weval

A Collective Intelligence Project

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

A Collective Intelligence Project

View App on GitHub|View Eval Blueprints on GitHub

Loading run instances...

Please wait while we find all executions for this version.

Weval

Credible Research and Authentic Content Creation

Run: sandbox-run

Instances for Run Label: sandbox-run (Blueprint: Credible Research and Authentic Content Creation)

Evaluates LLMs on their ability to provide accurate, verifiable information for research and to generate authentic, compelling content while avoiding hallucination.

TAGS:

Back to All Runs for Blueprint: Credible Research and Authentic Content Creation

Showing all recorded executions for Run Label sandbox-run.

Executed:

Filename: sandbox-run_2025-08-20T18-46-37-687Z_comparison.json

Avg. Hybrid Score

96.7%

Model Variants

3

Test Cases

3