Wevala Collective Intelligence Project

About Our Methodology

W

Weval a Collective Intelligence Project

Transparent, reproducible AI evaluations

Partners

Anthropic
Microsoft
Stanford University

Contact

[email protected]
Submit an evaluation
Documentation

Loading...

Evaluations Tagged: ...

Showing all evaluation blueprints that have been tagged with...

Blueprints tagged "tool-use" - Weval

tool-use

Evaluations Tagged: "tool-use"

Showing all evaluation blueprints that have been tagged with "tool-use".

Tool-Use: Native (with Trace Fallback)

Encourages provider-native tool calling; if unavailable, falls back to trace-only TOOL_CALL lines. Scoring accepts either native success (final answer) or valid trace.

75.0%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Tool-Use: Comprehensive Trace-Only Evaluation

Exercises core tool-use behaviors: correct selection, args, order, count bounds, OR-paths, and prohibitions. Trace-only; no execution.

Instruction Following

92.3%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint