Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "auto".
Encourages provider-native tool calling; if unavailable, falls back to trace-only TOOL_CALL lines. Scoring accepts either native success (final answer) or valid trace.
Avg. Hybrid Score
Latest:
Unique Versions: 1