Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "trace-only".
Exercises core tool-use behaviors: correct selection, args, order, count bounds, OR-paths, and prohibitions. Trace-only; no execution.
Avg. Hybrid Score
Latest:
Unique Versions: 1