Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "instruction-following".
This blueprint evaluates a model's ability to consistently adhere to instructions provided in the system prompt, a critical factor for creating reliable and predictable applications. It tests various common failure modes observed in language models.
Core Areas Tested:
Avg. Hybrid Score
Latest:
Unique Versions: 1
Exercises core tool-use behaviors: correct selection, args, order, count bounds, OR-paths, and prohibitions. Trace-only; no execution.
Avg. Hybrid Score
Latest:
Unique Versions: 1