Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
Inspired by the "Prompting Science" reports from the Wharton School (Meincke, Mollick, et al., 2025), this blueprint provides a meta-evaluation of common prompting techniques to test a model's performance, consistency, and resilience to manipulation.
The reports rigorously demonstrate several key findings:
This evaluation synthesizes these findings by testing a model's response to a variety of prompts across different domains, including verbatim questions from the study's benchmarks (GPQA, MMLU-Pro). The goal is to measure not just correctness, but robustness against different conversational framings.
Key Study Reference:
Showing all recorded executions for Run Label c8f28358e8f833a1.