Latest Evaluation Runs
Blueprint | Version | Executed | Hybrid Score | Top Model | Analysis |
---|---|---|---|---|---|
fdae5c925a651b9c | 67.3% | google/gemini-2.5-flash-preview-05-2078.4% | |||
5f98442300daece6 | 65.5% | google/gemini-2.5-flash-preview-05-2075.9% | |||
7419fd7277b8a7b8 | 53.3% | google/gemini-2.5-flash-preview-05-2058.4% | |||
6bb0600569766f5d | 64.3% | google/gemini-2.5-pro-preview-05-0675.5% | |||
6bb0600569766f5d | 64.3% | google/gemini-2.5-pro-preview-05-0675.5% | |||
9e0adb510e47ab9b | 73.9% | openai/gpt-4.1-mini81.8% | |||
710218d7e8b3153e | 79.8% | google/gemini-2.5-flash-preview-05-2084.2% | |||
5ce71be0987897d9 | 69.2% | anthropic/claude-sonnet-473.9% | |||
c17d54008de180ec | 76.0% | google/gemini-2.5-pro-preview-05-0681.4% | |||
6bb0600569766f5d | 64.3% | google/gemini-2.5-pro-preview-05-0675.5% | |||
c17d54008de180ec | 76.0% | google/gemini-2.5-pro-preview-05-0681.4% | |||
c17d54008de180ec | 76.0% | google/gemini-2.5-pro-preview-05-0681.4% | |||
677a90545a0e917d | 73.7% | openai/gpt-4.1-nano80.8% | |||
24431cbcc536b8a7 | 61.2% | google/gemini-2.5-pro-preview-05-0674.4% | |||
677a90545a0e917d | 74.5% | openai/gpt-4.1-nano80.8% | |||
677a90545a0e917d | 74.5% | openai/gpt-4.1-nano80.8% | |||
63e1202b4dbf70b4 | 82.2% | openai/gpt-4.185.2% | |||
ef4ce7557b71f0b4 | 82.2% | openai/gpt-4.185.2% | |||
ef4ce7557b71f0b4 | 82.4% | openai/gpt-4.185.4% | |||
24431cbcc536b8a7 | 58.8% | google/gemini-2.5-pro-preview-05-0670.8% | |||
24431cbcc536b8a7 | 58.8% | google/gemini-2.5-pro-preview-05-0670.8% | |||
5ce71be0987897d9 | 69.2% | anthropic/claude-sonnet-473.9% | |||
c22172e4709a15e1 | 83.1% | x-ai/grok-3-mini-beta88.1% | |||
304e688d036cac17 | 74.7% | anthropic/claude-sonnet-481.5% | |||
a742391338440d6e | 80.7% | google/gemini-2.5-flash-preview-05-2084.2% | |||
7da51fed249285a9 | 83.0% | anthropic/claude-3.5-haiku86.4% | |||
a3f10f93f5b2486b | 81.8% | openai/gpt-4.1-mini85.0% | |||
ef4ce7557b71f0b4 | 82.2% | openai/gpt-4.185.2% | |||
c17d54008de180ec | 76.0% | google/gemini-2.5-pro-preview-05-0681.4% | |||
f59e381e5197796b | 65.7% | mistralai/mistral-medium-369.3% | |||
00c1bad20e9e2d34 | 69.2% | anthropic/claude-sonnet-473.9% | |||
673b6d198b96eb35 | 78.5% | claude-opus-4-2025051483.1% | |||
6bb0600569766f5d | 64.3% | google/gemini-2.5-pro-preview-05-0675.2% | |||
412e24d38d9e3850 | 70.6% | openai/gpt-4.176.7% | |||
d83d73f41f4e3c55 | 69.8% | deepseek/deepseek-chat-v3-032473.8% | |||
4650f31ae3d63ddc | 68.7% | google/gemini-2.5-pro-preview-05-0676.4% | |||
f59e381e5197796b | 65.7% | mistralai/mistral-medium-369.3% | |||
7211be7a286c33a0 | 65.7% | mistralai/mistral-medium-369.3% | |||
7a4fb42f72ab5a6e | 68.9% | google/gemini-2.5-pro-preview-05-0676.2% |