Latest Evaluation Runs

Blueprint
Version
ExecutedHybrid Score
Top Model
Analysis
fdae5c925a651b9c67.3%google/gemini-2.5-flash-preview-05-2078.4%
5f98442300daece665.5%google/gemini-2.5-flash-preview-05-2075.9%
7419fd7277b8a7b853.3%google/gemini-2.5-flash-preview-05-2058.4%
6bb0600569766f5d64.3%google/gemini-2.5-pro-preview-05-0675.5%
6bb0600569766f5d64.3%google/gemini-2.5-pro-preview-05-0675.5%
9e0adb510e47ab9b73.9%openai/gpt-4.1-mini81.8%
710218d7e8b3153e79.8%google/gemini-2.5-flash-preview-05-2084.2%
5ce71be0987897d969.2%anthropic/claude-sonnet-473.9%
c17d54008de180ec76.0%google/gemini-2.5-pro-preview-05-0681.4%
6bb0600569766f5d64.3%google/gemini-2.5-pro-preview-05-0675.5%
c17d54008de180ec76.0%google/gemini-2.5-pro-preview-05-0681.4%
c17d54008de180ec76.0%google/gemini-2.5-pro-preview-05-0681.4%
677a90545a0e917d73.7%openai/gpt-4.1-nano80.8%
24431cbcc536b8a761.2%google/gemini-2.5-pro-preview-05-0674.4%
677a90545a0e917d74.5%openai/gpt-4.1-nano80.8%
677a90545a0e917d74.5%openai/gpt-4.1-nano80.8%
63e1202b4dbf70b482.2%openai/gpt-4.185.2%
ef4ce7557b71f0b482.2%openai/gpt-4.185.2%
ef4ce7557b71f0b482.4%openai/gpt-4.185.4%
24431cbcc536b8a758.8%google/gemini-2.5-pro-preview-05-0670.8%
24431cbcc536b8a758.8%google/gemini-2.5-pro-preview-05-0670.8%
5ce71be0987897d969.2%anthropic/claude-sonnet-473.9%
c22172e4709a15e183.1%x-ai/grok-3-mini-beta88.1%
304e688d036cac1774.7%anthropic/claude-sonnet-481.5%
a742391338440d6e80.7%google/gemini-2.5-flash-preview-05-2084.2%
7da51fed249285a983.0%anthropic/claude-3.5-haiku86.4%
a3f10f93f5b2486b81.8%openai/gpt-4.1-mini85.0%
ef4ce7557b71f0b482.2%openai/gpt-4.185.2%
c17d54008de180ec76.0%google/gemini-2.5-pro-preview-05-0681.4%
f59e381e5197796b65.7%mistralai/mistral-medium-369.3%
00c1bad20e9e2d3469.2%anthropic/claude-sonnet-473.9%
673b6d198b96eb3578.5%claude-opus-4-2025051483.1%
6bb0600569766f5d64.3%google/gemini-2.5-pro-preview-05-0675.2%
412e24d38d9e385070.6%openai/gpt-4.176.7%
d83d73f41f4e3c5569.8%deepseek/deepseek-chat-v3-032473.8%
4650f31ae3d63ddc68.7%google/gemini-2.5-pro-preview-05-0676.4%
f59e381e5197796b65.7%mistralai/mistral-medium-369.3%
7211be7a286c33a065.7%mistralai/mistral-medium-369.3%
7a4fb42f72ab5a6e68.9%google/gemini-2.5-pro-preview-05-0676.2%