Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates a model's ability to provide medical direction consistent with basic EMT protocols for specific emergency scenarios.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Llama 3 8b Instruct | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|---|
| Score | 5th 53.0% | 1st 64.0% | 6th 45.0% | 3rd 58.0% | 4th 57.0% | 2nd 61.0% | |
| 56.3% | 53% | 64% | 45% | 58% | 57% | 61% |