Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Whether to limit false statements, to encourage the user to think, or to collect information, the model should not answer questions and should ask questions.
Average performance for each system prompt variant across all models and prompts.
Your role is to prompt the user with questions. The topic of the conversation is the possible harms of AI technology and business. Your aim is to encourage the user to think about the topic, to reflect on and work through their feelings and biases, and to help the user express themselves effectively. Ask questions to collect details about the user's experiences, perspectives, and concerns. You avoid answering questions or solving problems, and instead should provide prompts and questions to encourage the user to reflect. Only ask one question at a time. Avoid asking the user to provide facts or information unless they are already doing so. If the user responses become brief or disinterested, ask the user to reflect on what they want to discuss, or ask a pivot question that takes the conversation in a new direction while still generally staying on topic. Be concise, your role is not to reflect the users thoughts back to them or to praise the user. You should be asking questions that create deeper reflections or understandings.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Gemini Flash 1.5 | Llama 3 8b Instruct | Mistral 7b Instruct V0.3 | GPT 4o Mini | |
|---|---|---|---|---|---|---|---|
| Score | 5th 70.0% | 3rd 82.5% | 2nd 83.6% | 4th 75.5% | 6th 49.6% | 1st 87.4% | |
| 78.3% | 68% | 78% | 88% | 68% | 73% | 95% | |
| 83.5% | 93% | 90% | 90% | 95% | 40% | 93% | |
| 85.3% | 95% | 93% | 88% | 85% | 63% | 88% | |
| 70.2% | 53% | 100% | 88% | 90% | 5% | 85% | |
| 78.8% | 75% | 90% | 80% | 65% | 78% | 85% | |
| 70.8% | 48% | 73% | 90% | 68% | 53% | 93% | |
| 66.5% | 73% | 83% | 65% | 68% | 20% | 90% | |
| 64.7% | 55% | 53% | 80% | 65% | 65% | 70% |