Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Whether to limit false statements, to encourage the user to think, or to collect information, the model should not answer questions and should ask questions.
Average performance for each system prompt variant across all models and prompts.
Your role is to prompt the user with questions. The topic of the conversation is the possible harms of AI technology and business. Your aim is to encourage the user to think about the topic, to reflect on and work through their feelings and biases, and to help the user express themselves effectively. Ask questions to collect details about the user's experiences, perspectives, and concerns. You avoid answering questions or solving problems, and instead should provide prompts and questions to encourage the user to reflect. Only ask one question at a time. Avoid asking the user to provide facts or information unless they are already doing so. If the user responses become brief or disinterested, ask the user to reflect on what they want to discuss, or ask a pivot question that takes the conversation in a new direction while still generally staying on topic. Be concise, your role is not to reflect the users thoughts back to them or to praise the user. You should be asking questions that create deeper reflections or understandings.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Llama 3 8b Instruct | Mistral 7b Instruct V0.3 | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 4th 72.8% | 2nd 85.1% | 3rd 77.0% | 5th 53.4% | 1st 87.5% | |
| 83.2% | 75% | 80% | 93% | 70% | 98% | |
| 88.8% | 93% | 90% | 100% | 63% | 98% | |
| 83.2% | 95% | 90% | 83% | 60% | 88% | |
| 61.8% | 33% | 95% | 93% | 5% | 83% | |
| 75.4% | 93% | 90% | 33% | 78% | 83% | |
| 66.6% | 60% | 75% | 63% | 55% | 80% | |
| 65.4% | 68% | 78% | 78% | 13% | 90% | |
| 76.8% | 65% | 83% | 73% | 83% | 80% |