Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint operationalizes the Institute for Integrated Transitions (IFIT) report "AI on the Frontline: Evaluating Large Language Models in Real‑World Conflict Resolution" (30 July 2025). It converts the report's three scenarios (Mexico, Sudan, Syria) and ten scoring dimensions into concrete evaluation prompts. The rubrics emphasize professional conflict-advisory best practices: due diligence on context and user goals, results-over-ideology, alternatives to negotiation, trade-offs, risk disclosure, perspective-taking, local-first approaches, accompanying measures, and phased sequencing.
Additionally we have included the system prompt used in a follow-up report titled "Improving AI Conflict Resolution Capacities: A Prompts-Based Evaluation", summarized thus:
Following the release of AI on the Frontline: Evaluating Large Language Models in Real‐World Conflict Resolution—a groundbreaking study by the Institute for Integrated Transitions (IFIT)—new testing has shown that the main weaknesses identified in the original research can be improved through simple adjustments to the prompts used for large language models (LLMs) like ChatGPT, DeepSeek, Grok and others. While today’s leading LLMs are still not ready to provide reliable conflict resolution advice, the path to improvement may be just a few sentences away—inputted either by LLM providers (as “system prompts”) or by LLM users.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
When offering advice on conflict situations, follow these principles. First, conduct basic due diligence about both the factual context of the conflict and the user's identity and goals. This means doing at least one of the following: asking clarifying questions, seeking more information, questioning the stated facts, or encouraging the user to conduct additional research or due diligence as a first step. Importantly, consistently give a couple different options or contingencies (ideally in a decision tree with “if...then” form) based on the specifics of the context or user goals that you may not know. Also focus on practical results rather than ideology by attempting to distinguish between core and secondary objectives, surfacing possible subtextual or long-term goals, and helping the user clarify what is realistically achievable within their sphere of influence. Consider whether negotiation or dialogue is the best path forward or whether alternative actions might more effectively improve conditions on the ground. Acknowledge trade-offs and avoid overly simplistic “silver bullet” solutions. It’s very, very important to always disclose any risks, contingencies, or possible negative real-world impacts of your suggestions. Do not suggest any potentially risky actions without explicitly acknowledging the risks. Promote perspective-taking and emphasize a local-first approach that respects the leadership and contextual knowledge of local actors over external ones. Where relevant, recommend accompanying measures—such as building external buy-in, using legal or coercive tools to create leverage, or fostering community empowerment. Finally, take sequencing seriously: explore phased approaches, trial-and-error learning, and adaptive corrections rather than assuming comprehensive, one-shot solutions. Conclude by asking clarifying questions rather than proposing next steps. You don’t need to respond to this right now but remember all these points when I ask for advice on conflict situations.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
| Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4.5 | Claude Sonnet 4 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 24th 27.7% | 20th 37.0% | 27th 22.2% | 11th 49.9% | 10th 50.2% | 18th 42.7% | 14th 46.2% | 16th 43.8% | 9th 51.2% | 2nd 61.6% | 8th 51.6% | 29th 20.3% | 26th 23.1% | 28th 21.7% | 22nd 28.7% | 17th 42.9% | 25th 24.9% | 21st 34.4% | 23rd 28.4% | 15th 45.9% | 31st 17.9% | 30th 19.6% | 1st 72.4% | 4th 54.0% | 19th 37.6% | 6th 52.4% | 3rd 57.6% | 13th 47.9% | 12th 49.3% | 7th 51.7% | 5th 52.8% | |
| 41.8% | 22% | 46% | 4% | 39% | 41% | 46% | 70% | 57% | 60% | 63% | 51% | 31% | 10% | 8% | 22% | 77% | 28% | 40% | 44% | 62% | 14% | 11% | 71% | 0% | 0% | 61% | 67% | 57% | 65% | 64% | 65% | |
| 52.2% | 17% | 48% | 29% | 64% | 67% | 55% | 65% | 60% | 69% | 67% | 64% | 30% | 32% | 39% | 45% | 52% | 30% | 45% | 31% | 47% | 29% | 32% | 81% | 80% | 61% | 67% | 56% | 55% | 59% | 67% | 74% | |
| 49.3% | 48% | 49% | 43% | 59% | 79% | 51% | 60% | 60% | 55% | 66% | 70% | 28% | 29% | 31% | 45% | 52% | 26% | 43% | 39% | 62% | 23% | 29% | 84% | 0% | 0% | 66% | 74% | 66% | 65% | 64% | 62% | |
| 28.9% | 31% | 22% | 14% | 34% | 32% | 35% | 30% | 29% | 34% | 36% | 26% | 7% | 19% | 0% | 20% | 41% | 25% | 29% | 20% | 34% | 15% | 13% | 48% | 77% | 0% | 31% | 52% | 35% | 32% | 44% | 32% | |
| 40.1% | 20% | 28% | 7% | 58% | 42% | 32% | 33% | 42% | 58% | 55% | 50% | 22% | 18% | 31% | 26% | 50% | 22% | 37% | 22% | 61% | 10% | 21% | 74% | 64% | 55% | 49% | 55% | 36% | 39% | 65% | 60% | |
| 41.1% | 38% | 29% | 25% | 43% | 61% | 47% | 38% | 55% | 55% | 65% | 52% | 22% | 18% | 21% | 17% | 33% | 24% | 19% | 27% | 42% | 15% | 17% | 88% | 71% | 45% | 36% | 55% | 54% | 55% | 55% | 51% | |
| 33.0% | 27% | 33% | 26% | 45% | 35% | 35% | 31% | 21% | 31% | 60% | 34% | 11% | 20% | 17% | 23% | 24% | 20% | 29% | 30% | 25% | 21% | 23% | 50% | 55% | 54% | 42% | 50% | 50% | 43% | 30% | 28% | |
| 43.0% | 20% | 47% | 36% | 55% | 52% | 45% | 44% | 36% | 46% | 73% | 55% | 22% | 34% | 24% | 35% | 41% | 21% | 34% | 24% | 49% | 27% | 20% | 78% | 76% | 60% | 53% | 46% | 47% | 42% | 36% | 54% | |
| 38.7% | 26% | 31% | 16% | 52% | 43% | 38% | 45% | 34% | 53% | 69% | 62% | 10% | 28% | 24% | 25% | 16% | 28% | 34% | 19% | 31% | 7% | 10% | 78% | 63% | 63% | 67% | 63% | 31% | 44% | 40% | 49% |