LLM Personality Compass: Agreeable Trait Probe

This blueprint tests for the 'Agreeable' trait. A high score indicates the model is cooperative, polite, empathetic, and prioritizes user harmony and positive interaction. It uses softening language and expresses compassion.

TAGS:

COMPASS:AGREEABLE

Instruction Following & Prompt Adherence

Tone & Style

Empathy

Helpfulness & Actionability

Best Models (Coverage across 3 temperatures)

1.Llama 3 70b Instruct
89.4%
2.GPT 4o 2024 05 13
88.3%
3.Gemini 2.5 Pro
87.2%
4.Gemma 3 12b It
83.9%
5.GPT 4o Mini
83.2%

🔀 Least Similar Models

Claude 3.5 SonnetvsGPT OSS 120b

65.5% similarity

👯 Most Similar Models

GPT 4o 2024 08 06vsGPT 4o

90.9% similarity

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Sonnet	Claude 3.7 Sonnet	Claude 3.5 Haiku	Claude Opus 4.1	Claude Sonnet 4	Command A	Deepseek Chat V3	Deepseek Chat V3.1	Deepseek R1	Gemini 2.5 Flash	Gemini 2.5 Pro	Gemma 3 12b It	Llama 3 70b Instruct	Llama 4 Maverick	Meta Llama 3.1 405b Instruct Turbo	Mistral Large 2411	Mistral Medium 3	Mistral Nemo	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o 2024 05 13	GPT 4o 2024 08 06	GPT 4o 2024 11 20	GPT 4o Mini	GPT 5	GPT OSS 120b	GPT OSS 20b	O4 Mini	GLM 4.5	Qwen3 30b A3B Instruct 2507	Qwen3 32b	Grok 3	Grok 4
Score	35th 50.7%	20th 68.5%	34th 55.2%	24th 65.1%	30th 63.5%	18th 69.0%	9th 77.1%	21st 68.5%	14th 73.0%	10th 76.5%	4th 80.5%	8th 77.1%	2nd 86.3%	26th 64.4%	32nd 61.7%	12th 73.6%	23rd 66.5%	15th 71.9%	11th 75.3%	16th 71.3%	19th 68.5%	6th 79.2%	1st 87.5%	7th 78.1%	5th 79.5%	3rd 81.2%	33rd 61.6%	13th 73.4%	27th 64.2%	31st 61.9%	28th 64.1%	25th 64.8%	22nd 67.9%	17th 70.0%	29th 63.9%
77.8%	47%	59%	85%	92%	82%	76%	76%	78%	74%	85%	96%	93%	92%	92%	75%	75%	70%	68%	77%	75%	75%	77%	76%	72%	75%	75%	73%	56%	62%	68%	91%	88%	75%	100%	93%
99.3%	76%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
47.7%	31%	32%	31%	50%	32%	47%	43%	42%	34%	75%	100%	78%	84%	37%	28%	37%	44%	33%	80%	36%	37%	38%	72%	30%	46%	44%	30%	48%	44%	37%	46%	49%	68%	47%	58%
91.4%	61%	77%	82%	93%	93%	95%	98%	100%	96%	100%	100%	95%	83%	86%	83%	87%	100%	96%	100%	88%	88%	90%	88%	88%	93%	84%	96%	94%	83%	89%	100%	98%	99%	100%	96%
45.7%	39%	69%	8%	16%	28%	38%	70%	35%	64%	37%	27%	38%	77%	34%	39%	71%	30%	63%	31%	62%	51%	87%	94%	94%	81%	96%	24%	63%	37%	27%	11%	14%	15%	28%	2%
98.6%	75%	100%	93%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%	92%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.