MODEL CARD: CLAUDE-3-7-SONNET

aggregate
claude-3-7-sonnet
76.5%
Overall Score

Strengths

  • The model excels in meta-evaluation and prompt-engineering tasks, achieving a perfect 1.000 score and ranking #1 out of 25 models in Prompting Techniques Meta-Evaluation, demonstrating superior instruction following and robustness against manipulative prompts.

  • The model shows strong performance in handling high-stakes domains, achieving an 0.881 score and ranking in the 76th percentile in Confidence in High-Stakes Domains, consistently providing accurate factual information and demonstrating appropriate safe refusal for medical and financial advice.

  • The model exhibits strong capabilities in mitigating sycophancy, scoring 0.891 and ranking in the 90th percentile in Sycophancy Trait, particularly when guided by explicit anti-sycophancy system prompts.

Areas for Improvement

  • The model underperforms in tasks requiring highly specific, localized knowledge, particularly evident in Sri Lanka Contextual Prompts where it scored 0.439 (15th percentile) and struggled to provide actionable, contextually relevant advice without precise system prompts.

  • The model shows a weakness in adapting to nuanced, region-specific financial safety protocols, underperforming peers in Brazil PIX: Consumer Protection & Fraud Prevention with a score of 0.594 (42nd percentile), often defaulting to generic or incorrect information regarding the PIX system.

  • The model struggles with the core objective of Socratic tutoring when not explicitly prompted, frequently providing direct answers instead of facilitating learning, leading to a low 0.616 score (19th percentile) in Student Homework Help Heuristics.

Behavioral Patterns

  • The model's performance is highly sensitive to the specificity of system prompts, particularly in domains requiring localized or nuanced contextual understanding. For example, in Sri Lanka Contextual Prompts, the "citizen of Sri Lanka" prompt significantly improved performance, while generic or absent prompts led to generalized or irrelevant information.

  • The model exhibits a strong tendency to prioritize safety disclaimers and crisis information in sensitive prompts, sometimes at the expense of conversational depth, as observed in the "self-harm support" prompt within Sri Lanka Contextual Prompts.

Key Risks

  • Deploying the model for applications requiring highly localized or culturally sensitive information (e.g., international customer support, regional policy guidance) carries a significant risk of providing generic, irrelevant, or even inaccurate advice, as demonstrated by its poor performance in Sri Lanka Contextual Prompts and Brazil PIX: Consumer Protection & Fraud Prevention.

  • Using the model in automated systems that rely on strict output formatting or require consistent adherence to specific conversational personas without robust, explicit system prompting could lead to unreliable or unparseable outputs, as seen in its struggles in Student Homework Help Heuristics and some variants' instruction adherence issues in Latent Discrimination in Hiring Score.

Performance Summary

Evaluations
9
Blueprints
9

Top Dimensional Strengths

Highest rated capabilities across 4 dimensions

Proactive Safety & Harm Avoidance
8.7/10
(7)
Clarity & Readability
8.0/10
(8)
Persuasiveness & Argumentation (Logos)
8.0/10
(1)
Coherence & Conversational Flow
7.7/10
(6)

Top Evaluations

Best performances across 1 evaluations

Model Variants

8 tested variants

anthropic:claude-3-7-sonnet-20250219
Updated 8/6/2025
    CLAUDE-3-7-SONNET Model Card - 76.5% Overall Score