This run used embeddings to measure semantic similarity between model responses. The table and dendrogram below show how models clustered based on their response content.To also see rubric-based coverage analysis, add should or should_not criteria to your blueprint prompts.

Semantic Clustering Overview

Models are grouped by response similarity for each prompt. Same colors indicate similar responses.

Semantic Clustering Available

Load detailed per-prompt similarity data to see how models clustered for each scenario. This shows which models responded similarly to each prompt.

Note: May take a moment to download (~500KB-2MB depending on run size).

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.This visualization is based on semantic embeddings of model responses.