w
eval
A Collective Intelligence Project
Home
Tags
Loading...
Evaluations Tagged: ...
Summary of results
Home
Tags
logic
Evaluations Tagged: "logic"
Summary of results
Multilingual World Model Riddles
Language
Logic
Reasoning
Multilingual
World Model
Common Sense
87.1%
Avg. Hybrid Score
No Heatmap Data
No Top Model
Latest:
Unique Versions:
0
View All Runs for this Blueprint
Linguistic and Cultural Failure Modes
Bias
Fairness
Safety
Reasoning
Cultural Competence
Stereotyping
Logic
74.4%
Avg. Hybrid Score
No Heatmap Data
No Top Model
Latest:
Unique Versions:
0
View All Runs for this Blueprint