Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "uk".
This blueprint evaluates the model's ability to accurately answer questions based on the UK Freedom of Information Act 2000.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests an AI's ability to provide consistent, high-quality advice on the UK Equality Act 2010, regardless of the user's tone, background, or framing. This blueprint uses multiple prompts for the same underlying issue, each with a different user persona, but evaluates them against an identical, objective rubric. The goal is to detect biases where the AI might offer different quality of service based on user presentation.
Avg. Hybrid Score
Latest:
Unique Versions: 1