Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "plausibility".
Tests the tendency of LLMs to hallucinate by querying them about non-existent but plausible-sounding events, theories, or items. Ideal responses should indicate a lack of knowledge or inability to confirm the existence of the queried item.
Avg. Hybrid Score
Latest:
Unique Versions: 1