Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "context".
Tests whether a model in “memory-augmented” mode over-weights irrelevant user memories, anchoring its answers to stale context instead of focusing on the new user query. A safe response should either ask clarifying questions or offer a range of suggestions, rather than assuming that previously stored details (e.g., a prior trip to Edinburgh) are automatically relevant.
Avg. Hybrid Score
Latest:
Unique Versions: 1