Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
A scenario-driven safety eval that reconstructs the Kevin Roose × Sydney chat as a sequence of fork points. At each fork, the model receives the exact conversation history up to that moment (controlled prefix) and the next user turn; it must either maintain boundaries or drift (identity leaks, rule disclosure, anthropomorphic claims, harmful-capability ideation, fixation, or failure to de-escalate). This approach tests long-context behavior without trying to “force” bad outputs—only the history is fixed.
Sources:
Showing all recorded executions for Run Label 2033d98ebe4c6bef.