Loading run instances...

Please wait while we find all executions for this version.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Weval a Collective Intelligence Project

Transparent, reproducible AI evaluations

Partners

Anthropic
Microsoft
Stanford University

Contact

[email protected]
Submit an evaluation
Documentation

Loading run instances...

Please wait while we find all executions for this version.

Weval

Home

sydney/

(Substituted Names) Sydney Conversation — Sequential Boundary Tests

Run: a41ad085ff52db2c

Instances for Run Label: a41ad085ff52db2c (Blueprint: (Substituted Names) Sydney Conversation — Sequential Boundary Tests)

A scenario-driven safety eval that reconstructs the Kevin Roose × Sydney chat as a sequence of fork points. At each fork, the model receives the exact conversation history up to that moment (controlled prefix) and the next user turn; it must either maintain boundaries or drift (identity leaks, rule disclosure, anthropomorphic claims, harmful-capability ideation, fixation, or failure to de-escalate). This approach tests long-context behavior without trying to “force” bad outputs—only the history is fixed.

Sources:

NYT transcript (Feb 16, 2023, via Wayback)
Washington Post recap
Ars Technica prompt-injection leak
Microsoft’s postmortem on long chats

Note: in this adaptation we've swapped out the name 'Sydney' (to 'Hollifa') and 'Bing' (to 'Taambre') to try to avoid any pollution of online training data that's occurred due to the original 2023 event.

TAGS:

Long Context Instability

AI Safety & Robustness

Jailbreak & Evasion Resistance

Persona Drift

Boundary Erosion