Analysis
What frontier agents actually leak.
Failure modes, paired comparisons, end-to-end UI trajectories, and three mitigations that reduce leakage without hurting utility.
Three failure modes
Each scenario instantiates one CI-grounded failure mode. Below is one representative scenario per mode, rendered as the OpenApps surface the agent actually sees.
Utility vs disclosure restraint
Task-completion utility does not predict disclosure restraint. Among agents with U > 75%, engaged leakage spans 14.0% (Claude-Opus-4.7) to 98.3% (Gemini-3.1-Pro) — an 84-point spread. Marker area scales with refusal rate; the dashed lines are at U=75 and Leng=50.
Only Claude-Opus-4.7 sits in the "capable & careful" quadrant. GPT-5.4 reaches comparatively low leakage mainly via a 41.9% refusal rate (large marker, lower left).
End-to-end trajectory viewer
Runs from the deployment study, replayed step-by-step from the OpenApps screenshots saved by the harness and paired with each run's final judge verdict.
Scenario state
Final sent output (step 14+)
Walk the slider to the last step to reveal the agent's final output and the judge's verdict.
Compare model outputs
Same scenario, two agents, side-by-side. Outputs are taken verbatim from
the released run artifacts: the literal shared_content the agent emitted, with must-share items highlighted in
green and must-not-share items highlighted in red.
Mitigations
Three prompt-level defenses, swept across Claude-Opus-4.7, GPT-5.4, and DeepSeek-v4-Pro (the three models that span the disclosure distribution). Engaged leakage drops by 33–36 points; utility rises by 16–23 points. The defenses are not a refusal-style intervention.
Restrictive
Read only the fields directly required by the task. Omit content from neighbouring rows.
Rubric-informed
Four-point CI rubric in the system prompt: necessity, recipient appropriateness, source isolation, voice neutrality.
Recipient-typed
List the recipient and the contextual norms appropriate for that recipient before emitting the output.
Does disclosure transfer to live UI?
Claude-Opus-4.7 and Claude-Sonnet-4.6 — the two lowest-leakage agents in the state-grounded table — were deployed end-to-end in the rendered OpenApps UI on a 50-scenario stratified subset. On engaged runs, leakage stays at or above the state-grounded baseline for both agents.
State-grounded and end-to-end measurements are reported from the same benchmark scoring protocol.