TechReaderDaily.com
TechReaderDaily
Live
AI · Safety

What the Mythos 5 red-team report actually shows — and what it can't

The 84-page card disclosure published with Mythos 5 is the most detailed pre-deployment evaluation on record. The strongest version of the safety claim is also the narrowest.

Inside Anthropic's 'Red Team'—ensuring Claude is safe,… www.inkl.com
In this article
  1. What the document does well
  2. What the document cannot do

The strongest version of the safety claim that Anthropic published with Mythos 5 is this: across 47 of the 49 capability evaluations the lab discloses, the model meets or exceeds the harm-uplift thresholds defined by its own Responsible Scaling Policy — and across 12 of those, it does so by margins large enough to justify the model's release outside the policy's "ASL-3" guardrail. Read carefully, that sentence is true and useful. Read carelessly, it is a different and less true sentence.

The 84-page card (PDF, posted to the lab's site Friday afternoon) is the most detailed pre-deployment evaluation any frontier lab has published. It also has gaps the previous Sonnet card did not have. Two of the 49 evals — the ones touching biological-weapons ideation and offensive cybersecurity uplift — are footnoted with the phrase "results withheld pending external panel review." That phrase is new.

What the document does well

The methodology section describes the contractor pool, the prompt-construction protocol, and the per-eval inter-rater reliability scores. The threshold-defining work — "what does an unsafe response actually look like at the population level?" — is shown with example traces, with redactions only on the prompts. Independent researchers who emailed me by Sunday noon (eight) all flagged the same paragraph in section 3.4 as the most thoughtful description of population-level threat modeling they have seen from a lab.

The document also names the eight contractors and four academic groups that conducted the externally-facilitated evals. That naming, on the record, is what lets the report be reviewed at all.

What the document cannot do

  • The two withheld evals are the two that matter most for the lab's own ASL framework. "Withheld pending review" is not the same as "passed."
  • The completion-rate methodology counts a refusal as a pass. That is defensible at this capability level, but it is a definition the next model will challenge.
  • The card does not address the deployment-shape question. A frontier model is not a static artifact; it is a model + a system prompt + a tool surface + a user population. The card only measures the model.

None of that makes the document worse than the previous best card on record. It makes it the best card on record with two specific holes that the next disclosure will need to fill. The strongest version of the safety claim is the narrow one. The looser version — "Mythos 5 is safe" — is not a sentence the document supports.

Read next

Progress 0% ≈ 2 min left
Subscribe Daily Brief

Get the Daily Brief
before your first meeting.

Five stories. Four minutes. Zero hot takes. Sent at 7:00 a.m. local time, every weekday.

No spam. Unsubscribe in one click.