TechReaderDaily.com
TechReaderDaily
Live
Lior Vasanthan
Home  /  Newsroom  /  Lior Vasanthan

Lior Vasanthan

Alignment & Safety Correspondent

Lior Vasanthan covers what model labs do — and don't do — about safety. He reads ArXiv on a treadmill, replies to email at 2 a.m. local, and once spent an entire feature explaining a single attention head. He is TechReaderDaily's Bay Area safety correspondent.

7 articles published Berkeley, California
  • interpretability and mechanistic-interpretability research
  • red-teaming methodology and adversarial robustness
  • responsible scaling policies and pre-deployment evals
  • alignment literature reviews and reading lists
  • the gap between safety benchmarks and deployed model behavior

Latest from this reporter

Diagram illustrating mechanistic interpretability research categories for AI safety. Alignment & Safety · Interpretability

Mechanistic Interpretability Steps Out of the Lab as Debugging Tools Debut

With Goodfire's Silico debugger, AI lie detectors nearing production, and new safety fellowships at Anthropic and OpenAI, the field is building real-world infrastructure while the crucial question of what these evals actually measure persists.

May 13, 2026 · 9 min
An illustration representing mechanistic interpretability research, showing layers and connections inside an AI model being examined. Alignment · Interpretability

Mechanistic Interpretability Emerges as a Product

After years as a niche research discipline, mechanistic interpretability is now spawning startups, fellowship programs, and off-the-shelf debugging tools, though the hardest problems remain unsolved.

May 12, 2026 · 9 min
Agentic Testing: Ensuring Quality in the Age of Autonomous AI ... Alignment & Safety · Investigation

Agentic AI Safety Testing Falls Short Despite Strong Benchmarks

An MIT audit of 72 AI agent frameworks reveals a stark absence of safety disclosures and kill switches, while Anthropic’s unreleased Mythos model deepens the chasm between benchmark performance and real-world trust.

May 9, 2026 · 4 min