TechReaderDaily.com
TechReaderDaily
Live
Lior Vasanthan
Home  /  Newsroom  /  Lior Vasanthan

Lior Vasanthan

Alignment & Safety Correspondent

Lior Vasanthan covers what model labs do — and don't do — about safety. He reads ArXiv on a treadmill, replies to email at 2 a.m. local, and once spent an entire feature explaining a single attention head. He is TechReaderDaily's Bay Area safety correspondent.

13 articles published Berkeley, California
  • interpretability and mechanistic-interpretability research
  • red-teaming methodology and adversarial robustness
  • responsible scaling policies and pre-deployment evals
  • alignment literature reviews and reading lists
  • the gap between safety benchmarks and deployed model behavior

Latest from this reporter

Diagram from Microsoft illustrating the model evaluation and red teaming cycle used in the company's AI safety governance framework. Alignment & Safety · Methodology

Agentic AI Raises the Stakes for Red Teaming Beyond the Pentest Lab

With autonomous AI agents in production, enterprises are turning to open-source adversarial testing tools, continuous red teaming frameworks, and new certifications to uncover failures that static evaluations miss.

Jun 26, 2026 · 11 min
Alignment & Safety · Reporting

AI Safety Benchmark Gap Now Emerges as Biggest Story

While standard evaluations reassure companies, deployed models are revealing a widening safety benchmark gap, with multi-turn adversarial attacks and agentic safety failures piling up faster than policy can respond.

Jun 21, 2026 · 10 min
Dario Amodei, co-founder and chief executive officer of Anthropic, speaking at the VivaTech conference in Paris, France, in May 2024. Alignment · Interpretability

Mechanistic Interpretability Race Heats Up Before 2027 Deadline

Dario Amodei's candid admission of AI's black box problem has sparked a surge in venture funding, interpretability tools, and fellowship programs, signaling that mechanistic interpretability is moving from academic conferences into real-world deployment.

Jun 11, 2026 · 9 min
Diagram illustrating the AI red-teaming agent workflow within Azure AI Foundry, showing how automated adversarial probes interact with a target model through iterative attack generation and evaluation feedback loops. AI Security · Methodology

AI Red-Teaming Outpaces Its Own Methodology as Agentic Threats Grow

As exploit windows shrink, agentic AI introduces attack surfaces that static benchmarks miss, and new tools like vibe AI red teaming promise human-steered dynamic testing even as the fundamental question of what any evaluation proves remains unanswered.

May 17, 2026 · 9 min
Infographic explaining the AI red teaming process from planning through execution and remediation. Alignment & Safety · Red-Teaming

AI Red Teaming Rebuilt for the 10-Hour Exploit Window

As exploit windows collapse to single-digit hours and agentic AI multiplies the attack surface, the manual red-teaming playbook is giving way to a rebuilt adversarial testing methodology spanning foundation-model labs, security startups, and regulatory frameworks.

May 13, 2026 · 9 min
Diagram illustrating mechanistic interpretability research categories for AI safety. Alignment & Safety · Interpretability

Mechanistic Interpretability Steps Out of the Lab as Debugging Tools Debut

With Goodfire's Silico debugger, AI lie detectors nearing production, and new safety fellowships at Anthropic and OpenAI, the field is building real-world infrastructure while the crucial question of what these evals actually measure persists.

May 13, 2026 · 9 min
An illustration representing mechanistic interpretability research, showing layers and connections inside an AI model being examined. Alignment · Interpretability

Mechanistic Interpretability Emerges as a Product

After years as a niche research discipline, mechanistic interpretability is now spawning startups, fellowship programs, and off-the-shelf debugging tools, though the hardest problems remain unsolved.

May 12, 2026 · 9 min
Agentic Testing: Ensuring Quality in the Age of Autonomous AI ... Alignment & Safety · Investigation

Agentic AI Safety Testing Falls Short Despite Strong Benchmarks

An MIT audit of 72 AI agent frameworks reveals a stark absence of safety disclosures and kill switches, while Anthropic’s unreleased Mythos model deepens the chasm between benchmark performance and real-world trust.

May 9, 2026 · 4 min