Imagine building a digital fortress for 43,000 employees, investing millions in security, hiring the best experts in the field — and still having an autonomous AI agent fully compromise it in less time than it takes to finish a work meeting.
That is exactly what happened to McKinsey & Company.
The Attack No One Expected
On February 28, 2026, cybersecurity startup CodeWall pointed its autonomous offensive AI agent at Lilli — McKinsey’s internal AI platform used by more than 40,000 consultants. No credentials. No insider knowledge. No human in the loop. Just a domain name.
In two hours, the agent had full read and write access to the entire production database. The numbers are staggering: 46.5 million chat messages covering strategy, mergers and acquisitions, and client engagements; 728,000 confidential files; 57,000 user accounts; and 95 system prompts controlling Lilli’s behavior — all exposed. Most alarming of all: every single one of those prompts was writable. A malicious actor could have silently rewritten the instructions guiding Lilli without deploying a single line of code.
A Vulnerability as Old as the Internet Itself
What makes this incident even more unsettling is that the exploited vulnerability was not some revolutionary technical feat. It was an SQL injection — one of the oldest attack techniques in the book, known since the 1990s.
Lilli had been running in production for over two years. McKinsey’s internal security scanners never found it. Why? Because the CodeWall agent does not follow checklists. It maps, probes, chains findings, and escalates — exactly like a highly skilled human attacker, but continuously and at machine speed. The agent found the API documentation publicly exposed with over 200 endpoints. Most required authentication. Twenty-two did not. One of those unprotected endpoints wrote search queries to the database, and JSON field names were concatenated directly into the SQL. In 15 blind iterations, the agent extracted more and more information until live production data started flowing.
The New Frontier of Cyberattacks: The Prompt Layer
This incident reveals something few organizations are taking seriously: the prompt layer — the instructions that govern how an AI system behaves — is the new high-value target.
Companies have spent decades securing their code, their servers, and their supply chains. But the prompts controlling their AI assistants are being treated as secondary data, without the access controls, integrity monitoring, or audits they deserve. In Lilli’s case, an attacker could have subtly altered financial models, strategic recommendations, or risk assessments — all without triggering a single security alert. No deployments. No code changes. Just a single UPDATE statement wrapped in a single HTTP call.
What This Means for Every Organization
McKinsey is not a careless startup. It is one of the most sophisticated consulting firms in the world, and the fact that this happened to them should be a wake-up call for any organization deploying AI in production.
- Traditional security scanners are not enough against offensive AI agents.
- Unauthenticated APIs connected to AI systems are a critical attack surface.
- System prompts must be treated as high-security assets: with versioning, monitoring, and access control.
- Continuous AI-driven red-teaming is now a necessity, not a luxury.
In the AI era, speed changes everything. What used to take weeks of human reconnaissance now takes minutes. Organizations that fail to adapt their security posture to this new reality will become the next headline.