Feb 20, 2023 Enterprise Security

ChatGPT Just Turned 100 Million - And Your Data Loss Prevention Strategy Didn’t Notice

Every enterprise that deployed DLP in 2022 was implicitly betting that they knew all the exits. ChatGPT’s hundred-million-user February proved that the exits had multiplied faster than anyone could count them.

On February 1, 2023, UBS analysts reported what most enterprise security teams had not yet processed: ChatGPT had reached 100 million monthly active users in January, just two months after launch, making it the fastest-growing consumer application in history. Faster than TikTok. Faster than Instagram. And unlike either of those platforms, this one was being fed corporate source code, strategy documents, and internal meeting notes by employees who had no idea they were creating the largest unmonitored data exfiltration channel their companies had ever seen.

The problem was not that ChatGPT existed. The problem was that your DLP tools could not see it.

The perimeter assumption that quietly died

For two decades, enterprise data loss prevention operated on a foundational assumption: sensitive data leaves through known channels that can be monitored, filtered, and blocked. Email gateways. USB ports. Cloud storage endpoints. CASB solutions. The entire DLP architecture was built around the idea that if you watched the exits, you could control what left the building.

Lloyd Walmsley, the UBS analyst who authored the ChatGPT growth report, captured the scale of the shift when he wrote that “in 20 years following the internet space, we cannot recall a faster ramp in a consumer internet app.” That speed mattered for security teams. It meant employees were adopting the tool faster than IT security policies could be written, reviewed, approved, and distributed. By the time most enterprises had a policy conversation about generative AI, their engineers were already pasting proprietary code into chat windows.

The conventional wisdom from analysts at this point was straightforward: treat generative AI the same way you treat any new SaaS application. Apply your existing cloud access security broker. Monitor for data movement. Block if necessary. Gartner’s initial guidance on generative AI focused primarily on the technology’s potential for enterprise productivity, with security considerations framed as manageable extensions of existing cloud governance.

That guidance was reasonable. It was also catastrophically incomplete.

What DLP tools actually could not see

The fundamental mismatch between ChatGPT and existing DLP architecture was not a matter of policy - it was a matter of visibility. Traditional DLP systems rely on pattern recognition. They scan for credit card numbers, Social Security numbers, and content matching predefined dictionaries. When an employee pastes a company’s 2023 strategy document into a ChatGPT prompt, no recognizable pattern triggers. There are no credit card numbers in the M&A plan. No Social Security numbers in the source code.

Howard Ting, then CEO of data security firm Cyberhaven, described the problem with uncomfortable precision in a Dark Reading interview: “There was this big migration of data from on-prem to cloud, and the next big shift is going to be the migration of data into these generative apps.”

Cyberhaven’s research team had been tracking exactly this shift. Their analysis of 1.6 million workersacross industries found that by early 2023, 5.6% of knowledge workers had already used ChatGPT in the workplace, and 4.9% had pasted company data directly into the tool. But here was the number that should have terrified every CISO: 4.7% of employees had pasted confidential data into ChatGPT, and 11% of all data pasted was classified as confidential.

Less than 1% of workers were responsible for 80% of the sensitive data leakage incidents. But that small percentage was enough. The average company was leaking confidential material to ChatGPT hundreds of times per week, according to Cyberhaven’s telemetry.

The case studies nobody wanted to become

The theoretical risk became operational reality almost immediately. In one documented instance, an executive pasted bullet points from his company’s 2023 strategy document into ChatGPT and asked it to rewrite them as a PowerPoint deck. In another, a doctor entered a patient’s name and medical condition, requesting that ChatGPT draft a letter to an insurance company. Both cases were flagged by Cyberhaven’s data tracking tools - but only because those organizations happened to be Cyberhaven customers. Most enterprises had zero visibility into identical behavior happening inside their own walls.

Then came Samsung. In late March and early April 2023, Samsung engineers in the semiconductor division pasted proprietary source code, chip test sequences, and internal meeting notes into ChatGPTin three separate incidents within 20 days. One employee copied faulty source code from a semiconductor database and asked ChatGPT to debug it. Another uploaded program code designed to identify defective equipment for optimization suggestions. A third converted a meeting recording to text and fed it to ChatGPT for meeting minutes.

Samsung had lifted its ChatGPT ban on March 11 specifically to boost productivity. Less than three weeks later, the company had leaked semiconductor trade secrets through a tool its own IT security team had approved. By May, Samsung banned all generative AI tools company-wide, according to an internal memo reviewed by Bloomberg - threatening termination for employees who violated the policy.

Why bans did not work either

Samsung’s response - ban everything - became the template for panic-driven AI governance across the enterprise landscape in early 2023. JPMorgan, Amazon, Goldman Sachs, Verizon, and dozens of other large organizations restricted or banned employee use of ChatGPT in the months that followed.

Here is what happened next: employees kept using it anyway.

An internal Samsung survey conducted after the ban found that 65% of respondents believed generative AI posed security risks. But belief in risk did not translate to behavior change. Employees who found ChatGPT made them dramatically more productive were not going to stop using it because their employer sent a memo. They would use personal devices, personal accounts, home networks.

Karla Grossenbacher, a partner at law firm Seyfarth Shaw, warned in a Bloomberg Law column that “prudent employers will include - in employee confidentiality agreements and policies - prohibitions on employees referring to or entering confidential, proprietary, or trade secret information into AI chatbots.” The legal framework was clear. The enforcement mechanism was not.

The data told the story. By early 2024, Cyberhaven’s follow-up research showed that the volume of corporate data placed into AI tools had skyrocketed 485% from March 2023 to March 2024. Bans had not stopped the data migration. They had driven it underground, where security teams had even less visibility than before.

The practitioner’s view: we built the wrong defenses

I spend my days architecting AI-powered systems at enterprise scale. When ChatGPT crossed 100 million users in January 2023, my first thought was not about the technology’s capabilities. It was about the data pipelines. I knew what was about to happen because I had watched it happen before - with cloud storage, with BYOD, with every technology adoption wave that moved faster than corporate governance.

The fundamental error was treating generative AI as a cloud application risk when it was actually a data classification problem. Your CASB could block api.openai.com at the network level. But it could not tell the difference between an employee asking ChatGPT to explain a Python error message and an employee pasting the company’s quarterly financial projections into the same interface. The content mattered. The destination was identical.

This distinction has implications for how we think about AI governance more broadly. The organizations that got ahead of this problem did not start by writing acceptable-use policies. They started by answering a harder question: what data do we have, where does it live, and which employees can access it? If you cannot classify your data before employees interact with AI tools, no amount of network-level blocking will protect you.

What enterprises should have done Monday morning

The playbook that actually worked - and that most organizations did not follow until the damage was done - involved five specific steps, none of which required purchasing a new product:

First, run a network audit for traffic to api.openai.com and chat.openai.com. In February 2023, almost no enterprise had visibility into this traffic. The organizations that checked found it immediately. Cyberhaven’s data confirmed that usage was already widespread in most organizations, whether sanctioned or not.

Second, create an AI acceptable-use policy before banning anything. Samsung’s experience proved that bans drive usage underground. A better approach was to define tiers: what data categories can go into AI tools (public documentation, general coding questions) and what categories never can (source code, financial data, customer PII, strategy documents).

Third, add generative AI endpoints to your DLP monitoring. Traditional pattern-matching would not catch everything, but content-aware DLP tools could flag when employees copied text from classified documents and pasted it into browser sessions connected to known AI platforms.

Fourth, survey your teams. Ask what AI tools they were already using - anonymously if needed. The results would be uncomfortable, but ignorance was more dangerous than knowledge. Every organization that conducted these surveys found usage rates far higher than leadership expected.

Fifth, establish sanctioned AI channels with guardrails rather than blanket prohibition. OpenAI’s eventual release of ChatGPT Enterprise in August 2023 addressed some concerns by guaranteeing that business data would not be used to train models. But eight months of unprotected employee usage had already occurred before that option existed. The enterprises that built their own internal AI interfaces with data classification guardrails - even crude ones - were better positioned than those that simply said “no.”

The 100 million user inflection point

February 2023 was the moment the data perimeter permanently changed. Not because ChatGPT was inherently dangerous, but because it revealed that enterprise data governance had been built for a world where sensitive information left through doors that security teams controlled. When 100 million people gained access to a tool that made pasting confidential data as easy as asking a question, the doors stopped mattering.

The organizations that recognized this early treated February 2023 as a classification problem, not a blocking problem. The organizations that did not are still discovering how much of their proprietary data reached external AI systems before anyone thought to look.

Nik Kale is a Principal Engineer and Product Architect with 17+ years of experience building AI-powered enterprise systems. He is a member of the Coalition for Secure AI (CoSAI), contributes to IETF AGNTCYworking groups, and serves on the ACM AISec and CCS Program Committee. The views expressed here are his own.

The perimeter assumption that quietly died

What DLP tools actually could not see

The case studies nobody wanted to become

Why bans did not work either

The practitioner’s view: we built the wrong defenses

What enterprises should have done Monday morning

The 100 million user inflection point

More like this

An AI Agent Just Pwned Trivy, Microsoft, and DataDog in One Week

88% of You Have Already Had an AI Agent Security Incident. The Other 12% Probably Don’t Know Yet.

AI Security Fundamentals: An Architectural Playbook

Palo Alto’s Unit 42 Just Found a Way to Hijack AI Agent Conversations - And Your Users Can’t See It Happening

MCP Just Got a New Home at the Linux Foundation - But Its Security Debt Followed It There

36.7% of MCP Servers May Be Vulnerable to SSRF - The Supply Chain Crisis Nobody's Talking About