Oct 10, 2023 AI in Production

The Brilliant Intern Who Hallucinated: What Happened When We Plugged an LLM Into Support Data

If you’re deploying LLMs against structured enterprise data, start with the data model. Map the relationships first. Then design retrieval that preserves those relationships.

In early 2024, we did what every enterprise was doing: we took an LLM, pointed it at our support data, and thought - what could go wrong?

We picked the Wireless Compatibility Matrix as our guinea pig. It seemed like a perfect first use case. Customers constantly asked questions about firmware compatibility, version dependencies, and hardware support. The answers existed in our documentation. All the AI had to do was find and present them.

The model answered confidently. And wrongly.

Not sometimes. Not on edge cases. On straightforward questions about which firmware version supported which access point, the model would produce plausible-sounding answers that were simply incorrect. It would confidently state that version X was compatible with hardware Y when the actual compatibility matrix said otherwise.

It was like watching a brilliant intern hallucinate through a technical review - smooth delivery, great vocabulary, completely wrong conclusions.

Why the obvious approach failed

The problem wasn’t the model. The problem was us - specifically, how we’d prepared the data.

Wireless compatibility isn’t prose. It’s structured logic. A compatibility matrix is a web of relationships: this firmware version supports these access points but not those controllers unless you’re running this specific build. The relationships are precise and directional. “A supports B” doesn’t mean “B supports A.”

When we fed this into a standard RAG pipeline, the retrieval step fragmented those relationships. The system would pull relevant text chunks - snippets that mentioned the right product names and version numbers - but lose the structural connections between them. It was giving the model puzzle pieces and asking it to guess the picture.

Traditional RAG gave snippets. We needed structure.

The drawing board

I won’t pretend the solution was obvious. We spent weeks diagnosing why the model was confident about wrong answers. Confidence scores were high precisely because the retrieved chunks weresemantically relevant - they contained the right terms. The model had no way to know the relationships between those terms were being lost in retrieval.

The breakthrough came when we stopped thinking about support data as documents and started thinking about it as a graph. Firmware versions, hardware models, software features, known issues - these aren’t paragraphs. They’re nodes in a network of relationships.

We merged RAG with knowledge graphs. Instead of retrieving text chunks, we retrieved structured relationship paths. When a customer asked “does firmware 17.9 support Catalyst 9300 with DNAC?” the system didn’t search for documents containing those terms. It traversed a graph of verified relationships and returned the actual dependency chain.

We called it GraphRAG. Instead of pages, the AI read maps.

The numbers that shut down the debate

Accuracy jumped from 50 percent to 95 percent on complex queries.

That number ended every internal argument about whether the additional complexity was worth it. At 50 percent, the system was a liability - worse than useless because it was confidently wrong. At 95 percent, it was genuinely useful, and the remaining 5 percent were cases where it knew enough to say “I’m not sure - here’s the closest match and a link to the full matrix.”

The line I kept repeating to the team: “AI didn’t get smarter. We taught it to read relationships.”

That distinction matters. Every enterprise I talk to is hitting the same wall - throwing LLMs at structured data and wondering why answers are unreliable. The model isn’t the bottleneck. The retrieval architecture is. If you flatten relational data into text for retrieval, you’ve already lost the information the model needs to answer correctly.

What the intern taught me

I still use the “brilliant intern” metaphor because it captures something important about LLMs in enterprise contexts. An intern who’s read every document in your company might sound impressively knowledgeable. But they don’t understand the relationships between the things they’ve read. They’ll confidently tell you that Product A is compatible with Product B because both appeared in the same document - not because they understand the actual dependency logic.

The fix isn’t to fire the intern. It’s to give them a better knowledge base - one that preserves relationships instead of flattening them.

If you’re deploying LLMs against structured enterprise data - compatibility matrices, configuration dependencies, version support tables, or anything where the relationships between entities matter as much as the entities themselves - don’t start with a standard RAG pipeline. Start with the data model. Map the relationships first. Then design retrieval that preserves those relationships.

We wasted three months learning this the hard way. You don’t have to.