From 50% to 95%: How We Taught AI to Read Relationships Instead of Documents
We had a 50 percent accuracy problem and no good theory for why.
Our RAG pipeline was doing everything right by the standard playbook. We’d chunked our documentation intelligently. We’d tuned embedding models. Our retrieval was pulling semantically relevant passages. The LLM was generating fluent, well-structured answers.
And half of them were wrong.
The answers weren’t random failures. They were systematically wrong in a specific way: the model would produce answers that sounded correct and used the right terminology but got the relationships between entities wrong. “Firmware X supports Controller Y” when the truth was “Firmware X supports Controller Z; Y requires a different build.”
It took us weeks to understand why. The answer was embarrassingly fundamental, and it changed how I think about retrieval architecture.
The problem with treating structure as text
Our Wireless Compatibility Matrix is a dense web of conditional relationships. Product A supports Product B if you’re running version C with feature set D enabled and you’re not using configuration E. These aren’t facts you can express in paragraphs: they’re graph relationships with conditions and directions.
When we chunked this data for RAG, we flattened those relationships into text. A chunk might contain the sentence “Firmware 17.9 supports Catalyst 9300 series switches” next to “DNAC integration requires firmware 17.9.3 or later.” Both sentences are true. Both would be retrieved for a query about Catalyst 9300 and DNAC compatibility. But the relationship between them, that 17.9 alone isn’t sufficient if you need DNAC, was lost in the chunking.
The retrieval worked. The chunks were relevant. But the structural information the model needed to answer correctly had been destroyed before the model ever saw it.
What GraphRAG actually means
The fix wasn’t incremental tuning. It was architectural.
We built a knowledge graph that preserved the explicit relationships between entities: products, firmware versions, features, configurations, known issues, and dependencies. Every relationship was typed and directional. “Supports,” “requires,” “conflicts with,” “deprecated by”; each relationship carried meaning the model could reason about.
Then we changed retrieval. Instead of finding text chunks that were semantically similar to the query, we traversed the graph to find relationship paths that connected the entities in the query. When someone asked about compatibility, the system didn’t search for documents: it walked the graph and returned the actual dependency chain.
We combined this with traditional retrieval: vector search for context, keyword matching for exact version strings, and graph traversal for relationships. The hybrid approach was more complex to build and maintain, but it preserved the one thing that mattered: the structure.
What 95% accuracy actually felt like
The jump wasn’t gradual. When we switched the Wireless Assistant from traditional RAG to GraphRAG, accuracy on complex multi-entity queries went from 50 percent to 95 percent almost immediately.
The first time I watched it work on a query that had been consistently wrong, a three-way compatibility check between firmware, controller, and access point, and produce the correct answer with the full dependency chain cited, I felt the kind of relief you feel when a theory you’ve been defending actually works in production.
The Wireless Assistant could now reason like an engineer. Ask it about firmware compatibility and it cross-checks versions, dependencies, and citations: not because it “understands” networking, but because the knowledge graph gives it the structure to trace relationships the way an engineer would.
That’s not a chatbot. It’s digital expertise on demand.
The principles that generalized
Three things we learned that apply beyond wireless compatibility:
First, if your data has structure, your retrieval must preserve that structure. Flattening relational data into text for RAG is information destruction. It’s like taking a map and converting it to prose directions: technically the same information, practically far less useful.
Second, hybrid retrieval isn’t optional for complex domains. Vector search alone can’t handle precise version matching. Keyword matching alone can’t handle semantic similarity. Graph traversal alone can’t handle unstructured context. You need all three, and you need them integrated rather than layered.
Third, domain-tuned models outperform generic models on domain-specific data. We saw measurable improvement when we fine-tuned on networking documentation. Generic models are impressive generalists, but for enterprise support where precision matters more than fluency, domain tuning is worth the investment.
The line I keep returning to: AI didn’t get smarter. We taught it to read relationships. That distinction, between making the model better and making the data architecture better, is where most enterprise AI projects should focus their effort.