Build vs. buy: the real cost of a DIY RAG stack
The Curiosity Team
Build vs. buy: the real cost of a DIY RAG stack
A retrieval-augmented generation prototype is one of the most satisfying things you can build in an afternoon. Pull a folder of documents, chunk them, embed the chunks into a vector database, and wire the results into a model. Ask a question, get a grounded answer with citations. It feels finished.
Then you try to roll it out to the company, and the afternoon project turns into a roadmap. The gap between "it works on my documents" and "it works for everyone, on all our data, safely" is where most of the cost lives — and it's almost entirely invisible from the demo.
The demo is the easy 20%
The reason the prototype feels done is that it solves the tractable part of the problem. One data source, one user, static files, no access rules. Under those conditions RAG is close to a solved problem, and the off-the-shelf pieces fit together cleanly.
Production removes every one of those simplifying assumptions at once:
- The data isn't one folder. It's SharePoint, Confluence, Jira, Google Drive, a dozen shared inboxes, a support tool, and three databases.
- The data isn't static. Documents change hourly, tickets close, people leave, permissions shift.
- The users aren't one person. They're thousands of people who are each allowed to see a different slice of the whole.
None of that is exotic. It's just the normal state of a company. But each one turns a weekend script into a system you have to own.
Where the hidden work actually is
If you decide to build the production version yourself, here is the work that doesn't show up in the prototype.
Connectors and ingestion. Every source has its own API, auth model, rate limits, and idea of what a "document" is. Writing one connector is easy. Keeping twenty connectors working as those APIs change, backfilling history, and syncing incrementally so you're not re-indexing everything every night — that's a standing team, not a task.
Freshness. A vector index is a snapshot. The moment a document changes, your snapshot is wrong, and a confidently cited answer from a stale chunk is worse than no answer. Keeping the index in step with reality means change detection, re-embedding, and cache invalidation across every source, continuously.
Permissions. This is the one that quietly sinks projects. In a company, who can see what is the whole game, and access lives in the source systems, not in your vector store. A DIY stack has to fetch each user's entitlements, keep them current as they change, and enforce them at retrieval time — not as a filter on the output, which leaks (we wrote about why that fails). Getting this wrong isn't a bug; it's an incident.
Retrieval quality. Pure vector search is good at meaning and bad at exact terms — product codes, error strings, names, dates. Real queries need both, which means hybrid keyword-and-semantic retrieval, plus the relationships between records that pure text matching can't see. Tuning this is ongoing work, not a default setting.
Everything around the model. Monitoring, evaluation, relevance feedback, audit logs, deployment that satisfies your security team. The model call is the last five percent. The other ninety-five is plumbing that has to be reliable.
The honest build-vs-buy question
None of this means you should never build. Plenty of teams should. The useful question isn't "can we build a RAG stack" — you can. It's:
Is retrieval infrastructure the thing our team should be spending its time maintaining?
For a company whose product is search or retrieval, the answer may well be yes. Owning the stack is owning your core. For nearly everyone else, the connectors, the permission syncing, and the freshness pipeline are undifferentiated heavy lifting. The differentiated work is the application you build on top — the assistant, the internal tool, the workflow your users actually see.
A good way to decide: list the five items above and ask which ones your team wants to be an expert in a year from now. The ones you don't are the ones to buy.
What buying the layer looks like
This is the layer Curiosity is built to be. It connects to enterprise sources, keeps the index current as data changes, resolves each user's permissions at retrieval time, and combines hybrid search with a knowledge graph so answers are grounded in the relationships between people, projects, and documents — not just matching text. You reach it through an API and build your application on top.
The point isn't that building is wrong. It's that most of the RAG stack is infrastructure every company needs and no company's users care about. Buying that layer moves your team's time back to the part that's actually yours.
The takeaway
The RAG prototype you can build in a weekend and the RAG system that survives a company are different projects that happen to share a first step. Before you commit to building, cost the invisible 80% — connectors, freshness, permissions, retrieval quality, and operations — and decide honestly whether that's where your team's time belongs.
Want to see the layer running on your own sources and access model? Talk to an engineer.
Read next
Articles on context graphs, enterprise search and industrial AI