Make or buy: the RAG stack decision every AI team faces
The Curiosity Team
Make or buy: the RAG stack decision every AI team faces
A retrieval-augmented generation prototype is quick to build. You pull a folder of documents, chunk them, embed the chunks, and wire the results into a model. Ask a question and you get a grounded answer with citations.
That prototype shows you can build a RAG stack. Whether to build or buy the production version is a separate decision, and it depends on factors the prototype doesn't reveal. This post lays out those factors so the trade-off is easier to reason about.
The demo covers the simple case
The prototype works because it handles the tractable version of the problem: one data source, one user, static files, no access rules. Production changes all four of those at once.
The data isn't one folder — it's SharePoint, Confluence, Jira, Drive, shared inboxes, and a few databases. It isn't static — documents change, tickets close, people join and leave. The users aren't one person — they're many, each allowed to see a different subset. That changes the scope from a script to a system that has to be maintained.
Wherever you start, you need all of it
Teams usually begin with the component that feels foundational — a vector database, an LLM, or a graph database — and treat the rest as wiring. But no single component is the system. Answering questions on company data takes ingestion, search, embeddings, a graph, permissions, a model, and a front end, working together. Whichever component you start from, you end up needing the others.
So the work isn't standing up any one component; it's integrating parts that weren't designed to work together into something reliable. That integration is what the build-or-buy decision is really about.
What the decision depends on
Five factors tend to determine which way it goes.
Time to market. Building takes longer than buying — often a year or more before anything is in use. The opportunity cost of not having the solution during that time depends on the value it provides once it's running.
Performance. Separate components communicate over the network, and a network call is roughly a thousand times slower than an in-memory one. A single query may make several of them — search, then graph, then permissions, then model. This has little effect on a demo but becomes significant at scale, and especially in air-gapped or offline deployments where managed services aren't available.
Security. A build relies on open-source dependencies, and the open-source supply chain has been a common source of attacks — the xz-utils backdoor, Log4Shell, and recurring npm and PyPI compromises. More dependencies mean more code to track, patch, and trust.
Cost. A build has an up-front cost and a recurring one. The underlying components — a graph database, a vector store, embedding calls — carry licenses or usage fees, and the system needs people to operate it. The total is usually higher than the initial build estimate. We break the ledger down in what building your own RAG stack actually costs.
Control. Building is often chosen for control. But the underlying components — the graph database, the model, the vector store — are things you buy or adopt rather than write yourself, so what a build gives you ownership of is mainly the integration between them. The part most teams want to control is the application their users interact with, and that can be customized whether the infrastructure beneath it is built or bought.
A useful question to ask
List the pieces of a production stack — connectors, freshness, permissions, retrieval quality, operations — and consider which of them your team wants to specialize in over the next year.
For a company whose product is search or retrieval, the answer may be all of them; owning the stack is owning the core product. For most other companies, these pieces are shared infrastructure rather than the product, and the differentiating work is the application built on top — the assistant, the internal tool, the workflow users actually see.
Where Curiosity fits
Curiosity is built as this layer: one system rather than a set of components wired together. It connects to enterprise sources, keeps the index current as data changes, resolves each user's permissions at retrieval time, and combines hybrid search with a knowledge graph, behind a single API you build your application on top of.
Whether to use it, build your own, or buy something else is your call. The aim here is to make the trade-offs visible, not to argue for one answer — some teams should own this infrastructure, and for them building is the right choice.
The takeaway
The prototype and the production system are different projects that share a first step. The build-or-buy decision depends on time to market, performance, security, cost, and where you need control. Weighing those factors for your own situation is the useful exercise.
Want to see the layer running on your own sources and access model? Talk to an engineer.
Referenced by
Read next
Articles on context graphs, enterprise search and industrial AI