What building your own RAG stack actually costs
The Curiosity Team
What building your own RAG stack actually costs
A RAG prototype is cheap to build: a weekend, a vector database, an API key. A version that runs in production for a whole company costs a lot more, and most of that cost isn't visible from the prototype. To see where it goes, it helps to first look at what a production system is actually made of.
What a production RAG system is made of
A prototype is one or two of these components. A production system needs all of them, integrated and kept running:
- Connectors and ingestion — pull content from each source, parse it, chunk it, and keep it in sync as it changes.
- Storage and indexing — a vector database for embeddings, usually alongside a keyword index.
- Embedding models — turn content into vectors, and re-embed when documents or models change.
- Retrieval — hybrid keyword-and-semantic search, often with a reranking step.
- Knowledge graph — the relationships between records, because company knowledge is connections, not just documents (why that matters).
- Permissions — resolve each user's access at retrieval time, so answers only use what that user is allowed to see (why this belongs at retrieval).
- LLM and orchestration — the model itself, plus query processing, prompt construction, caching, and fallbacks.
- Monitoring and evaluation — relevance tracking, quality checks, and audit logs.
- API and front end — how people and applications actually reach it.
None of these is exotic on its own. The cost is in building, integrating, and running all of them together.
Where the cost goes
Building it. A realistic first build is a team of about four engineers for a year. At €150k per engineer, that's roughly €600k before anything reaches users — and it assumes you can hire people who have built this before.
The opportunity cost of the delay. If the finished system would save the business, say, €2M a year, then every year spent building instead of running is €2M of value not yet realized. A one-year build carries about €2M in opportunity cost on top of the build itself. This is often the largest figure and the one least likely to appear in a budget.
Running it. Once live, the system needs maintenance: connectors break as source APIs change, retrieval needs tuning, models get upgraded. Budget at least one full-time engineer ongoing — about €150k a year — plus the security and compliance work that any system touching company data requires.
Component licenses. A build stitches together separate licensed products — a graph database, a search engine, a vector database, and others — each with its own recurring fee. A bought platform replaces those with a single license. Exact figures depend on vendors and scale, so there are no numbers here.
Costs that are the same either way. Some costs don't change with the decision. LLM inference runs around €50k a year, and infrastructure for large instances — compute and storage — around €100k a year, whether you build or buy. These aren't part of the trade-off; they're the cost of running the workload at all.
A few costs are easy to forget: hiring and ramp-up time for scarce specialists, the key-person risk when the people who built the system move on, security audits and certifications, and the migration work each time an underlying component or model changes.
Summary
| Cost | Type | Build |
|---|---|---|
| Initial development (~4 engineers × 1 year @ €150k) | NRC | ~€600k |
| Opportunity cost of a ~1-year delay (€2M/year in savings) | NRC | ~€2M |
| Maintenance and operations (≥1 engineer, ongoing) | RC | ~€150k/year |
| Component licenses (graph, search, vector DB, …) | RC | multiple licenses |
| LLM inference | RC | ~€50k/year |
| Infrastructure (compute and storage) | RC | ~€100k/year |
Inference and infrastructure cost the same whichever way you go. A bought platform still has its own license, and customization and upkeep aren't free either — but they come without the build, the opportunity cost of the delay, and the separate component licenses, which is where most of the difference sits.
The takeaway
Building can be the right choice. The point is that the honest figure isn't the build estimate alone — it's the build, the opportunity cost of the delay, the ongoing maintenance, and the stack of component licenses, on top of the inference and infrastructure you'd pay for either way. For how to weigh building against buying, see make or buy: the RAG stack decision.
Want to see the layer running on your own sources and access model? Talk to an engineer.
Referenced by
Read next
Articles on context graphs, enterprise search and industrial AI