Prompting, RAG, fine-tuning: these three approaches solve different problems, yet they are constantly confused. The costly reflex is to fine-tune when a good prompt or RAG would do. The engineering rule is simple: start with the cheapest, move up in complexity only if the need demands it.
Prompting: the starting point, almost free
Prompt engineering is about framing the request and context given to the model well. It is fast, cheap and often enough. If your need is to rewrite, classify, extract or generate text from clear instructions, start here. Typical budget: a few days of engineering, then usage cost only.
- Best for: writing, classification, extraction, on-demand synthesis
- Timeline: a few days to two weeks
- Build cost: low (EUR 1k to 5k depending on scope)
- Limit: the model does not know your internal data or your very specific cases
RAG: giving your data to the model
RAG (retrieval-augmented generation) connects the model to your knowledge base: documents, product sheets, history. For each question, the system retrieves the relevant passages and feeds them to the model. It is the right answer when the problem is one of knowledge, not behaviour. The vast majority of SMB projects are RAG, not fine-tuning.
When the model must know what you know, you want RAG. When it must behave differently, you think fine-tuning.
- Best for: support over your docs, internal search, sourced domain assistant
- Timeline: two to eight weeks depending on data quality
- Build cost: medium (EUR 8k to 30k), dominated by data preparation
- Key advantage: easy updates, sourced and verifiable answers
Fine-tuning: the last resort, the most expensive
Fine-tuning retrains a model on your examples to change its behaviour: a very specific tone, a rigid output format, a niche task repeated at very high volume. It requires a quality example dataset, serious evaluation work and maintenance every time the base model evolves. Without high, stable volume, it is rarely profitable.
- Best for: very specific style/format, niche task at very high volume, latency/cost to optimise at scale
- Timeline: several weeks to several months
- Build cost: high (from EUR 25k to 30k), plus a maintenance debt
- Prerequisites: a clean example dataset and an evaluation protocol
The decision tree in practice
Our approach is always the same: we try prompting first. If the model needs your knowledge, we move to RAG. We only consider fine-tuning if, with RAG in place, a behaviour or scale problem remains that nothing else solves. Nine times out of ten we stop before fine-tuning, and the budget is divided by three or four.
Challenge first, then build: the right technical choice is the one that solves the problem at the lowest cost, not the most sophisticated.
A combination is often the best answer: solid RAG with careful prompting covers nearly all B2B needs in 2026. Unsure which approach fits your case? We frame it with you, budget in hand, before writing a single line of code: contact@nexus-os.fr.