Playbook · Retrieval & RAG

What is Retrieval-Augmented Generation (RAG), and why is it important?

The trap here is giving a buzzword answer about embeddings and vector databases. The interviewer is really testing whether you understand what RAG buys in production: freshness, private knowledge access, attribution, and a cleaner failure model than asking the model to remember everything in its weights.

Senior High frequency 10 min read Free

Practical answer framework for AI engineer interview loops.

01Interview Context

02The 90-second answer

RAG combines retrieval with generation. Instead of answering from parametric memory alone, the system first fetches relevant external context and then asks the model to answer using that context. It matters because it lets you inject fresh or private knowledge without retraining the model every time the source material changes.

03Why teams use it in production

I describe RAG as a systems pattern, not a model feature. A useful pipeline usually has indexing, retrieval, optional reranking, prompt assembly, and generation. The value is that world knowledge lives outside the model weights. If the documents change daily or the knowledge is private, that is usually cheaper and safer than constant fine-tuning.

The other advantage is debuggability. When the answer is wrong, you can ask whether retrieval failed, reranking failed, or generation ignored the evidence. That is much easier to improve than a vague complaint that the model "knows the wrong thing."

04Weak vs Strong Answer

Weak answer

"RAG means you store embeddings in a vector database so the LLM hallucinates less."

Strong answer

"RAG matters because it externalizes knowledge from the model weights. That gives you freshness, private-data access, and cleaner debugging. But it only works if retrieval quality is high, so I would talk about chunking, reranking, and evaluation, not just the vector store."

05Where RAG actually fails

The weak mental model is that RAG automatically fixes hallucinations. It does not. If retrieval misses the right document, the model has nothing to stand on. If retrieval is correct but the prompt is weak, generation can still drift. That is why I separate retrieval quality from answer quality when I debug or evaluate the system.

Approach	When it works	Risk
Prompt only	Stable public knowledge and simple workflows	Goes stale or hallucinates
RAG	Fresh enterprise docs and private knowledge	Retrieval failures leak into answers
Fine-tuning	Stable repeated behavior change	Expensive to update frequently

06Follow-up questions to expect

When is RAG better than fine-tuning?
How would you measure retrieval quality separately from generation quality?
What would you do if the model still hallucinates even with the right context?

Next playbook

Design an AI-powered customer support chatbot.

11 min · System Design

→

Playbook stats

DifficultySenior

FrequencyHigh

Time to learn10 min

CategoryRetrieval & RAG

Best for

Who should study this.

AI Engineer, LLM Engineer, Backend Engineer

Run a mock on this exact topic.

Spoken answers, follow-ups, and the same kind of structure this playbook is teaching.

Start a session →