All playbooks / Retrieval & RAG

Playbook · Retrieval & RAG

What is Retrieval-Augmented Generation (RAG), and why is it important?

The trap here is giving a buzzword answer about embeddings and vector databases. The interviewer is really testing whether you understand what RAG buys in production: freshness, private knowledge access, attribution, and a cleaner failure model than asking the model to remember everything in its weights.

Senior High frequency 10 min read Free
Practical answer framework for AI engineer interview loops.

01Interview Context

The trap here is giving a buzzword answer about embeddings and vector databases. The interviewer is really testing whether you understand what RAG buys in production: freshness, private knowledge access, attribution, and a cleaner failure model than asking the model to remember everything in its weights.

02The 90-second answer

RAG combines retrieval with generation. Instead of answering from parametric memory alone, the system first fetches relevant external context and then asks the model to answer using that context. It matters because it lets you inject fresh or private knowledge without retraining the model every time the source material changes.

03Why teams use it in production

I describe RAG as a systems pattern, not a model feature. A useful pipeline usually has indexing, retrieval, optional reranking, prompt assembly, and generation. The value is that world knowledge lives outside the model weights. If the documents change daily or the knowledge is private, that is usually cheaper and safer than constant fine-tuning.

The other advantage is debuggability. When the answer is wrong, you can ask whether retrieval failed, reranking failed, or generation ignored the evidence. That is much easier to improve than a vague complaint that the model "knows the wrong thing."

04Weak vs Strong Answer

Weak answer

"RAG means you store embeddings in a vector database so the LLM hallucinates less."

Strong answer

"RAG matters because it externalizes knowledge from the model weights. That gives you freshness, private-data access, and cleaner debugging. But it only works if retrieval quality is high, so I would talk about chunking, reranking, and evaluation, not just the vector store."

05Where RAG actually fails

The weak mental model is that RAG automatically fixes hallucinations. It does not. If retrieval misses the right document, the model has nothing to stand on. If retrieval is correct but the prompt is weak, generation can still drift. That is why I separate retrieval quality from answer quality when I debug or evaluate the system.

Approach When it works Risk
Prompt only Stable public knowledge and simple workflows Goes stale or hallucinates
RAG Fresh enterprise docs and private knowledge Retrieval failures leak into answers
Fine-tuning Stable repeated behavior change Expensive to update frequently

06Follow-up questions to expect

  1. When is RAG better than fine-tuning?
  2. How would you measure retrieval quality separately from generation quality?
  3. What would you do if the model still hallucinates even with the right context?
Next playbook

Design an AI-powered customer support chatbot.

11 min · System Design