All playbooks / AI Agents

Playbook · AI Agents

What is an AI agent, and how does it differ from a simple LLM call?

The trap here is agent hype. A weak answer treats an agent like an LLM plus tools. The interviewer is usually probing whether you understand the operational difference between one-shot generation and a loop that can choose actions, mutate state, spend budget, and fail repeatedly before it finishes.

Senior High frequency 14 min read Free
Practical answer framework for AI engineer interview loops.

01Interview Context

The trap here is agent hype. A weak answer treats an agent like an LLM plus tools. The interviewer is usually probing whether you understand the operational difference between one-shot generation and a loop that can choose actions, mutate state, spend budget, and fail repeatedly before it finishes.

Senior and staff interviewers also probe restraint. The best candidates know when not to use an agent.

02The 90-second answer

A simple LLM call takes input and returns output once. An agent wraps the model inside a control loop. That loop decides whether to call tools, updates its state from the results, and stops only when it reaches a completion condition or hits a guardrail. The real difference is not the model. It is the orchestration logic around the model.

My short production stance: agents are useful when the task shape is uncertain and tool choice has to happen at runtime. If the workflow is already known, I would rather build a deterministic pipeline — it is cheaper, faster, and easier to test.

03Weak vs Strong Answer

Weak answer

"An AI agent is an LLM with access to tools so it can do more than chat."

Strong answer

"The difference is control flow. A simple LLM call returns once. An agent maintains state, chooses whether to act, uses tools, reacts to tool outputs, and needs explicit stop conditions. That flexibility is valuable, but only if you add limits, validation, and observability around the loop."

04What the loop actually does

The real difference is a for loop and a step budget:

def run_agent(task, tools, max_steps=10):
    messages = [{"role": "user", "content": task}]
    for step in range(max_steps):
        response = llm.complete(messages, tools=tools)
        if response.stop_reason == "end_turn":
            return response.text
        tool_call = response.tool_use
        validate_schema(tool_call)             # catch hallucinated params here
        result = dispatch(tool_call)           # real-world side effect
        messages.append(tool_call)
        messages.append(tool_result(result))
    raise StepBudgetExceeded(f"agent did not finish in {max_steps} steps")

The max_steps guard is not optional. Without it, a model that misreads a tool result will keep calling tools until something external kills the process. The schema validation is not optional either — raw model output should never reach a destructive API call directly.

A single LLM call has none of this:

result = llm.complete(prompt)
return result.text

That is the whole architecture. The moment you add a loop, you add state, sequential tool calls, budget consumption, and the possibility of the model making a bad decision that compounds across steps. The loop is not complexity for its own sake. It is what makes the agent capable of adapting mid-task. The cost is that every loop iteration is a new opportunity for things to go wrong.

Push as much logic as possible out of the loop. Guardrails, schema validation, permission checks — these belong at the boundary, before execution, not inside the reasoning step where the model might talk itself past them.

05Production failure modes

These are failure modes a single LLM call never has:

  • Infinite loops: the model misinterprets each tool result as insufficient progress and keeps calling. The fix is a hard step budget, not a prompt asking the model to stop when done.
  • Parameter hallucination: the model generates plausible-looking but invalid arguments for a tool call. This is where schema validation earns its cost — validate before dispatch, not after.
  • Cascading wrong turns: a bad tool call on step 2 corrupts the context for steps 3–10. The model continues confidently wrong. Intermediate logging and checkpoints are the only way to catch this without replaying the full run.
  • Budget exhaustion: each loop iteration burns tokens, and token cost compounds with context length. Long agent runs are far more expensive than they look. Log cost per run, not just aggregate cost, so you can see which task shapes bleed money.
  • Irreversible side effects: write, delete, and send operations cannot be undone by asking the model nicely afterward. Scope tool permissions tightly and require explicit approval before any action that cannot be rolled back.

06When I would not use one

If I know the sequence of steps ahead of time, I would rather build a workflow graph or a plain service pipeline.

Approach Best fit Main risk
Single LLM call Classification, extraction, summarization Not enough control for multi-step tasks
Workflow graph Fixed sequence with predictable branching Less adaptable when task shape changes
Agent loop Open-ended tasks with tool choice under uncertainty Loops, bad tool calls, unsafe actions

The point is not that agents are bad. The point is that flexibility is expensive, and you should only buy it when the task actually needs it.

07Tradeoffs interviewers probe

  • Flexibility vs. reliability: every degree of freedom you give an agent is a degree of freedom it can misuse. The more dynamic the tool selection, the harder the system is to test, because the path through the loop is not fixed. Deterministic pipelines are boring and robust; agents are flexible and fragile. Build the deterministic version first, and introduce the loop only when the task genuinely cannot be pre-planned.
  • Context growth vs. coherence: each loop iteration appends tool calls and results to the context. Long contexts cost more, slow inference, and eventually overflow the window. Summarization and scratchpad pruning help, but each compression is another place where the agent can lose something it needed.
  • Latency: a single LLM call might take 1–2 seconds. An agent doing six sequential tool calls might take 20 seconds or more. That is often acceptable for async batch tasks and usually not acceptable when a user is waiting for a response.
  • Observability cost vs. debuggability: logging every tool call, every model output, and every intermediate state produces a lot of data. You need it to debug a misbehaving agent. The engineers who skip it regret it on the first production incident. Plan the trace format before you ship.

08Follow-up questions to expect

  1. When would you replace an agent with a fixed workflow?
  2. How would you stop an agent from looping forever?
  3. What would you log to debug a bad tool-using agent in production?
  4. How would you validate tool parameters before executing them?
  5. What is your strategy when an agent needs to take an irreversible action?
  6. How would you handle state if an agent run spans multiple requests or takes minutes to complete?
  7. How do you test an agent without running it against live tools?
  8. At what point would you add a human-in-the-loop approval gate, and how would you implement it?
Next playbook

What is evaluation-driven development for AI applications?

9 min · Evaluation