Technical
What Breaks In Production Agents
Every agent I have put in production has broken at some point. The breakages are not random. They cluster around three failure modes.
The business problem is trust. If a client has to babysit an agent, they will cancel the contract. Agents earn their keep by running unattended. Failures that surface loudly are the whole point.
Failure one: tool timeouts
Your agent calls an API. The API is slow that day. The default timeout is way too long. The agent hangs. The user waits. The log says nothing.
Fix: every tool call wrapped in an explicit timeout shorter than the agent step budget.
import requests
def safe_call(url, timeout=10):
try:
return requests.get(url, timeout=timeout).json()
except requests.Timeout:
return {'error': 'timeout', 'url': url}Failure two: fuzzy stop conditions
The agent is told to stop when the task is done. The LLM cannot decide when done is done. The loop runs to max steps. Your bill doubles.
Fix: stop conditions must be checkable by code. Queue empty. File written. HTTP 200 received.
Failure three: unhandled errors poisoning the trajectory
One tool raises. The next step receives a traceback as context. The agent tries to reason about it and makes things worse.
Fix: every error converted to a short structured string before it goes back into the agent's history. Never let raw stack traces into context.
The watchlist
- Max steps hard cap
- Max wall clock hard cap
- Every tool call logged with latency
- Errors structured, not raw
- Stop conditions checkable by code
The story
I once had an agent run for seventeen minutes because a single tool call was silently retrying. The client noticed first. That was the last time I shipped without a wall clock cap.
Further reading
Hamel Husain on evals is the best writing I know of on how to keep an agent honest in production. Read it twice.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read