Technical
Production Agent Patterns: What Survives Real Traffic
Every demo agent looks great. Put it in front of real users and most fall apart in a week. The patterns that survive production traffic are boring. They are also what I wish I had known six months ago before I shipped my first agent-powered endpoint.
Pattern One: Agents Are Not Request Handlers
The biggest mistake I made early was treating an agent call like a normal API call. Users click a button, the endpoint invokes the agent, the agent runs for 45 seconds, the request times out. Now I always put a queue between the user and the agent. Enqueue the job, return a job ID, poll for status. Users get instant feedback, the agent gets room to work, the whole thing survives spiky traffic.
Pattern Two: Every Tool Call Has a Timeout
Agents call tools. Tools call APIs. APIs can hang. Without explicit timeouts, a single slow external call burns your entire agent budget. I wrap every tool in a timeout and return a structured failure if it trips:
async def call_tool(name: str, args: dict) -> dict:
try:
return await asyncio.wait_for(
TOOL_REGISTRY[name](**args),
timeout=10.0,
)
except asyncio.TimeoutError:
return {'error': 'tool_timeout', 'tool': name}The agent sees the timeout, decides whether to retry with different arguments, skip that step, or fail out. What never happens is silent hanging.
Pattern Three: Budget Caps on Every Run
Every agent run has a hard cap on token spend and tool calls. If a run hits the cap it terminates with a structured error. Otherwise a buggy prompt can loop forever and quietly drain your budget overnight. Ask me how I know.
Typical caps I use:
- Max 20 tool calls per run
- Max $0.50 spend per run
- Max 5 minutes wall-clock time
Pattern Four: Everything Is Observable
I log every tool call, every model call, every decision point, with a run ID. When a user complains that the agent gave a weird answer, I can pull the run ID and replay the whole session. Without that, agents are a black box and debugging is guesswork.
These four patterns are the floor. They do not make an agent smart. They make a smart agent reliable. That is the difference between a demo and a product.
For production patterns, see the AWS Lambda best practices.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read