Production Agent Patterns: What Survives Real Traffic

Every demo agent looks great. Put it in front of real users and most fall apart in a week. The patterns that survive production traffic are boring. They are also what I wish I had known six months ago before I shipped my first agent-powered endpoint.

Pattern One: Agents Are Not Request Handlers

The biggest mistake I made early was treating an agent call like a normal API call. Users click a button, the endpoint invokes the agent, the agent runs for 45 seconds, the request times out. Now I always put a queue between the user and the agent. Enqueue the job, return a job ID, poll for status. Users get instant feedback, the agent gets room to work, the whole thing survives spiky traffic.

Pattern Two: Every Tool Call Has a Timeout

Agents call tools. Tools call APIs. APIs can hang. Without explicit timeouts, a single slow external call burns your entire agent budget. I wrap every tool in a timeout and return a structured failure if it trips:

python

async def call_tool(name: str, args: dict) -> dict:
    try:
        return await asyncio.wait_for(
            TOOL_REGISTRY[name](**args),
            timeout=10.0,
        )
    except asyncio.TimeoutError:
        return {'error': 'tool_timeout', 'tool': name}

The agent sees the timeout, decides whether to retry with different arguments, skip that step, or fail out. What never happens is silent hanging.

Pattern Three: Budget Caps on Every Run

Every agent run has a hard cap on token spend and tool calls. If a run hits the cap it terminates with a structured error. Otherwise a buggy prompt can loop forever and quietly drain your budget overnight. Ask me how I know.

Typical caps I use:

Max 20 tool calls per run
Max $0.50 spend per run
Max 5 minutes wall-clock time

Pattern Four: Everything Is Observable

I log every tool call, every model call, every decision point, with a run ID. When a user complains that the agent gave a weird answer, I can pull the run ID and replay the whole session. Without that, agents are a black box and debugging is guesswork.

These four patterns are the floor. They do not make an agent smart. They make a smart agent reliable. That is the difference between a demo and a product.

For production patterns, see the AWS Lambda best practices.

Production Agent Patterns: What Survives Real Traffic

Pattern One: Agents Are Not Request Handlers

Pattern Two: Every Tool Call Has a Timeout

Pattern Three: Budget Caps on Every Run

Pattern Four: Everything Is Observable

The Consulting Shift I Am Making In Year Two

The Frontend Shift: Shipping Less JavaScript In Year Two

The Serverless Lesson I Would Write On A Sticky Note