Error Handling Patterns That Survived Production Load

Error handling is where code gets ugly in production. Every try/except block is a small architecture decision. After a year of real production traffic through my services, a few patterns consistently produced good outcomes and a few consistently produced pain. Here is the honest review.

Pattern 1: Fail Loud at the Boundary

At API boundaries, errors should be explicit, logged, and returned in a structured way. No silent failures. No 500 errors with no log trail.

python

@router.post('/subscribe')
async def subscribe(sub: SubscriberCreate):
    try:
        result = await create_subscriber(sub)
        return result
    except DuplicateSubscriber:
        raise HTTPException(status_code=409, detail='already subscribed')
    except EmailInvalid:
        raise HTTPException(status_code=400, detail='invalid email')
    except Exception as e:
        logger.exception('subscribe.failed', extra={'email': sub.email})
        raise HTTPException(status_code=500, detail='internal error')

Each exception type maps to a specific status code. Unknown exceptions log fully and return generic 500.

Pattern 2: Typed Exceptions in Business Logic

Inside the business logic layer, I define specific exception classes for specific failure modes. That lets the boundary layer map them to status codes without guessing.

python

class DuplicateSubscriber(Exception):
    pass
 
class EmailInvalid(Exception):
    pass

Small classes, clear names, specific meaning.

Pattern 3: Retry with Backoff for Transient Errors

For network calls, retry with exponential backoff. Most transient failures resolve in seconds. Most permanent failures fail fast. Backoff separates the two cleanly.

Pattern 4: Never Swallow Exceptions Silently

The worst anti-pattern I see: except: pass. It hides bugs for months and produces debugging sessions where nothing makes sense. Every exception caught must either be handled or re-raised with context. Never silently swallowed.

Pattern 5: Structured Error Responses

Every error response follows the same shape:

json

{
  "error": "invalid_email",
  "message": "The email address provided is not valid",
  "request_id": "abc123"
}

Machine-readable code plus human-readable message plus traceable ID. Three fields, every error.

What I Stopped Doing

Catching Exception at the top level and returning 200 (hides failures)
Adding try/except around every line (noise without signal)
Using boolean return values for failure (exceptions communicate better)
Logging errors without context (request_id, user_id, what happened)

The Meta-Pattern

Error handling is how you communicate failure. Communication requires clarity. Clarity requires specific types, specific codes, specific logs. Vague error handling produces vague debugging sessions.

Testing the Error Paths

The happy path is easy to test. The error paths are where bugs hide. I added tests for every expected error case this year and caught three real bugs that would have shipped to production. Error-path tests are not optional. They are the tests that actually earn their keep.

Client Error Messages

The machine-readable error code is for your frontend. The human-readable message is for your user. Keep them aligned. A 400 code with message invalid email format tells the frontend what to do and tells the user what happened. Generic bad request fails both audiences.

For the current FastAPI exception handling guidance, see the FastAPI documentation on custom exceptions.