Python Error Handling Patterns That Survive Production

Everyone writes try/except. Not everyone writes try/except that holds up when the system is under real load. Six months of production Python taught me which error patterns actually survive and which only look good in a tutorial. Here are the five that stuck.

Pattern One: Catch Specific, Not Bare

Bare except: catches everything including KeyboardInterrupt and SystemExit. That is almost always wrong. Catch the specific exception types you expect. Let everything else propagate to something that can decide what to do with it.

python

try:
    response = httpx.get(url, timeout=5.0)
except httpx.TimeoutException:
    return fallback_data()
except httpx.HTTPError as e:
    log.error('http_error', exc_info=e)
    raise

Two different handlers for two different problems. Timeouts get a fallback. Other HTTP errors get logged and re-raised. No silent swallowing.

Pattern Two: Errors Have Structure

Every caught error produces a structured log line with an error code, not just a stack trace. That lets me query "how many payment_declined errors in the last hour" without parsing stack traces:

python

log.error('payment_declined', code=e.code, request_id=rid)

Pattern Three: Always Include Request Context

Every error log carries the request ID, user ID, and relevant inputs. Errors without context are useless. I use a context-local dict that gets merged into every log entry, so I do not have to remember to pass it through.

Pattern Four: Retryable vs Non-Retryable

I wrap my own exception hierarchy around external failures:

python

class AppError(Exception): pass
class RetryableError(AppError): pass
class FatalError(AppError): pass

Retry logic checks the class, not the message. That keeps my retry policy explicit and testable. Network timeouts are retryable. Validation errors are fatal. Nothing is guessed at.

Pattern Five: Fail Loud at the Boundary

Inside my code I handle what I expect. At the API boundary, I have a single handler that catches anything that escaped and returns a clean 500 with a request ID. The user gets a reasonable error message. I get a log entry. The process does not crash. The other 99% of requests keep serving.

These five patterns are not exciting. They are just the floor. But they are the difference between a system that runs for a year and one you rewrite every quarter.

The Python logging documentation covers the structured logging setup.

Python Error Handling Patterns That Survive Production

Pattern One: Catch Specific, Not Bare

Pattern Two: Errors Have Structure

Pattern Three: Always Include Request Context

Pattern Four: Retryable vs Non-Retryable

Pattern Five: Fail Loud at the Boundary

The Consulting Shift I Am Making In Year Two

The Frontend Shift: Shipping Less JavaScript In Year Two

The Serverless Lesson I Would Write On A Sticky Note