Python Performance Wins That Actually Pay Off

Python performance optimization is where developers waste the most time for the smallest gain. Ninety percent of advice on the internet is premature. After eight months of profiling real production Lambdas and APIs, here are the optimizations that actually moved the needle, and the ones that cost me a day for no reason.

Wins That Paid

Batch database calls. The single biggest win in serverless Python. A Lambda that made ten get_item calls in a loop ran 14x faster after I switched to a single batch_get_item. The cost: one evening rewriting the handler. The savings: 200ms per request, forever.

Cache the boto3 client. Creating a boto3.client('dynamodb') inside the handler instead of at module scope added 80ms cold and 20ms warm. Move it to module level. One line change. Free speed.

Use Pydantic v2, not v1. The serialization speed jump is real. On a JSON-heavy API, switching to v2 cut median response times by 30 percent. Migration is annoying; the Pydantic v2 guide covers it well.

Wins That Did Not Pay

Async everywhere. On small serverless workloads, async/await cost more than it saved. Cold start overhead and the switch cost outweighed the IO parallelism benefits. I now use async only where I have multiple concurrent IO ops in a single handler.

Replacing CPython with PyPy. Spent a day porting, could not use it on Lambda anyway, reverted. PyPy is great for long-running workers, not for serverless.

Micro-optimizations. Replacing list comprehensions with map(). Using __slots__ on models. Measurable but unobservable. Not worth the complexity in code I will maintain for years.

A Profiling Recipe That Works

python

import cProfile, pstats
 
prof = cProfile.Profile()
prof.enable()
handler(event, context)
prof.disable()
pstats.Stats(prof).sort_stats('cumulative').print_stats(20)

Run this once in local dev before optimizing anything. The top three lines almost always reveal the real bottleneck, and it is almost never what you guessed.

The Honest Rule

Profile first. Optimize what the profile shows. Ignore everything else. Eight months in, the wins are always the same three categories: database calls, client instantiation, and serialization. Stop looking for clever ones.

The Measurement Habit

I record baseline p50 and p95 latency for every endpoint at deploy time. A regression only exists relative to a baseline. Without numbers, every 'the site feels slow' comment becomes a fishing expedition. With numbers, I either see the regression in the dashboard or I do not, and I act accordingly. See AWS X-Ray for the serverless-native tracing tool I use for this baseline tracking.