Technical
Production Lambda Cold Starts: What Actually Moves the Needle
Every founder running on Lambda asks the same question after their first week in production: why is the first request so slow? Cold starts are the most common complaint about serverless, and the internet is full of bad advice. Here is what actually worked.
The Cold Start Budget
A cold start has three phases: container provisioning, runtime initialization, and your code's init. You cannot do much about the first. The second depends on language choice. The third is entirely on you. Four months into running Python Lambdas in production, I measured every phase and found my init code was eating 800ms of a 1200ms cold start. That is where the money was.
Three Changes That Mattered
First, lazy import everything outside the handler path. A single boto3 session import was costing me 400ms. Importing it inside the function that actually needed it dropped p99 cold start to 700ms.
Second, use Lambda SnapStart if you are on Java, or switch to a faster runtime if you are on Python and cold starts matter. I stayed on Python because warm invocations dominate my traffic, but I know when to break the rule.
Third, set provisioned concurrency only where it matters. For my API endpoints that back a live UI, I keep two warm instances. For background workers, I do not. Provisioned concurrency costs real money and most code paths do not need it.
# Before: top-level import, 400ms cold start cost
import boto3
client = boto3.client('dynamodb')
def handler(event, context):
return client.get_item(...)
# After: lazy init, cold starts drop under 700ms
_client = None
def get_client():
global _client
if _client is None:
import boto3
_client = boto3.client('dynamodb')
return _client
def handler(event, context):
return get_client().get_item(...)The Measurement Discipline
Do not optimize cold starts without data. CloudWatch Logs Insights can pull init duration out of your REPORT lines in two minutes. If you cannot see your cold start distribution, you cannot fix it.
The broader lesson is one I keep relearning: serverless gives you fantastic defaults, but production traffic exposes the edges. Measure first, then tune. See the Lambda execution environment docs for the full lifecycle breakdown.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read