Technical
Dataclasses or Pydantic: How I Actually Choose
Both dataclasses and Pydantic models describe structured data in Python. Developers pick between them based on vibes. After using both across a dozen services, I have a clear decision rule that has not led me wrong.
The Boundary Test
If the data crosses a trust boundary, use Pydantic. If it stays inside your process, use a dataclass.
A trust boundary is anywhere data comes from outside your code: HTTP bodies, database rows, message queues, environment variables, config files. At that boundary, you need validation, coercion, and clear error messages. That is Pydantic's job.
Inside your own code, data is already clean. Using Pydantic there adds overhead and ceremony with no benefit. Dataclasses are lighter, faster to construct, and easier to read.
A Concrete Example
from pydantic import BaseModel
from dataclasses import dataclass
# crosses HTTP boundary: Pydantic
class CreatePostRequest(BaseModel):
title: str
slug: str
categories: list[str]
# internal domain object: dataclass
@dataclass(frozen=True)
class Post:
title: str
slug: str
categories: tuple[str, ...]
published_at: datetime
def handle_create(req: CreatePostRequest) -> Post:
# Pydantic did the validation on the way in
# Dataclass carries the cleaned domain state forward
return Post(
title=req.title,
slug=req.slug,
categories=tuple(req.categories),
published_at=datetime.now(timezone.utc),
)Two shapes, one clear rule, clean separation of concerns.
Why frozen=True Matters
Internal domain objects should be immutable. Mutations are where bugs hide. frozen=True on dataclasses gives you immutability free. Pydantic models also support this via model_config, but dataclasses make it the easy default.
What I Used to Do, Wrong
I used Pydantic everywhere because it was what the FastAPI tutorial showed. The result: internal code that ran validation on already-valid data, on every function call, forever. Embarrassing profile reports once I looked. See the Python Patterns catalogue for the general principle: layer your data shapes by trust level.
The boundary test is the whole rule. Write it on a sticky note.
The Conversion Cost
Converting between the two at the boundary is cheap if both shapes are thin. Keep Pydantic models and dataclasses close in structure. A five-line converter function at the boundary beats a clever shared base class every time. The Zen of Python applies here: flat is better than nested; explicit is better than implicit. Two shapes you can read in ten seconds beat one shape you have to chase through three files.
What I Do When the Shapes Diverge
Sometimes the incoming request shape needs a field the domain does not, or the domain carries fields the request does not. That divergence is a signal, not a problem. The request shape reflects the transport. The domain shape reflects the business. They are allowed to disagree. Forcing them to match creates leaks: request-only fields ending up in the domain, or domain-only fields being exposed in the API. Keep them separate and let the converter function carry the translation cost in one visible place.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read