Back to writing

DDD is iterative: design aggregates around constraints, not data

Aggregates aren't designed once. They get refined when real pressure forces it. The mistake: grouping fields because they share a row, not because they share an invariant.

ddd architecture aggregate-design

We had an Order aggregate in a B2B marketplace backend. It started as a clean domain object representing a purchase order: line items, pricing, tax calculations. Over the months, the business grew, and so did the aggregate. The fulfillment team needed shipping status and tracking on the order. The compliance team needed export classifications and audit flags. Three independent business concerns ended up sharing one aggregate because they shared one database row.

Then we got a concurrency bug in production. Two event handlers were both reacting to an OrderPaidEvent. One handler was writing fulfillment_status = READY_TO_SHIP via a narrow column update. The other was loading the full order row, recalculating tax after a post-payment discount, and writing the whole row back. The whole-row update was working from a stale snapshot where fulfillment_status was still null, and it clobbered the just-set READY_TO_SHIP back to nothing. Orders were getting stuck in a limbo state where payment had gone through but fulfillment never kicked in.

The hotfix was straightforward: exclude fulfillment_status from the destructure so the tax handler wouldn’t touch it. It worked for that specific field. But the pattern was fragile: every new field written by a concurrent handler would need the same exclusion, and sooner or later someone would forget.

The real problem wasn’t the race. It was the aggregate.

One aggregate, three concerns, zero shared invariants

When I looked at what the Order aggregate was actually doing, three independent clusters of fields became obvious.

Pricing owned amount, tax rate, line items, discounts, and currency. Its invariants: the total must equal the sum of line items minus discounts, the tax rate must be valid for the buyer’s jurisdiction, discounts can’t exceed the order total.

Fulfillment owned fulfillment status, tracking number, and shipping provider reference. Its invariants: status transitions follow a defined lifecycle (pending, ready to ship, shipped, delivered), tracking number is required once status moves to shipped.

Compliance owned export classification codes, customs declarations, and audit flags. Its invariants: export codes must be valid for the destination country, audit flags can’t be removed once set.

These three sets of invariants are completely disjoint. No business rule says “the tax rate must be consistent with the fulfillment status.” No rule connects the export classification to the shipping provider. They were grouped into one aggregate because they happened to live on the same database row, not because they belonged together logically.

The race condition was just the symptom that made the structural problem visible. An aggregate that tries to arbitrate three independent consistency boundaries can’t enforce any of them well. Its repository becomes a god-writer that touches every column on every save, and any two handlers reacting to the same event will compete for the same row.

The data-first trap

How does a reasonable team end up with a god-aggregate? The persistence layer makes it easy.

In TypeORM, Hibernate, Drizzle, or ActiveRecord codebases, the default mental model is “one table row equals one entity equals one aggregate.” This model is wrong but seductive. It’s wrong because consistency boundaries don’t follow table boundaries; they follow invariants. It’s seductive because the ORM makes data-first groupings cheap to express. Adding a column to an existing entity costs one migration and one line in the mapper. Creating a second aggregate over the same row costs a new class, a new mapper, a new repository. The path of least resistance always points toward the existing aggregate.

Matthias Noback makes this point well in DDD Entities and ORM Entities: if you’re relying on lazy-loading for something in your aggregate, you’ve probably modeled it incorrectly. The ORM’s convenience mechanisms make it nearly invisible when an aggregate has grown beyond its proper boundary. Everything still “works,” so nobody questions the scope.

The trap closes when a third or fourth concern lands on the same row. By then you have a god-aggregate, multiple writers competing for it, and an incident waiting to happen.

Eric Evans defined an aggregate as “a cluster of associated objects we treat as a unit for the purpose of data changes.” The key phrase is for the purpose of data changes, which means consistency boundary. The definition is clear; the drift happens in implementation, where “associated” gets silently reinterpreted as “stored together.”

Aggregates are consistency boundaries, not data containers

Vaughn Vernon sharpened Evans’ definition into a practical rule: model true invariants in consistency boundaries. An invariant is a business rule that must be atomically true after every operation. If two fields have no rule connecting them, there is no reason they must be in the same aggregate.

Vernon’s “Effective Aggregate Design” series is the most practical guide on this topic. His central warning: “a large-cluster aggregate will never perform or scale well and is more likely to become a nightmare leading only to failure.” The large cluster usually forms not from deliberate design but from false invariants, things the team assumes must be consistent together without ever verifying that a real business rule requires it.

The diagnostic is simple. For each group of fields in your aggregate, ask: “Is there a business rule that requires these fields to be consistent with that other group in the same transaction?” If the answer is no, those groups belong in different aggregates.

Applied to our order case:

Field clusterInvariant setShared invariants with other clusters?
amount, tax, line_items, discountsPricing rulesNo
fulfillment_status, tracking_numberFulfillment lifecycleNo
export_code, customs_declaration, audit_flagsCompliance rulesNo

Three disjoint invariant sets. Three aggregates. The fact that they share a database row is an implementation detail, not a domain truth.

Same row, two aggregates, disjoint columns

What we ended up doing is having multiple aggregates share the same physical table, each owning a disjoint subset of columns. The orders row stays as it is. But instead of one Order aggregate loading and writing the whole thing, you get three smaller aggregates, each with its own mapper that projects only its columns and its own repository that writes only its columns.

orders table (same physical row)
├── pricing.Order aggregate
│   ├── owns: amount, tax_rate, line_items, discounts, currency
│   └── repository: writes only pricing columns

├── fulfillment.Order aggregate
│   ├── owns: fulfillment_status, tracking_number, shipping_provider_id
│   └── repository: writes only fulfillment columns

└── compliance.Order aggregate
    ├── owns: export_code, customs_declaration, audit_flags
    └── repository: writes only compliance columns

Now each listener on OrderPaidEvent writes to its own column set, and no update can clobber another. The race is gone, not because we serialized the handlers, but because the writes don’t overlap anymore.

This might feel like a weird thing to do if you’re used to thinking of the schema as the source of truth for aggregate boundaries. But the schema is just row layout; it’s infrastructure. Aggregates are about behavior and invariants, and they’re each owned by a bounded context. There’s no reason those two things have to align. Vernon’s starting recommendation is actually to begin with each entity as its own aggregate root and only merge them when domain experts confirm they must change in the same transaction. Two aggregates mapping to the same table is just what happens when their column groupings are disjoint.

There’s a real cost to this, of course. You end up with multiple mappers and multiple repositories where there used to be one, and the team has to enforce a discipline: each aggregate only reads and writes its own columns. If a shared invariant emerges later, say a new business rule that connects pricing to fulfillment status, that’s the signal to either reunify or split the table properly and mediate via events. And as Kamil Grzybek notes, eventual consistency across split aggregates requires message brokers, idempotency, and monitoring. So you don’t do this preemptively. You do it when the aggregate is actually causing problems.

Refinement, not big-design-up-front

There’s a standard narrative around DDD that presents aggregate design as an upfront discipline: gather around a whiteboard, event-storm the domain, identify aggregates, ship. This narrative is misleading because it implies that with enough upfront effort, you can get the aggregates right the first time.

You can’t. Not because you’re not skilled enough, but because you don’t have enough information yet. You don’t know which concerns will land on the row until they do. You don’t know which invariant sets will be disjoint until the domain has evolved enough to reveal it.

I wrote about this more broadly in Your backend architecture should evolve, not be designed upfront: architecture decisions made too early are guesses about how the system will evolve, and those guesses are almost always wrong in ways that matter. The same principle applies at the aggregate level.

The realistic approach is: start small, ship, observe. Design the smallest aggregate that captures today’s known invariants. When a new concern appears, ask whether it shares invariants with the existing aggregate or not. If yes, fold it in. If no, separate aggregate. The DDD facilitator skill I built for Claude Code explicitly encodes this question as an opinion anchor during aggregate design: “Start small. Propose the smallest possible aggregate, then let the user argue for merging based on invariants.”

In our case, the right time to split the order aggregate was when we did it, not earlier. When the aggregate was first created, it genuinely was one concern (pricing). The fulfillment fields arrived months later; the compliance fields after that. At each step, adding to the existing aggregate was the correct call, because the overhead of splitting wasn’t justified by any observed pain. Only when the race condition surfaced did the split become clearly worth its cost.

Recognizing the pressure

If aggregate refinement is triggered by pressure rather than planning, recognizing the pressure becomes the critical skill. I’ve seen three concrete signals.

An incident. That’s what happened to us: two handlers, one row, lost update. This is the loudest signal. When you trace a concurrency issue to two unrelated operations contending on the same aggregate, the aggregate is doing too much.

A new business concept that doesn’t fit. A feature request arrives and it feels awkward to implement on the existing aggregate. You find yourself adding fields and methods that have nothing to do with the aggregate’s existing invariants. The new concept uses the same data but constrains it differently. That friction is a signal that the new concept belongs in its own aggregate.

Repeated friction. Every change to the aggregate breaks something else. The test suite covers too many independent scenarios. A bug fix in one area requires understanding invariants from a completely different area. The aggregate has accumulated enough concerns that it’s become cognitively expensive to work with.

The common thread: the trigger is concrete, not theoretical. You don’t split aggregates because an architecture review says you should. You split them when the cost of keeping them together exceeds the cost of separating them. Vernon’s rule of thumb is useful here: most aggregates should contain a single entity, perhaps two. If yours contains five, it’s worth asking which of them actually share invariants.

Refinement gets faster over time

One more thing worth noting. In the codebase where this happened, the team had already applied the same pattern once before on a different god-aggregate, using the same disjoint-columns technique. So this wasn’t a first encounter with the pattern; it was the second time around.

And it was noticeably faster. The team already had the vocabulary: “disjoint column ownership,” “narrow repository,” “constraint-driven boundary.” The discussion didn’t need to explain the pattern from first principles. It could focus on diagnosing which invariant sets were disjoint and drawing the boundary.

Patterns the team has applied before become cheap. The first aggregate split is an architectural discussion. The second is a refactoring ticket. Refinement accelerates as the team’s vocabulary grows.

TL;DR

  • Aggregates are consistency boundaries, not data containers. If two fields have no business rule connecting them, they don’t belong in the same aggregate, regardless of whether they share a database row.
  • The “one row, one aggregate” mental model is wrong. ORMs make it seductive by making data-first groupings cheap. But consistency boundaries follow invariants, not table layouts. Two aggregates can share one row if their column sets are disjoint and no invariant spans both.
  • Don’t design aggregates upfront. Refine them under pressure. Start with the smallest aggregate that captures today’s known invariants. Split when real pain forces it: a production incident, a new concept that doesn’t fit, repeated friction that makes the aggregate cognitively expensive.
  • The concrete trigger matters more than architectural taste. An incident, a misfit feature, a test suite that covers too many independent concerns. These are the signals. Not “it would be cleaner if.”
  • Refinement gets faster with practice. The first aggregate split is an architectural discussion. The second is a refactoring ticket. The pattern becomes part of the team’s vocabulary, and the cost of splitting drops each time you do it.