Why I Built a DDD Facilitator Skill for Claude Code (And Why Structure Beats Conversation)
Structured interactive workflows turn AI into a real DDD facilitator — not an oracle that generates generic models, and not a rubber duck that just listens.
You need to design a new bounded context. You’ve got a backend with established DDD patterns — CQRS, event-driven, clean architecture, several bounded contexts already running in production — and now there’s a new business capability to model from scratch.
The business need is clear enough. You’ve talked to product. You have a rough idea of the workflow. But translating that into aggregates, invariants, state machines, and context boundaries? That’s the hard part — the part that traditionally requires a whiteboard, a skilled facilitator, and a room full of stakeholders spending half a day on EventStorming.
You don’t have half a day. You don’t have a room full of stakeholders. What you have is an AI coding assistant that’s supposedly great at architecture.
So you try the obvious thing: “Design a domain model for badge issuing.”
You get a clean answer. Aggregates, value objects, a state machine, even some TypeScript interfaces. It looks reasonable at first glance. It’s also completely generic — the kind of model you’d find in a DDD textbook chapter on “order management.” The aggregate is called BadgeManager. The order flow has add-line/remove-line operations, like a shopping cart. There’s a BadgeService that does… everything else.
None of this reflects your reality. Your codebase already has a similar ordering flow in another context — a fixed-quantity checkout pattern, not a cart pattern. Your naming convention uses the domain action (Card Issuing, not Card Manager). Your integration patterns rely on domain events and anti-corruption layers for external providers. The AI’s proposal ignores all of this, because it never had the opportunity to learn it.
This is oracle mode: one prompt in, one model out. The AI treats domain modeling as a generation problem — take the input, produce a complete output. But domain modeling isn’t a generation problem. It’s a discovery problem. The right model emerges from questions, challenges, and iterations. It requires someone to ask “what exactly do you mean by an order here — is it a cart-like flow with add/remove, or a fixed-quantity checkout?” It requires someone to push back: “That aggregate is doing too much. What invariant requires all of this to be in the same transaction?”
The AI can’t ask those questions in oracle mode, because the interaction is over before they’d come up.
The natural next step is conversation. You start a back-and-forth: share more context, paste some existing code, ask follow-up questions. This is genuinely better. The AI learns about your codebase incrementally, starts referencing real patterns, gives more grounded advice.
But without structure, conversations meander. You spend twenty minutes debating aggregate naming, skip bounded context boundaries entirely, jump from event storming straight to repository interfaces, and accept the first event flow without ever questioning failure paths. When the AI proposes something questionable, there’s no natural moment to stop and challenge it — you’re already three topics ahead. When you make an assumption the AI should question, it doesn’t, because there’s no explicit “validate before advancing” step.
I’ve been in these freeform sessions. They feel productive. You’re generating artifacts, the AI is responsive, you’re moving fast. Then you look at the output and realize you designed three aggregates without ever discussing which invariants actually require transactional consistency. Or you built an event flow that misses half the failure paths because nobody asked “what can go wrong at this step?”
The real issue isn’t that AI can’t do DDD. It’s that neither oracle mode nor freeform conversation supports the kind of work DDD actually requires: iterative, opinionated, grounded in real code, with explicit checkpoints to stop and validate before moving forward.
That “stop and validate” dynamic is what makes human DDD workshops valuable. Alberto Brandolini designed EventStorming around it — the facilitator doesn’t just record what people say, they challenge it, reframe it, surface contradictions. The room full of sticky notes isn’t the point; the conversation structure is the point.
And it’s exactly what’s missing from both AI interaction patterns.
Three Levels of AI Interaction for Design Work
There’s a useful way to think about how developers use AI for architecture and design tasks. Not all interactions are equal, and the gap between them isn’t about model quality or prompt cleverness — it’s about the structure of the conversation itself.
Level 1: Oracle mode. You describe what you need, the AI generates a complete answer. This is the default for most AI-assisted work, and for code generation it’s often fine. “Write a function that validates email addresses” is a well-defined problem with a well-defined output. But domain modeling isn’t like that. When you ask “design a domain model for badge issuing,” you’re asking the AI to make dozens of implicit decisions — aggregate boundaries, entity vs. value object classification, consistency requirements, integration patterns — without any way to validate those decisions against your actual context. The AI doesn’t know what it doesn’t know, and neither do you, because the interaction doesn’t surface the gaps.
Oracle mode fails for design work because design is fundamentally about trade-offs, and trade-offs require dialogue. Should this be one aggregate or two? That depends on your consistency requirements, your team’s tolerance for eventual consistency, and your existing patterns. No single prompt can capture all of that, and no single response can navigate it.
Level 2: Conversation mode. You go back and forth with the AI. Share context incrementally, react to proposals, ask follow-up questions. This is a significant step up — the AI can ask for clarification, you can correct misunderstandings, and the model evolves over multiple turns.
But conversation mode has a subtle problem: it lacks intentional structure. The discussion follows whatever thread feels most interesting in the moment. You might dive deep into one aggregate’s state machine while completely forgetting to map the event flow first. You might accept a bounded context boundary without ever applying the linguistic test (“does this term mean the same thing in both contexts?”). The AI follows your lead, and if your lead skips a step, the AI skips it too.
There’s also no natural checkpoint. In a real DDD workshop, the facilitator explicitly pauses: “Before we move to aggregate design, does everyone agree on these context boundaries?” In a freeform conversation, there’s no such gate. You drift from topic to topic, and by the time you realize something was wrong three steps back, you’ve built a lot of design on a shaky foundation.
I’ve noticed another pattern in conversation mode: the AI tends to be agreeable. You propose something, it elaborates on it. You suggest a boundary, it refines it. What’s missing is genuine pushback — the kind a skilled DDD facilitator gives when they spot an aggregate that’s trying to do too much, or when a developer conflates two concepts that should live in different contexts. Freeform conversation doesn’t create the conditions for that pushback, because there’s no moment where the AI is explicitly asked to evaluate and challenge.
Level 3: Structured facilitation. This is the level where results change dramatically. The interaction follows a defined sequence of phases — event storming, then bounded context discovery, then context relationships, then aggregate design, then tactical deep dive. Each phase has a specific goal, a specific interaction pattern, and an explicit gate.
The key differences from conversation mode:
The AI leads with an opinionated proposal. Instead of asking “what aggregates do you think you need?”, the AI explores your codebase, analyzes the events discovered in the previous phase, and proposes: “Based on the event clusters and your existing patterns, I see three aggregates with these boundaries. Here’s my reasoning.” This mirrors how an expert human facilitator works — they don’t start from zero, they bring an informed opinion and let the room react to it.
Questions are targeted and limited. Instead of open-ended “what do you think?”, each round has 1-3 specific questions designed to validate or challenge the proposal. “Does this aggregate need to enforce consistency across both order lines and payment status, or could payment be eventually consistent?” This focuses the conversation and prevents the meandering of freeform chat.
Gates prevent premature advancement. Each phase ends with an explicit checkpoint: “Does this event flow capture the full process? Any events or failure paths missing?” The AI will not move to bounded context discovery until you’ve confirmed the event storm is complete. This is where hidden gaps get caught — you can’t hand-wave past a shaky event flow because the gate forces you to look at it.
The codebase is the ground truth. Before proposing anything, the AI explores your actual code — existing bounded contexts, naming conventions, integration patterns, similar flows that already exist. Proposals reference real implementations: “Your existing card-issuing context uses a fixed-quantity checkout pattern. Should we align with that, or is there a reason this flow is different?” This prevents the AI from generating textbook DDD disconnected from your codebase.
The jump from Level 2 to Level 3 isn’t about having a smarter model. It’s about having a smarter process. The same AI, with the same knowledge, produces dramatically better results when the interaction is structured to support discovery rather than generation.
Most tools emerging in this space — Qlerify, Vibe Modeling — are exploring similar ideas: AI that facilitates domain modeling rather than just generating it. But the approach I landed on is different: instead of a dedicated app, it’s a Claude Code skill — a structured prompt that turns your existing coding assistant into a DDD facilitator, with direct access to your codebase.
Anatomy of a DDD Facilitator Skill
A Claude Code skill is a markdown file that teaches the AI how to handle a specific workflow. Think of it as a structured prompt with phases, rules, and reference material — loaded on demand when you invoke it with a slash command. The official docs cover the mechanics, but the interesting part isn’t how skills work. It’s what happens when you design one around a process that’s inherently interactive.
The /ddd-workshop skill I built has two files: SKILL.md (the facilitator script) and ddd-knowledge.md (the facilitator’s playbook — interview questions, opinion anchors, trade-off frameworks, and diagram templates). Together, they’re around 500 lines. Here’s how they’re structured.
Six phases, one gate each. The skill defines a strict sequence:
- Problem Space Mapping (Event Storming) — Discover domain events, commands, actors, and policies
- Bounded Context Discovery — Cluster events into bounded contexts, classify as Core/Supporting/Generic
- Context Relationships — Define how contexts communicate (Customer-Supplier, ACL, Shared Kernel…)
- Aggregate Design — Identify aggregates with their consistency boundaries
- Tactical Deep Dive — Full design per aggregate: entities, value objects, invariants, state machines, events, repositories
- Synthesis — Compile everything into a domain design document
Each phase follows the same pattern: the AI explores relevant code, proposes something opinionated, asks 1-3 targeted questions, iterates until stable, then gates before moving on. This structure isn’t arbitrary — it mirrors the natural progression of a DDD workshop, from big picture (what happens in the business?) down to tactical details (what are the invariants on this aggregate?).
The SKILL.md captures this explicitly. Here’s a simplified excerpt from the Event Storming phase:
### Phase 1: Problem Space Mapping (Event Storming)
**Goal:** Discover domain events, commands, actors, and policies.
1. Ask the user to describe the business process
2. Propose an event timeline in past tense, with commands and actors
3. Ask 1-3 questions to validate and discover missing events/failure paths
4. Iterate until the timeline is stable (max 3 rounds)
**Gate:** Present the event timeline. Ask: "Does this capture
the full process? Any events or failure paths missing?"
Simple, right? But that simplicity is doing real work. The “max 3 rounds” prevents the AI from going in circles. The gate is an explicit instruction, not a suggestion — the AI must present the timeline and ask for confirmation before moving to Phase 2. And the goal statement keeps the phase focused: we’re discovering events here, not designing aggregates.
The interaction rules are the soul of the skill. More than the phase structure, what makes the facilitator effective is a set of behavioral rules defined at the top of the SKILL.md:
## Interaction Rules
- **Lead with opinion, then ask.** Don't just interrogate — propose
your analysis first, then ask the user to validate/correct.
- **Max 3 questions per round.** Don't overwhelm.
Prioritize the most impactful question.
- **Ground in the codebase.** Reference existing patterns:
"In your ordering context, we handle this with..."
- **Challenge gently.** If a design choice seems problematic,
explain the trade-off: "That could work, but consider that..."
- **Gate before advancing.** Never move to the next phase
without explicit user confirmation.
- **Progressive disclosure.** Don't dump DDD terminology.
Introduce concepts as they become relevant.
- **Track open questions.** If something can't be resolved now,
note it and move on.
“Lead with opinion, then ask” is the most important rule. Without it, the AI defaults to interviewing you — “What aggregates do you need? What are the invariants? What events should be emitted?” This puts the entire cognitive burden on the developer, which defeats the purpose. You might as well be filling out a template.
With this rule, the AI does the heavy lifting first. It reads your codebase, finds similar patterns, and proposes: “Based on your existing checkout flow and the events we discovered, I see an Order aggregate with a fixed-quantity model, a Badge aggregate tracking individual device lifecycle, and a Fulfillment aggregate handling provider interaction. Here’s my reasoning.” Now you’re reacting to a concrete proposal instead of generating from scratch. That’s a fundamentally different cognitive task — and a much easier one.
The facilitator’s playbook makes opinions grounded. The second file, ddd-knowledge.md, is what turns the AI from a generic DDD assistant into an effective facilitator. It’s organized by phase and contains four types of content:
Interview questions — specific, tested questions for each phase. Not “tell me about your domain” but “What can go wrong at each step? What happens then?” and “Does this term mean the same thing in both of these areas, or does it have a subtly different meaning?” These are the questions a seasoned DDD practitioner would ask, encoded so the AI asks them at the right moment.
Opinion anchors — explicit positions the AI should take when evaluating proposals. For example, during bounded context discovery: “Default to splitting. It’s easier to merge two contexts later than to split one.” Or during aggregate design: “Start small. Propose the smallest possible aggregate, then let the user argue for merging based on invariants.” These aren’t universal truths — they’re opinionated defaults that create a productive starting point for discussion.
Trade-off frameworks — structured decision guides for moments when the user is torn. During aggregate sizing, the playbook provides four questions to walk through: “What invariant requires these to be in the same transaction? How often do concurrent users modify the same aggregate? What’s the cost of eventual consistency here? What’s the cost of strong consistency?” Instead of the AI giving a single answer, it facilitates the trade-off analysis.
Mermaid diagram templates — ready-made templates for event flows, context maps, aggregate class diagrams, and state machines. The AI fills these in as the session progresses, producing visual artifacts that make the design tangible and reviewable.
Here’s an excerpt from the opinion anchors for Event Storming:
### Opinion Anchors
- **Challenge vague events.** "OrderProcessed" → "What specifically
happened? Was it shipped? Paid? Validated?"
- **Challenge missing failure paths.** Every command can fail.
If the user only describes happy paths, ask: "What if the payment
fails? What if the badge is already assigned?"
- **Challenge hidden policies.** "We always send a notification
after approval" → that's a policy/event handler,
not part of the aggregate.
- **Surface temporal coupling.** "This has to happen before that"
→ is it a true invariant or just a process convention?
These anchors are what give the AI its “facilitator personality.” Without them, the AI accepts whatever you say. With them, it actively looks for vague events, missing failures, hidden policies, and temporal assumptions. It’s the encoded experience of running DDD workshops — the patterns a human facilitator recognizes instinctively, made explicit so an AI can apply them consistently.
Mode selection adapts scope. Not every session needs all six phases. The skill supports three modes:
| Mode | When to use | Phases |
|---|---|---|
| Full | New feature, new bounded context | All 6 phases |
| Strategic | Exploring boundaries and relationships | Phases 1-3 + Synthesis |
| Tactical | Designing aggregates for an existing context | Phases 4-5 + Synthesis |
This keeps sessions focused. If you already know your bounded context boundaries and just need to design the aggregates, you skip straight to tactical mode. The AI detects the right mode from your input or asks if it’s ambiguous.
The output is a directly actionable document. The synthesis phase compiles everything into a structured markdown document: event tables, context maps, aggregate class diagrams, state machines, invariant tables, repository interfaces, and open questions. It’s not a report to read — it’s a spec to implement from.
The total skill — SKILL.md plus the knowledge base — is roughly 500 lines of markdown. Not code. Not a complex framework. Just structured instructions that transform how the AI approaches a specific type of work.
What a Session Actually Looks Like
Theory is one thing. Let me walk you through what actually happens when you type /ddd-workshop and start a session.
You’re designing a bounded context for issuing physical devices — let’s say toll badges — that get ordered from an external provider and assigned to vehicles in a fleet. You’ve talked to product, you have a rough understanding of the workflow, and you need to turn that into a domain model.
The opening move sets the tone. The AI doesn’t start with questions. It starts by exploring your codebase — scanning existing bounded contexts, reading domain models, looking at how similar flows are implemented. This takes a few seconds, but it’s crucial. By the time it speaks, it already knows your naming conventions, your integration patterns, and that you have a similar issuing flow for payment cards in another context.
Then it asks you to describe the business process. You explain: badges are ordered in batches, shipped by the provider, received and assigned to vehicles, then active until frozen or terminated. Pretty straightforward.
Here’s where oracle mode would give you a complete model and call it done. Here’s where conversation mode would start a free-flowing discussion that might go anywhere.
Instead, the AI proposes an event timeline:
“Based on what you’ve described, here’s the event flow I see:
BadgeOrderCreated → BadgeOrderSubmittedToProvider → BadgesReceivedFromProvider → BadgeAssignedToVehicle → BadgeActivated → BadgeFrozen → BadgeTerminated. I notice your card-issuing context follows a checkout-then-draft pattern for orders. Two questions: (1) Does the order have a draft state where the user configures it before submitting? (2) Can badges be unassigned from one vehicle and reassigned to another?”
Two things are happening here. First, the AI is leading with opinion — it’s not asking “what events exist?”, it’s saying “here are the events I think exist, correct me.” That’s a much more productive starting point because it gives you something concrete to react to. You immediately notice it missed the deposit step (the provider requires a financial deposit before activation) and that unassignment is indeed possible.
Second, it’s grounding in your codebase. The reference to the card-issuing checkout pattern isn’t decoration — it’s a real question about whether to align this flow with an existing one. You confirm yes, the order should follow the same checkout-then-submit pattern, which immediately saves you from designing a different flow that would be inconsistent with the rest of your system.
After two rounds of iteration, the event timeline stabilizes. The gate kicks in:
“Here’s the complete event flow with the deposit step and reassignment. Does this capture the full process? Any events or failure paths missing?”
You review it. It looks right. You confirm, and the AI moves to Phase 2: bounded context discovery.
The context discovery phase is where the AI earns its keep. Based on the event clusters, it proposes splitting the domain into four bounded contexts — ordering, fulfillment, device lifecycle, and provider integration — with detailed rationale for each boundary.
Your gut reaction: that’s too many. This isn’t a massive enterprise system. You push back: “I think this is one new bounded context — badge issuing — plus extensions to two existing ones. The provider interaction is an infrastructure concern, not a separate context.”
This is the kind of moment that doesn’t happen in oracle mode. The AI proposed something reasonable but over-engineered for your scale. Because the interaction is structured around proposals and reactions, your pushback is immediate and specific. You’re not trying to articulate the right answer from scratch — you’re editing a concrete proposal, which is cognitively much cheaper.
The AI adjusts, asks a clarifying question about where provider-specific translation logic should live (answer: behind an anti-corruption layer, following the same pattern as your existing payment provider adapter), and presents the revised context map. Gate, confirm, move on.
The aggregate design phase is where things get interesting — and where the session produces its most valuable insight. The AI proposes three aggregates: Order (the checkout-and-submit flow), Badge (individual device lifecycle), and a third for provider synchronization. For the Badge aggregate, it proposes several value objects including a ProviderReference — a provider-specific identifier used for API calls.
Then it asks: “Should ProviderReference live on the Badge aggregate, or is it an infrastructure mapping that stays behind the anti-corruption layer?”
This is an opinion anchor firing. The knowledge base has an explicit instruction to challenge provider-specific data leaking into the domain. And it’s a genuinely good question — one that trips up experienced developers. The provider assigns a reference number to each badge, and it’s tempting to put it on the domain model because it feels like it’s about the badge. But the AI’s question forces you to apply the litmus test: does the user see this reference? Would it exist with a different provider?
The answer is no on both counts. It’s a provider mapping artifact. It stays in the infrastructure layer. Without that question, you’d probably have put it on the aggregate and created a subtle coupling to the provider’s data model — the kind of design mistake that only hurts when you switch providers two years later. (If you’re interested in this specific problem, I wrote more about how leaky abstractions create exactly this kind of coupling.)
But the most important moment in this session wasn’t a question the AI asked. It was a gap the gate revealed. During the aggregate design phase, the AI proposes that the Order aggregate enforces a maximum number of badges per order. It presents the gate: “Are these boundaries right? Is anything too big or too small?”
You look at the invariant list and realize something is off. The maximum isn’t per order — it’s per vehicle type, and the rules differ depending on the fleet’s contract tier. You hadn’t mentioned this because product only clarified it last week and it wasn’t in your initial mental model of the feature. The gate forced you to carefully review the aggregate’s invariants, and that review surfaced a product requirement you’d have otherwise forgotten to encode in the domain.
This is the moment that justifies the entire approach. In oracle mode, the AI would have generated an aggregate with a simple maxBadges invariant and you’d have moved on. In conversation mode, you might have caught it, or you might have been three topics ahead by the time it mattered. The structured gate — “stop, review everything in this phase before we advance” — created the conditions for you to notice the gap.
You pause the session, go back to the product spec, clarify the contract-tier rules, and feed them back into the aggregate design. The AI adjusts the invariant structure, proposes a ContractTier value object that encodes the per-vehicle-type limits, and the design is now correct.
The full session — from /ddd-workshop to a complete domain design document — took about 45 minutes. The output: an event storm with 15+ events, one new bounded context with context relationships mapped, three aggregates with full tactical design (entities, value objects, invariants, state machines, domain events, repository interfaces), and a list of open questions to discuss with the team.
That’s a document you can implement from. Not a vague architecture sketch, not a textbook model that needs to be translated — an actual domain design grounded in your codebase and validated through iterative conversation.
A human-facilitated DDD workshop would produce similar quality, probably with richer stakeholder input. But it would take half a day to schedule, half a day to run, and another day to synthesize. When you’re a solo developer or a small team that can’t afford that overhead, structured AI facilitation isn’t a replacement for the real thing — it’s a way to get 80% of the value in 5% of the time.
Building the Skill Was the Same Process
There’s a recursive quality to this whole approach that’s worth calling out: the skill itself was built through the same structured interactive process it teaches.
I didn’t sit down and write 500 lines of facilitator instructions from scratch. I didn’t prompt Claude with “write me a DDD workshop skill” and paste the output into a file. Both of those would have produced something mediocre — the first because encoding facilitation expertise in markdown is harder than it sounds, the second because oracle mode doesn’t work any better for skill design than it does for domain modeling.
What I actually did was closer to a design session. I started with a rough idea — “I want a skill that facilitates DDD workshops” — and began an interactive back-and-forth with Claude to figure out what that meant in practice.
The first iteration was too simple. It was basically a checklist: “Step 1: Do event storming. Step 2: Identify bounded contexts. Step 3: Design aggregates.” Technically correct, but useless in practice — it read like a table of contents for a DDD book, not a facilitator script. The AI would follow it and produce output that was structurally complete but shallow. No pushback, no targeted questions, no grounding in the codebase.
So I pushed back: “The problem isn’t the phases. It’s that you’re not challenging anything. When I say ‘OrderProcessed’, you should ask me what specifically happened.” That conversation led to the opinion anchors — explicit instructions for the AI to challenge vague events, surface missing failure paths, and question temporal coupling. We iterated on the wording. “Challenge vague events” alone wasn’t enough — the AI needed an example to calibrate: "OrderProcessed" → "What specifically happened? Was it shipped? Paid? Validated?" That concrete example made the difference between the AI sometimes challenging vague events and consistently doing it.
The interaction rules went through a similar refinement. “Lead with opinion” started as “propose something before asking questions” — too vague. The AI would propose a one-sentence summary and then ask five questions. It took a few rounds to land on the right formulation: “Propose your full analysis first — aggregates, boundaries, reasoning — then ask 1-3 targeted questions to validate.” That specificity matters. The difference between a good skill and a mediocre one is often in these details, and they only surface through trial and error.
The gate mechanism was added after I ran the first real session and realized the AI was racing ahead. I’d confirm the event flow was “mostly right” and suddenly we were three phases deep in aggregate design. Adding explicit gates — “Never move to the next phase without explicit user confirmation” — fixed that immediately. But I wouldn’t have known to add it without experiencing the failure mode firsthand.
The knowledge base (ddd-knowledge.md) grew organically over multiple sessions. After each workshop, I’d notice moments where the AI was effective and moments where it fell flat. Effective moments usually meant the knowledge base already had the right question or opinion anchor. Flat moments meant something was missing. I’d go back, add the question that should have been asked, and the next session would be better.
This is the point worth internalizing: good skills aren’t written, they’re iterated. You can’t design a perfect facilitator script upfront any more than you can design a perfect domain model upfront — a lesson that applies to architecture in general. The skill is a living document that improves every time you use it and notice what’s missing.
And the process of improving it is the same process it encodes: propose something, test it against reality, identify gaps, refine, repeat. If you use Claude Code and you’ve been thinking about skills as static prompts you write once, think of them instead as collaborative artifacts you build with the AI and evolve over time. The AI is surprisingly good at helping you refine the instructions that govern its own behavior — as long as the interaction is structured enough to surface what’s not working.
Building a skill like this takes a few hours across multiple sessions. It’s an investment. But once it exists, every DDD workshop you run with it benefits from the accumulated refinement of every previous session. The facilitator gets better every time — not because the model improves, but because the instructions do.
TL;DR
- One-shot prompting doesn’t work for design. Domain modeling is a discovery problem, not a generation problem. Asking AI to produce a complete model in one pass gives you textbook answers disconnected from your reality.
- Freeform conversation is better but not enough. Without structure, you skip steps, accept shallow answers, and miss the “stop and validate” checkpoints that catch design mistakes early.
- Structured facilitation is the real unlock. Phases, gates, opinionated proposals, and targeted questions turn AI from a generator into a facilitator — one that challenges your assumptions, grounds proposals in your codebase, and forces you to validate before advancing.
- The gate is where the value lives. Explicit “review and confirm” checkpoints surface gaps you wouldn’t catch otherwise — missing invariants, forgotten product requirements, provider-specific data leaking into the domain.
- Skills aren’t written, they’re iterated. Build them collaboratively with the AI, test against real sessions, refine what’s missing. The instructions improve with every use.