Luxury Presence · Victor Zanivan · 8 min read

We didn't set out to build an "agentic design system" - we built one anyway.

A retrospective on what "agent ready" actually means, and what it costs.

Someone asked me recently if our design system was "agent ready." I didn't have a clean answer. The term has started to mean everything and nothing. Some people mean MCP servers, some mean metadata, some mean Figma parity, some just mean an LLM can use it.

Around the same time, I read an article arguing for agentic design systems. Encode governance through machine readable metadata, build skills AI can reuse, close the Figma to code gap, add parity reports. Good instincts, mostly aspirational. When I mapped it against what we'd actually shipped over the last two years, I realized we were further along than the article's proposed destination. Not because we planned it that way, but because each piece was built for a different reason, and the shape emerged.

What we have, what it cost, what the article got right, what it missed, and what still doesn't work.

1. The inventory

Here's what we have, not as a brag, just as the baseline for the rest of this post.

An MCP server. Around fifteen tools sit on it, but they cluster into three jobs: discovery (what does the DS contain?), specification (what are the exact values?), and guidance (how do I use this correctly?). The reason there are fifteen instead of three is a deliberate choice, one tool per response shape, not one tool per capability. Asking "list components" wants a different answer shape than asking "show me the color palette," which wants a different shape again from "generate a working form example." Generic tools return generic answers; specific tools return useful ones.

The server also runs in two modes. Inside our monorepo it reads the filesystem directly. When consumed via npx, it uses pre bundled data generated at build time. That detail is the one I'd have skipped in a greenfield spec and regretted later. It's the difference between "works in my dev environment" and "works in every consumer repo."

A shadcn compatible registry. Our components are published as a registry JSON that any shadcn aware agent already knows how to consume without extra integration. Cursor, v0, Claude, a dozen others. This was probably the highest leverage thing we did, and we did it before I had any agent framing for why.

A Claude Code plugin. Skills that encode token rules, component guidance, and migration playbooks from our legacy UI libraries. Commands like /build-from-figma that preload the right context before touching the Figma MCP. An agent that analyzes consumer codebases for migration opportunities. Most recently it picked up three layers we should have named from the start: a knowledge layer (the lux-design subagent that carries our deviations from Tailwind defaults), a translation layer (a map-design-value tool that turns raw Figma values into the correct LUX class or returns "no clean match" rather than guessing), and a verification layer (a deterministic audit that runs silently after every Write or Edit, plus on demand via /lux:audit, /lux:check, and /lux:review).

Figma Code Connect on our most used components, so agents generating from Figma land on the correct React imports instead of inventing them.

Per component MDX docs, Storybook stories, and tests. Standard stuff, but each one is a surface an agent can read.

A docs/solutions folder of frontmatter indexed learnings, so past bugs and decisions are queryable by module or problem type instead of buried in Slack.

None of this was designed as an agentic stack. It was designed as the stack. That's the first thing worth saying out loud.

2. How we got here

The honest version of this story is that none of these pieces was built as part of an agent readiness plan. Each one was solving a different problem, and the agent shaped outline only became visible in retrospect.

The MCP server started because we kept answering the same component API questions in Slack. Each new tool was a captured FAQ. The shape it has today, fifteen tools across three jobs, is the residue of two years of that loop, not a deliberate API design. When someone asked the same question for the third time, we wrote a tool. When the answer wanted a different shape, we made it a different tool.

The shadcn compatible registry was a distribution decision that predated the agent framing entirely. We wanted consumer teams to be able to npx shadcn add a component without us in the loop. The fact that it also turned out to be the cleanest agent consumption surface, every shadcn aware tool already knows how to read it, was luck.

The Claude Code plugin came out of a different pressure. Human developers in consumer repos kept reaching for legacy components we'd already replaced. The plugin was originally a teaching tool for humans, encoding "use the new one, not the old one" as skills. Agents inherited that guidance for free. The audit hook and /lux:check pipeline showed up later, when the same teaching pressure scaled past what skills could carry. Skills could instruct, but they couldn't catch drift after the fact. The verification layer is what closes that gap.

Figma Code Connect was driven by design and engineering pairs who wanted fewer handoffs. It only became agent relevant later, when MCP driven Figma workflows arrived and suddenly the same mappings that helped a human engineer also let an agent land on the correct React import.

The docs/solutions index started as a postmortem habit. Frontmatter indexed learnings so we could find past decisions without scrolling Slack. It's only retroactively "metadata for agents."

The point is this: none of this was built under an agentic banner. Teams trying to build "an agentic design system" top down are probably solving the wrong problem. Teams building the right engineering primitives, distribution, structured access, captured knowledge, arrive at the same place from the side, with stronger foundations.

3. What the article got right

Three things, and they're the ones I'd tell any design system team to internalize even if they ignore the rest.

Governance belongs in machine readable form, not just on a docs site. A written guideline that says "use the primary variant sparingly for the most important action on the page" is decoration. The same guidance surfaced through a tool call when an agent is choosing a variant is governance. We stumbled into this. Our MCP has a tool that returns semantic color guidance with examples. But the principle is right. The docs site is for humans; agents need structured answers to structured questions.

Metadata is a product surface. Once you treat agents as a first class consumer of your design system, component metadata stops being documentation overhead and starts being an API. Every time we added a field to our indexer (category, variants, dependencies, accessibility notes), we were quietly expanding that API. Treating it deliberately from the start would've saved us a round of retrofitting.

Bidirectional Figma and code matters more than one way. Code Connect isn't just Figma knowing about code. It's the glue that lets an agent read a Figma node, map it to a real component, and produce imports that don't hallucinate. Teams that skip this layer end up with agents generating beautiful, wrong code.

4. What the article missed

This is where it gets practical, because these are the things that don't show up in thought leadership posts but matter in real systems.

Distribution is half the product. An MCP server that nobody configures does nothing. Our shadcn registry works because it plugs into tools agents already use without being told. The MCP works inside our Claude Code plugin because the plugin routes agents through it by default. If I'd built the MCP and stopped there, adoption would be a fraction of what it is. Distribution deserves as much design attention as the tools themselves.

Skills and MCPs solve different problems. This took me a while to untangle. Skills teach an agent how to use your system, the patterns, conventions, and mental model. MCPs answer what the system contains right now, the live data, names, values. A skill that lists token names goes stale the moment you ship a new token. An MCP that lists token names is always correct but can't tell you when to use them. You need both, and the split should be intentional.

Runtime enforcement isn't optional. Every skill that says "never write bg-[var(...)]" is only enforced for agents running that skill. Humans and other agents ignore it. A lint rule that fails the build is still the ideal endpoint, but the version we shipped first is a deterministic audit that runs silently after every Write or Edit, with the same script behind /lux:audit, /lux:check, and /lux:review. It closes the loop inside the agent loop, before the file ever reaches CI.

Dual mode packaging is a tax worth paying. The bundled versus monorepo split in our MCP cost real time. It also means our tools work the same whether you're a contributor or a random agent using npx. That symmetry is what makes agent ready actually true instead of only working locally.

Captured learnings compound. The article talks about metadata that instructs agents. It doesn't talk about metadata that captures where agents get things wrong. Every time we document a misuse, a failure mode, or a non obvious decision in a searchable way, the next agent avoids it. That's a different kind of governance, more retrospective, but just as important.

5. The gaps I still see

If I'm being honest, we're probably a B+. Here's what's missing.

Per component choice guidance. Our MCP can tell an agent everything about a component. It can't tell the agent which component to pick. When an agent asks "I need to display a list of properties," we return names. We don't return guidance like "use a property list card when users will click into details, use a plain card for read only display, avoid tables under five rows." That knowledge lives in people's heads. The plan and confirm pause in /lux:component and /lux:page names specific primitives before any code is written, which is a partial step toward this. It creates the surface where choice guidance could land, but it doesn't carry the guidance yet. This is still the biggest gap.

Portable enforcement of token usage. The audit hook and /lux:check pipeline now run the rule inside Claude Code, after every Write or Edit. That moved this from aspirational to real for one surface. It's still warnings rather than a CI gate, and it only fires for agents running inside Claude Code. Cursor, v0, humans editing in VS Code, and any other agent surface still bypass it. The remaining work is an ESLint or Biome rule running in CI so the enforcement is portable across every consumer of the system.

Figma and code parity detection. We have Code Connect for our most used components. We don't have an automated way to detect drift when Figma changes but code doesn't. We catch it manually. A nightly check would fix that.

Full Code Connect coverage. We're at around forty percent. The rest is a mix of low priority and just not done. It isn't hard work, just work, and the fact that it's unfinished means the incentives aren't strong enough.

Signal on what agents actually use. We have telemetry on MCP calls. We don't analyze it. The tools that get used most should guide where we invest. The ones that are ignored should probably go away. Building without measuring is a habit worth breaking.

6. What I'd tell another team

If you're starting from scratch, build the shadcn compatible registry first, the MCP server second, and the plugin third. Don't invert that order. The registry gives you passive distribution. The MCP gives structured, live access. The plugin gives opinionated workflows for your team. Each layer narrows the audience and increases value.

If you already have some of this, the highest leverage thing you haven't shipped is probably some form of enforcement, not another tool. A deterministic audit running in the agent loop is a great first version. A CI lint rule is the version that covers every surface. Either beats instruction. Check your consumer repos for the patterns your docs warn against. You'll know quickly.

If you're not sure you need any of this, you probably do, just not for the reason you think. The value isn't that agents can use your system. The value is that making your system legible to agents also makes it clearer for humans. New hires, consumer teams, future you. Agent ready turns out to be a quality bar for clarity that finally has a budget.

7. The real conclusion

The question I started with, are we agent ready, was the wrong one. Agent readiness isn't a state, it's a direction. You can keep moving along it indefinitely. What matters is whether each new piece compounds with what you already built.

Our MCP compounds with our registry, which compounds with our plugin, which now compounds with the audit hook, which compounds with Code Connect, which compounds with our solutions index. The article's author is building something interesting from a different starting point. We're further along in some areas, behind in others.

I don't think anyone has finished building an agent ready design system. I'm not even sure what finished looks like. But the teams that treat these pieces as side projects will still be talking about it in a few years. The teams that treat it as the core product will quietly ship something their consumers can't live without.

We didn't set out to build an agent ready design system. We built the system our consumers needed next, human or otherwise. It turns out those are the same thing. Agent readiness isn't a project. It's the side effect of taking your system seriously enough to make it consumable, by anything.

One last thing: most of the above wouldn't exist without Frankie Ramírez, who's been working on this design system from the ground up.

Let's get in touch!

Feel free to reach out, I'm always up for a good chat :)