How granular should design tokens be before they’re more work than help?

On this page

Tokens stay helpful as long as each one captures a meaningful, reusable decision, and they tip into overhead the moment they fragment into so many narrow values that nobody can choose or maintain them. The threshold is not a count. It is whether a token earns its existence by being reused with intent. A token that names a real decision more than one place depends on is pulling its weight. A token invented to wrap a single value used in a single spot is just an extra layer of naming between you and the number. The honest rule is to tokenize decisions, not every individual value.

The reasoning is that a token’s whole purpose is leverage. When you name a decision once and many things reference it, you can change that decision in one place and trust it to propagate correctly. That leverage only exists when the token is genuinely shared. The instinct to create a token for every value, in the name of being fully systematic, quietly destroys the leverage it was meant to create. A system with thousands of single-use tokens is not more systematic. It is harder to navigate, harder to choose from, and harder to keep coherent, because the signal of which values actually matter is buried under a flood of values that do not. Every token you add is a thing someone has to understand, name, and maintain, so each one has to give back more than it costs.

A concrete contrast makes the line visible. A useful token is something like a primary action color or a base spacing step that dozens of components reference, where a single change ripples meaningfully across the product. Compare that to a system that defines a separate color token for the border of one specific card, a unique spacing token for the gap inside one modal, and a font-size token used by exactly one heading on one page. Now a designer opening the palette faces a list so long they cannot tell which colors are real choices and which are incidental. They give up and hardcode a value rather than hunt through forty near-identical tokens, which is precisely the behavior the system was supposed to prevent. The granular version looks rigorous and behaves like clutter.

This flips the other way too, because under-tokenizing is a real failure, not a virtue. If meaningful, repeated decisions are left as scattered raw values, you lose the single source of truth and changes become a search-and-replace hazard. So the answer is not simply fewer tokens. A semantic layer can also justify what looks like duplication: a token that means danger and a token that means error might resolve to the same red today yet deserve to exist separately because they carry different intent and may diverge later. That is a decision earning its own name, not a stray value being wrapped. Intent, not visual coincidence, is what makes a token worth keeping.

A practical way to feel the threshold is to ask, for any proposed token, what would break if it did not exist. If the honest answer is that several places would have to change a value independently and could fall out of sync, the token is earning its keep. If the answer is that one place would simply hold its own value, you have found overhead. The same test catches the opposite mistake: if you keep finding the same raw number repeated across components, that repetition is a decision asking to be named.

So set the granularity by reuse with intent, not by a drive to capture everything. Tokenize the decisions that genuinely recur and carry meaning, push back on tokens that serve a single use, and let a semantic token exist when intent differs even if the value matches today. Before adding any token, name the decision it represents and where else that decision lives. If you cannot, you are about to make the system bigger and weaker at the same time.

Leave a comment

Your email address will not be published. Required fields are marked *