← writing

Designing MCP tools agents actually use

Half of the MCP tools I see in the wild are technically correct and practically unused. Here are 7 rules I use to design tools agents actually call.

Half of the MCP tools I see in the wild are technically correct and practically unused. The endpoint works, the schema validates, the server boots clean, and the agent still ignores it. That gap between "correct" and "used" is where most MCP design effort dies.

Here is the thing I keep coming back to. The model will call your tool if, and only if, it understands three things from the description alone: what the tool does, when it should reach for it, and what it gets back. Miss any of those and the agent quietly routes around you.

So here are the 7 rules I follow now, after shipping a lot of tools that worked on paper and got zero calls in production.

1. Name verbs, not nouns

A tool is an action. get_customer beats customer every time. When I name a tool with a noun, the agent has to guess whether it reads, writes, or lists. When I name it with a verb, intent is encoded right in the call site. list_customers, get_customer, create_customer, update_customer. Boring is good. Boring is predictable.

2. One job per tool

The worst pattern I see is the swiss-army tool with a mode flag. customer({ action: "list" }) vs customer({ action: "get", id }). Split it. list_customers and get_customer are two tools, not one tool with a discriminator. The model picks tools by name and description. The more a tool tries to do, the harder it is for the model to know if this is the right one.

3. Descriptions are documentation for the model

Write the description like a docstring for someone seeing your code for the first time. Not for you, not for the team Slack, not for the README. For a reader with no context and no ability to ask follow-up questions. State what it does in one sentence, then list inputs and outputs in plain language. If there is a side effect, say so. If it is idempotent, say so. The model reads this and decides whether to call it.

4. Schemas are contracts

If a field is required, mark it required. No exceptions, no "well usually you want this." Optional fields are a separate conversation. When schemas lie, the agent learns to ignore them and starts guessing, which is exactly the failure mode you wanted to avoid by writing a schema in the first place.

5. Error messages are agent UX

This one took me a while to internalize. The agent reads your error messages the same way a user reads a form validation popup. missing field 'email' beats ValidationError by a mile. customer with id 42 not found beats 404. Be specific, name the field, suggest the fix. The agent will retry intelligently if you tell it what was wrong. If you return a generic 500, it gives up.

6. Return small structured data

The model has to read every byte you return. A 2MB JSON blob with 400 fields is a denial of service attack on your own context window. Return what was asked for. If list_products can return thousands of rows, page it and include a cursor. If a row has 50 fields and the agent needs 5, offer a fields parameter or trim by default. Verbose is expensive.

7. Add an example when behavior is non-obvious

If your tool has any quirk at all, the date format, a weird enum, a required header, drop a one-line example call in the description. example: search_products({ query: "red shoes", in_stock: true }). The model pattern-matches off examples better than it parses prose.

Before and after

Quick sketch. Here is a search tool I rewrote last month.

Before:

Name: search. Description: "Searches the catalog." Inputs: q (string), opts (object, optional). Returns: full product objects, all of them, paginated by mystery.

The agent called it twice in a month. It had no idea what opts took or what came back.

After:

Name: search_products. Description: "Search the product catalog by free text query. Returns up to 20 matching products with id, name, price, and in_stock. Use list_products to browse all products without a query. Example: search_products({ query: 'red running shoes', in_stock: true })." Inputs: query (string, required), in_stock (boolean, optional, defaults to false).

Calls jumped roughly 8x in the next week. Same backend, same data, same agent. Only the description changed.

A practical tip on top

You can A/B test descriptions. Ship two versions of the tool with different copy, log which one the agent calls, keep the winner. Treat your tool descriptions the way a growth team treats landing page copy. They are conversion surfaces.

The thing I want to leave you with is simple. Your tool has exactly one user, and that user is the model. The human who wrote the agent is a downstream beneficiary. Design for the reader who has to decide, in one shot, with no follow-up, whether your tool is the right one for the job in front of it. Do that and the calls will come.

Want more like this?

Occasional, opinionated, no listicles.
all writing →