Tool Calls Are Side Effects. Treat Them That Way.
Agents that call tools are running code with real consequences. Most teams build them like they're not.
Somewhere between “LLM decides what to do” and “thing happens in the real world,” there’s a step most teams gloss over. The model picks a tool. The tool runs. An email gets sent, a database row gets written, a payment gets initiated.
That’s a side effect. And most agent implementations treat it like a return value.
The Gap Between “Calling” and “Committing”
When you write application code, you think hard about where side effects live. You wrap mutations in transactions. You check whether an action already ran before running it again. You think about rollback.
When you build an agent, you hand the LLM a list of function signatures and call it a day. The model calls send_invoice(). You run it. Done.
Until the model calls it twice. Or calls it with a half-formed output because the context window was cramped. Or retries after a timeout and the first call had already succeeded.
Now you’ve sent two invoices.
Idempotency Is Not Optional
Every tool your agent can call needs to be idempotent or guarded. That means:
- External writes get a deduplication key the caller generates, not the callee
- Tools that can’t be made idempotent get a confirmation step before they execute
- Retries are handled at the infrastructure layer, not by re-running the full agent turn
This isn’t theoretical. The more capable your agent becomes, the more consequential its mistakes. A text generation error is annoying. A duplicate bank transfer is a support ticket.
The “Just Try It” Architecture
The default agent architecture goes something like: model decides, tool runs, next step. Fast to build. Works great in demos. Breaks in production under the conditions that matter most — timeouts, partial failures, ambiguous inputs, and the model misreading its own context.
You need a layer between the model’s decision and the tool’s execution. Not a lot of ceremony — just enough to answer: has this already run? Is the input complete enough to safely proceed? Can I undo this if something goes wrong?
Three questions. Most agents can’t answer any of them.
What Production-Grade Looks Like
Tool execution in a real agent has structure:
- Validate the tool call parameters before touching anything external
- Log intent — record what you’re about to do before you do it
- Execute with idempotency keys
- Confirm success before marking the step complete
- Surface failures explicitly rather than silently continuing
This isn’t new engineering. It’s the same discipline you apply to any distributed system. The difference is that distributed systems have been doing this for decades, and agent frameworks have been doing it for about eighteen months.
The gap shows.
Your agent isn’t just reasoning — it’s operating. Build the execution layer like it matters, because every tool call is a bet that you know what you’re doing.