May 22, 2026 · Sovont · 3 min read

The Timeout You Never Set

LLM API calls without explicit timeouts are a production incident waiting to happen. Here's what hangs, why, and how to stop it.

AI Production

Somewhere in your codebase, there’s an LLM API call with no timeout.

Not a short timeout. Not a conservative timeout. No timeout — just a raw request that will wait as long as the provider wants to take. In development, that’s fine. Responses come back in two seconds and everyone’s happy. In production, that call is a slow grenade.

What actually happens without a timeout.

Provider latency isn’t constant. It varies by model, by load, by the length of your prompt, by whether something is degraded on their end that hasn’t shown up on the status page yet. On a normal day, your p50 is 1.2 seconds. On a bad day, your p99 is 45 seconds — or the request just hangs.

If you have no timeout, your thread hangs with it. Your connection pool holds a slot. Your queue backs up. Downstream systems waiting on the response start timing out. You don’t get one slow request — you get a cascade that looks like an outage but isn’t one, because nothing actually failed, it’s all just waiting.

This is how a single degraded LLM call takes down a system that handles ten unrelated things.

Where teams get this wrong.

Most LLM client libraries have a default timeout, buried in the docs, set conservatively high — sometimes 60 seconds, sometimes longer. Teams assume the default is reasonable. It isn’t. It’s set to avoid false positives, not to protect your system.

Others set a timeout on the HTTP request but forget about streaming. If you’re streaming a response, the initial connection timeout doesn’t cover the full stream duration. You can have a 10-second timeout that still lets a slow stream run for two minutes.

Some teams add timeouts to their API layer but forget the background workers, the async processors, the internal microservice that also calls the LLM. Every integration point needs a timeout. One uncovered path is enough.

What a real timeout strategy looks like.

Set timeouts explicitly at every call site. Don’t trust library defaults. Set a number you’ve thought about, not inherited.

Separate connect timeout from read timeout. Connect timeout: how long to wait for the initial connection. Read timeout: how long to wait for data between chunks. Both matter, especially for streaming.

Size the timeout to the workflow. A real-time user-facing request can’t wait 30 seconds. An async background job can afford more. Calibrate per use case, not across the board.

Handle timeout errors explicitly. A timeout isn’t a generic exception — it has a specific meaning. Log it separately, alert on it separately, handle it differently than a 500. If you’re timing out frequently, that’s signal worth seeing.

Test the timeout path. It’s easy to forget that timeouts are code paths too. Mock a slow provider response in your test suite. Make sure the timeout fires, the error surfaces correctly, and the system degrades gracefully instead of hanging.

Timeouts are not pessimism. They’re contracts.

When you set a timeout, you’re making a decision: this workflow is worth this much time, and not more. It forces clarity about what your system promises. It forces you to handle failure instead of deferring it.

The LLM call that never times out isn’t robust — it’s just lucky. And luck isn’t a reliability strategy.

Set the timeout. Then check every other call site and set those too.