Controlling cost and turns
Cap how long an agent runs and how much it spends, read back usage, and handle the errors raised when a budget is exceeded.
An agentic loop calls the model repeatedly: each tool call feeds another request back to the LLM until the agent produces a final answer. Left unbounded, a confused or adversarial run can loop indefinitely and burn real money. AgentRoute gives you two hard guardrails on every Agent — max_turns and max_cost — plus a usage object on every result so you can see exactly what a run cost.
This guide covers setting both limits, reading result.usage, and catching the errors raised when a limit is hit.
The two budgets
Both limits are constructor arguments on Agent:
from agentroute import Agent
agent = Agent(
name="researcher",
model="claude-sonnet-4",
instructions="Answer the user's question. Use tools when helpful.",
max_turns=8, # at most 8 trips through the loop
max_cost=0.25, # stop once this run has spent $0.25
)max_turns(default10) bounds the number of iterations of the agentic loop — roughly the number of model calls. Each turn is one LLM request plus any tool calls it triggers. When the count exceeds the limit the run raises ErrorMaxTurns.max_cost(defaultNone, meaning no cost cap) bounds the cumulative spend of a single run in USD. When accumulated cost exceeds the limit the run raises ErrorBudget.
A bare name with no max_cost still has the max_turns=10 default, so an agent can never loop forever — but it can still spend more than you intend on a few expensive calls. Set max_cost whenever cost matters.
Both limits scope to a single run() / arun() call. They are not a running total across many invocations. If you need a per-user or per-day cap, track it yourself by summing result.usage.total_cost_usd across runs.
Reading what a run cost
Every Result carries a Usage object at result.usage. It accumulates across every model call in the run:
result = agent.run("Summarize the latest changes to the pricing page.")
print(result.usage.input_tokens) # prompt tokens sent to the model
print(result.usage.output_tokens) # completion tokens received
print(result.usage.total_cost_usd) # cumulative cost of the run, in USD
print(result.usage.model_calls) # how many LLM requests this run madeUsage has exactly four fields:
| Field | Type | Meaning |
|---|---|---|
input_tokens | int | Total prompt tokens across all model calls. |
output_tokens | int | Total completion tokens across all model calls. |
total_cost_usd | float | Cumulative cost of the run, in US dollars. |
model_calls | int | Number of LLM requests made during the run. |
model_calls is the practical signal for whether a run is doing more loop iterations than you expected. A single-shot answer is one model call; a run that uses tools will have one call per turn. If model_calls is climbing toward max_turns, the agent is probably struggling — tighten the instructions or the toolset.
str(result) gives you the final output text, so you can log the answer and its cost together: print(result, "—", f"${result.usage.total_cost_usd:.4f}").
Handling a blown budget
When a limit is exceeded, the run does not return a Result — it raises. Both errors subclass ErrorAgent, so you can catch them together or individually. They carry structured fields so you can log, alert, or degrade gracefully.
from agentroute import Agent, ErrorMaxTurns, ErrorBudget
agent = Agent(
name="researcher",
model="claude-sonnet-4",
max_turns=8,
max_cost=0.25,
)
try:
result = agent.run("Find and cross-check three sources on X.")
print(result)
except ErrorMaxTurns as e:
print(f"Stopped at turn {e.turn} (limit {e.limit}) — the agent looped too long.")
except ErrorBudget as e:
print(f"Stopped after spending ${e.spent:.4f} (limit ${e.limit:.4f}).")The fields on each error mirror the limit that tripped:
| Parameter | Type | Default | Description |
|---|---|---|---|
ErrorMaxTurns.turn | int | — | The turn number the loop reached when it stopped. |
ErrorMaxTurns.limit | int | — | The configured max_turns value. |
ErrorBudget.spent | float | — | Cumulative cost in USD at the point the run was stopped. |
ErrorBudget.limit | float | — | The configured max_cost value. |
If you only need to know that some guardrail tripped — for example to return a fallback response — catch the base class:
from agentroute import ErrorAgent
try:
result = agent.run(prompt)
except ErrorAgent as e:
# Covers ErrorMaxTurns, ErrorBudget, and any future guardrail.
log.warning("Agent run halted: %s", e)
result = NoneRetry is not an ErrorAgent and is not caught by these handlers. It is a control-flow signal raised from tools and @agent.output_validator callbacks to ask the model to try again, bounded by Agent(retries=...). See errors and retries for the distinction.
Choosing sensible limits
There is no universal right number — the budget depends on the model, the tools, and the task. Some guidance that holds up in practice:
A single-shot question (no tools) needs only one or two turns. A tool-using agent needs at least one turn per tool call it must make to finish, plus one for the final answer. If an agent has three tools and a typical task uses two of them, budget around 5–6 turns, not 50. The default of 10 is a reasonable ceiling for most tool-light agents.
Run representative prompts with generous limits first, then read result.usage.model_calls and result.usage.total_cost_usd. Set your real limits a comfortable margin above the observed worst case — tight enough to catch runaway loops, loose enough not to cut off legitimate work.
max_turns caps iterations, but a few large-context calls can be expensive even at low turn counts. max_cost is what protects your bill. Pick a number that would be acceptable to spend per request even in the worst case, and treat ErrorBudget as a signal that something is wrong rather than a routine outcome.
When the prompt comes from end users (especially a public surface), use stricter limits than you would for internal batch jobs. A low max_turns and a small max_cost turn a prompt-injection or runaway loop from a billing incident into a caught exception.
Setting max_turns too low will cause otherwise-valid runs to fail with ErrorMaxTurns before the agent can finish. If you see that error on tasks you expect to succeed, the fix is usually a higher limit or simpler instructions — not a retry.
Tracking cost across many runs
Because each limit is per-run, enforcing a session- or user-level budget is your job. Sum total_cost_usd as runs complete and stop dispatching new work once you cross your own ceiling:
import asyncio
from agentroute import Agent, ErrorAgent
agent = Agent(name="worker", model="claude-sonnet-4", max_turns=6, max_cost=0.10)
async def run_with_session_budget(prompts: list[str], session_cap_usd: float) -> None:
spent = 0.0
for prompt in prompts:
if spent >= session_cap_usd:
print(f"Session cap ${session_cap_usd:.2f} reached — skipping remaining work.")
break
try:
result = await agent.arun(prompt)
except ErrorAgent as e:
print(f"Run halted: {e}")
continue
spent += result.usage.total_cost_usd
print(f"{result} (run cost ${result.usage.total_cost_usd:.4f}, session ${spent:.4f})")
asyncio.run(run_with_session_budget(["...", "..."], session_cap_usd=1.00))The per-run max_cost still protects you against any single run blowing the session budget in one shot; the loop above adds the cumulative ceiling on top.