Controlling cost and turns

An agentic loop calls the model repeatedly: each tool call feeds another request back to the LLM until the agent produces a final answer. Left unbounded, a confused or adversarial run can loop indefinitely and burn real money. AgentRoute gives you two hard guardrails on every Agent — max_turns and max_cost — plus a usage object on every result so you can see exactly what a run cost.

This guide covers setting both limits, reading result.usage, and catching the errors raised when a limit is hit.

The two budgets

Both limits are constructor arguments on Agent:

from agentroute import Agent
 
agent = Agent(
    name="researcher",
    model="claude-sonnet-4",
    instructions="Answer the user's question. Use tools when helpful.",
    max_turns=8,        # at most 8 trips through the loop
    max_cost=0.25,      # stop once this run has spent $0.25
)

max_turns (default 10) bounds the number of iterations of the agentic loop — roughly the number of model calls. Each turn is one LLM request plus any tool calls it triggers. When the count exceeds the limit the run raises ErrorMaxTurns.
max_cost (default None, meaning no cost cap) bounds the cumulative spend of a single run in USD. When accumulated cost exceeds the limit the run raises ErrorBudget.

A bare name with no max_cost still has the max_turns=10 default, so an agent can never loop forever — but it can still spend more than you intend on a few expensive calls. Set max_cost whenever cost matters.

Note

Both limits scope to a single run() / arun() call. They are not a running total across many invocations. If you need a per-user or per-day cap, track it yourself by summing result.usage.total_cost_usd across runs.

Reading what a run cost

Every Result carries a Usage object at result.usage. It accumulates across every model call in the run:

result = agent.run("Summarize the latest changes to the pricing page.")
 
print(result.usage.input_tokens)     # prompt tokens sent to the model
print(result.usage.output_tokens)    # completion tokens received
print(result.usage.total_cost_usd)   # cumulative cost of the run, in USD
print(result.usage.model_calls)      # how many LLM requests this run made

Usage has exactly four fields:

Field	Type	Meaning
`input_tokens`	`int`	Total prompt tokens across all model calls.
`output_tokens`	`int`	Total completion tokens across all model calls.
`total_cost_usd`	`float`	Cumulative cost of the run, in US dollars.
`model_calls`	`int`	Number of LLM requests made during the run.

model_calls is the practical signal for whether a run is doing more loop iterations than you expected. A single-shot answer is one model call; a run that uses tools will have one call per turn. If model_calls is climbing toward max_turns, the agent is probably struggling — tighten the instructions or the toolset.

Tip

str(result) gives you the final output text, so you can log the answer and its cost together: print(result, "—", f"${result.usage.total_cost_usd:.4f}").

Handling a blown budget

When a limit is exceeded, the run does not return a Result — it raises. Both errors subclass ErrorAgent, so you can catch them together or individually. They carry structured fields so you can log, alert, or degrade gracefully.

from agentroute import Agent, ErrorMaxTurns, ErrorBudget
 
agent = Agent(
    name="researcher",
    model="claude-sonnet-4",
    max_turns=8,
    max_cost=0.25,
)
 
try:
    result = agent.run("Find and cross-check three sources on X.")
    print(result)
except ErrorMaxTurns as e:
    print(f"Stopped at turn {e.turn} (limit {e.limit}) — the agent looped too long.")
except ErrorBudget as e:
    print(f"Stopped after spending ${e.spent:.4f} (limit ${e.limit:.4f}).")

The fields on each error mirror the limit that tripped:

Parameter	Type	Default	Description
`ErrorMaxTurns.turn`	int	—	The turn number the loop reached when it stopped.
`ErrorMaxTurns.limit`	int	—	The configured `max_turns` value.
`ErrorBudget.spent`	float	—	Cumulative cost in USD at the point the run was stopped.
`ErrorBudget.limit`	float	—	The configured `max_cost` value.

If you only need to know that some guardrail tripped — for example to return a fallback response — catch the base class:

from agentroute import ErrorAgent
 
try:
    result = agent.run(prompt)
except ErrorAgent as e:
    # Covers ErrorMaxTurns, ErrorBudget, and any future guardrail.
    log.warning("Agent run halted: %s", e)
    result = None

Warning

Retry is not an ErrorAgent and is not caught by these handlers. It is a control-flow signal raised from tools and @agent.output_validator callbacks to ask the model to try again, bounded by Agent(retries=...). See errors and retries for the distinction.

Choosing sensible limits

There is no universal right number — the budget depends on the model, the tools, and the task. Some guidance that holds up in practice:

Start from the shape of the task

A single-shot question (no tools) needs only one or two turns. A tool-using agent needs at least one turn per tool call it must make to finish, plus one for the final answer. If an agent has three tools and a typical task uses two of them, budget around 5–6 turns, not 50. The default of 10 is a reasonable ceiling for most tool-light agents.

Measure before you tighten

Run representative prompts with generous limits first, then read result.usage.model_calls and result.usage.total_cost_usd. Set your real limits a comfortable margin above the observed worst case — tight enough to catch runaway loops, loose enough not to cut off legitimate work.

Set max_cost as the real backstop

max_turns caps iterations, but a few large-context calls can be expensive even at low turn counts. max_cost is what protects your bill. Pick a number that would be acceptable to spend per request even in the worst case, and treat ErrorBudget as a signal that something is wrong rather than a routine outcome.

Tighten in untrusted contexts

When the prompt comes from end users (especially a public surface), use stricter limits than you would for internal batch jobs. A low max_turns and a small max_cost turn a prompt-injection or runaway loop from a billing incident into a caught exception.

Caution

Setting max_turns too low will cause otherwise-valid runs to fail with ErrorMaxTurns before the agent can finish. If you see that error on tasks you expect to succeed, the fix is usually a higher limit or simpler instructions — not a retry.

Tracking cost across many runs

Because each limit is per-run, enforcing a session- or user-level budget is your job. Sum total_cost_usd as runs complete and stop dispatching new work once you cross your own ceiling:

import asyncio
from agentroute import Agent, ErrorAgent
 
agent = Agent(name="worker", model="claude-sonnet-4", max_turns=6, max_cost=0.10)
 
async def run_with_session_budget(prompts: list[str], session_cap_usd: float) -> None:
    spent = 0.0
    for prompt in prompts:
        if spent >= session_cap_usd:
            print(f"Session cap ${session_cap_usd:.2f} reached — skipping remaining work.")
            break
        try:
            result = await agent.arun(prompt)
        except ErrorAgent as e:
            print(f"Run halted: {e}")
            continue
        spent += result.usage.total_cost_usd
        print(f"{result}  (run cost ${result.usage.total_cost_usd:.4f}, session ${spent:.4f})")
 
asyncio.run(run_with_session_budget(["...", "..."], session_cap_usd=1.00))

The per-run max_cost still protects you against any single run blowing the session budget in one shot; the loop above adds the cumulative ceiling on top.

Context and usage

How Usage accumulates and how the Context carries it through a run.

Errors and retries

The exception hierarchy and how Retry differs from ErrorAgent.