Production checklist

Prototyping an agent is a few lines. Shipping one means thinking about credentials, cost ceilings, failure modes, and what happens when a conversation runs for hours. This page is the short list we run through before an agent serves real traffic.

Work top to bottom. Each step is independent, so adopt the ones that fit your deployment and skip the rest.

Hosted deployment is still being finalized

This guide covers running your agent in your own process — the part of AgentRoute that is stable today. Hosted, managed deployment is on the platform roadmap.

The checklist

Load API keys from the environment, never from code

A bare model string resolves through OpenRouter, so one key works for every model. Resolution order is: an explicit api_key= argument, then your environment / config file, then nothing. Keep the key out of source by setting AGENTROUTE_API_KEY (or OPENROUTER_API_KEY) and letting resolve_model find it.

export AGENTROUTE_API_KEY="sk-or-..."

from agentroute import Agent
 
# No api_key= here — it is read from the environment at run time.
agent = Agent(name="support", model="claude-sonnet-4")

You can also persist the key once with the CLI, which writes ~/.agentroute/config:

agentroute login

Caution

Do not hardcode keys or commit .env files. If you pass api_key= explicitly (for example, a per-tenant key pulled from a secrets manager), make sure it never lands in logs.

See Models for the full resolution rules and per-vendor inference.

Cap spend with max_cost

max_cost is a hard ceiling in USD per run(). Once accumulated cost crosses it, the loop stops and raises ErrorBudget, which carries spent and limit. This protects you from a runaway tool loop quietly burning credits.

agent = Agent(
    name="support",
    model="claude-sonnet-4",
    max_cost=0.50,  # stop this run if it would exceed $0.50
)

Pick a number a few multiples above a normal run so legitimate traffic is not cut off, but low enough to catch a pathological loop. See Context and usage for how cost is accumulated.

Bound the loop with max_turns

max_turns (default 10) limits how many model-plus-tool cycles a single run may take. Exceeding it raises ErrorMaxTurns with turn and limit. Tool-heavy agents that chain several calls may need more headroom; a simple Q&A agent can run lower.

agent = Agent(
    name="researcher",
    model="claude-sonnet-4",
    max_turns=20,  # raise for multi-step tool use
)

Treat max_turns and max_cost as two independent safety valves: turns guard against logic loops, cost guards against expensive ones.

Choose a retry budget

retries (default 1) bounds how many times the loop will re-run after a Retry signal raised from a tool or an output validator. Retry is a control-flow hint, not a failure — it feeds a message back to the model so it can correct itself. Higher values give the model more chances to satisfy a validator; each retry costs another model call.

from pydantic import BaseModel
from agentroute import Agent, Retry
 
class Ticket(BaseModel):
    summary: str
    priority: str
 
agent = Agent(
    name="triage",
    model="claude-sonnet-4",
    output=Ticket,
    retries=2,  # allow two corrective passes
)
 
@agent.output_validator
def check_priority(ctx, output: Ticket) -> Ticket:
    if output.priority not in {"low", "medium", "high"}:
        raise Retry("priority must be one of: low, medium, high")
    return output

Keep retries small — validators should converge in one or two passes. See Errors and retries and Structured output.

Persist memory with MemorySQLite

In-RAM Memory is fine for a script. For anything that restarts — a worker, a request handler, a long-lived service — use MemorySQLite(path), which is durable and backed by FTS5 full-text search. Attach it once via Agent(memory=...) and tools reach it through ctx.memory.

from agentroute import Agent, MemorySQLite
 
agent = Agent(
    name="assistant",
    model="claude-sonnet-4",
    memory=MemorySQLite("/var/lib/agentroute/memory.db"),
)

Point the path at durable storage (a mounted volume, not a container's ephemeral filesystem) and back it up like any other database. See Memory and the Memory reference.

Pick a history policy for long sessions

Without a history policy, a long conversation grows unbounded and eventually overruns the model's context window. Attach one of the built-in compactors via Agent(history=...):

HistorySlidingWindow

Keeps the most recent window_size messages (default 20). Cheap and predictable.

HistoryTruncate

Drops oldest messages to stay under max_tokens (default 100_000), using a ~4-chars-per-token heuristic.

HistorySummarize

Summarizes older turns and keeps the last keep_recent (default 4). Preserves more context at the cost of summary calls.

from agentroute import Agent, HistorySlidingWindow
 
agent = Agent(
    name="assistant",
    model="claude-sonnet-4",
    history=HistorySlidingWindow(window_size=30),
)

Warning

HistorySummarize requires a concrete Model — passing model=None raises RuntimeError. Resolve one explicitly with resolve_model and hand it to the policy.

from agentroute import Agent, HistorySummarize, resolve_model
 
summarizer = resolve_model("claude-sonnet-4")
agent = Agent(
    name="assistant",
    model="claude-sonnet-4",
    history=HistorySummarize(model=summarizer, keep_recent=6),
)

See History and the History reference.

Handle ErrorAgent subclasses at the call site

The two budget guards above raise exceptions. Catch them where you call run() so a single expensive request degrades gracefully instead of crashing the worker. Both ErrorMaxTurns and ErrorBudget subclass ErrorAgent, so you can catch the base class and branch on the specifics.

from agentroute import Agent, ErrorAgent, ErrorBudget, ErrorMaxTurns
 
agent = Agent(
    name="support",
    model="claude-sonnet-4",
    max_cost=0.50,
    max_turns=20,
)
 
try:
    result = agent.run("Investigate ticket #4821 and propose a fix.")
    print(result.output)
except ErrorBudget as exc:
    print(f"Hit cost ceiling: spent ${exc.spent:.4f} of ${exc.limit:.4f}")
except ErrorMaxTurns as exc:
    print(f"Hit turn limit: turn {exc.turn} of {exc.limit}")
except ErrorAgent as exc:
    print(f"Agent failed: {exc}")

Note

Retry is not an ErrorAgent and should not be caught here — the loop handles it internally and bounds it with retries. Only catch it if you are writing a custom layer that re-raises it deliberately.

See Errors and retries and the Exceptions reference.

Log result.usage for cost tracking

Every Result carries a usage object with input_tokens, output_tokens, total_cost_usd, and model_calls. Emit it after each run so you can attribute spend per request, per tenant, or per agent in your observability stack.

result = agent.run("Summarize today's incidents.")
 
usage = result.usage
log.info(
    "agent_run",
    agent="support",
    input_tokens=usage.input_tokens,
    output_tokens=usage.output_tokens,
    cost_usd=usage.total_cost_usd,
    model_calls=usage.model_calls,
)

Aggregating total_cost_usd over time also tells you whether your max_cost ceiling is set sensibly. See Context and usage and the Results reference.

Pin the model string

"claude-sonnet-4" resolves through vendor inference, which is convenient in development but leaves the exact model to the resolver. In production, pin a fully-qualified slug so a provider-side alias change cannot silently shift behavior or pricing under you. The bare name and the explicit slug both go through the same OpenRouter path.

# Development: convenient, vendor inferred.
agent = Agent(name="support", model="claude-sonnet-4")
 
# Production: explicit and stable.
agent = Agent(name="support", model="anthropic/claude-sonnet-4")

Record the pinned string alongside your deployment so you can reproduce a run later. See Models.

A production-ready agent, end to end

Putting the steps together — env-loaded keys, both budget guards, persistent memory, a history policy, a pinned model, error handling, and usage logging:

service.py

import logging
 
from agentroute import (
    Agent,
    ErrorAgent,
    ErrorBudget,
    ErrorMaxTurns,
    HistorySlidingWindow,
    MemorySQLite,
)
 
log = logging.getLogger("agent")
 
agent = Agent(
    name="support",
    model="anthropic/claude-sonnet-4",   # pinned slug
    instructions="You are a concise, accurate support agent.",
    max_cost=0.50,                        # USD ceiling per run
    max_turns=20,                         # loop ceiling
    retries=2,                            # corrective passes
    memory=MemorySQLite("/var/lib/agentroute/memory.db"),
    history=HistorySlidingWindow(window_size=30),
)
# API key comes from AGENTROUTE_API_KEY in the environment.
 
 
def handle(prompt: str) -> str:
    try:
        result = agent.run(prompt)
    except ErrorBudget as exc:
        log.warning("budget_exceeded", extra={"spent": exc.spent, "limit": exc.limit})
        return "That request was too expensive to complete. Please narrow it."
    except ErrorMaxTurns as exc:
        log.warning("turns_exceeded", extra={"turn": exc.turn, "limit": exc.limit})
        return "That request took too many steps. Please simplify it."
    except ErrorAgent as exc:
        log.error("agent_failed", extra={"error": str(exc)})
        return "Something went wrong. Please try again."
 
    u = result.usage
    log.info(
        "agent_run",
        extra={
            "input_tokens": u.input_tokens,
            "output_tokens": u.output_tokens,
            "cost_usd": u.total_cost_usd,
            "model_calls": u.model_calls,
        },
    )
    return str(result)

Where to go next

Models

Model-string resolution, vendor inference, and key precedence.

Errors and retries

The exception hierarchy and how Retry feeds back into the loop.

Memory

In-RAM versus persistent SQLite memory and FTS recall.

History

Sliding-window, truncation, and summarization policies.

Context and usage

What usage tracks and how cost is accumulated per run.

Platform roadmap

Hosted deployment, streaming, and native multi-agent.