Production checklist
A pragmatic pre-flight list for taking an AgentRoute agent from prototype to production — keys, budgets, retries, persistence, history, error handling, and cost tracking.
Prototyping an agent is a few lines. Shipping one means thinking about credentials, cost ceilings, failure modes, and what happens when a conversation runs for hours. This page is the short list we run through before an agent serves real traffic.
Work top to bottom. Each step is independent, so adopt the ones that fit your deployment and skip the rest.
This guide covers running your agent in your own process — the part of AgentRoute that is stable today. Hosted, managed deployment is on the platform roadmap.
The checklist
A bare model string resolves through OpenRouter, so one key works for every model. Resolution order is: an explicit api_key= argument, then your environment / config file, then nothing. Keep the key out of source by setting AGENTROUTE_API_KEY (or OPENROUTER_API_KEY) and letting resolve_model find it.
export AGENTROUTE_API_KEY="sk-or-..."from agentroute import Agent
# No api_key= here — it is read from the environment at run time.
agent = Agent(name="support", model="claude-sonnet-4")You can also persist the key once with the CLI, which writes ~/.agentroute/config:
agentroute loginDo not hardcode keys or commit .env files. If you pass api_key= explicitly (for example, a per-tenant key pulled from a secrets manager), make sure it never lands in logs.
See Models for the full resolution rules and per-vendor inference.
max_cost is a hard ceiling in USD per run(). Once accumulated cost crosses it, the loop stops and raises ErrorBudget, which carries spent and limit. This protects you from a runaway tool loop quietly burning credits.
agent = Agent(
name="support",
model="claude-sonnet-4",
max_cost=0.50, # stop this run if it would exceed $0.50
)Pick a number a few multiples above a normal run so legitimate traffic is not cut off, but low enough to catch a pathological loop. See Context and usage for how cost is accumulated.
max_turns (default 10) limits how many model-plus-tool cycles a single run may take. Exceeding it raises ErrorMaxTurns with turn and limit. Tool-heavy agents that chain several calls may need more headroom; a simple Q&A agent can run lower.
agent = Agent(
name="researcher",
model="claude-sonnet-4",
max_turns=20, # raise for multi-step tool use
)Treat max_turns and max_cost as two independent safety valves: turns guard against logic loops, cost guards against expensive ones.
retries (default 1) bounds how many times the loop will re-run after a Retry signal raised from a tool or an output validator. Retry is a control-flow hint, not a failure — it feeds a message back to the model so it can correct itself. Higher values give the model more chances to satisfy a validator; each retry costs another model call.
from pydantic import BaseModel
from agentroute import Agent, Retry
class Ticket(BaseModel):
summary: str
priority: str
agent = Agent(
name="triage",
model="claude-sonnet-4",
output=Ticket,
retries=2, # allow two corrective passes
)
@agent.output_validator
def check_priority(ctx, output: Ticket) -> Ticket:
if output.priority not in {"low", "medium", "high"}:
raise Retry("priority must be one of: low, medium, high")
return outputKeep retries small — validators should converge in one or two passes. See Errors and retries and Structured output.
In-RAM Memory is fine for a script. For anything that restarts — a worker, a request handler, a long-lived service — use MemorySQLite(path), which is durable and backed by FTS5 full-text search. Attach it once via Agent(memory=...) and tools reach it through ctx.memory.
from agentroute import Agent, MemorySQLite
agent = Agent(
name="assistant",
model="claude-sonnet-4",
memory=MemorySQLite("/var/lib/agentroute/memory.db"),
)Point the path at durable storage (a mounted volume, not a container's ephemeral filesystem) and back it up like any other database. See Memory and the Memory reference.
Without a history policy, a long conversation grows unbounded and eventually overruns the model's context window. Attach one of the built-in compactors via Agent(history=...):
Keeps the most recent window_size messages (default 20). Cheap and predictable.
Drops oldest messages to stay under max_tokens (default 100_000), using a ~4-chars-per-token heuristic.
Summarizes older turns and keeps the last keep_recent (default 4). Preserves more context at the cost of summary calls.
from agentroute import Agent, HistorySlidingWindow
agent = Agent(
name="assistant",
model="claude-sonnet-4",
history=HistorySlidingWindow(window_size=30),
)HistorySummarize requires a concrete Model — passing model=None raises RuntimeError. Resolve one explicitly with resolve_model and hand it to the policy.
from agentroute import Agent, HistorySummarize, resolve_model
summarizer = resolve_model("claude-sonnet-4")
agent = Agent(
name="assistant",
model="claude-sonnet-4",
history=HistorySummarize(model=summarizer, keep_recent=6),
)See History and the History reference.
The two budget guards above raise exceptions. Catch them where you call run() so a single expensive request degrades gracefully instead of crashing the worker. Both ErrorMaxTurns and ErrorBudget subclass ErrorAgent, so you can catch the base class and branch on the specifics.
from agentroute import Agent, ErrorAgent, ErrorBudget, ErrorMaxTurns
agent = Agent(
name="support",
model="claude-sonnet-4",
max_cost=0.50,
max_turns=20,
)
try:
result = agent.run("Investigate ticket #4821 and propose a fix.")
print(result.output)
except ErrorBudget as exc:
print(f"Hit cost ceiling: spent ${exc.spent:.4f} of ${exc.limit:.4f}")
except ErrorMaxTurns as exc:
print(f"Hit turn limit: turn {exc.turn} of {exc.limit}")
except ErrorAgent as exc:
print(f"Agent failed: {exc}")Retry is not an ErrorAgent and should not be caught here — the loop handles it internally and bounds it with retries. Only catch it if you are writing a custom layer that re-raises it deliberately.
See Errors and retries and the Exceptions reference.
Every Result carries a usage object with input_tokens, output_tokens, total_cost_usd, and model_calls. Emit it after each run so you can attribute spend per request, per tenant, or per agent in your observability stack.
result = agent.run("Summarize today's incidents.")
usage = result.usage
log.info(
"agent_run",
agent="support",
input_tokens=usage.input_tokens,
output_tokens=usage.output_tokens,
cost_usd=usage.total_cost_usd,
model_calls=usage.model_calls,
)Aggregating total_cost_usd over time also tells you whether your max_cost ceiling is set sensibly. See Context and usage and the Results reference.
"claude-sonnet-4" resolves through vendor inference, which is convenient in development but leaves the exact model to the resolver. In production, pin a fully-qualified slug so a provider-side alias change cannot silently shift behavior or pricing under you. The bare name and the explicit slug both go through the same OpenRouter path.
# Development: convenient, vendor inferred.
agent = Agent(name="support", model="claude-sonnet-4")
# Production: explicit and stable.
agent = Agent(name="support", model="anthropic/claude-sonnet-4")Record the pinned string alongside your deployment so you can reproduce a run later. See Models.
A production-ready agent, end to end
Putting the steps together — env-loaded keys, both budget guards, persistent memory, a history policy, a pinned model, error handling, and usage logging:
import logging
from agentroute import (
Agent,
ErrorAgent,
ErrorBudget,
ErrorMaxTurns,
HistorySlidingWindow,
MemorySQLite,
)
log = logging.getLogger("agent")
agent = Agent(
name="support",
model="anthropic/claude-sonnet-4", # pinned slug
instructions="You are a concise, accurate support agent.",
max_cost=0.50, # USD ceiling per run
max_turns=20, # loop ceiling
retries=2, # corrective passes
memory=MemorySQLite("/var/lib/agentroute/memory.db"),
history=HistorySlidingWindow(window_size=30),
)
# API key comes from AGENTROUTE_API_KEY in the environment.
def handle(prompt: str) -> str:
try:
result = agent.run(prompt)
except ErrorBudget as exc:
log.warning("budget_exceeded", extra={"spent": exc.spent, "limit": exc.limit})
return "That request was too expensive to complete. Please narrow it."
except ErrorMaxTurns as exc:
log.warning("turns_exceeded", extra={"turn": exc.turn, "limit": exc.limit})
return "That request took too many steps. Please simplify it."
except ErrorAgent as exc:
log.error("agent_failed", extra={"error": str(exc)})
return "Something went wrong. Please try again."
u = result.usage
log.info(
"agent_run",
extra={
"input_tokens": u.input_tokens,
"output_tokens": u.output_tokens,
"cost_usd": u.total_cost_usd,
"model_calls": u.model_calls,
},
)
return str(result)Where to go next
Model-string resolution, vendor inference, and key precedence.
The exception hierarchy and how Retry feeds back into the loop.
In-RAM versus persistent SQLite memory and FTS recall.
Sliding-window, truncation, and summarization policies.
What usage tracks and how cost is accumulated per run.
Hosted deployment, streaming, and native multi-agent.