FAQ
Quick answers to common questions about models, keys, sync vs async, memory, cost control, structured output, and what ships today.
Short answers to the questions that come up most often when you start building with AgentRoute. Each one links out to the deeper reference.
Models and providers
Which models are supported, and how do API keys work?
AgentRoute routes through OpenRouter by default, so any model string OpenRouter understands works out of the box — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, and more. You can also point at a local Ollama instance or any OpenAI-compatible endpoint.
resolve_model decides where a model string goes:
http://orhttps://prefix → a custom OpenAI-compatible endpoint (the last path segment is the model id).ollama/NAME→http://localhost:11434/v1, no auth.- Anything else → OpenRouter.
For a bare name with no vendor prefix, the vendor is inferred from the name: claude* maps to anthropic, gpt*/o1/o3/o4 to openai, gemini* to google, llama* to meta-llama, deepseek* to deepseek, and mistral* to mistralai.
from agentroute import Agent
# OpenRouter (default) — vendor inferred from "claude"
agent = Agent(name="assistant", model="claude-sonnet-4")
# Fully qualified OpenRouter slug
agent = Agent(name="assistant", model="anthropic/claude-sonnet-4")
# Local Ollama, no key needed
agent = Agent(name="local", model="ollama/llama3.1")
# Any OpenAI-compatible endpoint
agent = Agent(name="custom", model="https://my-host/v1/my-model", api_key="sk-...")One OpenRouter key works for every routed model. AgentRoute resolves it in this order: an explicit api_key= argument, then config (the AGENTROUTE_API_KEY or OPENROUTER_API_KEY environment variable, then ~/.agentroute/config), then nothing for no-auth endpoints like Ollama.
export AGENTROUTE_API_KEY="sk-or-..." # or OPENROUTER_API_KEYOllama and custom endpoints skip auth entirely, so you can develop locally without any key. See Models and the resolve_model reference.
Running agents
What's the difference between run and arun?
run is the synchronous entry point — it calls arun under the hood with asyncio.run, so it's the right choice from a script, a notebook cell, or anywhere you're not already inside an event loop. arun is the native coroutine; use it inside async code or when you want to fan out concurrent calls.
result = agent.run("Summarize this PR") # sync
result = await agent.arun("Summarize this PR") # asyncBoth return a Result; str(result) gives you the output text. For concurrency, gather over arun:
import asyncio
prompts = ["First", "Second", "Third"]
results = await asyncio.gather(*(agent.arun(p) for p in prompts))Don't call the synchronous run from inside a running event loop — asyncio.run raises if a loop is already active. Use await agent.arun(...) there instead.
Memory and history
Does memory persist between runs?
It depends which backend you attach. Memory is in-RAM and lives only as long as the process. MemorySQLite(path) writes to a SQLite file (with FTS5 search), so it survives restarts.
from agentroute import Agent, Memory, MemorySQLite
ephemeral = Agent(name="chat", model="claude-sonnet-4", memory=Memory())
persistent = Agent(name="chat", model="claude-sonnet-4", memory=MemorySQLite("agent.db"))Both implement the same protocol (get_messages, add_messages, remember, recall, forget), and tools reach them through ctx.memory. See Memory and the reference.
How is that different from history?
Memory is long-term recall across sessions. History is how the working conversation is compacted to fit a context window on each turn. Attach a strategy with history=:
HistorySlidingWindow(window_size=20)keeps the last N messages.HistoryTruncate(max_tokens=100_000)drops oldest messages using a ~4-chars/token heuristic.HistorySummarize(model=..., keep_recent=4)summarizes older turns — this one needs a concreteModel, or it raisesRuntimeError.
See History and the reference.
Cost and limits
How do I control how much an agent spends?
Three knobs, all on the Agent constructor or run/arun:
max_turns(default10) caps tool-calling loops; hitting it raisesErrorMaxTurns.max_cost(USD) caps spend; exceeding it raisesErrorBudget.- Every
Resultcarries aUsageobject so you can read what was actually spent.
from agentroute import Agent, ErrorBudget
agent = Agent(
name="bounded",
model="claude-sonnet-4",
max_turns=6,
max_cost=0.25,
)
try:
result = agent.run("Do the task")
print(result.usage.total_cost_usd, result.usage.model_calls)
except ErrorBudget as e:
print(f"Stopped at ${e.spent:.4f} (limit ${e.limit})")See Context and usage and Errors and retries.
Structured output
How do I get JSON back instead of free text?
Pass a Pydantic model as output=. The result's output attribute is then a validated instance of that model rather than a string.
from pydantic import BaseModel
from agentroute import Agent
class Invoice(BaseModel):
number: str
total: float
currency: str
agent = Agent(name="extractor", model="claude-sonnet-4", output=Invoice)
result = agent.run("Invoice INV-42: total 199.00 EUR")
invoice = result.output # an Invoice instance
print(invoice.total, invoice.currency)To enforce extra constraints, add an @agent.output_validator and raise Retry("hint") to make the model try again:
from agentroute import Retry
@agent.output_validator
def positive_total(ctx, output: Invoice) -> Invoice:
if output.total <= 0:
raise Retry("total must be greater than zero")
return outputRetries are bounded by Agent(retries=...). See Structured output and Results.
Platform and SDKs
What's shipped today versus on the roadmap?
The Python SDK — agents, tools, memory, history, structured output, cost/turn limits, and OpenRouter/Ollama/custom-endpoint routing — is real and runnable today at version 1.1.0. Streaming events and hosted deployment are still being finalized: the Event and ResultDeploy types exist as placeholders, and the CLI deploy command predates the current SDK.
Hosted deployment (and the CLI deploy flow) is still being finalized. For what's live now versus what's coming, see the platform roadmap.
Is there a TypeScript SDK?
Python is the primary and only first-class SDK today. A TypeScript SDK is on the roadmap but not yet available — track it on the platform roadmap.
What about multi-agent teams and agent-to-agent calls?
There's no native Team or A2A primitive yet. You can build multi-agent workflows today with plain Python: construct several specialized Agent instances and orchestrate them by calling run/arun (use asyncio.gather for concurrency). Native orchestration and agent-to-agent calls are on the platform roadmap.
Where to next
How routing, vendors, and keys resolve.
In-RAM versus persistent SQLite recall.
Typed results with Pydantic and validators.
Budgets, turn limits, and the Retry signal.
login, deploy, logs, and list.
What's shipped and what's coming.