FAQ

Quick answers to common questions about models, keys, sync vs async, memory, cost control, structured output, and what ships today.


Short answers to the questions that come up most often when you start building with AgentRoute. Each one links out to the deeper reference.

Models and providers

Which models are supported, and how do API keys work?

AgentRoute routes through OpenRouter by default, so any model string OpenRouter understands works out of the box — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, and more. You can also point at a local Ollama instance or any OpenAI-compatible endpoint.

resolve_model decides where a model string goes:

  • http:// or https:// prefix → a custom OpenAI-compatible endpoint (the last path segment is the model id).
  • ollama/NAMEhttp://localhost:11434/v1, no auth.
  • Anything else → OpenRouter.

For a bare name with no vendor prefix, the vendor is inferred from the name: claude* maps to anthropic, gpt*/o1/o3/o4 to openai, gemini* to google, llama* to meta-llama, deepseek* to deepseek, and mistral* to mistralai.

from agentroute import Agent
 
# OpenRouter (default) — vendor inferred from "claude"
agent = Agent(name="assistant", model="claude-sonnet-4")
 
# Fully qualified OpenRouter slug
agent = Agent(name="assistant", model="anthropic/claude-sonnet-4")
 
# Local Ollama, no key needed
agent = Agent(name="local", model="ollama/llama3.1")
 
# Any OpenAI-compatible endpoint
agent = Agent(name="custom", model="https://my-host/v1/my-model", api_key="sk-...")

One OpenRouter key works for every routed model. AgentRoute resolves it in this order: an explicit api_key= argument, then config (the AGENTROUTE_API_KEY or OPENROUTER_API_KEY environment variable, then ~/.agentroute/config), then nothing for no-auth endpoints like Ollama.

export AGENTROUTE_API_KEY="sk-or-..."   # or OPENROUTER_API_KEY
Tip

Ollama and custom endpoints skip auth entirely, so you can develop locally without any key. See Models and the resolve_model reference.

Running agents

What's the difference between run and arun?

run is the synchronous entry point — it calls arun under the hood with asyncio.run, so it's the right choice from a script, a notebook cell, or anywhere you're not already inside an event loop. arun is the native coroutine; use it inside async code or when you want to fan out concurrent calls.

result = agent.run("Summarize this PR")          # sync
result = await agent.arun("Summarize this PR")    # async

Both return a Result; str(result) gives you the output text. For concurrency, gather over arun:

import asyncio
 
prompts = ["First", "Second", "Third"]
results = await asyncio.gather(*(agent.arun(p) for p in prompts))
Warning

Don't call the synchronous run from inside a running event loop — asyncio.run raises if a loop is already active. Use await agent.arun(...) there instead.

Memory and history

Does memory persist between runs?

It depends which backend you attach. Memory is in-RAM and lives only as long as the process. MemorySQLite(path) writes to a SQLite file (with FTS5 search), so it survives restarts.

from agentroute import Agent, Memory, MemorySQLite
 
ephemeral = Agent(name="chat", model="claude-sonnet-4", memory=Memory())
persistent = Agent(name="chat", model="claude-sonnet-4", memory=MemorySQLite("agent.db"))

Both implement the same protocol (get_messages, add_messages, remember, recall, forget), and tools reach them through ctx.memory. See Memory and the reference.

How is that different from history?

Memory is long-term recall across sessions. History is how the working conversation is compacted to fit a context window on each turn. Attach a strategy with history=:

  • HistorySlidingWindow(window_size=20) keeps the last N messages.
  • HistoryTruncate(max_tokens=100_000) drops oldest messages using a ~4-chars/token heuristic.
  • HistorySummarize(model=..., keep_recent=4) summarizes older turns — this one needs a concrete Model, or it raises RuntimeError.

See History and the reference.

Cost and limits

How do I control how much an agent spends?

Three knobs, all on the Agent constructor or run/arun:

  • max_turns (default 10) caps tool-calling loops; hitting it raises ErrorMaxTurns.
  • max_cost (USD) caps spend; exceeding it raises ErrorBudget.
  • Every Result carries a Usage object so you can read what was actually spent.
from agentroute import Agent, ErrorBudget
 
agent = Agent(
    name="bounded",
    model="claude-sonnet-4",
    max_turns=6,
    max_cost=0.25,
)
 
try:
    result = agent.run("Do the task")
    print(result.usage.total_cost_usd, result.usage.model_calls)
except ErrorBudget as e:
    print(f"Stopped at ${e.spent:.4f} (limit ${e.limit})")

See Context and usage and Errors and retries.

Structured output

How do I get JSON back instead of free text?

Pass a Pydantic model as output=. The result's output attribute is then a validated instance of that model rather than a string.

from pydantic import BaseModel
from agentroute import Agent
 
class Invoice(BaseModel):
    number: str
    total: float
    currency: str
 
agent = Agent(name="extractor", model="claude-sonnet-4", output=Invoice)
result = agent.run("Invoice INV-42: total 199.00 EUR")
 
invoice = result.output          # an Invoice instance
print(invoice.total, invoice.currency)

To enforce extra constraints, add an @agent.output_validator and raise Retry("hint") to make the model try again:

from agentroute import Retry
 
@agent.output_validator
def positive_total(ctx, output: Invoice) -> Invoice:
    if output.total <= 0:
        raise Retry("total must be greater than zero")
    return output

Retries are bounded by Agent(retries=...). See Structured output and Results.

Platform and SDKs

What's shipped today versus on the roadmap?

The Python SDK — agents, tools, memory, history, structured output, cost/turn limits, and OpenRouter/Ollama/custom-endpoint routing — is real and runnable today at version 1.1.0. Streaming events and hosted deployment are still being finalized: the Event and ResultDeploy types exist as placeholders, and the CLI deploy command predates the current SDK.

Warning

Hosted deployment (and the CLI deploy flow) is still being finalized. For what's live now versus what's coming, see the platform roadmap.

Is there a TypeScript SDK?

Python is the primary and only first-class SDK today. A TypeScript SDK is on the roadmap but not yet available — track it on the platform roadmap.

What about multi-agent teams and agent-to-agent calls?

There's no native Team or A2A primitive yet. You can build multi-agent workflows today with plain Python: construct several specialized Agent instances and orchestrate them by calling run/arun (use asyncio.gather for concurrency). Native orchestration and agent-to-agent calls are on the platform roadmap.

Where to next