Using local and custom models
Run AgentRoute agents against Ollama on localhost or any OpenAI-compatible endpoint, and understand how model strings resolve to a provider.
AgentRoute talks to models through a single model string. That string decides where requests go: a local Ollama server, your own OpenAI-compatible gateway, or OpenRouter (the default). You never wire up a client by hand — resolve_model reads the string and builds the right connection.
This guide covers the two non-default cases: running locally with Ollama, and pointing at a custom endpoint with your own credentials. For the full provider behavior, see Models and the resolve_model reference.
How a model string resolves
When you pass model="..." to an Agent, AgentRoute applies three rules, in order:
- Starts with
http://orhttps://→ a custom OpenAI-compatible endpoint. - Starts with
ollama/→ a local Ollama server athttp://localhost:11434/v1, no auth. - Anything else → OpenRouter, the default for every bare name and
vendor/modelslug.
For bare names that hit OpenRouter, AgentRoute infers the vendor prefix from the name: claude* becomes anthropic/, gpt*/o1*/o3*/o4* become openai/, gemini* becomes google/, llama* becomes meta-llama/, deepseek* becomes deepseek/, and mistral* becomes mistralai/. So model="claude-sonnet-4" resolves to anthropic/claude-sonnet-4 on OpenRouter.
For the OpenRouter default, a single key — AGENTROUTE_API_KEY or OPENROUTER_API_KEY — works for every model string. The two cases below are exactly when you step off that path.
Run against a local Ollama model
Prefix the model name with ollama/. AgentRoute points at http://localhost:11434/v1 and sends no real API key, so there is nothing to configure beyond having Ollama running.
Follow the Ollama install instructions, then pull a model.
ollama pull llama3Ollama serves an OpenAI-compatible API on port 11434 while it is running.
The ollama/ prefix is the only change. No api_key, no base_url.
from agentroute import Agent
agent = Agent(
name="local-helper",
model="ollama/llama3",
instructions="You are a concise coding assistant.",
)
result = agent.run("Explain what a Python generator is in two sentences.")
print(result)python local_agent.pyEverything else — tools, structured output, memory — works exactly as it does against a hosted model, as long as the local model supports tool calls.
The part after the slash is the Ollama model tag, so ollama/llama3:8b and ollama/qwen2.5-coder both work. If your Ollama runs on another host or port, pass an explicit base_url (see the next section), or use the raw-URL form directly.
The agent loop relies on the model emitting OpenAI-style tool calls. Smaller local models may not support tool calling well; if your agent ignores its tools, try a tool-capable model or drop the tools and rely on plain prompting.
Point at a custom OpenAI-compatible endpoint
Any endpoint that speaks the OpenAI chat-completions API — a self-hosted gateway, a private deployment, vLLM, LiteLLM, a corporate proxy — works through the raw-URL form. Pass the full URL as the model string and supply your own api_key.
from agentroute import Agent
agent = Agent(
name="gateway-agent",
model="https://llm.internal.example.com/v1/my-model",
base_url="https://llm.internal.example.com/v1",
api_key="sk-internal-...",
instructions="Answer using only the provided context.",
)
result = agent.run("Summarize today's incident report.")
print(result)How the URL is split
For a http(s):// model string, AgentRoute splits on the last /: everything before it is the base URL and the final segment is the model id. So https://llm.internal.example.com/v1/my-model yields base URL https://llm.internal.example.com/v1 and model id my-model.
Passing base_url explicitly (as above) overrides that split and is the clearest, least error-prone form — the model string then only needs to end in the correct model id. Use it whenever your model id itself contains slashes or your path does not end in the model name.
Custom endpoints do not fall back to your OpenRouter key. If the endpoint needs auth, pass api_key= explicitly (or set it in ~/.agentroute/config). Never hardcode secrets — read them from an environment variable, for example api_key=os.environ["MY_GATEWAY_KEY"].
Custom host for a local server
The same form covers an Ollama (or any local server) on a non-default host or port. Point the URL at it directly:
from agentroute import Agent
agent = Agent(
name="remote-ollama",
model="http://gpu-box.lan:11434/v1/llama3",
base_url="http://gpu-box.lan:11434/v1",
api_key="ollama", # placeholder; Ollama ignores it
)API-key resolution order
For any path that needs a key, AgentRoute resolves it in this order:
- The explicit
api_key=you pass to theAgent(highest priority). Config— environment variables, then~/.agentroute/config.None, for no-auth endpoints like Ollama.
The ollama/ path skips the lookup entirely and sends a placeholder key, so a missing config never blocks local development.
Inspect resolution directly
If you want to confirm where a string would route before wiring up an agent, call resolve_model yourself. It returns a concrete Model you can also pass to history compaction (for example HistorySummarize(model=...)).
from agentroute import resolve_model
local = resolve_model("ollama/llama3")
cloud = resolve_model("claude-sonnet-4") # -> anthropic/claude-sonnet-4 on OpenRouter
custom = resolve_model(
"https://llm.internal.example.com/v1/my-model",
api_key="sk-internal-...",
)resolve_model accepts the same api_key= and base_url= keyword arguments as the Agent constructor, and applies the identical rules.
Where to go next
The full provider model: string resolution, vendor inference, and config.
Signatures and rules for resolve_model, Model, and the message types.
HistorySummarize needs a concrete Model — wire one up with resolve_model.
What happens when an endpoint is unreachable or a budget is exceeded.