Using local and custom models

Run AgentRoute agents against Ollama on localhost or any OpenAI-compatible endpoint, and understand how model strings resolve to a provider.


AgentRoute talks to models through a single model string. That string decides where requests go: a local Ollama server, your own OpenAI-compatible gateway, or OpenRouter (the default). You never wire up a client by hand — resolve_model reads the string and builds the right connection.

This guide covers the two non-default cases: running locally with Ollama, and pointing at a custom endpoint with your own credentials. For the full provider behavior, see Models and the resolve_model reference.

How a model string resolves

When you pass model="..." to an Agent, AgentRoute applies three rules, in order:

  1. Starts with http:// or https:// → a custom OpenAI-compatible endpoint.
  2. Starts with ollama/ → a local Ollama server at http://localhost:11434/v1, no auth.
  3. Anything else → OpenRouter, the default for every bare name and vendor/model slug.

For bare names that hit OpenRouter, AgentRoute infers the vendor prefix from the name: claude* becomes anthropic/, gpt*/o1*/o3*/o4* become openai/, gemini* becomes google/, llama* becomes meta-llama/, deepseek* becomes deepseek/, and mistral* becomes mistralai/. So model="claude-sonnet-4" resolves to anthropic/claude-sonnet-4 on OpenRouter.

One key for the default path

For the OpenRouter default, a single key — AGENTROUTE_API_KEY or OPENROUTER_API_KEY — works for every model string. The two cases below are exactly when you step off that path.

Run against a local Ollama model

Prefix the model name with ollama/. AgentRoute points at http://localhost:11434/v1 and sends no real API key, so there is nothing to configure beyond having Ollama running.

1
Install and start Ollama

Follow the Ollama install instructions, then pull a model.

ollama pull llama3

Ollama serves an OpenAI-compatible API on port 11434 while it is running.

2
Point your agent at it

The ollama/ prefix is the only change. No api_key, no base_url.

local_agent.py
from agentroute import Agent
 
agent = Agent(
    name="local-helper",
    model="ollama/llama3",
    instructions="You are a concise coding assistant.",
)
 
result = agent.run("Explain what a Python generator is in two sentences.")
print(result)
3
Run it
python local_agent.py

Everything else — tools, structured output, memory — works exactly as it does against a hosted model, as long as the local model supports tool calls.

The part after the slash is the Ollama model tag, so ollama/llama3:8b and ollama/qwen2.5-coder both work. If your Ollama runs on another host or port, pass an explicit base_url (see the next section), or use the raw-URL form directly.

Tool support varies by model

The agent loop relies on the model emitting OpenAI-style tool calls. Smaller local models may not support tool calling well; if your agent ignores its tools, try a tool-capable model or drop the tools and rely on plain prompting.

Point at a custom OpenAI-compatible endpoint

Any endpoint that speaks the OpenAI chat-completions API — a self-hosted gateway, a private deployment, vLLM, LiteLLM, a corporate proxy — works through the raw-URL form. Pass the full URL as the model string and supply your own api_key.

custom_endpoint.py
from agentroute import Agent
 
agent = Agent(
    name="gateway-agent",
    model="https://llm.internal.example.com/v1/my-model",
    base_url="https://llm.internal.example.com/v1",
    api_key="sk-internal-...",
    instructions="Answer using only the provided context.",
)
 
result = agent.run("Summarize today's incident report.")
print(result)

How the URL is split

For a http(s):// model string, AgentRoute splits on the last /: everything before it is the base URL and the final segment is the model id. So https://llm.internal.example.com/v1/my-model yields base URL https://llm.internal.example.com/v1 and model id my-model.

Passing base_url explicitly (as above) overrides that split and is the clearest, least error-prone form — the model string then only needs to end in the correct model id. Use it whenever your model id itself contains slashes or your path does not end in the model name.

Provide your own auth

Custom endpoints do not fall back to your OpenRouter key. If the endpoint needs auth, pass api_key= explicitly (or set it in ~/.agentroute/config). Never hardcode secrets — read them from an environment variable, for example api_key=os.environ["MY_GATEWAY_KEY"].

Custom host for a local server

The same form covers an Ollama (or any local server) on a non-default host or port. Point the URL at it directly:

from agentroute import Agent
 
agent = Agent(
    name="remote-ollama",
    model="http://gpu-box.lan:11434/v1/llama3",
    base_url="http://gpu-box.lan:11434/v1",
    api_key="ollama",  # placeholder; Ollama ignores it
)

API-key resolution order

For any path that needs a key, AgentRoute resolves it in this order:

  1. The explicit api_key= you pass to the Agent (highest priority).
  2. Config — environment variables, then ~/.agentroute/config.
  3. None, for no-auth endpoints like Ollama.

The ollama/ path skips the lookup entirely and sends a placeholder key, so a missing config never blocks local development.

Inspect resolution directly

If you want to confirm where a string would route before wiring up an agent, call resolve_model yourself. It returns a concrete Model you can also pass to history compaction (for example HistorySummarize(model=...)).

from agentroute import resolve_model
 
local = resolve_model("ollama/llama3")
cloud = resolve_model("claude-sonnet-4")  # -> anthropic/claude-sonnet-4 on OpenRouter
custom = resolve_model(
    "https://llm.internal.example.com/v1/my-model",
    api_key="sk-internal-...",
)

resolve_model accepts the same api_key= and base_url= keyword arguments as the Agent constructor, and applies the identical rules.

Where to go next