History
Compact a growing conversation to fit the context window with sliding-window, truncation, or summarization policies.
Long conversations grow without bound. Every turn appends the user message, the assistant's reply, and any tool calls and their results to the transcript — and the whole transcript is re-sent to the model on the next turn. Left alone, it eventually overflows the model's context window and costs more on every call.
A history policy is a small object that takes the current message list and returns a shorter one. You attach one to an agent with Agent(history=...), and the agent applies it before each model call. AgentRoute ships three policies: keep the last N turns, drop turns until you fit a token budget, or fold older turns into an LLM-generated summary.
History compacts the current run's transcript so it fits the context window. Memory is the durable layer — facts and past conversations that persist across runs and processes. They are independent and compose well; see Combining memory and history.
The History protocol
Every policy implements one method. The protocol is runtime_checkable, so any object with a matching compact works — the built-in policies are just the batteries-included implementations.
from agentroute import History # the protocol
from agentroute.models import Message
async def compact(
messages: list[Message],
token_budget: int | None = None,
) -> list[Message]:
...compact receives the full transcript as a list of Message objects and returns a new, equal-or-shorter list. token_budget is an optional override the agent may pass at call time; when None, the policy uses the limit it was constructed with.
See the History reference for the full surface, and Models for the Message shape.
Turn grouping
All three policies operate on turns, not raw messages — so they never split a tool call from its result or strand half an exchange. Compaction happens in two steps before any policy logic runs.
First, leading system messages are split off and always preserved. Your instructions never get dropped.
Second, the remaining messages are grouped into turns. A turn starts at a user message and runs until the next user message. Because an assistant tool-call and the tool-role results it produces come after the user message that triggered them, they stay attached to that turn.
[system] ← split off, always kept
user: what's the weather in NYC? ┐
assistant: (tool_call get_weather)│ turn 1
tool: 72°F and sunny │
assistant: It's 72°F and sunny. ┘
user: and tomorrow? ┐
assistant: (tool_call get_weather)│ turn 2
tool: 68°F, light rain │
assistant: 68°F with light rain. ┘A policy that "keeps the last 1 turn" keeps all four messages of turn 2 — never just the trailing assistant message. This is what makes compaction safe: the model always sees complete, coherent exchanges.
Sliding window
HistorySlidingWindow keeps the last window_size turns and discards everything older. It is the simplest and cheapest policy — no token counting, no extra model calls — and a good default for chat-style agents where only recent context matters.
from agentroute import Agent, HistorySlidingWindow
agent = Agent(
name="support",
model="claude-sonnet-4",
instructions="You are a helpful support assistant.",
history=HistorySlidingWindow(window_size=10), # keep the last 10 turns
)If the conversation has window_size turns or fewer, the transcript is returned unchanged. window_size must be at least 1, otherwise the constructor raises ValueError.
Sliding window discards old turns permanently from the model's view. If the agent needs to recall something said 30 turns ago, pair it with memory so the agent can recall the fact even after the turn has scrolled out of the window.
Truncate
HistoryTruncate drops the oldest turns until the transcript fits a token budget. Use it when you care about a hard size ceiling rather than a fixed number of turns — turns vary wildly in length, so a token budget is a more honest proxy for context-window pressure.
from agentroute import Agent, HistoryTruncate
agent = Agent(
name="researcher",
model="claude-sonnet-4",
history=HistoryTruncate(max_tokens=100_000),
)Token counts use a dependency-free char heuristic — roughly 4 characters per token — so AgentRoute does not need to ship a tokenizer for every model. It is an estimate, not an exact count: leave headroom below the model's real context limit. max_tokens must be at least 1.
The system messages are counted first and always kept; the remaining budget is filled with the most recent turns, oldest dropped first. The newest turn is always retained even if it alone exceeds the budget, so the agent never loses the message it is currently answering.
Summarize
HistorySummarize keeps the last keep_recent turns verbatim and replaces everything older with a short LLM-generated summary, injected as a system message. Use it for long-running agents where early context still matters — decisions, names, numbers — but the verbatim back-and-forth no longer needs to be in the window.
from agentroute import Agent, HistorySummarize
from agentroute import resolve_model
summarizer = resolve_model("claude-sonnet-4")
agent = Agent(
name="planner",
model="claude-sonnet-4",
history=HistorySummarize(model=summarizer, keep_recent=4),
)Older turns are flattened into a transcript and sent to model with the summary_prompt. The default prompt asks for 3-5 bullet points that preserve names, numbers, and decisions without speculation; override it for domain-specific compaction. keep_recent must be at least 0 (use 0 to summarize everything). If the conversation has keep_recent turns or fewer, it is returned unchanged.
HistorySummarize needs a real model to do the summarizing. If you construct it with model=None (the default), the next compact call raises a RuntimeError —
HistorySummarize requires a model. Pass model=... when constructing.Always pass a model resolved via resolve_model or any object implementing the Model protocol.
Summarization adds one extra model call each time compaction triggers, and that call costs tokens. It is the most expensive policy — reach for it only when losing old context outright (as sliding window and truncate do) would hurt the agent's answers.
Choosing a policy
| Parameter | Type | Default | Description |
|---|---|---|---|
HistorySlidingWindow | window_size: int = 20 | — | Keeps the last N turns. No token counting, no extra calls. Best for chat where only recent context matters. Old turns are lost. |
HistoryTruncate | max_tokens: int = 100_000 | — | Drops oldest turns to fit a token budget (~4 chars/token heuristic). Best when you need a hard size ceiling. Old turns are lost. |
HistorySummarize | keep_recent: int = 4 | — | Summarizes older turns into a system message, keeps the last N verbatim. Best for long sessions where early context matters. Costs one extra model call and requires a concrete Model. |
| Policy | Bounds by | Extra model call | Keeps old context | Best for |
|---|---|---|---|---|
HistorySlidingWindow | turn count | no | no | short chat sessions |
HistoryTruncate | token budget | no | no | hard context-window ceiling |
HistorySummarize | turns kept verbatim | yes (per compaction) | yes, as a summary | long-running agents |
Combining memory and history
History and memory solve different problems, so attach both. Memory remembers facts and past conversations across runs; history keeps the live transcript inside the window during a single run.
from agentroute import Agent, Context, MemorySQLite, HistorySummarize
from agentroute import resolve_model
agent = Agent(
name="assistant",
model="claude-sonnet-4",
instructions="You are a long-running personal assistant.",
memory=MemorySQLite("assistant.db"), # durable across restarts
history=HistorySummarize( # keeps the live window small
model=resolve_model("claude-sonnet-4"),
keep_recent=6,
),
)
@agent.tool
async def remember_preference(ctx: Context, key: str, value: str) -> str:
"""Persist a user preference for future conversations."""
await ctx.memory.remember(key, value)
return f"Saved {key}."
result = agent.run("Remember that I prefer metric units.")
print(result)Here a turn can scroll out of the summarized window, but the preference saved with ctx.memory.remember survives — the agent can recall it in a completely new run. See Memory for the full durable layer and Context and usage for how ctx.memory is injected into tools.