History
Conversation compaction policies — the History protocol plus the sliding-window, truncate, and summarize strategies for keeping a transcript inside the context window.
A History policy keeps a long conversation inside the model's context window by compacting the transcript before each turn. AgentRoute ships three concrete policies — drop old turns, fit a token budget, or summarize the past with an LLM — and they all share one async method. Attach a policy with Agent(history=...).
For the conceptual overview and guidance on which policy to pick, see History. For the message shape these policies operate on, see Models.
from agentroute import History, HistorySlidingWindow, HistoryTruncate, HistorySummarizeHistory
History is a runtime-checkable Protocol. Any object with a matching compact method satisfies it, so you can write your own policy without subclassing.
from agentroute import History@runtime_checkable
class History(Protocol):
async def compact(
self,
messages: list[Message],
token_budget: int | None = None,
) -> list[Message]: ...compact
async def compact(
self,
messages: list[Message],
token_budget: int | None = None,
) -> list[Message]| Parameter | Type | Default | Description |
|---|---|---|---|
messagesrequired | list[Message] | — | The full transcript to compact, oldest first. Leading system messages are always preserved. |
token_budget | int | None | None | Optional per-call token ceiling injected by the loop. When set, it overrides the policy's own limit (used by HistoryTruncate); other policies may ignore it. |
All built-in policies operate on turns, not individual messages. Leading system messages are split off first. A turn begins at a user message and runs until the next user message, so an assistant tool-call and its tool-result messages stay attached to the user turn that triggered them. This keeps a tool round-trip from being cut in half during compaction.
HistorySlidingWindow
HistorySlidingWindow keeps the last window_size user-turn groups and drops everything older. System messages are always preserved. The cheapest policy — no tokenizer, no model call.
from agentroute import HistorySlidingWindowclass HistorySlidingWindow:
def __init__(self, window_size: int = 20) -> None: ...| Parameter | Type | Default | Description |
|---|---|---|---|
window_size | int | 20 | Number of most-recent user-turn groups to keep. Must be >= 1. |
window_size <= 0.If the transcript already has window_size turns or fewer, compact returns the messages unchanged.
from agentroute import Agent, HistorySlidingWindow
agent = Agent(
name="support",
model="claude-sonnet-4",
history=HistorySlidingWindow(window_size=10),
)
# After 10 user turns, the oldest turns are dropped on each new turn;
# the system prompt is always kept.
result = agent.run("What did we decide about the refund?")
print(result)HistoryTruncate
HistoryTruncate drops the oldest user-turn groups until the total fits max_tokens. It uses a dependency-free character heuristic — roughly 4 characters per token — instead of a real tokenizer, so it stays fast and never adds a model dependency.
from agentroute import HistoryTruncateclass HistoryTruncate:
def __init__(self, max_tokens: int = 100_000) -> None: ...| Parameter | Type | Default | Description |
|---|---|---|---|
max_tokens | int | 100_000 | Approximate token budget for the whole transcript. Must be >= 1. |
max_tokens <= 0.System messages are counted against the budget first, then the most recent turns are kept until the remaining budget runs out. If compact receives a token_budget, that value overrides max_tokens for that call. At least the newest turn is always kept, even if it alone exceeds the budget.
from agentroute import Agent, HistoryTruncate
agent = Agent(
name="researcher",
model="claude-sonnet-4",
history=HistoryTruncate(max_tokens=32_000),
)
result = agent.run("Summarize everything we've covered so far.")
print(result)The ~4-chars-per-token heuristic is an estimate, not an exact count. Leave headroom under your model's real context limit rather than setting max_tokens to the exact maximum.
HistorySummarize
HistorySummarize replaces older turns with an LLM-generated summary and keeps the last keep_recent turns verbatim. The summary is inserted as a system message so the model retains the gist of the conversation without the full token cost.
from agentroute import HistorySummarizeclass HistorySummarize:
def __init__(
self,
model: Model | None = None,
*,
keep_recent: int = 4,
summary_prompt: str = (
"Summarize the following conversation in 3-5 concise bullet points. "
"Preserve names, numbers, and decisions. Do not add speculation."
),
) -> None: ...| Parameter | Type | Default | Description |
|---|---|---|---|
model | Model | None | None | The model used to generate the summary. Must be a concrete Model instance — see the requirement below. |
keep_recent | int | 4 | Number of most-recent user-turn groups to keep verbatim. Older turns are folded into the summary. Must be >= 0. |
summary_prompt | str | — | System prompt that instructs the model how to summarize. Defaults to a 3-5 bullet point summary that preserves names, numbers, and decisions. |
keep_recent < 0.compact time when model is None.HistorySummarize needs a Model to produce the summary. The constructor accepts model=None, but calling compact while model is still None raises RuntimeError. Build the model with resolve_model and pass it in.
If the transcript has keep_recent turns or fewer, compact returns the messages unchanged — there is nothing old enough to summarize.
from agentroute import Agent, HistorySummarize, resolve_model
summarizer = resolve_model("claude-sonnet-4")
agent = Agent(
name="planner",
model="claude-sonnet-4",
history=HistorySummarize(model=summarizer, keep_recent=4),
)
# Once the conversation exceeds 4 turns, older turns are replaced by a
# system-message summary while the latest 4 turns stay verbatim.
result = agent.run("Pick up where we left off.")
print(result)Custom policies
Because History is a Protocol, any object with a matching compact coroutine works as a policy. This one keeps only the latest turn-group plus any system messages.
from agentroute import Agent, Message
class KeepLast:
async def compact(
self,
messages: list[Message],
token_budget: int | None = None,
) -> list[Message]:
system = [m for m in messages if m.role == "system"]
rest = [m for m in messages if m.role != "system"]
return system + rest[-1:] # naive: keep only the final message
agent = Agent(name="terse", model="claude-sonnet-4", history=KeepLast())A real custom policy should respect turn boundaries so it never separates an assistant tool-call from its tool results. The KeepLast example above is intentionally minimal and may split a tool round-trip — use the built-in policies for production transcripts.