History

Conversation compaction policies — the History protocol plus the sliding-window, truncate, and summarize strategies for keeping a transcript inside the context window.


A History policy keeps a long conversation inside the model's context window by compacting the transcript before each turn. AgentRoute ships three concrete policies — drop old turns, fit a token budget, or summarize the past with an LLM — and they all share one async method. Attach a policy with Agent(history=...).

For the conceptual overview and guidance on which policy to pick, see History. For the message shape these policies operate on, see Models.

from agentroute import History, HistorySlidingWindow, HistoryTruncate, HistorySummarize

History

History is a runtime-checkable Protocol. Any object with a matching compact method satisfies it, so you can write your own policy without subclassing.

from agentroute import History
@runtime_checkable
class History(Protocol):
    async def compact(
        self,
        messages: list[Message],
        token_budget: int | None = None,
    ) -> list[Message]: ...

compact

async def compact(
    self,
    messages: list[Message],
    token_budget: int | None = None,
) -> list[Message]
ParameterTypeDefaultDescription
messagesrequiredlist[Message]The full transcript to compact, oldest first. Leading system messages are always preserved.
token_budgetint | NoneNoneOptional per-call token ceiling injected by the loop. When set, it overrides the policy's own limit (used by HistoryTruncate); other policies may ignore it.
A new, compacted message list. The original list is never mutated.
Turn grouping

All built-in policies operate on turns, not individual messages. Leading system messages are split off first. A turn begins at a user message and runs until the next user message, so an assistant tool-call and its tool-result messages stay attached to the user turn that triggered them. This keeps a tool round-trip from being cut in half during compaction.

HistorySlidingWindow

HistorySlidingWindow keeps the last window_size user-turn groups and drops everything older. System messages are always preserved. The cheapest policy — no tokenizer, no model call.

from agentroute import HistorySlidingWindow
class HistorySlidingWindow:
    def __init__(self, window_size: int = 20) -> None: ...
ParameterTypeDefaultDescription
window_sizeint20Number of most-recent user-turn groups to keep. Must be >= 1.
Raises
ValueErrorwhen window_size <= 0.

If the transcript already has window_size turns or fewer, compact returns the messages unchanged.

from agentroute import Agent, HistorySlidingWindow
 
agent = Agent(
    name="support",
    model="claude-sonnet-4",
    history=HistorySlidingWindow(window_size=10),
)
# After 10 user turns, the oldest turns are dropped on each new turn;
# the system prompt is always kept.
result = agent.run("What did we decide about the refund?")
print(result)

HistoryTruncate

HistoryTruncate drops the oldest user-turn groups until the total fits max_tokens. It uses a dependency-free character heuristic — roughly 4 characters per token — instead of a real tokenizer, so it stays fast and never adds a model dependency.

from agentroute import HistoryTruncate
class HistoryTruncate:
    def __init__(self, max_tokens: int = 100_000) -> None: ...
ParameterTypeDefaultDescription
max_tokensint100_000Approximate token budget for the whole transcript. Must be >= 1.
Raises
ValueErrorwhen max_tokens <= 0.

System messages are counted against the budget first, then the most recent turns are kept until the remaining budget runs out. If compact receives a token_budget, that value overrides max_tokens for that call. At least the newest turn is always kept, even if it alone exceeds the budget.

from agentroute import Agent, HistoryTruncate
 
agent = Agent(
    name="researcher",
    model="claude-sonnet-4",
    history=HistoryTruncate(max_tokens=32_000),
)
result = agent.run("Summarize everything we've covered so far.")
print(result)
Tip

The ~4-chars-per-token heuristic is an estimate, not an exact count. Leave headroom under your model's real context limit rather than setting max_tokens to the exact maximum.

HistorySummarize

HistorySummarize replaces older turns with an LLM-generated summary and keeps the last keep_recent turns verbatim. The summary is inserted as a system message so the model retains the gist of the conversation without the full token cost.

from agentroute import HistorySummarize
class HistorySummarize:
    def __init__(
        self,
        model: Model | None = None,
        *,
        keep_recent: int = 4,
        summary_prompt: str = (
            "Summarize the following conversation in 3-5 concise bullet points. "
            "Preserve names, numbers, and decisions. Do not add speculation."
        ),
    ) -> None: ...
ParameterTypeDefaultDescription
modelModel | NoneNoneThe model used to generate the summary. Must be a concrete Model instance — see the requirement below.
keep_recentint4Number of most-recent user-turn groups to keep verbatim. Older turns are folded into the summary. Must be >= 0.
summary_promptstrSystem prompt that instructs the model how to summarize. Defaults to a 3-5 bullet point summary that preserves names, numbers, and decisions.
Raises
ValueErrorwhen keep_recent < 0.
RuntimeErrorat compact time when model is None.
A concrete model is required

HistorySummarize needs a Model to produce the summary. The constructor accepts model=None, but calling compact while model is still None raises RuntimeError. Build the model with resolve_model and pass it in.

If the transcript has keep_recent turns or fewer, compact returns the messages unchanged — there is nothing old enough to summarize.

from agentroute import Agent, HistorySummarize, resolve_model
 
summarizer = resolve_model("claude-sonnet-4")
 
agent = Agent(
    name="planner",
    model="claude-sonnet-4",
    history=HistorySummarize(model=summarizer, keep_recent=4),
)
# Once the conversation exceeds 4 turns, older turns are replaced by a
# system-message summary while the latest 4 turns stay verbatim.
result = agent.run("Pick up where we left off.")
print(result)

Custom policies

Because History is a Protocol, any object with a matching compact coroutine works as a policy. This one keeps only the latest turn-group plus any system messages.

from agentroute import Agent, Message
 
class KeepLast:
    async def compact(
        self,
        messages: list[Message],
        token_budget: int | None = None,
    ) -> list[Message]:
        system = [m for m in messages if m.role == "system"]
        rest = [m for m in messages if m.role != "system"]
        return system + rest[-1:]  # naive: keep only the final message
 
agent = Agent(name="terse", model="claude-sonnet-4", history=KeepLast())
Caution

A real custom policy should respect turn boundaries so it never separates an assistant tool-call from its tool results. The KeepLast example above is intentionally minimal and may split a tool round-trip — use the built-in policies for production transcripts.

See also

Source: memory/history.py