Knowledge base builder

Turn raw documents into structured Q&A pairs and topic summaries, then process a whole corpus concurrently.


This example builds a knowledge base from plain-text documents. Each document goes through an agent that returns a typed KB object — a list of QAPair items plus a short summary — using structured output. To process a corpus rather than a single file, we fan out over agent.arun with asyncio.gather, covered in the async and concurrency guide.

The pattern is two nested Pydantic models. The agent's output is the top-level KB model, and the model populates the nested QAPair list directly — no parsing, no regex, just result.output.

The agent

kb_builder.py
import asyncio
from typing import Literal
 
from pydantic import BaseModel, Field
 
from agentroute import Agent
 
 
class QAPair(BaseModel):
    question: str
    answer: str
    difficulty: Literal["easy", "medium", "hard"]
    tags: list[str] = Field(default_factory=list)
 
 
class KB(BaseModel):
    summary: str = Field(description="Two-sentence summary of the document.")
    pairs: list[QAPair] = Field(description="Question/answer pairs covering the document.")
 
 
agent = Agent(
    name="kb-builder",
    model="claude-sonnet-4",
    instructions=(
        "You turn a single document into a study-ready knowledge base. "
        "Write clear, self-contained questions and accurate answers grounded "
        "only in the document. Assign a difficulty and 1-3 lowercase tags per pair. "
        "Aim for 4-8 pairs that cover the document's key ideas."
    ),
    output=KB,
)
 
 
def build_kb(document: str) -> KB:
    """Build a knowledge base from one document (synchronous)."""
    result = agent.run(f"Document:\n\n{document}")
    return result.output
 
 
async def build_corpus(documents: list[str]) -> list[KB]:
    """Build knowledge bases for many documents concurrently."""
    results = await asyncio.gather(
        *(agent.arun(f"Document:\n\n{doc}") for doc in documents)
    )
    return [r.output for r in results]

The agent is reused across every document — Agent is stateless between runs, so the same instance is safe to call concurrently. Each arun gets its own Context and Usage, so per-document cost and token counts never bleed into each other.

Run it

run.py
import asyncio
 
from kb_builder import build_corpus
 
DOCS = [
    "AgentRoute resolves a bare model name to a vendor and routes the call "
    "through OpenRouter. A single API key works for every model string.",
    "Structured output is requested by passing output=SomeModel to Agent. "
    "The result's output attribute is then an instance of that model.",
]
 
 
async def main() -> None:
    kbs = await build_corpus(DOCS)
    for i, kb in enumerate(kbs, start=1):
        print(f"\nDocument {i}: {kb.summary}")
        for pair in kb.pairs:
            print(f"  [{pair.difficulty}] {pair.question}")
            print(f"    -> {pair.answer}  ({', '.join(pair.tags)})")
 
 
asyncio.run(main())

Set your key first:

export AGENTROUTE_API_KEY="sk-or-..."
python run.py
Concurrency is bounded by the provider, not the SDK

asyncio.gather launches every document at once. For large corpora, cap concurrency with an asyncio.Semaphore (see the async guide) to stay under your provider's rate limits.

A note on coordination

This example uses one specialized agent applied in parallel. For pipelines where distinct agents hand work to each other — say an extractor feeding a reviewer feeding an indexer — compose them today with plain Python: call each Agent in turn and pass one result into the next prompt.

Native multi-agent is on the roadmap

First-class Team/Company coordination and agent-to-agent (A2A) calls are planned but not yet available. Until then, orchestrate multiple Agent instances with ordinary Python control flow. Track progress on the platform roadmap.

Concepts used