Content moderator

This example builds a content moderator that takes a piece of user-generated text and returns a structured verdict: is it safe, how severe is any violation, which policies it breaks, and what to do about it. The moderation guidelines live entirely in the agent's instructions, and the answer comes back as a typed Pydantic model instead of free text — so you can branch on result.output.is_safe in code without parsing a paragraph.

It also shows a common pattern: a second, focused agent that scans the same text for personally identifiable information (PII). Each stage has its own output model, and you compose them in plain Python.

The main concept on display is structured output — passing output=Model to get an instance of your own type back.

The moderation model

Define exactly the shape you want back. The model is the contract: the agent is told to produce it, and result.output is an instance of it.

models.py

from enum import Enum
from pydantic import BaseModel, Field
 
 
class Severity(str, Enum):
    none = "none"
    low = "low"
    medium = "medium"
    high = "high"
 
 
class Violation(BaseModel):
    category: str = Field(description="Policy category, e.g. 'harassment', 'hate', 'self-harm'")
    quote: str = Field(description="The specific span of text that triggered this violation")
    reason: str = Field(description="Why this span violates the category")
 
 
class Moderation(BaseModel):
    is_safe: bool
    severity: Severity
    violations: list[Violation]
    recommendation: str = Field(description="One of: allow, flag-for-review, block")

Using a str-valued Enum for severity keeps the model honest — it can only return one of the four levels, not an arbitrary string.

The full agent

The guidelines go in instructions. The moderator agent is configured with output=Moderation, so a single run returns a fully populated Moderation instance. A second agent, pii_agent, reuses the same input to detect PII with its own PIIReport model.

content_moderator.py

import asyncio
 
from agentroute import Agent
from pydantic import BaseModel, Field
 
from models import Moderation, Severity
 
 
GUIDELINES = """
You are a content moderation system. Evaluate the user's text against these policies:
 
- harassment: targeted insults, threats, or demeaning language aimed at a person or group
- hate: dehumanizing content based on protected attributes
- self-harm: encouragement or instructions for self-harm
- violence: credible threats or glorification of violence
- sexual: explicit sexual content involving minors is always a high-severity block
 
Rules:
- Quote the exact span that triggered each violation.
- severity is the maximum across all violations ("none" if there are none).
- is_safe is true only when there are zero violations.
- recommendation must be one of: "allow", "flag-for-review", "block".
  Use "allow" for safe text, "flag-for-review" for low/medium severity,
  and "block" for high severity.
Be precise. Do not invent violations that are not clearly present.
"""
 
moderator = Agent(
    name="content-moderator",
    model="claude-sonnet-4",
    instructions=GUIDELINES,
    output=Moderation,
)
 
 
class PIIItem(BaseModel):
    kind: str = Field(description="e.g. email, phone, ssn, credit_card, address")
    value: str = Field(description="The detected value as it appears in the text")
 
 
class PIIReport(BaseModel):
    has_pii: bool
    items: list[PIIItem]
 
 
pii_agent = Agent(
    name="pii-detector",
    model="claude-sonnet-4",
    instructions=(
        "Detect any personally identifiable information in the text. "
        "Return each piece of PII with its kind and the exact value found. "
        "has_pii is true only when at least one item is present."
    ),
    output=PIIReport,
)
 
 
async def review(text: str) -> tuple[Moderation, PIIReport]:
    # Run both stages concurrently — they are independent.
    verdict, pii = await asyncio.gather(
        moderator.arun(text),
        pii_agent.arun(text),
    )
    return verdict.output, pii.output
 
 
async def main() -> None:
    text = "Hey loser, I know you live at 42 Elm St — email me at jo@x.com or I'll find you."
    verdict, pii = await review(text)
 
    print(f"safe={verdict.is_safe} severity={verdict.severity.value} -> {verdict.recommendation}")
    for v in verdict.violations:
        print(f"  [{v.category}] {v.quote!r}: {v.reason}")
 
    if pii.has_pii:
        print("PII found:")
        for item in pii.items:
            print(f"  {item.kind}: {item.value}")
 
 
if __name__ == "__main__":
    asyncio.run(main())

Because both moderator and pii_agent expose arun, you can fan them out with asyncio.gather — each runs as its own LLM call, and you read the typed result off each Result.output.

Read the result, don't parse it

With output=Moderation, result.output is a Moderation instance — so verdict.severity is a Severity enum and verdict.violations is a list[Violation]. There is no JSON parsing or string matching in your code. See structured output and Result.

Run it

export AGENTROUTE_API_KEY="sk-..."
python content_moderator.py

A single OpenRouter key (AGENTROUTE_API_KEY or OPENROUTER_API_KEY) works for every model string, including claude-sonnet-4. See models for how model strings resolve.

Variations

Validate the verdict

Add an @moderator.output_validator that raises Retry("...") when the model returns is_safe=True but a non-empty violations list, forcing a self-correction. See errors and retries.

Synchronous single call

If you only need the moderation verdict, skip the gather and call moderator.run(text).output directly — no async required.

Concepts used

Structured output — output=Moderation and reading result.output
Agents — configuring instructions and model
Results and Result — the typed return value
Errors and retries — Retry from an output validator for self-correction
Models — how claude-sonnet-4 resolves to a provider

See the full examples index for more end-to-end agents.