Code reviewer

A code-review agent that returns a structured verdict — overall score, summary, typed issues, and strengths — using a Pydantic output model.


This example builds an agent that reviews a code snippet against a fixed set of review standards and returns a fully structured verdict instead of free-form prose. The review standards live in instructions, and the shape of the result is defined by a Pydantic model passed as output. Because output is a model, result.output is a validated instance you can iterate over, filter by severity, and turn into a CI gate.

It demonstrates structured output: nested Pydantic models (Review containing a list[Issue]), the output= parameter on Agent, and reading the typed object off Result.

The agent

The output schema is the contract. We define an Issue model for a single finding and a Review model that aggregates the findings plus an overall score and summary. Passing output=Review puts the agent in structured mode — the model is steered to emit data matching that schema, and AgentRoute validates the response into a Review instance before handing it back.

code_reviewer.py
from enum import Enum
 
from pydantic import BaseModel, Field
 
from agentroute import Agent
 
 
class Severity(str, Enum):
    critical = "critical"
    major = "major"
    minor = "minor"
    nit = "nit"
 
 
class Issue(BaseModel):
    severity: Severity = Field(description="How serious this issue is.")
    category: str = Field(description="e.g. correctness, security, performance, style.")
    description: str = Field(description="What is wrong and why it matters.")
    suggestion: str = Field(description="A concrete fix the author can apply.")
 
 
class Review(BaseModel):
    overall_score: int = Field(ge=0, le=100, description="Quality score from 0 to 100.")
    summary: str = Field(description="One-paragraph verdict.")
    issues: list[Issue] = Field(default_factory=list)
    strengths: list[str] = Field(default_factory=list)
 
 
reviewer = Agent(
    name="code-reviewer",
    model="claude-sonnet-4",
    instructions=(
        "You are a senior engineer doing a code review. Review standards:\n"
        "- Correctness first: flag bugs, edge cases, and incorrect error handling.\n"
        "- Security: flag injection, unsafe input handling, and secret leakage.\n"
        "- Performance: flag obvious inefficiencies, not micro-optimizations.\n"
        "- Style: follow PEP 8; prefer clear names and small functions.\n"
        "Be specific and actionable. Cite the exact construct you are critiquing. "
        "Score 90+ only for code you would merge as-is; 70-89 for code needing "
        "minor changes; below 70 for code with real defects. Also call out genuine "
        "strengths so the author knows what to keep."
    ),
    output=Review,
)
 
 
SNIPPET = '''
def get_user(db, user_id):
    query = "SELECT * FROM users WHERE id = " + user_id
    row = db.execute(query).fetchone()
    return row["name"]
'''
 
 
def main() -> None:
    result = reviewer.run(f"Review this Python function:\n\n{SNIPPET}")
    review: Review = result.output  # a validated Review instance
 
    print(f"Score: {review.overall_score}/100")
    print(f"Summary: {review.summary}\n")
 
    for issue in sorted(review.issues, key=lambda i: i.severity.value):
        print(f"[{issue.severity.value.upper()}] {issue.category}: {issue.description}")
        print(f"  fix: {issue.suggestion}")
 
    if review.strengths:
        print("\nStrengths:")
        for strength in review.strengths:
            print(f"  + {strength}")
 
 
if __name__ == "__main__":
    main()
The score is just an int

Because overall_score is a plain int with ge=0, le=100, you can gate a pull request on it directly — for example, fail CI when review.overall_score < 80 or when any issue has severity == Severity.critical.

Run it

Set your key, then run the script:

export AGENTROUTE_API_KEY=sk-...
python code_reviewer.py

One OpenRouter-style key works for every model string — see models for how claude-sonnet-4 is resolved.

Score: 35/100
Summary: This function has a critical SQL injection vulnerability and a likely runtime error. It needs to be rewritten before merging.
 
[CRITICAL] security: user_id is concatenated directly into the SQL string, allowing SQL injection.
  fix: Use a parameterized query, e.g. db.execute("SELECT name FROM users WHERE id = ?", (user_id,)).
[MAJOR] correctness: fetchone() returns None when no row matches, so row["name"] will raise a TypeError.
  fix: Guard for a missing row and return None or raise a domain-specific error.
 
Strengths:
  + The function has a single, clear responsibility.

The exact numbers and wording vary per run, but result.output is always a valid Review — every field is present and typed, so downstream code never has to parse model prose.

Concepts used

For input-side validation that asks the model to retry a bad answer, see the output_validator pattern with Retry. Browse more in the examples index.