OpenAI Agents SDK in Production: From Prototype to Deployed Multi-Agent System
The bottom line: The OpenAI Agents SDK (openai-agents on PyPI) ships a lightweight agent loop, handoffs, guardrails, sessions, and built-in tracing — enough to prototype a multi-agent system in an afternoon [1]. Moving to production means layering on multi-model routing (LiteLLM), persistent session stores, sandbox execution, and a decision framework for handoffs vs agents-as-tools. This guide walks through each layer with code you can deploy today.
What the SDK Gives You
The SDK evolved from the experimental Swarm library into a production-grade framework with five core primitives [1]:
| Primitive | What It Does | Production Ready? |
|---|---|---|
| Agent | LLM + instructions + tools | Yes — model-agnostic via model provider interface |
| Runner | Manages the agent loop (tool calls → LLM → done) | Yes — handles interruptions, resumable state |
| Handoffs / Agents as Tools | Two patterns for multi-agent coordination | Yes — handoffs for ownership transfer, tools for delegation |
| Guardrails | Input, output, and tool-level validation | Yes — parallel or blocking, tripwire exceptions |
| Sessions | Persistent memory across turns | Yes — pluggable backends (SQLite, custom) |
| Tracing | Built-in OpenTelemetry-compatible tracing | Yes — free dashboard via OpenAI |
The SDK intentionally keeps abstractions minimal [1]. You orchestrate agents with native Python — no YAML configs, no DSL, no framework-specific state machines. The LLM decides tool calls, handoffs, and when to stop.
Step 1: Core Agent with Function Tools
Start here. A single agent with @function_tool is the right default for 80% of use cases [1].
import asyncio
import httpx
from pydantic import BaseModel
from agents import Agent, Runner, function_tool
class StockPrice(BaseModel):
symbol: str
price_usd: float
change_pct: float
@function_tool
async def get_stock_price(symbol: str) -> StockPrice:
"""Get the current stock price and daily change for a symbol."""
async with httpx.AsyncClient(timeout=10) as client:
r = await client.get(
f"https://api.example-prices.com/v1/quote/{symbol}",
)
data = r.json()
return StockPrice(
symbol=data["symbol"],
price_usd=data["price"],
change_pct=data["changePercent"],
)
agent = Agent(
name="Stock Analyst",
instructions="You are a stock analyst. Provide concise quotes with price and daily change.",
tools=[get_stock_price],
model="gpt-5.4-mini",
)
async def main():
result = await Runner.run(agent, "What's AAPL doing today?")
print(result.final_output)
asyncio.run(main())
Key details: The @function_tool decorator auto-generates JSON schemas from your function signature and validates with Pydantic at runtime [1]. Docstrings become tool descriptions — be explicit. The default model is gpt-5.4-mini [1].
Step 2: Multi-Agent Patterns — Handoffs vs Agents as Tools
This is the most common design decision when scaling beyond one agent. The SDK supports two distinct patterns [2]:
| Pattern | Ownership | When to Use |
|---|---|---|
| Handoffs | Specialist takes over the conversation | Routing to different policy domains (billing vs support), different instructions or tools needed |
| Agents as tools | Manager stays in control | Specialist does a bounded task (summarize, classify, extract) and manager synthesizes final answer |
Handoffs — Delegating Ownership
from agents import Agent, handoff
billing_agent = Agent(
name="Billing",
instructions="Handle billing questions, payment history, and invoices.",
)
refund_agent = Agent(
name="Refund",
instructions="Process refund requests. Ask for order ID and reason.",
)
triage_agent = Agent(
name="Triage",
instructions="Route customers to the right specialist based on their question.",
handoffs=[billing_agent, handoff(refund_agent)],
)
result = await Runner.run(triage_agent, "I need a refund on order ABC-123.")
print(result.last_agent.name) # Refund
The LLM decides when to hand off. result.last_agent tells you which agent produced the final output. Handoffs pass full conversation history to the target agent — the specialist has full context [2].
Agents as Tools — Manager-Style Delegation
from agents import Agent
summarizer = Agent(
name="Summarizer",
instructions="Generate a concise summary of the supplied text. 2-3 sentences.",
)
researcher = Agent(
name="Research assistant",
tools=[
summarizer.as_tool(
tool_name="summarize_text",
tool_description="Summarize the provided text concisely.",
)
],
)
result = await Runner.run(researcher, "Summarize this earnings report: ...")
# Manager keeps control, calls summarizer as a tool
Rule of thumb: Start with one agent. Split only when a branch needs different instructions, tools, or policy [2]. Premature splitting creates more prompts, more traces, and more approval surfaces without improving outcomes.
Step 3: Guardrails — Input, Output, and Tool-Level
Guardrails prevent bad inputs from reaching expensive models and catch problematic outputs before they reach users [3].
from pydantic import BaseModel
from agents import (
Agent, GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
Runner, input_guardrail,
)
class TopicCheck(BaseModel):
is_on_topic: bool
reasoning: str
guardrail_checker = Agent(
name="Topic Guardrail",
instructions="Check if the user is asking about stock market or finance topics.",
output_type=TopicCheck,
)
@input_guardrail
async def finance_only_guardrail(ctx, agent, input_data):
result = await Runner.run(guardrail_checker, input_data)
return GuardrailFunctionOutput(
tripwire_triggered=not result.final_output.is_on_topic,
)
agent = Agent(
name="Stock Analyst",
instructions="You help with stock market questions.",
input_guardrails=[finance_only_guardrail],
)
Execution modes [3]:
- Parallel (default,
run_in_parallel=True): Runs guardrail concurrently with agent. Best latency but agent may consume tokens before guardrail completes. - Blocking (
run_in_parallel=False): Runs before agent starts. Prevents wasted tokens on out-of-domain inputs.
Tool guardrails fire on every custom function_tool invocation — use these when multi-agent workflows need checks on tool arguments across handoffs and delegated calls [3].
Step 4: Sessions and Persistent Memory
The SDK has a pluggable session system for maintaining state across turns [1].
from agents import Agent, Runner, SQLiteSession
session = SQLiteSession("user_123", db_path="./sessions.db")
agent = Agent(
name="Support Agent",
instructions="You are a support agent. Reference previous context from the session.",
)
# First turn
result = await Runner.run(agent, "I need help with my order.", session=session)
# Second turn — session preserves context
result = await Runner.run(agent, "What was my order number?", session=session)
The SQLiteSession backend persists to disk, survives restarts, and is thread-safe. For distributed deployments, implement the Session interface with Redis or PostgreSQL — the SDK provides the abstraction [1].
Step 5: Production Tracing and Monitoring
Tracing is enabled by default and sends traces to OpenAI’s dashboard [1]. For production, you’ll want to export traces to your own observability stack.
import os
os.environ["OPENAI_AGENTS_DISABLE_TRACING"] = "0" # default
# Sentry integration (if using sentry-sdk[openai-agents])
import sentry_sdk
sentry_sdk.init(dsn=os.environ["SENTRY_DSN"])
What tracing captures [1]:
- Agent runs (start → tool calls → handoffs → completion)
- Guardrail evaluations
- Tool invocation timing and results
- Token usage per turn
For self-hosted tracing, configure the exporter to point at your OpenTelemetry collector:
from agents.tracing import set_tracing_processor
from agents.tracing.processors import OTLPTraceExporter
set_tracing_processor(
OTLPTraceExporter(
endpoint="http://your-otel-collector:4318/v1/traces"
)
)
Production metrics to track [4]:
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
| Average agent turns per session | Workflow complexity | Unbounded growth → stuck loop |
| Guardrail trip rate | Abuse patterns or prompt confusion | >5% of total sessions |
| Handoff success rate | Agent routing accuracy | <80% → retrain prompts |
| Token cost per agent run | Cost per workflow | Weekly trend, not daily spike |
| p95 time-to-completion | User experience | >30s → investigate |
Step 6: Multi-Model Routing with LiteLLM
The SDK supports custom model providers. Use LiteLLM to route different agents to different models — a common pattern is GPT-5.4-mini for triage/guardrails and Claude Opus 4.7 for writing/code tasks [5].
from agents import Model, Agent, Runner, set_default_model
from agents.models.litellm import LiteLLMModel
# Create model instances
fast_model = LiteLLMModel(
model="gpt-5.4-mini",
temperature=0.1,
)
quality_model = LiteLLMModel(
model="claude-opus-4-7",
temperature=0.3,
)
# Route agents to appropriate models
triage = Agent(
name="Triage",
instructions="Route requests to the right specialist.",
model=fast_model, # Cheap, fast
)
writer = Agent(
name="Writer",
instructions="Write detailed responses.",
model=quality_model, # Expensive, high quality
)
From the April 2026 update [6], the SDK also supports sandbox agents — controlled workspaces with isolated file systems, manifest-defined mounts, and resumable sessions. This is the production path for agents that write code, process files, or need a reproducible environment.
Step 7: Production Checklist
Before deploying a multi-agent system built on the OpenAI Agents SDK:
| Check | Implementation | Why |
|---|---|---|
| Guardrails on every public input | Input guardrail + tool guardrails | Blocks injection and off-topic queries before they cost money [3] |
| Session persistence | SQLiteSession or Redis-backed | Survives restarts, supports multi-turn conversations [1] |
| Model cost isolation | Cheap model for triage/guardrails, expensive model for generation | Cuts per-session cost by 60–80% [5] |
| Timeout per agent run | Runner.run(..., max_turns=25) | Prevents infinite loops and runaway costs |
| Tracing exported | OTLP exporter to OpenTelemetry collector | Debug without OpenAI dashboard dependency [1] |
| Handoff boundaries documented | handoff_description on every handoff | Keeps routing legible as system grows [2] |
| Output validation on final agent | Structured outputs via output_type | Guarantees consumers get valid data [1] |
| Error recovery | Try/except on Runner.run(), retry with exponential backoff | Handles API errors, rate limits, transient failures |
When Not to Use the SDK
The SDK is not always the right choice [1]:
| Use the SDK when… | Use the Responses API directly when… |
|---|---|
| You want the runtime to manage turns, tool execution, guardrails | You own the loop and state yourself |
| Workflow spans multiple coordinated steps | Single model call + tool is enough |
| Agents produce artifacts or need resumable execution | Short-lived workflows returning model’s response |
You can mix both: use the SDK for managed workflows and call the Responses API directly for lower-level paths [1].
Summary
The OpenAI Agents SDK gives you a production-ready foundation in roughly 50 lines of Python. The critical layers you add yourself are:
- Multi-agent topology — handoffs for ownership transfer, agents-as-tools for bounded delegation [2]
- Guardrails at every boundary — input (blocking), output, and tool-level checks [3]
- Persistent sessions — SQLite for single-node, Redis for distributed [1]
- Multi-model routing — LiteLLM for cost-optimized agent assignments [5]
- Tracing export — OTLP to your observability stack [1]
The SDK’s minimal abstraction philosophy means you’re writing Python, not framework code. That makes it easier to debug, easier to test, and easier to migrate when the next framework arrives.
References
[1] OpenAI, “Agents SDK Documentation,” 2026. https://openai.github.io/openai-agents-python/
[2] OpenAI, “Orchestration and Handoffs — Agents SDK,” 2026. https://developers.openai.com/api/docs/guides/agents/orchestration
[3] OpenAI, “Guardrails — Agents SDK,” 2026. https://openai.github.io/openai-agents-python/guardrails/
[4] OpenAI, “Tracing — Agents SDK,” 2026. https://openai.github.io/openai-agents-python/tracing/
[5] Tech Insider, “OpenAI Agents SDK Tutorial: 13 Steps,” May 2026. https://tech-insider.org/openai-agents-sdk-tutorial-python-13-steps-2026/
[6] OpenAI, “The Next Evolution of the Agents SDK,” April 2026. https://openai.com/index/the-next-evolution-of-the-agents-sdk/
📖 Related Reads
- ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
- NoCode Insider — AI workflow automation with no-code tools, agents, and APIs
Cross-links automatically generated from NiteAgent.
← Back to all posts