OpenAI Agents SDK in Production: From Prototype to Deployed Multi-Agent System

The bottom line: The OpenAI Agents SDK (openai-agents on PyPI) ships a lightweight agent loop, handoffs, guardrails, sessions, and built-in tracing — enough to prototype a multi-agent system in an afternoon [1]. Moving to production means layering on multi-model routing (LiteLLM), persistent session stores, sandbox execution, and a decision framework for handoffs vs agents-as-tools. This guide walks through each layer with code you can deploy today.


What the SDK Gives You

The SDK evolved from the experimental Swarm library into a production-grade framework with five core primitives [1]:

PrimitiveWhat It DoesProduction Ready?
AgentLLM + instructions + toolsYes — model-agnostic via model provider interface
RunnerManages the agent loop (tool calls → LLM → done)Yes — handles interruptions, resumable state
Handoffs / Agents as ToolsTwo patterns for multi-agent coordinationYes — handoffs for ownership transfer, tools for delegation
GuardrailsInput, output, and tool-level validationYes — parallel or blocking, tripwire exceptions
SessionsPersistent memory across turnsYes — pluggable backends (SQLite, custom)
TracingBuilt-in OpenTelemetry-compatible tracingYes — free dashboard via OpenAI

The SDK intentionally keeps abstractions minimal [1]. You orchestrate agents with native Python — no YAML configs, no DSL, no framework-specific state machines. The LLM decides tool calls, handoffs, and when to stop.


Step 1: Core Agent with Function Tools

Start here. A single agent with @function_tool is the right default for 80% of use cases [1].

import asyncio
import httpx
from pydantic import BaseModel
from agents import Agent, Runner, function_tool

class StockPrice(BaseModel):
    symbol: str
    price_usd: float
    change_pct: float

@function_tool
async def get_stock_price(symbol: str) -> StockPrice:
    """Get the current stock price and daily change for a symbol."""
    async with httpx.AsyncClient(timeout=10) as client:
        r = await client.get(
            f"https://api.example-prices.com/v1/quote/{symbol}",
        )
        data = r.json()
        return StockPrice(
            symbol=data["symbol"],
            price_usd=data["price"],
            change_pct=data["changePercent"],
        )

agent = Agent(
    name="Stock Analyst",
    instructions="You are a stock analyst. Provide concise quotes with price and daily change.",
    tools=[get_stock_price],
    model="gpt-5.4-mini",
)

async def main():
    result = await Runner.run(agent, "What's AAPL doing today?")
    print(result.final_output)

asyncio.run(main())

Key details: The @function_tool decorator auto-generates JSON schemas from your function signature and validates with Pydantic at runtime [1]. Docstrings become tool descriptions — be explicit. The default model is gpt-5.4-mini [1].


Step 2: Multi-Agent Patterns — Handoffs vs Agents as Tools

This is the most common design decision when scaling beyond one agent. The SDK supports two distinct patterns [2]:

PatternOwnershipWhen to Use
HandoffsSpecialist takes over the conversationRouting to different policy domains (billing vs support), different instructions or tools needed
Agents as toolsManager stays in controlSpecialist does a bounded task (summarize, classify, extract) and manager synthesizes final answer

Handoffs — Delegating Ownership

from agents import Agent, handoff

billing_agent = Agent(
    name="Billing",
    instructions="Handle billing questions, payment history, and invoices.",
)

refund_agent = Agent(
    name="Refund",
    instructions="Process refund requests. Ask for order ID and reason.",
)

triage_agent = Agent(
    name="Triage",
    instructions="Route customers to the right specialist based on their question.",
    handoffs=[billing_agent, handoff(refund_agent)],
)

result = await Runner.run(triage_agent, "I need a refund on order ABC-123.")
print(result.last_agent.name)  # Refund

The LLM decides when to hand off. result.last_agent tells you which agent produced the final output. Handoffs pass full conversation history to the target agent — the specialist has full context [2].

Agents as Tools — Manager-Style Delegation

from agents import Agent

summarizer = Agent(
    name="Summarizer",
    instructions="Generate a concise summary of the supplied text. 2-3 sentences.",
)

researcher = Agent(
    name="Research assistant",
    tools=[
        summarizer.as_tool(
            tool_name="summarize_text",
            tool_description="Summarize the provided text concisely.",
        )
    ],
)

result = await Runner.run(researcher, "Summarize this earnings report: ...")
# Manager keeps control, calls summarizer as a tool

Rule of thumb: Start with one agent. Split only when a branch needs different instructions, tools, or policy [2]. Premature splitting creates more prompts, more traces, and more approval surfaces without improving outcomes.


Step 3: Guardrails — Input, Output, and Tool-Level

Guardrails prevent bad inputs from reaching expensive models and catch problematic outputs before they reach users [3].

from pydantic import BaseModel
from agents import (
    Agent, GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    Runner, input_guardrail,
)

class TopicCheck(BaseModel):
    is_on_topic: bool
    reasoning: str

guardrail_checker = Agent(
    name="Topic Guardrail",
    instructions="Check if the user is asking about stock market or finance topics.",
    output_type=TopicCheck,
)

@input_guardrail
async def finance_only_guardrail(ctx, agent, input_data):
    result = await Runner.run(guardrail_checker, input_data)
    return GuardrailFunctionOutput(
        tripwire_triggered=not result.final_output.is_on_topic,
    )

agent = Agent(
    name="Stock Analyst",
    instructions="You help with stock market questions.",
    input_guardrails=[finance_only_guardrail],
)

Execution modes [3]:

  • Parallel (default, run_in_parallel=True): Runs guardrail concurrently with agent. Best latency but agent may consume tokens before guardrail completes.
  • Blocking (run_in_parallel=False): Runs before agent starts. Prevents wasted tokens on out-of-domain inputs.

Tool guardrails fire on every custom function_tool invocation — use these when multi-agent workflows need checks on tool arguments across handoffs and delegated calls [3].


Step 4: Sessions and Persistent Memory

The SDK has a pluggable session system for maintaining state across turns [1].

from agents import Agent, Runner, SQLiteSession

session = SQLiteSession("user_123", db_path="./sessions.db")
agent = Agent(
    name="Support Agent",
    instructions="You are a support agent. Reference previous context from the session.",
)

# First turn
result = await Runner.run(agent, "I need help with my order.", session=session)

# Second turn — session preserves context
result = await Runner.run(agent, "What was my order number?", session=session)

The SQLiteSession backend persists to disk, survives restarts, and is thread-safe. For distributed deployments, implement the Session interface with Redis or PostgreSQL — the SDK provides the abstraction [1].


Step 5: Production Tracing and Monitoring

Tracing is enabled by default and sends traces to OpenAI’s dashboard [1]. For production, you’ll want to export traces to your own observability stack.

import os
os.environ["OPENAI_AGENTS_DISABLE_TRACING"] = "0"  # default

# Sentry integration (if using sentry-sdk[openai-agents])
import sentry_sdk
sentry_sdk.init(dsn=os.environ["SENTRY_DSN"])

What tracing captures [1]:

  • Agent runs (start → tool calls → handoffs → completion)
  • Guardrail evaluations
  • Tool invocation timing and results
  • Token usage per turn

For self-hosted tracing, configure the exporter to point at your OpenTelemetry collector:

from agents.tracing import set_tracing_processor
from agents.tracing.processors import OTLPTraceExporter

set_tracing_processor(
    OTLPTraceExporter(
        endpoint="http://your-otel-collector:4318/v1/traces"
    )
)

Production metrics to track [4]:

MetricWhat It Tells YouAlert Threshold
Average agent turns per sessionWorkflow complexityUnbounded growth → stuck loop
Guardrail trip rateAbuse patterns or prompt confusion>5% of total sessions
Handoff success rateAgent routing accuracy<80% → retrain prompts
Token cost per agent runCost per workflowWeekly trend, not daily spike
p95 time-to-completionUser experience>30s → investigate

Step 6: Multi-Model Routing with LiteLLM

The SDK supports custom model providers. Use LiteLLM to route different agents to different models — a common pattern is GPT-5.4-mini for triage/guardrails and Claude Opus 4.7 for writing/code tasks [5].

from agents import Model, Agent, Runner, set_default_model
from agents.models.litellm import LiteLLMModel

# Create model instances
fast_model = LiteLLMModel(
    model="gpt-5.4-mini",
    temperature=0.1,
)

quality_model = LiteLLMModel(
    model="claude-opus-4-7",
    temperature=0.3,
)

# Route agents to appropriate models
triage = Agent(
    name="Triage",
    instructions="Route requests to the right specialist.",
    model=fast_model,  # Cheap, fast
)

writer = Agent(
    name="Writer",
    instructions="Write detailed responses.",
    model=quality_model,  # Expensive, high quality
)

From the April 2026 update [6], the SDK also supports sandbox agents — controlled workspaces with isolated file systems, manifest-defined mounts, and resumable sessions. This is the production path for agents that write code, process files, or need a reproducible environment.


Step 7: Production Checklist

Before deploying a multi-agent system built on the OpenAI Agents SDK:

CheckImplementationWhy
Guardrails on every public inputInput guardrail + tool guardrailsBlocks injection and off-topic queries before they cost money [3]
Session persistenceSQLiteSession or Redis-backedSurvives restarts, supports multi-turn conversations [1]
Model cost isolationCheap model for triage/guardrails, expensive model for generationCuts per-session cost by 60–80% [5]
Timeout per agent runRunner.run(..., max_turns=25)Prevents infinite loops and runaway costs
Tracing exportedOTLP exporter to OpenTelemetry collectorDebug without OpenAI dashboard dependency [1]
Handoff boundaries documentedhandoff_description on every handoffKeeps routing legible as system grows [2]
Output validation on final agentStructured outputs via output_typeGuarantees consumers get valid data [1]
Error recoveryTry/except on Runner.run(), retry with exponential backoffHandles API errors, rate limits, transient failures

When Not to Use the SDK

The SDK is not always the right choice [1]:

Use the SDK when…Use the Responses API directly when…
You want the runtime to manage turns, tool execution, guardrailsYou own the loop and state yourself
Workflow spans multiple coordinated stepsSingle model call + tool is enough
Agents produce artifacts or need resumable executionShort-lived workflows returning model’s response

You can mix both: use the SDK for managed workflows and call the Responses API directly for lower-level paths [1].


Summary

The OpenAI Agents SDK gives you a production-ready foundation in roughly 50 lines of Python. The critical layers you add yourself are:

  1. Multi-agent topology — handoffs for ownership transfer, agents-as-tools for bounded delegation [2]
  2. Guardrails at every boundary — input (blocking), output, and tool-level checks [3]
  3. Persistent sessions — SQLite for single-node, Redis for distributed [1]
  4. Multi-model routing — LiteLLM for cost-optimized agent assignments [5]
  5. Tracing export — OTLP to your observability stack [1]

The SDK’s minimal abstraction philosophy means you’re writing Python, not framework code. That makes it easier to debug, easier to test, and easier to migrate when the next framework arrives.


References

[1] OpenAI, “Agents SDK Documentation,” 2026. https://openai.github.io/openai-agents-python/

[2] OpenAI, “Orchestration and Handoffs — Agents SDK,” 2026. https://developers.openai.com/api/docs/guides/agents/orchestration

[3] OpenAI, “Guardrails — Agents SDK,” 2026. https://openai.github.io/openai-agents-python/guardrails/

[4] OpenAI, “Tracing — Agents SDK,” 2026. https://openai.github.io/openai-agents-python/tracing/

[5] Tech Insider, “OpenAI Agents SDK Tutorial: 13 Steps,” May 2026. https://tech-insider.org/openai-agents-sdk-tutorial-python-13-steps-2026/

[6] OpenAI, “The Next Evolution of the Agents SDK,” April 2026. https://openai.com/index/the-next-evolution-of-the-agents-sdk/

  • ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
  • NoCode Insider — AI workflow automation with no-code tools, agents, and APIs

Cross-links automatically generated from NiteAgent.

← Back to all posts