Containerizing AI Agent Services with Docker: A Production Guide

Shipping an AI agent as a script is easy. Shipping one that survives production traffic, restarts cleanly, and doesn’t leak API keys or eat all your disk — that’s the hard part. Docker gives you a repeatable, isolated, and observable runtime for agent services. This guide walks through the patterns that actually work in production, not the tutorial examples.

Prerequisites

Docker Engine 24+ installed (docker --version)
A Python 3.11+ agent project with at least one tool call (any LLM SDK)
Basic familiarity with Dockerfiles and docker compose
A container registry account (Docker Hub, GHCR, or any OCI-compliant registry)

Step 1: Design the Dockerfile with Multi-Stage Builds

A single-stage Dockerfile for an agent service balloons to 2GB+ because pip installs compile dependencies, LLM SDKs pull in CUDA libs, and you carry build tooling that runtime doesn’t need. Multi-stage solves this.

# Stage 1: Build dependencies
FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .

RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Runtime — minimal image
FROM python:3.11-slim AS runtime

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates curl \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

COPY agent_service/ ./agent_service/
COPY config/ ./config/

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

ENTRYPOINT ["python", "-m", "agent_service.main"]

The builder stage compiles and caches pip. The runtime stage copies only the installed packages, plus your application code. The final image for a typical agent service with OpenAI, Anthropic, and LangChain SDKs comes in around 450MB instead of 2.1GB.

Step 2: Manage Tool Dependencies Carefully

Agent tools often pull in heavy dependencies: Playwright (browsers), Pillow (image processing), pandas (dataframes), unstructured (document parsing). Each one multiplies image size and attack surface.

Pattern: Optional extras via dependency groups

# Only install browser tooling when needed
ARG INSTALL_BROWSER=false

RUN if [ "$INSTALL_BROWSER" = "true" ]; then \
    pip install --user playwright && \
    playwright install chromium; \
    fi

Pattern: Separate tool containers

For heavy tools (browser automation, code execution sandboxes), run them as sidecar containers and communicate over HTTP or Unix sockets. The main agent container stays lean.

# docker-compose.yml excerpt
services:
  agent:
    build: .
    ports: ["8080:8080"]
    environment:
      - BROWSER_TOOL_URL=http://browser-tool:9222
      - CODE_TOOL_URL=http://code-sandbox:50051

  browser-tool:
    image: ghcr.io/your-org/browser-agent-tool:latest
    restart: unless-stopped

  code-sandbox:
    image: ghcr.io/your-org/code-exec-sandbox:latest
    read_only: true
    security_opt:
      - no-new-privileges:true

[1] Docker multi-stage builds for Python — https://docs.docker.com/build/building/multi-stage/

Step 3: Configure the Agent Through Environment Variables

Hardcoding API keys, model names, and endpoint URLs in your agent code is the most common production mistake. Docker’s environment-based configuration pattern solves it cleanly.

# config/settings.py
import os
from pydantic_settings import BaseSettings

class AgentSettings(BaseSettings):
    model_config = {"env_prefix": "AGENT_"}

    llm_provider: str = "openai"
    openai_api_key: str = ""
    anthropic_api_key: str = ""
    model_name: str = "gpt-4o"
    max_tokens_per_call: int = 4096
    tool_timeout_seconds: int = 30
    log_level: str = "INFO"

    # Resource limits
    max_concurrent_tools: int = 5
    rate_limit_rpm: int = 60

    # Observability
    otel_endpoint: str = "http://otel-collector:4318"
    enable_tracing: bool = True

settings = AgentSettings()

Then docker compose passes environment:

services:
  agent:
    build: .
    environment:
      - AGENT_LLM_PROVIDER=anthropic
      - AGENT_ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - AGENT_MODEL_NAME=claude-sonnet-4-20250514
      - AGENT_MAX_TOKENS_PER_CALL=8192
      - AGENT_LOG_LEVEL=DEBUG
      - AGENT_OTEL_ENDPOINT=http://otel-collector:4318
    env_file:
      - .env.production

Never bake secrets into the image. Always use runtime environment variables or a secrets manager. Docker secrets (docker secret create) or HashiCorp Vault are better for production than .env files.

[2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/

Step 4: Add Production-Grade Health Checks

A health endpoint is not optional. Orchestrators (Docker Swarm, Kubernetes, Nomad) use it to know when your agent is actually ready to serve requests.

# agent_service/health.py
import time
from fastapi import FastAPI, Response
from pydantic import BaseModel

app = FastAPI()
start_time = time.time()

class HealthStatus(BaseModel):
    status: str
    uptime_seconds: float
    llm_available: bool
    tool_count: int
    last_eval_score: float | None = None

@app.get("/health")
async def health():
    # Quick connectivity check to LLM provider
    llm_ok = await check_llm_connectivity(timeout=2)
    return HealthStatus(
        status="healthy" if llm_ok else "degraded",
        uptime_seconds=time.time() - start_time,
        llm_available=llm_ok,
        tool_count=len(registered_tools()),
    )

The Dockerfile HEALTHCHECK at the top runs this endpoint every 30 seconds. Three consecutive failures triggers a container restart.

Step 5: Structured Logging for Agent Debugging

print() statements don’t scale. Agent debugging requires tracing individual turns: which tool was called, what arguments were passed, what the LLM responded, how long the tool took.

# agent_service/logging_setup.py
import structlog
import json_logging
from pythonjsonlogger import jsonlogger
import logging

def setup_agent_logging():
    structlog.configure(
        processors=[
            structlog.stdlib.add_log_level,
            structlog.dev.ConsoleRenderer() if os.isatty(0)
            else structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.stdlib.BoundLogger,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
        cache_logger_on_first_use=True,
    )
    return structlog.get_logger()

# Usage in agent loop
log = setup_agent_logging()
log.info("agent_tool_call",
    tool="web_search",
    query="latest MCP server updates",
    duration_ms=1200,
    tokens_used=845,
    success=True
)

In Docker, output JSON logs to stdout. Your log aggregator (Loki, CloudWatch, Datadog) can then parse them natively. Add the tool name, duration, token count, and success/failure to every structured log line.

[3] structlog documentation — https://www.structlog.org/en/stable/

Step 6: Run It with docker compose

# docker-compose.yml
version: "3.9"

services:
  agent:
    build:
      context: .
      args:
        INSTALL_BROWSER: "false"
    ports:
      - "8080:8080"
    environment:
      - AGENT_LOG_LEVEL=INFO
      - AGENT_OTEL_ENDPOINT=http://otel-collector:4318
    env_file:
      - .env.production
    volumes:
      - agent_data:/data/agent
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: "4G"
        reservations:
          cpus: "0.5"
          memory: "1G"

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel-config.yaml:/etc/otel/config.yaml
    ports:
      - "4318:4318"

volumes:
  agent_data:

Step 7: Tag and Push to Your Registry

# Semantic versioning for agent images
docker build -t ghcr.io/your-org/agent-service:1.2.0 .
docker tag ghcr.io/your-org/agent-service:1.2.0 ghcr.io/your-org/agent-service:latest
docker push ghcr.io/your-org/agent-service:1.2.0
docker push ghcr.io/your-org/agent-service:latest

Tag with the git commit SHA for traceability:

GIT_SHA=$(git rev-parse --short HEAD)
docker build -t ghcr.io/your-org/agent-service:${GIT_SHA} .

Production Checklist

Before you ship, run through this checklist:

Item	Check
Secrets injected at runtime, not baked into image	✅
HEALTHCHECK endpoint returns liveness within 10s	✅
Logs are structured JSON, not plain text	✅
Resource limits set (CPU, memory)	✅
Log rotation configured (<10MB per file, 3 files)	✅
Read-only filesystem where possible	✅
No `root` user in container (use `USER agentuser`)	✅
Dockerfile has .dockerignore excluding secrets/cache	✅
Container can be stopped gracefully (SIGTERM handler)	✅

Key Takeaways

Multi-stage builds shrink image size by 4-5x for agent services.
Separate heavy tools into sidecar containers — the agent container stays lean and focused.
Never bake secrets into images — use environment variables or a secrets manager at runtime.
Structured JSON logging is non-negotiable for debugging agent tool calls across sessions.
Health checks and resource limits prevent cascading failures when the LLM provider rate-limits your agent.

FastMCP — Build MCP-compatible agent tool servers (each tool can be its own container)
OpenTelemetry Collector — Trace agent tool calls across containers
Docker Scout — Scan agent images for vulnerabilities
HashiCorp Vault — Manage agent API keys across environments

References

[1] Docker multi-stage builds — https://docs.docker.com/build/building/multi-stage/ [2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/ [3] structlog documentation — https://www.structlog.org/en/stable/

ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
Hermes Tutorials — Hermes Agent setup, configuration, and advanced workflows

Cross-links automatically generated from NiteAgent.

← Back to all posts