Containerizing AI Agent Services with Docker: A Production Guide

Shipping an AI agent as a script is easy. Shipping one that survives production traffic, restarts cleanly, and doesn’t leak API keys or eat all your disk — that’s the hard part. Docker gives you a repeatable, isolated, and observable runtime for agent services. This guide walks through the patterns that actually work in production, not the tutorial examples.

Prerequisites

  • Docker Engine 24+ installed (docker --version)
  • A Python 3.11+ agent project with at least one tool call (any LLM SDK)
  • Basic familiarity with Dockerfiles and docker compose
  • A container registry account (Docker Hub, GHCR, or any OCI-compliant registry)

Step 1: Design the Dockerfile with Multi-Stage Builds

A single-stage Dockerfile for an agent service balloons to 2GB+ because pip installs compile dependencies, LLM SDKs pull in CUDA libs, and you carry build tooling that runtime doesn’t need. Multi-stage solves this.

# Stage 1: Build dependencies
FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .

RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Runtime — minimal image
FROM python:3.11-slim AS runtime

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates curl \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

COPY agent_service/ ./agent_service/
COPY config/ ./config/

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

ENTRYPOINT ["python", "-m", "agent_service.main"]

The builder stage compiles and caches pip. The runtime stage copies only the installed packages, plus your application code. The final image for a typical agent service with OpenAI, Anthropic, and LangChain SDKs comes in around 450MB instead of 2.1GB.

Step 2: Manage Tool Dependencies Carefully

Agent tools often pull in heavy dependencies: Playwright (browsers), Pillow (image processing), pandas (dataframes), unstructured (document parsing). Each one multiplies image size and attack surface.

Pattern: Optional extras via dependency groups

# Only install browser tooling when needed
ARG INSTALL_BROWSER=false

RUN if [ "$INSTALL_BROWSER" = "true" ]; then \
    pip install --user playwright && \
    playwright install chromium; \
    fi

Pattern: Separate tool containers

For heavy tools (browser automation, code execution sandboxes), run them as sidecar containers and communicate over HTTP or Unix sockets. The main agent container stays lean.

# docker-compose.yml excerpt
services:
  agent:
    build: .
    ports: ["8080:8080"]
    environment:
      - BROWSER_TOOL_URL=http://browser-tool:9222
      - CODE_TOOL_URL=http://code-sandbox:50051

  browser-tool:
    image: ghcr.io/your-org/browser-agent-tool:latest
    restart: unless-stopped

  code-sandbox:
    image: ghcr.io/your-org/code-exec-sandbox:latest
    read_only: true
    security_opt:
      - no-new-privileges:true

[1] Docker multi-stage builds for Python — https://docs.docker.com/build/building/multi-stage/

Step 3: Configure the Agent Through Environment Variables

Hardcoding API keys, model names, and endpoint URLs in your agent code is the most common production mistake. Docker’s environment-based configuration pattern solves it cleanly.

# config/settings.py
import os
from pydantic_settings import BaseSettings

class AgentSettings(BaseSettings):
    model_config = {"env_prefix": "AGENT_"}

    llm_provider: str = "openai"
    openai_api_key: str = ""
    anthropic_api_key: str = ""
    model_name: str = "gpt-4o"
    max_tokens_per_call: int = 4096
    tool_timeout_seconds: int = 30
    log_level: str = "INFO"

    # Resource limits
    max_concurrent_tools: int = 5
    rate_limit_rpm: int = 60

    # Observability
    otel_endpoint: str = "http://otel-collector:4318"
    enable_tracing: bool = True

settings = AgentSettings()

Then docker compose passes environment:

services:
  agent:
    build: .
    environment:
      - AGENT_LLM_PROVIDER=anthropic
      - AGENT_ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - AGENT_MODEL_NAME=claude-sonnet-4-20250514
      - AGENT_MAX_TOKENS_PER_CALL=8192
      - AGENT_LOG_LEVEL=DEBUG
      - AGENT_OTEL_ENDPOINT=http://otel-collector:4318
    env_file:
      - .env.production

Never bake secrets into the image. Always use runtime environment variables or a secrets manager. Docker secrets (docker secret create) or HashiCorp Vault are better for production than .env files.

[2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/

Step 4: Add Production-Grade Health Checks

A health endpoint is not optional. Orchestrators (Docker Swarm, Kubernetes, Nomad) use it to know when your agent is actually ready to serve requests.

# agent_service/health.py
import time
from fastapi import FastAPI, Response
from pydantic import BaseModel

app = FastAPI()
start_time = time.time()

class HealthStatus(BaseModel):
    status: str
    uptime_seconds: float
    llm_available: bool
    tool_count: int
    last_eval_score: float | None = None

@app.get("/health")
async def health():
    # Quick connectivity check to LLM provider
    llm_ok = await check_llm_connectivity(timeout=2)
    return HealthStatus(
        status="healthy" if llm_ok else "degraded",
        uptime_seconds=time.time() - start_time,
        llm_available=llm_ok,
        tool_count=len(registered_tools()),
    )

The Dockerfile HEALTHCHECK at the top runs this endpoint every 30 seconds. Three consecutive failures triggers a container restart.

Step 5: Structured Logging for Agent Debugging

print() statements don’t scale. Agent debugging requires tracing individual turns: which tool was called, what arguments were passed, what the LLM responded, how long the tool took.

# agent_service/logging_setup.py
import structlog
import json_logging
from pythonjsonlogger import jsonlogger
import logging

def setup_agent_logging():
    structlog.configure(
        processors=[
            structlog.stdlib.add_log_level,
            structlog.dev.ConsoleRenderer() if os.isatty(0)
            else structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.stdlib.BoundLogger,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
        cache_logger_on_first_use=True,
    )
    return structlog.get_logger()

# Usage in agent loop
log = setup_agent_logging()
log.info("agent_tool_call",
    tool="web_search",
    query="latest MCP server updates",
    duration_ms=1200,
    tokens_used=845,
    success=True
)

In Docker, output JSON logs to stdout. Your log aggregator (Loki, CloudWatch, Datadog) can then parse them natively. Add the tool name, duration, token count, and success/failure to every structured log line.

[3] structlog documentation — https://www.structlog.org/en/stable/

Step 6: Run It with docker compose

# docker-compose.yml
version: "3.9"

services:
  agent:
    build:
      context: .
      args:
        INSTALL_BROWSER: "false"
    ports:
      - "8080:8080"
    environment:
      - AGENT_LOG_LEVEL=INFO
      - AGENT_OTEL_ENDPOINT=http://otel-collector:4318
    env_file:
      - .env.production
    volumes:
      - agent_data:/data/agent
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: "4G"
        reservations:
          cpus: "0.5"
          memory: "1G"

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel-config.yaml:/etc/otel/config.yaml
    ports:
      - "4318:4318"

volumes:
  agent_data:

Step 7: Tag and Push to Your Registry

# Semantic versioning for agent images
docker build -t ghcr.io/your-org/agent-service:1.2.0 .
docker tag ghcr.io/your-org/agent-service:1.2.0 ghcr.io/your-org/agent-service:latest
docker push ghcr.io/your-org/agent-service:1.2.0
docker push ghcr.io/your-org/agent-service:latest

Tag with the git commit SHA for traceability:

GIT_SHA=$(git rev-parse --short HEAD)
docker build -t ghcr.io/your-org/agent-service:${GIT_SHA} .

Production Checklist

Before you ship, run through this checklist:

Item Check
Secrets injected at runtime, not baked into image
HEALTHCHECK endpoint returns liveness within 10s
Logs are structured JSON, not plain text
Resource limits set (CPU, memory)
Log rotation configured (<10MB per file, 3 files)
Read-only filesystem where possible
No root user in container (use USER agentuser)
Dockerfile has .dockerignore excluding secrets/cache
Container can be stopped gracefully (SIGTERM handler)

Key Takeaways

  1. Multi-stage builds shrink image size by 4-5x for agent services.
  2. Separate heavy tools into sidecar containers — the agent container stays lean and focused.
  3. Never bake secrets into images — use environment variables or a secrets manager at runtime.
  4. Structured JSON logging is non-negotiable for debugging agent tool calls across sessions.
  5. Health checks and resource limits prevent cascading failures when the LLM provider rate-limits your agent.

References

[1] Docker multi-stage builds — https://docs.docker.com/build/building/multi-stage/ [2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/ [3] structlog documentation — https://www.structlog.org/en/stable/

  • ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
  • Hermes Tutorials — Hermes Agent setup, configuration, and advanced workflows

Cross-links automatically generated from NiteAgent.

← Back to all posts