Containerizing AI Agent Services with Docker: A Production Guide

Shipping an AI agent as a script is easy. Shipping one that survives production traffic, restarts cleanly, and doesn’t leak API keys or eat all your disk — that’s the hard part. Docker gives you a repeatable, isolated, and observable runtime for agent services. This guide walks through the patterns that actually work in production, not the tutorial examples.
Prerequisites
- Docker Engine 24+ installed (
docker --version) - A Python 3.11+ agent project with at least one tool call (any LLM SDK)
- Basic familiarity with Dockerfiles and
docker compose - A container registry account (Docker Hub, GHCR, or any OCI-compliant registry)
Step 1: Design the Dockerfile with Multi-Stage Builds
A single-stage Dockerfile for an agent service balloons to 2GB+ because pip installs compile dependencies, LLM SDKs pull in CUDA libs, and you carry build tooling that runtime doesn’t need. Multi-stage solves this.
# Stage 1: Build dependencies
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Stage 2: Runtime — minimal image
FROM python:3.11-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
COPY agent_service/ ./agent_service/
COPY config/ ./config/
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
ENTRYPOINT ["python", "-m", "agent_service.main"]
The builder stage compiles and caches pip. The runtime stage copies only the installed packages, plus your application code. The final image for a typical agent service with OpenAI, Anthropic, and LangChain SDKs comes in around 450MB instead of 2.1GB.
Step 2: Manage Tool Dependencies Carefully
Agent tools often pull in heavy dependencies: Playwright (browsers), Pillow (image processing), pandas (dataframes), unstructured (document parsing). Each one multiplies image size and attack surface.
Pattern: Optional extras via dependency groups
# Only install browser tooling when needed
ARG INSTALL_BROWSER=false
RUN if [ "$INSTALL_BROWSER" = "true" ]; then \
pip install --user playwright && \
playwright install chromium; \
fi
Pattern: Separate tool containers
For heavy tools (browser automation, code execution sandboxes), run them as sidecar containers and communicate over HTTP or Unix sockets. The main agent container stays lean.
# docker-compose.yml excerpt
services:
agent:
build: .
ports: ["8080:8080"]
environment:
- BROWSER_TOOL_URL=http://browser-tool:9222
- CODE_TOOL_URL=http://code-sandbox:50051
browser-tool:
image: ghcr.io/your-org/browser-agent-tool:latest
restart: unless-stopped
code-sandbox:
image: ghcr.io/your-org/code-exec-sandbox:latest
read_only: true
security_opt:
- no-new-privileges:true
[1] Docker multi-stage builds for Python — https://docs.docker.com/build/building/multi-stage/
Step 3: Configure the Agent Through Environment Variables
Hardcoding API keys, model names, and endpoint URLs in your agent code is the most common production mistake. Docker’s environment-based configuration pattern solves it cleanly.
# config/settings.py
import os
from pydantic_settings import BaseSettings
class AgentSettings(BaseSettings):
model_config = {"env_prefix": "AGENT_"}
llm_provider: str = "openai"
openai_api_key: str = ""
anthropic_api_key: str = ""
model_name: str = "gpt-4o"
max_tokens_per_call: int = 4096
tool_timeout_seconds: int = 30
log_level: str = "INFO"
# Resource limits
max_concurrent_tools: int = 5
rate_limit_rpm: int = 60
# Observability
otel_endpoint: str = "http://otel-collector:4318"
enable_tracing: bool = True
settings = AgentSettings()
Then docker compose passes environment:
services:
agent:
build: .
environment:
- AGENT_LLM_PROVIDER=anthropic
- AGENT_ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- AGENT_MODEL_NAME=claude-sonnet-4-20250514
- AGENT_MAX_TOKENS_PER_CALL=8192
- AGENT_LOG_LEVEL=DEBUG
- AGENT_OTEL_ENDPOINT=http://otel-collector:4318
env_file:
- .env.production
Never bake secrets into the image. Always use runtime environment variables or a secrets manager. Docker secrets (docker secret create) or HashiCorp Vault are better for production than .env files.
[2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/
Step 4: Add Production-Grade Health Checks
A health endpoint is not optional. Orchestrators (Docker Swarm, Kubernetes, Nomad) use it to know when your agent is actually ready to serve requests.
# agent_service/health.py
import time
from fastapi import FastAPI, Response
from pydantic import BaseModel
app = FastAPI()
start_time = time.time()
class HealthStatus(BaseModel):
status: str
uptime_seconds: float
llm_available: bool
tool_count: int
last_eval_score: float | None = None
@app.get("/health")
async def health():
# Quick connectivity check to LLM provider
llm_ok = await check_llm_connectivity(timeout=2)
return HealthStatus(
status="healthy" if llm_ok else "degraded",
uptime_seconds=time.time() - start_time,
llm_available=llm_ok,
tool_count=len(registered_tools()),
)
The Dockerfile HEALTHCHECK at the top runs this endpoint every 30 seconds. Three consecutive failures triggers a container restart.
Step 5: Structured Logging for Agent Debugging
print() statements don’t scale. Agent debugging requires tracing individual turns: which tool was called, what arguments were passed, what the LLM responded, how long the tool took.
# agent_service/logging_setup.py
import structlog
import json_logging
from pythonjsonlogger import jsonlogger
import logging
def setup_agent_logging():
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.dev.ConsoleRenderer() if os.isatty(0)
else structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)
return structlog.get_logger()
# Usage in agent loop
log = setup_agent_logging()
log.info("agent_tool_call",
tool="web_search",
query="latest MCP server updates",
duration_ms=1200,
tokens_used=845,
success=True
)
In Docker, output JSON logs to stdout. Your log aggregator (Loki, CloudWatch, Datadog) can then parse them natively. Add the tool name, duration, token count, and success/failure to every structured log line.
[3] structlog documentation — https://www.structlog.org/en/stable/
Step 6: Run It with docker compose
# docker-compose.yml
version: "3.9"
services:
agent:
build:
context: .
args:
INSTALL_BROWSER: "false"
ports:
- "8080:8080"
environment:
- AGENT_LOG_LEVEL=INFO
- AGENT_OTEL_ENDPOINT=http://otel-collector:4318
env_file:
- .env.production
volumes:
- agent_data:/data/agent
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
deploy:
resources:
limits:
cpus: "2"
memory: "4G"
reservations:
cpus: "0.5"
memory: "1G"
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otel-config.yaml:/etc/otel/config.yaml
ports:
- "4318:4318"
volumes:
agent_data:
Step 7: Tag and Push to Your Registry
# Semantic versioning for agent images
docker build -t ghcr.io/your-org/agent-service:1.2.0 .
docker tag ghcr.io/your-org/agent-service:1.2.0 ghcr.io/your-org/agent-service:latest
docker push ghcr.io/your-org/agent-service:1.2.0
docker push ghcr.io/your-org/agent-service:latest
Tag with the git commit SHA for traceability:
GIT_SHA=$(git rev-parse --short HEAD)
docker build -t ghcr.io/your-org/agent-service:${GIT_SHA} .
Production Checklist
Before you ship, run through this checklist:
| Item | Check |
|---|---|
| Secrets injected at runtime, not baked into image | ✅ |
| HEALTHCHECK endpoint returns liveness within 10s | ✅ |
| Logs are structured JSON, not plain text | ✅ |
| Resource limits set (CPU, memory) | ✅ |
| Log rotation configured (<10MB per file, 3 files) | ✅ |
| Read-only filesystem where possible | ✅ |
No root user in container (use USER agentuser) |
✅ |
| Dockerfile has .dockerignore excluding secrets/cache | ✅ |
| Container can be stopped gracefully (SIGTERM handler) | ✅ |
Key Takeaways
- Multi-stage builds shrink image size by 4-5x for agent services.
- Separate heavy tools into sidecar containers — the agent container stays lean and focused.
- Never bake secrets into images — use environment variables or a secrets manager at runtime.
- Structured JSON logging is non-negotiable for debugging agent tool calls across sessions.
- Health checks and resource limits prevent cascading failures when the LLM provider rate-limits your agent.
Related Tools
- FastMCP — Build MCP-compatible agent tool servers (each tool can be its own container)
- OpenTelemetry Collector — Trace agent tool calls across containers
- Docker Scout — Scan agent images for vulnerabilities
- HashiCorp Vault — Manage agent API keys across environments
References
[1] Docker multi-stage builds — https://docs.docker.com/build/building/multi-stage/ [2] Pydantic Settings — https://docs.pydantic.dev/latest/concepts/pydantic_settings/ [3] structlog documentation — https://www.structlog.org/en/stable/
📖 Related Reads
- ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
- Hermes Tutorials — Hermes Agent setup, configuration, and advanced workflows
Cross-links automatically generated from NiteAgent.
← Back to all posts

