Agent Engineering Reference Hub

Everything NiteAgent covers — organized by topic. Each section is a cheat sheet with quick picks and links to the full posts. Updated as the landscape evolves.

100+ posts indexed12 categoriesUpdated June 2026

Agent Frameworks & SDKs

12 posts

Production-grade frameworks and vendor SDKs for building AI agents. Open-source (LangGraph, CrewAI, Smolagents, MAF, AG2, Swarms) and vendor-locked (Claude, OpenAI, Google ADK).

Framework	Paradigm	MCP	State	Complex Tasks	Languages
LangGraph	State machine (DAG)	Native	Checkpointed	62%	Python
CrewAI	Role-based teams	Native	Session only	54%	Python
Smolagents	Code-in-action	Native	Session only	59%	Python
MAF 1.0	Graph workflows	Native	Checkpointed	67%	C#, Python
AG2	Conversational	Native	Session only	58%	Python
Swarms	Swarm/ring	Native	Basic	~55%	Python
Claude Agent SDK	Subagents + OS tools	Deepest	Session only	—	Python, TS
OpenAI Agents SDK	Handoff chains	Adapter	Pluggable	—	Python, TS
Google ADK	Hierarchical supervisor	Adapter	Distributed	—	Python, TS, Java, Go

Quick pick: LangGraph for production durability, CrewAI for fastest prototype, MAF 1.0 for .NET/enterprise, Claude SDK for OS-level control, OpenAI SDK for model flexibility, Google ADK for enterprise hierarchy. → Vendor SDK Showdown · LangGraph vs CrewAI · Smolagents vs MAF vs AG2

MCP Server Implementations

10 posts

Tools and frameworks for building, testing, deploying, and monitoring MCP servers. From FastMCP scaffolding to production CI/CD pipelines.

Tool / Guide	Runtime	Maturity	Key Differentiator
FastMCP 3.0	Python	Stable	CLI scaffolding, production-ready
MCP Gateway	Python	Beta	Auth + rate limiting + routing
Custom MCP Server	Any	Guide	Full walkthrough from scratch
MCP PDF Extractor	Python	Build log	Real-world PDF extraction server
Context Optimization	Python	Build log	Token-efficient context serving
CI/CD for MCP	—	Guide	Testing + deploy pipeline for servers
Observability	—	Guide	Monitoring MCP in production
Deployment Patterns	—	Guide	Production deployment reference
Integration Patterns	—	Guide	SDK integration patterns catalog

Quick pick: FastMCP 3.0 for quick scaffolding, MCP Gateway for auth+gates, Custom MCP Guide for full control. → FastMCP Build Log · Custom Server Guide

Agent Protocols

5 posts

The protocol stack powering agent communication: MCP (Model Context Protocol), A2A (Agent-to-Agent), Function Calling, WebMCP, and how they compose in production.

Protocol	Purpose	Origin	Adoption	Best For
MCP	Tool/resource access for agents	Anthropic	97M SDK downloads/mo	Connecting agents to tools & data
A2A	Agent-to-agent communication	Google	Growing (multi-agent std)	Cross-system agent coordination
Function Calling	API-style tool invocation	OpenAI	Universal	Simple tool calls, chat completions
WebMCP	Web agent protocol	Google	Early	Browser-based agent actions
Tool Calling Arch	Enterprise tool-calling design	—	—	Production tool-calling at scale

Quick pick: MCP for tool access, A2A for cross-agent, Function Calling for simple/chat. → Protocol Stack Deep Dive · A2A Guide

Production Patterns

12 posts

Battle-tested patterns for running agents in production: caching, routing, resilience, cost optimization, structured outputs, and CI/CD automation.

Pattern	Problem	Solution	Key Metric
Pipeline Resiliency	Agent pipelines fail silently	Retry + circuit breaker + fallback	3 recovery strategies
LLM Router/Fallback	Provider outages kill agents	Multi-provider routing with fallbacks	Zero-downtime switching
Structured Outputs	Parsing failures across providers	Cross-provider schema enforcement	Provider-agnostic parsing
Cache Hit Engineering	High LLM costs from cache misses	Prefix engineering for cache hits	80%+ cache hit rate
Multi-Tier Caching	Single cache isn't enough	Semantic + deterministic + prompt cache	3-tier cache architecture
Cost Optimization	Runaway API bills	Model routing + caching + tiering	~70% cost reduction
Self-Healing CI/CD	Deploys break silently	Automated rollback + health checks	Automated recovery
Codex Pipeline	Multi-agent code generation	DAG-based agent pipelines	Parallel agent execution
Ollama Agent Loop	Running agents locally	Local LLM agent loop design	Zero-cost inference
Agent Router Build	Single provider lock-in	Multi-provider agent router	Provider-agnostic agent
Source-Driven Dev	Untraceable agent behavior	Source-tracked agent execution	Full audit trail
Uncertainty Quant	Blind trust in LLM outputs	Confidence scoring + rejection	Uncertainty-aware agents

Quick pick: Start with cache engineering (biggest cost impact), then add resiliency + routing. → Cache Hit Engineering · Resiliency Patterns

Agent Security & Governance

6 posts

Guardrails, governance frameworks, safety patterns, and hallucination prevention for production agent systems. What happens when your agent goes rogue.

Topic	Approach	Key Takeaway
Guardrail Automation	Rule-based + LLM-as-judge	Parallel guardrails catch 90%+ failures
Tool Governance	Centralized tool registry + audit	Whitelist + version pinning for tools
Microsoft Gov Toolkit	Enterprise governance framework	Microsoft's production governance playbook
Production Safety	Crisis pattern catalog	11 real-world agent failure patterns
Hallucination Prevention	Pre-publish verification chains	Source-checked output pipeline
Cybersecurity	Agent-specific threat model	Prompt injection + tool abuse vectors

Quick pick: Start with guardrails (biggest risk reduction), add governance as you scale. → Guardrails Guide · Hallucination Prevention

Evaluation & Testing

8 posts

Eval frameworks, testing strategies, and observability tools for nondeterministic agent systems. 57% have agents in production — only 52% run offline evals.

Tool / Guide	Open Source	Multi-Turn	CI/CD	Metrics	Best For
MLflow	Apache 2.0	Trace-aware	Native	40+ (GPA, GEPA)	Full open-source platform
DeepEval	Apache 2.0	Span-level	Pytest	50+	CI/CD + pytest teams
LangSmith	No	LangGraph native	LangChain CI	20+	LangChain stacks
Braintrust	No	Trace-based	Eval-gated	25+	Eval-driven dev culture
OpenInference vs OTel	ELv2 / OTel	Trace-aware	Limited	50+	ML monitoring extension
Eval Harness	DIY	Custom	Custom	—	Building your own eval system
Testing Strategies	—	—	—	—	2026 agent testing landscape
RAG Eval Pipeline	—	—	—	—	RAG-specific evaluation pipeline

Quick pick: MLflow for full platform, DeepEval for CI/CD, Braintrust for eval-driven culture. → Eval Framework Comparison · DeepEval Deep Dive

Agent Memory

4 posts

Memory is the #1 bottleneck for long-running autonomous agents. Comparing Mem0, Zep, LangMem, Letta — and when to just use markdown + search.

Solution	Approach	Stars	Self-Host	LongMemEval	Pricing
Mem0	Universal memory layer	48K★	Yes	49%	$19→$249/mo
Zep	Temporal knowledge graph	12K★	GraphDB	63.8%	$25/mo
LangMem	LangGraph SDK library	LangChain	Yes	N/A	Free
Letta	OS-tiered memory	18K★	Yes	83.2%	Free + Cloud
RAG vs Long Context	Architecture decision	—	—	—	—

Quick pick: Letta for best LongMemEval score (83.2%), Mem0 for quickstart, Zep for temporal reasoning, LangMem if already on LangGraph. → Full Memory Comparison

Inference & Hosting

6 posts

Running LLMs locally and in production: inference engines, self-hosting, gateways, and vector databases for agent workloads.

Tool / Guide	Type	Hardware	Key Differentiator
Ollama	Inference engine	CPU/GPU	Easiest setup, model management
llama.cpp	Inference engine	CPU/GPU	Best performance, most configurable
MLX	Inference engine	Apple Silicon	Apple-optimized, fastest on Mac
LiteLLM Gateway	API gateway	Any	100+ provider proxy, cost tracking
Vector DBs	Vector database	Any	Benchmark: Qdrant, Weaviate, Pinecone

Quick pick: Ollama for quick local LLMs, llama.cpp for perf, MLX for Mac, LiteLLM for multi-provider gateway. → Inference Engine Comparison · Vector DB Benchmark

Coding Agents & Developer Tools

8 posts

AI coding assistants, editors, and developer tools compared — from Claude Code to Codex CLI to Cursor. Productivity benchmarks, state of play, and deep dives.

Tool / Topic	Type	Key Metric	Best For
Coding Agents State of Play	Landscape report	—	Overview of 2026 coding agent market
AI Code Editors	Comparison	6 editors tested	Choosing the right AI editor
Coding Productivity	Benchmark	~2x speed boost	Productivity measurement methodology
Claude Code iPhone	Case study	Full app in one session	Claude Code's capabilities
232 Claude Code Skills	Catalog	232 MCP skills	Claude Code skill ecosystem
Harness vs Compute	Architecture	Workflow vs one-shot	When to use harness vs raw LLM
Codebase Graphs	Guide	Graph-based code understanding	Agents that understand large codebases
System Prompt Engineering	Technique	51 prompt patterns	Compound prompt architecture

Quick pick: Claude Code for terminal-native coding, Cursor for IDE integration, Codex CLI for autonomous parallelism. → State of Play · Editor Comparison

Multi-Agent Orchestration

6 posts

Architectures and patterns for orchestrating multiple agents: DAG pipelines, swarm topologies, hierarchical supervisors, and dynamic agent teams.

Topic	Paradigm	Key Insight
Multi-Agent Production	Production survey	Pattern catalog for production multi-agent
VMAO Paper	Verified orchestration	Verification-based multi-agent guarantees
Sim Studio DAG	DAG executor	Visual DAG-based agent pipelines
Mission Control	Control plane	Centralized agent monitoring + orchestration
Architecture Catalog	Decision guide	Which architecture for which workload
RL Conductor	RL-based orchestration	RL-trained orchestrators for dynamic routing

Quick pick: DAG for deterministic workflows, swarm for elastic scaling, hierarchy for enterprise. → Architecture Guide · Production Multi-Agent

LLMs & Models

6 posts

Open-source model comparisons, structured output benchmarks, prompt engineering, and context management for agent workloads.

Topic	Coverage	Key Insight
Open-Source LLM Comparison	DeepSeek R1, Llama 4, Qwen 3	Qwen 3 235B leads GPQA (77.2%), DeepSeek leads MATH (97.3%)
Structured Outputs	Cross-provider JSON mode	Schema enforcement varies wildly across providers
Writing Tools for Agents	Agent writing capabilities	Model selection matters more than prompt engineering
Context Engineering	Prompt patterns catalog	Structured context beats raw prompt length
Codebase Graphs	Graph-based code understanding	Graph RAG outperforms flat context for code

Quick pick: Qwen 3 for Apache 2.0 safety, DeepSeek R1 for math/reasoning, Llama 4 Scout for 10M context. → Full LLM Comparison

Build Logs

8 posts

Real builds, real mistakes. These are the raw build logs — what broke, what worked, and what the final architecture looked like. Read these first if you want to skip the same mistakes.

Build Log	What Was Built	Key Lesson
Agent Self-Reflection	Self-improving agent loop	Reflection doubles task success on complex workflows
Web Extraction	Agentic web scraping pipeline	Structured extraction beats raw scraping 3:1
Multi-Provider Router	LLM provider router with fallbacks	Circuit breaker pattern essential for production
Custom MCP Server	MCP server from scratch	FastMCP scaffolding saves 80% of boilerplate
FastMCP 3.0 Server	Production MCP server	Testing + observability = MCP production readiness
Multi-Agent Framework	Custom multi-agent framework	Start with existing frameworks, customize when you hit limits
MCP PDF Extractor	PDF-to-text MCP server	PDF parsing is harder than it looks — use marker-pdf
Context Optimization	Token-efficient context server	Compression beats truncation for long context

Quick pick: Self-reflection loop for the biggest win, FastMCP Build Log for MCP patterns, Web Extraction for pipeline design. → Self-Reflection Log · Web Extraction Log