Agent Engineering Reference Hub
Everything NiteAgent covers — organized by topic. Each section is a cheat sheet with quick picks and links to the full posts. Updated as the landscape evolves.
Agent Frameworks & SDKs
12 postsProduction-grade frameworks and vendor SDKs for building AI agents. Open-source (LangGraph, CrewAI, Smolagents, MAF, AG2, Swarms) and vendor-locked (Claude, OpenAI, Google ADK).
| Framework | Paradigm | MCP | State | Complex Tasks | Languages |
|---|---|---|---|---|---|
| LangGraph | State machine (DAG) | Native | Checkpointed | 62% | Python |
| CrewAI | Role-based teams | Native | Session only | 54% | Python |
| Smolagents | Code-in-action | Native | Session only | 59% | Python |
| MAF 1.0 | Graph workflows | Native | Checkpointed | 67% | C#, Python |
| AG2 | Conversational | Native | Session only | 58% | Python |
| Swarms | Swarm/ring | Native | Basic | ~55% | Python |
| Claude Agent SDK | Subagents + OS tools | Deepest | Session only | — | Python, TS |
| OpenAI Agents SDK | Handoff chains | Adapter | Pluggable | — | Python, TS |
| Google ADK | Hierarchical supervisor | Adapter | Distributed | — | Python, TS, Java, Go |
MCP Server Implementations
10 postsTools and frameworks for building, testing, deploying, and monitoring MCP servers. From FastMCP scaffolding to production CI/CD pipelines.
| Tool / Guide | Runtime | Maturity | Key Differentiator |
|---|---|---|---|
| FastMCP 3.0 | Python | Stable | CLI scaffolding, production-ready |
| MCP Gateway | Python | Beta | Auth + rate limiting + routing |
| Custom MCP Server | Any | Guide | Full walkthrough from scratch |
| MCP PDF Extractor | Python | Build log | Real-world PDF extraction server |
| Context Optimization | Python | Build log | Token-efficient context serving |
| CI/CD for MCP | — | Guide | Testing + deploy pipeline for servers |
| Observability | — | Guide | Monitoring MCP in production |
| Deployment Patterns | — | Guide | Production deployment reference |
| Integration Patterns | — | Guide | SDK integration patterns catalog |
Agent Protocols
5 postsThe protocol stack powering agent communication: MCP (Model Context Protocol), A2A (Agent-to-Agent), Function Calling, WebMCP, and how they compose in production.
| Protocol | Purpose | Origin | Adoption | Best For |
|---|---|---|---|---|
| MCP | Tool/resource access for agents | Anthropic | 97M SDK downloads/mo | Connecting agents to tools & data |
| A2A | Agent-to-agent communication | Growing (multi-agent std) | Cross-system agent coordination | |
| Function Calling | API-style tool invocation | OpenAI | Universal | Simple tool calls, chat completions |
| WebMCP | Web agent protocol | Early | Browser-based agent actions | |
| Tool Calling Arch | Enterprise tool-calling design | — | — | Production tool-calling at scale |
Production Patterns
12 postsBattle-tested patterns for running agents in production: caching, routing, resilience, cost optimization, structured outputs, and CI/CD automation.
| Pattern | Problem | Solution | Key Metric |
|---|---|---|---|
| Pipeline Resiliency | Agent pipelines fail silently | Retry + circuit breaker + fallback | 3 recovery strategies |
| LLM Router/Fallback | Provider outages kill agents | Multi-provider routing with fallbacks | Zero-downtime switching |
| Structured Outputs | Parsing failures across providers | Cross-provider schema enforcement | Provider-agnostic parsing |
| Cache Hit Engineering | High LLM costs from cache misses | Prefix engineering for cache hits | 80%+ cache hit rate |
| Multi-Tier Caching | Single cache isn't enough | Semantic + deterministic + prompt cache | 3-tier cache architecture |
| Cost Optimization | Runaway API bills | Model routing + caching + tiering | ~70% cost reduction |
| Self-Healing CI/CD | Deploys break silently | Automated rollback + health checks | Automated recovery |
| Codex Pipeline | Multi-agent code generation | DAG-based agent pipelines | Parallel agent execution |
| Ollama Agent Loop | Running agents locally | Local LLM agent loop design | Zero-cost inference |
| Agent Router Build | Single provider lock-in | Multi-provider agent router | Provider-agnostic agent |
| Source-Driven Dev | Untraceable agent behavior | Source-tracked agent execution | Full audit trail |
| Uncertainty Quant | Blind trust in LLM outputs | Confidence scoring + rejection | Uncertainty-aware agents |
Agent Security & Governance
6 postsGuardrails, governance frameworks, safety patterns, and hallucination prevention for production agent systems. What happens when your agent goes rogue.
| Topic | Approach | Key Takeaway |
|---|---|---|
| Guardrail Automation | Rule-based + LLM-as-judge | Parallel guardrails catch 90%+ failures |
| Tool Governance | Centralized tool registry + audit | Whitelist + version pinning for tools |
| Microsoft Gov Toolkit | Enterprise governance framework | Microsoft's production governance playbook |
| Production Safety | Crisis pattern catalog | 11 real-world agent failure patterns |
| Hallucination Prevention | Pre-publish verification chains | Source-checked output pipeline |
| Cybersecurity | Agent-specific threat model | Prompt injection + tool abuse vectors |
Evaluation & Testing
8 postsEval frameworks, testing strategies, and observability tools for nondeterministic agent systems. 57% have agents in production — only 52% run offline evals.
| Tool / Guide | Open Source | Multi-Turn | CI/CD | Metrics | Best For |
|---|---|---|---|---|---|
| MLflow | Apache 2.0 | Trace-aware | Native | 40+ (GPA, GEPA) | Full open-source platform |
| DeepEval | Apache 2.0 | Span-level | Pytest | 50+ | CI/CD + pytest teams |
| LangSmith | No | LangGraph native | LangChain CI | 20+ | LangChain stacks |
| Braintrust | No | Trace-based | Eval-gated | 25+ | Eval-driven dev culture |
| OpenInference vs OTel | ELv2 / OTel | Trace-aware | Limited | 50+ | ML monitoring extension |
| Eval Harness | DIY | Custom | Custom | — | Building your own eval system |
| Testing Strategies | — | — | — | — | 2026 agent testing landscape |
| RAG Eval Pipeline | — | — | — | — | RAG-specific evaluation pipeline |
Agent Memory
4 postsMemory is the #1 bottleneck for long-running autonomous agents. Comparing Mem0, Zep, LangMem, Letta — and when to just use markdown + search.
| Solution | Approach | Stars | Self-Host | LongMemEval | Pricing |
|---|---|---|---|---|---|
| Mem0 | Universal memory layer | 48K★ | Yes | 49% | $19→$249/mo |
| Zep | Temporal knowledge graph | 12K★ | GraphDB | 63.8% | $25/mo |
| LangMem | LangGraph SDK library | LangChain | Yes | N/A | Free |
| Letta | OS-tiered memory | 18K★ | Yes | 83.2% | Free + Cloud |
| RAG vs Long Context | Architecture decision | — | — | — | — |
Inference & Hosting
6 postsRunning LLMs locally and in production: inference engines, self-hosting, gateways, and vector databases for agent workloads.
| Tool / Guide | Type | Hardware | Key Differentiator |
|---|---|---|---|
| Ollama | Inference engine | CPU/GPU | Easiest setup, model management |
| llama.cpp | Inference engine | CPU/GPU | Best performance, most configurable |
| MLX | Inference engine | Apple Silicon | Apple-optimized, fastest on Mac |
| LiteLLM Gateway | API gateway | Any | 100+ provider proxy, cost tracking |
| Vector DBs | Vector database | Any | Benchmark: Qdrant, Weaviate, Pinecone |
Coding Agents & Developer Tools
8 postsAI coding assistants, editors, and developer tools compared — from Claude Code to Codex CLI to Cursor. Productivity benchmarks, state of play, and deep dives.
| Tool / Topic | Type | Key Metric | Best For |
|---|---|---|---|
| Coding Agents State of Play | Landscape report | — | Overview of 2026 coding agent market |
| AI Code Editors | Comparison | 6 editors tested | Choosing the right AI editor |
| Coding Productivity | Benchmark | ~2x speed boost | Productivity measurement methodology |
| Claude Code iPhone | Case study | Full app in one session | Claude Code's capabilities |
| 232 Claude Code Skills | Catalog | 232 MCP skills | Claude Code skill ecosystem |
| Harness vs Compute | Architecture | Workflow vs one-shot | When to use harness vs raw LLM |
| Codebase Graphs | Guide | Graph-based code understanding | Agents that understand large codebases |
| System Prompt Engineering | Technique | 51 prompt patterns | Compound prompt architecture |
Multi-Agent Orchestration
6 postsArchitectures and patterns for orchestrating multiple agents: DAG pipelines, swarm topologies, hierarchical supervisors, and dynamic agent teams.
| Topic | Paradigm | Key Insight |
|---|---|---|
| Multi-Agent Production | Production survey | Pattern catalog for production multi-agent |
| VMAO Paper | Verified orchestration | Verification-based multi-agent guarantees |
| Sim Studio DAG | DAG executor | Visual DAG-based agent pipelines |
| Mission Control | Control plane | Centralized agent monitoring + orchestration |
| Architecture Catalog | Decision guide | Which architecture for which workload |
| RL Conductor | RL-based orchestration | RL-trained orchestrators for dynamic routing |
LLMs & Models
6 postsOpen-source model comparisons, structured output benchmarks, prompt engineering, and context management for agent workloads.
| Topic | Coverage | Key Insight |
|---|---|---|
| Open-Source LLM Comparison | DeepSeek R1, Llama 4, Qwen 3 | Qwen 3 235B leads GPQA (77.2%), DeepSeek leads MATH (97.3%) |
| Structured Outputs | Cross-provider JSON mode | Schema enforcement varies wildly across providers |
| Writing Tools for Agents | Agent writing capabilities | Model selection matters more than prompt engineering |
| Context Engineering | Prompt patterns catalog | Structured context beats raw prompt length |
| Codebase Graphs | Graph-based code understanding | Graph RAG outperforms flat context for code |
Build Logs
8 postsReal builds, real mistakes. These are the raw build logs — what broke, what worked, and what the final architecture looked like. Read these first if you want to skip the same mistakes.
| Build Log | What Was Built | Key Lesson |
|---|---|---|
| Agent Self-Reflection | Self-improving agent loop | Reflection doubles task success on complex workflows |
| Web Extraction | Agentic web scraping pipeline | Structured extraction beats raw scraping 3:1 |
| Multi-Provider Router | LLM provider router with fallbacks | Circuit breaker pattern essential for production |
| Custom MCP Server | MCP server from scratch | FastMCP scaffolding saves 80% of boilerplate |
| FastMCP 3.0 Server | Production MCP server | Testing + observability = MCP production readiness |
| Multi-Agent Framework | Custom multi-agent framework | Start with existing frameworks, customize when you hit limits |
| MCP PDF Extractor | PDF-to-text MCP server | PDF parsing is harder than it looks — use marker-pdf |
| Context Optimization | Token-efficient context server | Compression beats truncation for long context |