Agent Engineering Reference Hub

Everything NiteAgent covers — organized by topic. Each section is a cheat sheet with quick picks and links to the full posts. Updated as the landscape evolves.

100+ posts indexed 12 categories Updated June 2026

Agent Frameworks & SDKs

12 posts

Production-grade frameworks and vendor SDKs for building AI agents. Open-source (LangGraph, CrewAI, Smolagents, MAF, AG2, Swarms) and vendor-locked (Claude, OpenAI, Google ADK).

FrameworkParadigmMCPStateComplex TasksLanguages
LangGraphState machine (DAG)NativeCheckpointed62%Python
CrewAIRole-based teamsNativeSession only54%Python
SmolagentsCode-in-actionNativeSession only59%Python
MAF 1.0Graph workflowsNativeCheckpointed67%C#, Python
AG2ConversationalNativeSession only58%Python
SwarmsSwarm/ringNativeBasic~55%Python
Claude Agent SDKSubagents + OS toolsDeepestSession onlyPython, TS
OpenAI Agents SDKHandoff chainsAdapterPluggablePython, TS
Google ADKHierarchical supervisorAdapterDistributedPython, TS, Java, Go
Quick pick: LangGraph for production durability, CrewAI for fastest prototype, MAF 1.0 for .NET/enterprise, Claude SDK for OS-level control, OpenAI SDK for model flexibility, Google ADK for enterprise hierarchy. → Vendor SDK Showdown · LangGraph vs CrewAI · Smolagents vs MAF vs AG2

MCP Server Implementations

10 posts

Tools and frameworks for building, testing, deploying, and monitoring MCP servers. From FastMCP scaffolding to production CI/CD pipelines.

Tool / GuideRuntimeMaturityKey Differentiator
FastMCP 3.0PythonStableCLI scaffolding, production-ready
MCP GatewayPythonBetaAuth + rate limiting + routing
Custom MCP ServerAnyGuideFull walkthrough from scratch
MCP PDF ExtractorPythonBuild logReal-world PDF extraction server
Context OptimizationPythonBuild logToken-efficient context serving
CI/CD for MCPGuideTesting + deploy pipeline for servers
ObservabilityGuideMonitoring MCP in production
Deployment PatternsGuideProduction deployment reference
Integration PatternsGuideSDK integration patterns catalog
Quick pick: FastMCP 3.0 for quick scaffolding, MCP Gateway for auth+gates, Custom MCP Guide for full control. → FastMCP Build Log · Custom Server Guide

Agent Protocols

5 posts

The protocol stack powering agent communication: MCP (Model Context Protocol), A2A (Agent-to-Agent), Function Calling, WebMCP, and how they compose in production.

ProtocolPurposeOriginAdoptionBest For
MCPTool/resource access for agentsAnthropic97M SDK downloads/moConnecting agents to tools & data
A2AAgent-to-agent communicationGoogleGrowing (multi-agent std)Cross-system agent coordination
Function CallingAPI-style tool invocationOpenAIUniversalSimple tool calls, chat completions
WebMCPWeb agent protocolGoogleEarlyBrowser-based agent actions
Tool Calling ArchEnterprise tool-calling designProduction tool-calling at scale
Quick pick: MCP for tool access, A2A for cross-agent, Function Calling for simple/chat. → Protocol Stack Deep Dive · A2A Guide

Production Patterns

12 posts

Battle-tested patterns for running agents in production: caching, routing, resilience, cost optimization, structured outputs, and CI/CD automation.

PatternProblemSolutionKey Metric
Pipeline ResiliencyAgent pipelines fail silentlyRetry + circuit breaker + fallback3 recovery strategies
LLM Router/FallbackProvider outages kill agentsMulti-provider routing with fallbacksZero-downtime switching
Structured OutputsParsing failures across providersCross-provider schema enforcementProvider-agnostic parsing
Cache Hit EngineeringHigh LLM costs from cache missesPrefix engineering for cache hits80%+ cache hit rate
Multi-Tier CachingSingle cache isn't enoughSemantic + deterministic + prompt cache3-tier cache architecture
Cost OptimizationRunaway API billsModel routing + caching + tiering~70% cost reduction
Self-Healing CI/CDDeploys break silentlyAutomated rollback + health checksAutomated recovery
Codex PipelineMulti-agent code generationDAG-based agent pipelinesParallel agent execution
Ollama Agent LoopRunning agents locallyLocal LLM agent loop designZero-cost inference
Agent Router BuildSingle provider lock-inMulti-provider agent routerProvider-agnostic agent
Source-Driven DevUntraceable agent behaviorSource-tracked agent executionFull audit trail
Uncertainty QuantBlind trust in LLM outputsConfidence scoring + rejectionUncertainty-aware agents
Quick pick: Start with cache engineering (biggest cost impact), then add resiliency + routing. → Cache Hit Engineering · Resiliency Patterns

Agent Security & Governance

6 posts

Guardrails, governance frameworks, safety patterns, and hallucination prevention for production agent systems. What happens when your agent goes rogue.

TopicApproachKey Takeaway
Guardrail AutomationRule-based + LLM-as-judgeParallel guardrails catch 90%+ failures
Tool GovernanceCentralized tool registry + auditWhitelist + version pinning for tools
Microsoft Gov ToolkitEnterprise governance frameworkMicrosoft's production governance playbook
Production SafetyCrisis pattern catalog11 real-world agent failure patterns
Hallucination PreventionPre-publish verification chainsSource-checked output pipeline
CybersecurityAgent-specific threat modelPrompt injection + tool abuse vectors
Quick pick: Start with guardrails (biggest risk reduction), add governance as you scale. → Guardrails Guide · Hallucination Prevention

Evaluation & Testing

8 posts

Eval frameworks, testing strategies, and observability tools for nondeterministic agent systems. 57% have agents in production — only 52% run offline evals.

Tool / GuideOpen SourceMulti-TurnCI/CDMetricsBest For
MLflowApache 2.0Trace-awareNative40+ (GPA, GEPA)Full open-source platform
DeepEvalApache 2.0Span-levelPytest50+CI/CD + pytest teams
LangSmithNoLangGraph nativeLangChain CI20+LangChain stacks
BraintrustNoTrace-basedEval-gated25+Eval-driven dev culture
OpenInference vs OTelELv2 / OTelTrace-awareLimited50+ML monitoring extension
Eval HarnessDIYCustomCustomBuilding your own eval system
Testing Strategies2026 agent testing landscape
RAG Eval PipelineRAG-specific evaluation pipeline
Quick pick: MLflow for full platform, DeepEval for CI/CD, Braintrust for eval-driven culture. → Eval Framework Comparison · DeepEval Deep Dive

Agent Memory

4 posts

Memory is the #1 bottleneck for long-running autonomous agents. Comparing Mem0, Zep, LangMem, Letta — and when to just use markdown + search.

SolutionApproachStarsSelf-HostLongMemEvalPricing
Mem0Universal memory layer48K★Yes49%$19→$249/mo
ZepTemporal knowledge graph12K★GraphDB63.8%$25/mo
LangMemLangGraph SDK libraryLangChainYesN/AFree
LettaOS-tiered memory18K★Yes83.2%Free + Cloud
RAG vs Long ContextArchitecture decision
Quick pick: Letta for best LongMemEval score (83.2%), Mem0 for quickstart, Zep for temporal reasoning, LangMem if already on LangGraph. → Full Memory Comparison

Inference & Hosting

6 posts

Running LLMs locally and in production: inference engines, self-hosting, gateways, and vector databases for agent workloads.

Tool / GuideTypeHardwareKey Differentiator
OllamaInference engineCPU/GPUEasiest setup, model management
llama.cppInference engineCPU/GPUBest performance, most configurable
MLXInference engineApple SiliconApple-optimized, fastest on Mac
LiteLLM GatewayAPI gatewayAny100+ provider proxy, cost tracking
Vector DBsVector databaseAnyBenchmark: Qdrant, Weaviate, Pinecone
Quick pick: Ollama for quick local LLMs, llama.cpp for perf, MLX for Mac, LiteLLM for multi-provider gateway. → Inference Engine Comparison · Vector DB Benchmark

Coding Agents & Developer Tools

8 posts

AI coding assistants, editors, and developer tools compared — from Claude Code to Codex CLI to Cursor. Productivity benchmarks, state of play, and deep dives.

Tool / TopicTypeKey MetricBest For
Coding Agents State of PlayLandscape reportOverview of 2026 coding agent market
AI Code EditorsComparison6 editors testedChoosing the right AI editor
Coding ProductivityBenchmark~2x speed boostProductivity measurement methodology
Claude Code iPhoneCase studyFull app in one sessionClaude Code's capabilities
232 Claude Code SkillsCatalog232 MCP skillsClaude Code skill ecosystem
Harness vs ComputeArchitectureWorkflow vs one-shotWhen to use harness vs raw LLM
Codebase GraphsGuideGraph-based code understandingAgents that understand large codebases
System Prompt EngineeringTechnique51 prompt patternsCompound prompt architecture
Quick pick: Claude Code for terminal-native coding, Cursor for IDE integration, Codex CLI for autonomous parallelism. → State of Play · Editor Comparison

Multi-Agent Orchestration

6 posts

Architectures and patterns for orchestrating multiple agents: DAG pipelines, swarm topologies, hierarchical supervisors, and dynamic agent teams.

TopicParadigmKey Insight
Multi-Agent ProductionProduction surveyPattern catalog for production multi-agent
VMAO PaperVerified orchestrationVerification-based multi-agent guarantees
Sim Studio DAGDAG executorVisual DAG-based agent pipelines
Mission ControlControl planeCentralized agent monitoring + orchestration
Architecture CatalogDecision guideWhich architecture for which workload
RL ConductorRL-based orchestrationRL-trained orchestrators for dynamic routing
Quick pick: DAG for deterministic workflows, swarm for elastic scaling, hierarchy for enterprise. → Architecture Guide · Production Multi-Agent

LLMs & Models

6 posts

Open-source model comparisons, structured output benchmarks, prompt engineering, and context management for agent workloads.

TopicCoverageKey Insight
Open-Source LLM ComparisonDeepSeek R1, Llama 4, Qwen 3Qwen 3 235B leads GPQA (77.2%), DeepSeek leads MATH (97.3%)
Structured OutputsCross-provider JSON modeSchema enforcement varies wildly across providers
Writing Tools for AgentsAgent writing capabilitiesModel selection matters more than prompt engineering
Context EngineeringPrompt patterns catalogStructured context beats raw prompt length
Codebase GraphsGraph-based code understandingGraph RAG outperforms flat context for code
Quick pick: Qwen 3 for Apache 2.0 safety, DeepSeek R1 for math/reasoning, Llama 4 Scout for 10M context. → Full LLM Comparison

Build Logs

8 posts

Real builds, real mistakes. These are the raw build logs — what broke, what worked, and what the final architecture looked like. Read these first if you want to skip the same mistakes.

Build LogWhat Was BuiltKey Lesson
Agent Self-ReflectionSelf-improving agent loopReflection doubles task success on complex workflows
Web ExtractionAgentic web scraping pipelineStructured extraction beats raw scraping 3:1
Multi-Provider RouterLLM provider router with fallbacksCircuit breaker pattern essential for production
Custom MCP ServerMCP server from scratchFastMCP scaffolding saves 80% of boilerplate
FastMCP 3.0 ServerProduction MCP serverTesting + observability = MCP production readiness
Multi-Agent FrameworkCustom multi-agent frameworkStart with existing frameworks, customize when you hit limits
MCP PDF ExtractorPDF-to-text MCP serverPDF parsing is harder than it looks — use marker-pdf
Context OptimizationToken-efficient context serverCompression beats truncation for long context
Quick pick: Self-reflection loop for the biggest win, FastMCP Build Log for MCP patterns, Web Extraction for pipeline design. → Self-Reflection Log · Web Extraction Log