// BLOG
Latest notes.
Notes from inside the build. No thought-leadership, no recycled takes, just what I figured out the hard way so the next person doesn't have to.
Small Language Models for Vertical Agents
Small models fine-tuned for a narrow task outperform generalist frontier models on that task, cost a fraction to run, and unlock deployment options the big ones cannot touch.
AI Evals in Production: The Work Nobody Sees Until It Breaks
A model that scored well on benchmarks will quietly rot in production. Evals are the unglamorous work that turns an AI demo into a system you can operate.
LLM Cost Control: Strategies That Actually Move the Bill
Most LLM cost advice stops at "use a smaller model." Real bills get reduced by caching, routing, context discipline, and knowing which 20% of calls drive 80% of spend.
Agentic RAG: When Retrieval Becomes a Reasoning Step
Traditional RAG retrieves once and hopes. Agentic RAG treats retrieval as a loop the model controls, which changes everything about how you build the pipeline.
MCP Servers in Production: What the Hype Misses
Model Context Protocol finally gives agents a clean contract for tool use, but the production story depends on auth, versioning, and rate limits most teams skip.