Engineering Playbook
Opinionated guides on building production ML systems. Not tutorials—lessons learned from shipping AI at scale.
Production LLM Architecture Patterns
How I design LLM pipelines that scale—preprocessing, chunking strategies, prompt engineering, and system architecture that survives production.
Structured Outputs > Prompting: How I Make LLMs Deterministic
Moving beyond prompt engineering to contract-first design with Pydantic schemas, validation layers, and structured outputs.
Embeddings in the Real World: Two-Tower Ranking and When Cosine Fails
From cosine similarity to production retrieval—calibration, two-tower models, and evaluation beyond offline metrics.
Evaluation and Feedback Loops for AI Products
From offline metrics to online learning—drift monitoring, regression tests, A/B testing, and continuous evaluation pipelines.
Latency/Cost Playbook for LLM Apps
What I optimize first when building LLM applications—caching strategies, model selection, prompt compression, and batching.
GraphRAG for Complex Document Understanding
When vanilla RAG fails—using knowledge graphs for legal documents, contracts, and complex hierarchical content.