Your LLM Has a Latency Budget. Do You Know What It Is?
Most teams ship AI features without defining acceptable latency. Then they spend months optimizing the wrong thing.
Blog
Notes on production AI, data engineering, and the messy reality of shipping systems that work.
Most teams ship AI features without defining acceptable latency. Then they spend months optimizing the wrong thing.
Vague AI initiatives don't die — they consume budget indefinitely. Here's how to kill the cycle before it starts.
Semantic search solves one problem. Hybrid retrieval solves the problem you actually have.
Idempotency is table stakes. The next level is building pipelines that assume everything upstream is lying to you.
Your model changed under your app. Your prompt changed under your users. And nobody noticed until something broke. Fix this before it happens to you.
You've got five AI tools, two vector databases, and three prompt management systems. What you don't have is a production AI system.
Most data pipelines break silently when run twice. Idempotency isn't a nice-to-have — it's the property that separates pipelines you can trust from ones you're afraid to touch.
Tuning chunk size and tweaking similarity thresholds won't save you when your pipeline silently degrades in production.
Every team thinks their use case is special enough to justify building from scratch. Most are wrong — and the decision is costing them months.
You spent three months building the RAG knowledge base. Then you shipped it and moved on. That's why it's already wrong.
You monitor your servers. You don't monitor your models. Here's what that's costing you.
Schemas change. That's fine. What's not fine is discovering you've silently broken three pipelines and a model when they do.
Everyone has an opinion on feature stores. Most of them are wrong. Here's when you actually need one.
Two different tools for two different problems. Picking the wrong one wastes months.
The streaming vs batch debate isn't about which is better. It's about which problem you're actually solving — and most teams get it wrong by defaulting to one without thinking.
Vibes-based RAG evaluation is how you ship broken retrieval to production. Here's what a real eval framework looks like.
Every deployment without a rollback plan is a bet that nothing will go wrong. In production ML systems, that bet loses more often than you think.
Your job posting says 'machine learning engineer' but you need someone who ships and operates, not someone who experiments and publishes. The distinction matters more than you think.
Without data contracts, every pipeline change is a potential incident. Here's why informal data agreements between teams are a liability — and what to do instead.
Your model isn't broken — it's just quietly wrong. Here's how to catch drift before it becomes a support ticket.
Regular software debt is a slow leak. ML debt is a pressure cooker — and most teams don't realize it until something explodes.
Prompt management in production isn't a nice-to-have. If you're not versioning, testing, and deploying prompts with the same discipline as code, you're flying blind.
Centralizing your AI talent into a dedicated team feels organized and intentional. It's also one of the fastest ways to kill momentum.
Most RAG pipelines fail at chunk size 512, split by character, never revisited. Here's what actually moves the needle on retrieval quality — and why your defaults are probably wrong.
Your software pipeline won't save your ML system. Here's what actually needs to be different — and why copying your DevOps playbook is a trap.
AI agents are proliferating, but they can't find each other. Agora is an open-source registry and discovery service that fixes that — built to complement A2A and MCP.
Technical debt in data systems doesn't sit quietly. It compounds. Every downstream model, dashboard, and decision built on dirty data pays the price.
The demo worked. Stakeholders loved it. And then nothing happened. Here's why — and how to stop the cycle.
Unit tests don't cover AI behavior. If you're shipping models without eval suites, you're shipping blind.
Why every production ML team needs model versioning, eval tracking, and promotion workflows.
No 90-day discovery phase. No 200-page strategy doc. Here's how we actually work.
If you're hiring 4 senior data engineers, you're not doing AI yet — you're building the foundation you skipped.