Services Process Blog Demo

Get in touch

hello@sovont.com

Blog

Thinking out loud.

Notes on production AI, data engineering, and the messy reality of shipping systems that work.

The A100 Surplus Hiding in Plain Sight

RunPod charges $3.29/hr for an A100 SXM. On Vast.ai right now, you can rent one for $0.44. That 7.5× gap isn't a pricing glitch — it's a market signal worth acting on.

AI Infrastructure GPU Pricing MLOps

The Document Freshness Problem Nobody Talks About

Your RAG pipeline retrieves the right document. The problem is it was last updated eight months ago.

RAG & Knowledge Systems

The Column You Dropped That Wasn't Actually Dead

Dropping 'unused' columns without lineage visibility is how you break three downstream teams at once — and none of them will tell you until production is already wrong.

Data Engineering

The Timeout You Never Set

LLM API calls without explicit timeouts are a production incident waiting to happen. Here's what hangs, why, and how to stop it.

AI Production

The Rollout Nobody Communicated

The model is in production. The integration is live. Nobody told the users. This is how AI projects succeed technically and fail completely.

Strategy Culture

The Deployment You Can't Explain to Compliance

You shipped the model. Can you say which version it is, what data trained it, and why it makes the decisions it makes? If not, you have a governance problem — not a compliance problem.

MLOps

The Default Value That Lied to Your Model

Sentinel values and bad defaults look like real data. They pass every schema check, corrupt your features, and make your model confidently wrong in production.

Data Engineering

The Index You Forgot to Rebuild

Your RAG pipeline retrieved the right answer six months ago. The source doc changed. Nobody re-indexed it.

RAG & Knowledge Systems

The Cost Spike You Didn't See Coming

Nobody models LLM costs seriously until they get the bill. By then, the architecture is already wrong.

AI Production

The Feedback Loop You Forgot to Close

You shipped the AI feature. Users are using it. Something's wrong. You don't know what — because you never built a way to find out.

Strategy Culture

The Model That Passed Eval and Failed in Production

Offline metrics look great. Production behavior is a disaster. This gap isn't bad luck — it's a design failure you can prevent.

MLOps

The Join Key That Changed Halfway Through

Source systems quietly change their primary keys and your pipelines keep running — producing wrong answers instead of errors. That's the worst kind of failure.

Data Engineering

The System Prompt That Grew Without Anyone Noticing

System prompt bloat is one of the slowest ways to degrade your LLM system — and one of the easiest to miss until performance tanks and costs spike.

AI Production

The Metadata You Forgot to Index

You built a RAG system that retrieves semantically. You forgot to build the one that retrieves precisely. Metadata filtering isn't an optimization — it's the difference between a search engine and a lucky guess.

RAG & Knowledge Systems

The AI Audit Nobody Scheduled

Your AI system went live six months ago. Has anyone actually checked if it still works the way you think it does?

Strategy Culture

The A/B Test You Never Finished

Half your production models are running inside experiments that nobody has looked at in months. That's not science — that's clutter with a p-value.

MLOps

Late Data Is Not an Edge Case

Treating late-arriving data as an exception is how you get metrics that silently restate themselves for days after the fact. Design for lateness upfront or debug it forever.

Data Engineering

Tool Calls Are Side Effects. Treat Them That Way.

Agents that call tools are running code with real consequences. Most teams build them like they're not.

AI Production

The Demo That Became the Product

Someone built a slick AI demo. Leadership loved it. Now it's in production. This is how systems fail slowly and visibly.

Strategy Culture

The Query Rewriter You're Not Using

Most RAG systems retrieve against the user's raw query. That's the problem. Query rewriting is the highest-leverage improvement most teams skip entirely.

RAG & Knowledge Systems

Canary Deployments for ML Models

Software engineers ship canaries without thinking twice. ML teams ship full replacements and call it 'confidence.' Here's why that's backwards — and how to fix it.

MLOps

The Retry Loop That Ate Your API Quota

Naive retry logic is one of the most common — and most expensive — bugs in LLM production systems. Here's what it looks like and how to fix it.

AI Production

Who Owns the AI System After Go-Live?

The team that built it is already on the next project. The ops team doesn't understand it. And nobody wants to be the one paged at 2 AM when it breaks.

Strategy Culture

The Timestamp That Broke Your Join

Timezone-naive timestamps are a silent data quality bomb. They pass every schema check, join on nothing, and make your dashboards confidently wrong.

Data Engineering

The Model That Works on Your Machine

It runs fine locally. It breaks in staging. It fails silently in production. ML environment parity is not a nice-to-have — it's the job.

MLOps

The Embedding Model You Chose in Week One

You picked an embedding model early, it worked well enough, and you never looked at it again. That's the problem.

RAG & Knowledge Systems

The Context Window Is Not a Clipboard

Bigger context windows didn't solve the problem of what goes in them. Most production LLM failures aren't model failures — they're context failures.

AI Production

Agents Don't Fix Bad Processes

Everyone is building AI agents. Nobody is asking whether the process being automated was worth keeping in the first place.

Strategy Culture

The Partitioning Decision You'll Regret

Bad partitioning doesn't break your pipeline. It just makes everything slightly wrong, forever.

Data Engineering

The Staging Environment That Lies to You

Your ML staging environment feels like safety. It isn't. Here's what it's hiding.

MLOps

Structured Output Is Not a Nice-to-Have

If your LLM integration parses free-text responses in production, you don't have a product. You have a fragile prototype waiting to fail.

AI Production

The AI Vendor That Sold You a Roadmap

A roadmap is not a product. Learn to tell the difference before you sign the contract.

Strategy Culture

Reranking Is Not Optional

Your retrieval pipeline returns 20 chunks. Your LLM sees 5. What happens in between that gap is either thoughtful or a coin flip.

RAG & Knowledge Systems

The Experiment That Never Got Turned Off

That A/B test from eight months ago is still running. So is the one before it. Your production model is now a graveyard of half-decisions.

MLOps

The Backfill You Never Scheduled

Backfills aren't a nice-to-have. They're how you find out if your pipeline actually works.

Data Engineering

The Confidence Problem in LLM Outputs

LLMs don't know when they're wrong. Your production system has to.

AI Production

The Stakeholder Who Keeps Moving the Goalposts

Scope creep in AI projects rarely looks like bad faith. It looks like enthusiasm. Here's how to handle it without torching the relationship.

Strategy Culture

Dead Letter Queues: The Unglamorous Hero of Reliable Pipelines

Most data pipelines fail silently. A dead letter queue is the thing that catches what falls through — and tells you why.

Data Engineering

The Shadow ML Dependency

Your model works. Your pipeline is green. But somewhere, something is hardcoded to a version you never wrote down. That's the shadow dependency — and it will break you.

MLOps

Your LLM Has a Latency Budget. Do You Know What It Is?

Most teams ship AI features without defining acceptable latency. Then they spend months optimizing the wrong thing.

AI Production

The AI Project That Never Gets Scoped

Vague AI initiatives don't die — they consume budget indefinitely. Here's how to kill the cycle before it starts.

Strategy Culture

When Vector Search Isn't Enough

Semantic search solves one problem. Hybrid retrieval solves the problem you actually have.

RAG & Knowledge Systems

The Pipeline That Runs Once and Trusts Nothing

Idempotency is table stakes. The next level is building pipelines that assume everything upstream is lying to you.

Data Engineering

LLM Versioning: The Problem Nobody Solves Until It's Too Late

Your model changed under your app. Your prompt changed under your users. And nobody noticed until something broke. Fix this before it happens to you.

MLOps AI Production

The Hidden Cost of AI Platform Sprawl

You've got five AI tools, two vector databases, and three prompt management systems. What you don't have is a production AI system.

Strategy Culture

Idempotency Is the Property Your Pipelines Are Missing

Most data pipelines break silently when run twice. Idempotency isn't a nice-to-have — it's the property that separates pipelines you can trust from ones you're afraid to touch.

Data Engineering

Your RAG Pipeline Needs Monitoring, Not Just Better Retrieval

Tuning chunk size and tweaking similarity thresholds won't save you when your pipeline silently degrades in production.

AI Production RAG & Knowledge Systems

Build vs. Buy AI: Stop Kidding Yourself

Every team thinks their use case is special enough to justify building from scratch. Most are wrong — and the decision is costing them months.

Strategy Culture

Knowledge Base Maintenance Is a Product, Not a Project

You spent three months building the RAG knowledge base. Then you shipped it and moved on. That's why it's already wrong.

RAG & Knowledge Systems

The Observability Stack for ML in Production

You monitor your servers. You don't monitor your models. Here's what that's costing you.

MLOps

Schema Evolution Without Breaking Everything Downstream

Schemas change. That's fine. What's not fine is discovering you've silently broken three pipelines and a model when they do.

Data Engineering

Feature Stores: Overhyped or Underused?

Everyone has an opinion on feature stores. Most of them are wrong. Here's when you actually need one.

AI Production MLOps

Retrain vs Fine-Tune: Stop Guessing, Start Deciding

Two different tools for two different problems. Picking the wrong one wastes months.

MLOps

Streaming vs Batch: When Each Actually Makes Sense

The streaming vs batch debate isn't about which is better. It's about which problem you're actually solving — and most teams get it wrong by defaulting to one without thinking.

Data Engineering

RAG Evaluation Frameworks: Beyond 'Does It Look Right?'

Vibes-based RAG evaluation is how you ship broken retrieval to production. Here's what a real eval framework looks like.

RAG & Knowledge Systems AI Production

The Cost of No Rollback Plan

Every deployment without a rollback plan is a bet that nothing will go wrong. In production ML systems, that bet loses more often than you think.

MLOps

Hiring for AI Production Is Not the Same as Hiring for AI Research

Your job posting says 'machine learning engineer' but you need someone who ships and operates, not someone who experiments and publishes. The distinction matters more than you think.

Strategy Culture

Data Contracts Are How You Stop Breaking Each Other

Without data contracts, every pipeline change is a potential incident. Here's why informal data agreements between teams are a liability — and what to do instead.

Data Engineering

Monitor Model Drift Before Your Users Do

Your model isn't broken — it's just quietly wrong. Here's how to catch drift before it becomes a support ticket.

MLOps

ML Technical Debt Compounds Faster Than You Think

Regular software debt is a slow leak. ML debt is a pressure cooker — and most teams don't realize it until something explodes.

Strategy Culture

Treat Your Prompts Like Code. Because They Are.

Prompt management in production isn't a nice-to-have. If you're not versioning, testing, and deploying prompts with the same discipline as code, you're flying blind.

AI Production

The AI Team Antipattern

Centralizing your AI talent into a dedicated team feels organized and intentional. It's also one of the fastest ways to kill momentum.

Strategy Culture

Chunking Strategies That Actually Affect Retrieval Quality

Most RAG pipelines fail at chunk size 512, split by character, never revisited. Here's what actually moves the needle on retrieval quality — and why your defaults are probably wrong.

RAG & Knowledge Systems

CI/CD for ML Is Not the Same as CI/CD for Software

Your software pipeline won't save your ML system. Here's what actually needs to be different — and why copying your DevOps playbook is a trap.

MLOps

Introducing Agora: DNS for AI Agents

AI agents are proliferating, but they can't find each other. Agora is an open-source registry and discovery service that fixes that — built to complement A2A and MCP.

Agent Infrastructure Open Source

The Real Cost of 'We'll Clean It Later'

Technical debt in data systems doesn't sit quietly. It compounds. Every downstream model, dashboard, and decision built on dirty data pays the price.

Data Engineering

Why Most AI POCs Die Before Production

The demo worked. Stakeholders loved it. And then nothing happened. Here's why — and how to stop the cycle.

Strategy AI Production

Evals Are Your Test Suite Now

Unit tests don't cover AI behavior. If you're shipping models without eval suites, you're shipping blind.

MLOps AI Production

The Model Registry Is Not Optional

Why every production ML team needs model versioning, eval tracking, and promotion workflows.

MLOps

What a Sovont Engagement Actually Looks Like

No 90-day discovery phase. No 200-page strategy doc. Here's how we actually work.

Process Sovont

Your AI Readiness Is Showing

If you're hiring 4 senior data engineers, you're not doing AI yet — you're building the foundation you skipped.

Data Engineering AI Strategy