March 28, 2026 · Sovont · 3 min read

The Shadow ML Dependency

Your model works. Your pipeline is green. But somewhere, something is hardcoded to a version you never wrote down. That's the shadow dependency — and it will break you.

MLOps

Your model is deployed. The pipeline is green. Metrics look fine. You go to sleep.

Six weeks later, a dependency upgrade happens. Or a library gets a silent breaking change. Or the cloud provider quietly updates a managed service. And your model starts doing something subtly wrong — not crashing, just drifting. Not loud enough to trigger an alert. Loud enough to matter.

That’s the shadow ML dependency. And almost every production ML system has at least one.

What It Looks Like

Shadow dependencies don’t show up in your requirements.txt. They live in:

Tokenizer behavior tied to a specific version of a library that never gets pinned
Preprocessing logic that assumes pandas will handle NaNs a certain way — until it doesn’t
Feature encoding in a training notebook that diverges from what’s in your serving code
Model weights loaded from a path that quietly got overwritten by a scheduled retrain
Prompts referencing an LLM system behavior that changed when the provider pushed an update

Each of these is a dependency. None of them look like a dependency until they break.

Why It Happens

ML systems have two codebases: the one you track and the one you don’t.

The one you track: your source code, your model code, your pipelines.

The one you don’t: the environment assumptions baked into training notebooks, the ad hoc decisions that became defaults, the “I’ll clean this up later” configurations that ran in production for two years.

Software has this problem too. But ML has it worse, because so much of what determines output isn’t code — it’s weights, tokenizers, preprocessing order, data schemas. Version control doesn’t capture most of it by default.

The Fix Is Boring

Pin everything. Serialize everything. Test cross-version behavior.

Pin your libraries hard. Even minor versions.
Serialize preprocessing steps alongside your model artifact, not separately.
Use hashes to verify model weights at load time.
Log the full environment at training time, not just the major dependencies.
Run integration tests that load the artifact and exercise the full inference path against expected outputs.

And for LLM pipelines specifically: treat the model version and the system prompt as one artifact. If either changes, the artifact version changes. No exceptions.

The Real Problem

Shadow dependencies expose a deeper issue: most ML systems don’t have a clear boundary between “what the system does” and “what the environment does.” That ambiguity is debt. The shadow dependency is just where it comes due.

Fix the ambiguity. The dependencies will follow.

The pipeline that breaks silently is worse than the one that breaks loud. Know what your system actually depends on — all of it.