Services Process Blog Demo

Get in touch

hello@sovont.com
Back to blog
· Sovont · 3 min read

The Model That Works on Your Machine

It runs fine locally. It breaks in staging. It fails silently in production. ML environment parity is not a nice-to-have — it's the job.

MLOps

It runs on your laptop. The metrics look great. You push to staging and something’s off — predictions are different, latency is higher, a dependency you forgot about is missing. You debug for two days, figure out the divergence, and push again. Then production does something else entirely.

This is the ML environment parity problem, and it kills more production deployments than bad models do.


Software engineering dealt with this years ago. Containers, lockfiles, reproducible builds — the tooling exists because “it works on my machine” is not a delivery milestone. ML teams are relearning the same lesson, slower, with more consequences.

The problem in ML is worse for a few reasons.

Libraries are not pinned. You trained with scikit-learn==1.3.0, deployed with 1.4.2, and the model’s predict behavior changed in a way that’s not in any changelog you read. Same code, different numbers. Good luck finding it.

Hardware matters. A model trained on GPU and serving on CPU can behave differently — not just slower, but numerically differently. Float precision, library backends, hardware-specific optimizations all affect output. If you haven’t tested the exact serving hardware with the exact model, you don’t know what you’re deploying.

Data preprocessing is not pinned. Your preprocessing code ran in a notebook during training. Serving runs the same function from a different entry point, on different data shapes, possibly with a different version of the utility function that handles nulls. Nobody checked.

Randomness is not seeded. Anything that uses randomness — dropout, sampling, certain tree algorithms — produces different results without an explicit seed. Your training run and your evaluation run are not the same run.


How to close the gap:

Containerize training and serving together. Same image, same library versions, same OS. If the training environment and the serving environment aren’t the same container, you have a gap. It will find you eventually.

Pin everything. Not just your main dependencies — all of them. Use a lockfile. Regenerate it on purpose, not by accident.

Freeze preprocessing as a versioned artifact. The transformation logic that touches data at training time must be the same logic that touches data at serving time. Package it, version it, deploy it with the model.

Run inference on serving hardware before you call it done. Not staging-ish hardware. Actual production hardware, actual serving stack, actual latency budget. Sign off on that run, not the notebook.

Seed your randomness. Set it everywhere. Document it. This is a two-minute fix that eliminates an entire class of “why are the results different” investigations.


The model that works on your machine is not your product. The model that works in production — reliably, repeatably, on hardware you don’t control — that’s your product. Build for that environment from day one, or spend weeks retrofitting it later.

It’s not glamorous. It’s the job.