Hiring for AI Production Is Not the Same as Hiring for AI Research
Your job posting says 'machine learning engineer' but you need someone who ships and operates, not someone who experiments and publishes. The distinction matters more than you think.
Most companies write the same job posting for two completely different roles. They call it “Machine Learning Engineer,” list every framework known to science, and hope the right candidate self-selects.
What they actually need and what they end up hiring are often different people with different incentives, different definitions of “done,” and different instincts about what good looks like.
The research mindset in production
Research ML is optimized for discovery. The output is a notebook, a paper, or a demo that proves something is possible. Success is measured in benchmark improvements and novel findings. Deployment is someone else’s problem — or at least a later problem.
That mindset isn’t wrong. It’s just wrong for production.
A researcher hired into a production role will often over-index on model performance at the expense of latency, cost, and operability. They’ll want one more experiment before calling it done. They’ll hand off a model with no monitoring, no fallback, and no documentation because those weren’t part of the original objective.
This isn’t a character flaw. It’s a mismatch.
What production actually demands
Production AI is closer to infrastructure engineering than to research. The job is building systems that are reliable, observable, and cheap to operate — systems that degrade gracefully when inputs go sideways, that can be updated without breaking everything downstream, and that have clear ownership when something goes wrong at 2 AM.
The people who are good at this tend to care about:
- Latency and throughput, not just accuracy
- Failure modes, not just happy paths
- Deployment velocity — how fast can you push a fix, roll back, or swap a model
- Feedback loops — is the system telling you when it’s wrong, or do you find out from a user complaint three weeks later
They also tend to be suspicious of complexity. A simpler model that ships and runs reliably beats a better model that never makes it out of the notebook.
Where the interview process goes wrong
The standard ML interview — LeetCode plus a take-home modeling problem — screens for research-adjacent skills. You end up evaluating whether someone can write clean Python and tune a gradient boosted tree, not whether they’ve ever thought seriously about how to monitor a model in production or what they’d do when a feature distribution shifts unexpectedly.
If you’re hiring for a production role, the interview should surface production thinking:
- “Walk me through the last model you deployed. What did you monitor? What broke?”
- “How did you handle a significant distribution shift in production?”
- “What does rollback look like for a model?”
If the candidate has never thought about these questions, you’re not hiring for the same job they’re interviewing for.
It’s not a seniority problem
Junior researchers can have strong production instincts. Senior researchers can be actively allergic to operational concerns. The signal isn’t years of experience — it’s what someone has actually built, shipped, and maintained in anger.
Look at their commit history, not just their publication list. Ask what they’re proud of that’s running in production right now. Listen for whether “done” means “it works” or “it works, it’s monitored, and I can change it next week.”
You can train someone with production instincts to do more research. Training a researcher to care about uptime is a much longer road.