Services Process Blog Demo

Get in touch

hello@sovont.com
Back to blog
· Sovont · 3 min read

The A100 Surplus Hiding in Plain Sight

RunPod charges $3.29/hr for an A100 SXM. On Vast.ai right now, you can rent one for $0.44. That 7.5× gap isn't a pricing glitch — it's a market signal worth acting on.

AI Infrastructure GPU Pricing MLOps

The A100 SXM is not a budget GPU. It launched in 2020, became the workhorse for production LLM inference at mid-scale, and still powers a significant fraction of deployed AI today. RunPod charges $3.29/hour for one.

On Vast.ai right now, you can rent the same GPU for $0.44/hour.

That’s a 7.5× gap. Not a rounding error — a market signal.


Why the gap exists

Vast.ai is a marketplace, not a managed cloud. When datacenter operators have idle capacity they don’t want to leave cold, they list it on Vast.ai to recover costs. The price you see reflects what the cheapest willing seller currently needs to cover electricity and depreciation.

The $0.44 A100 exists because operators are upgrading to H100s and H200s. They have A100 hardware that isn’t fully loaded, and they’d rather earn $0.44/hr than $0.00/hr while they figure out what to do with it.

The gap between $0.44 and $3.29 is the premium RunPod charges for fixed availability, predictable performance, and not having to scramble for a replacement when your spot instance disappears. For some workloads, that premium is worth every dollar. For others, you’re just paying for peace of mind you don’t need.


What this means in practice

If you’re running batch inference workloads — evaluations, embedding generation, fine-tuning jobs, dataset processing — the math changes significantly at $0.44/hr:

A 100-hour batch job costs $44 on Vast.ai versus $329 on RunPod. That’s not a marginal optimization — that’s a $285 difference on a single run. Run that weekly and you’ve saved $14,820 over the course of a year for one GPU.

The A100 SXM has 80GB HBM2e VRAM. It can run 70B parameter models comfortably in fp16, handles multi-GPU tensor parallelism well, and is fully supported by every major inference framework. For most non-real-time workloads, it’s not a compromise — it’s a deliberate choice.


The catch

Vast.ai is a marketplace. The $0.44 listing is the cheapest available at crawl time (data captured May 29, 2026). Availability varies. You may bid on a machine and find it gone. Your instance can be interrupted if the host reclaims it. There’s no SLA.

This is not the right choice for:

  • Latency-sensitive serving (customer-facing inference)
  • Workloads that can’t tolerate interruption mid-run
  • Teams without the ops capacity to handle spot failures

It is a reasonable choice for:

  • Offline evaluation pipelines
  • Fine-tuning runs with checkpointing
  • Research and experimentation
  • Any job where you can retry on failure without consequences

The bigger picture

The A100 spot data on Vast.ai is a leading indicator of what’s happening across the GPU market. When a capable, widely-deployed GPU starts showing up at 13% of its managed-cloud price, it means supply is running ahead of demand for that tier.

That surplus is real. The H100 ramp is real. And the gap will compress over time as operators either find buyers, retire hardware, or new demand absorbs the inventory.

Right now, the arbitrage window is open.

If you’re building AI infrastructure and you’re not looking at Vast.ai alongside RunPod and CoreWeave when you scope your next compute budget, you’re leaving money on the table — sometimes a lot of it.


GPU pricing data from CROWler + Selenium crawl of live provider pages, captured 2026-05-29 at ~03:40 UTC. Vast.ai prices are marketplace listings at crawl time — verify before committing. RunPod · Vast.ai