GPU Instance Storage: The Hidden Cost You Keep Forgetting

It’s 2026, and we’ve all learned to scan the $/hr price for A100s or 4090s before hitting ‘deploy’. But how many of us actually scroll down to the storage line item? We’ve seen bills where the NVMe costs rivaled, or even exceeded, the GPU runtime itself for specific workloads. Not because we were bad at resource management, but because the underlying assumptions about ‘included’ or ‘cheap’ storage were just plain wrong.

We recently wrapped up a few weeks of benchmarking various GPU providers for general LLM inference and training (you can read our full A100 comparison here). During that process, we started noticing some jarring discrepancies in how our data loading times and checkpointing impacted the overall job duration and, crucially, the final bill. The GPU might be blazing fast, but if it’s waiting on a slow disk or egressing terabytes from expensive volumes, your ‘cheap’ GPU isn’t cheap at all.

What We’re Actually Comparing: NVMe on Instances

This isn’t about S3 or generic block storage. This is about the local NVMe storage that comes with your GPU instance, or the persistent NVMe volumes you can attach to it. For many ML workloads, especially those involving large datasets, frequent checkpointing, or model loading, local NVMe is critical. Network-attached block storage can introduce latency that chokes your GPU, turning a $1/hr A100 into an expensive paperweight for minutes at a time.

We focused on four providers popular with developers and small teams: Runpod, Vultr, Lambda Labs, and Vast.ai. For consistency, we mostly looked at instances equipped with A100 40GB GPUs, as they represent a common mid-to-high-tier choice for serious ML work. Our goal was to understand not just the stated price per GB, but the actual cost implications of using this storage for typical training and inference scenarios, including cold starts and data transfer.

Price and Raw Specs: The Fine Print of Fast Storage

On paper, some providers seem to offer generous local storage. The devil, as always, is in the details – specifically, whether that storage is ephemeral or persistent, and how much any additional persistent storage will set you back. We also paid close attention to included I/O operations per second (IOPS) and throughput, as a cheap but slow NVMe isn’t much better than a spinning disk for demanding workloads.

Here’s a snapshot of how these providers generally handle NVMe storage for a typical A100 40GB instance as of May 2026:

Provider	Instance (GPU)	Base $/hr	Included NVMe	Persistent NVMe $/GB/month	Typical NVMe IOPS (read/write)	Egress $/GB (after free tier)
Runpod	A100 (40GB)	~$1.19	120 GB local (ephemeral)	$0.05 (for additional)	Up to 1M+	$0.005
Vultr	A100 (40GB)	~$1.35	1 TB NVMe (local, persistent)	N/A (fixed for instance)	Up to 200k / 150k	$0.01
Lambda	A100 (40GB)	~$1.20	1 TB NVMe (local, persistent)	N/A (fixed for instance)	Up to 1.5M+	$0.00
Vast.ai	A100 (40GB)	~$0.70-1.50	Varies (often 120-1000GB)	$0.05-$0.10 (for add-on)	Varies (often 100k-500k)	$0.00-0.01

Note: Vast.ai pricing and specs vary significantly by host, so these are approximate ranges for common A100 offerings.

Runpod’s base instances come with relatively small ephemeral local storage. For persistent data, you provision additional NVMe volumes. This gives flexibility but means you’re almost always paying extra if you need your data to stick around. Vultr and Lambda, conversely, bundle a significant amount of persistent NVMe directly with their A100 instances. This simplifies things but means you’re paying for it whether you use it all or not. Vast.ai is a true marketplace, so storage specs are a grab bag—you have to check each listing carefully.

Performance Under Real Workloads: Where NVMe Matters Most

We ran a series of tests to simulate common ML workflows:

Dataset Loading: Loading a 500GB training dataset from local NVMe into GPU memory.
Checkpointing: Saving a 100GB model checkpoint every hour during a training run.
Model Loading (Cold Start): Loading a 70B parameter LLM (140GB) on instance start.

For dataset loading, providers with generous, fast local NVMe (Vultr, Lambda) consistently delivered the best performance. Our 500GB dataset loaded in roughly 35-45 seconds on Vultr and Lambda, leveraging the dedicated NVMe bandwidth. On Runpod, using a 500GB attached NVMe volume, the load time was closer to 55-70 seconds, due to the slightly higher latency of network-attached storage compared to truly local disks. Vast.ai was a mixed bag; some hosts had phenomenal local NVMe, others were clearly bottlenecked.

Checkpointing revealed similar patterns. Frequent, large writes benefited greatly from high-IOPS local NVMe. Where providers throttled IOPS or had less performant underlying disks, our training jobs saw noticeable stuttering during checkpoint writes, extending overall job times by 5-10%. For a job running for days, those small delays add up.

Cold starts for large models were another point of differentiation. Lambda’s pre-provisioned 1TB NVMe meant our 140GB Llama 3 70B model could be pulled from disk and into GPU VRAM remarkably quickly, contributing to a sub-10-second cold start for the storage component of the load. Vultr was similarly quick. Runpod required careful management of persistent volumes and image caching to achieve comparable speeds. Vast.ai was, again, a lottery, with cold start times varying from excellent to frustratingly slow depending on the host’s underlying storage and network configuration.

Operational Friction and Hidden Gotchas

Beyond raw performance and price, the operational aspects of storage can significantly impact your workflow and budget.

Runpod: The flexibility of attaching volumes is good, but managing them (creating, attaching, detaching, snapshotting) adds a layer of manual work or automation. Crucially, if you don’t delete your persistent volumes after a job, you’re paying for them. The 120GB ephemeral storage is useful for temporary files but will be wiped with the instance. Egress is very competitive at $0.005/GB, so pulling data off isn’t usually a shock.
Vultr: The bundled 1TB NVMe is convenient and persistent. No fuss, no extra volume management. However, if your actual data is only 200GB, you’re still paying for the full terabyte. Their egress is $0.01/GB, which is reasonable but can add up if you’re frequently moving large models or datasets out of the cloud.
Lambda Labs: Like Vultr, Lambda bundles 1TB of persistent NVMe. It’s fast, it’s there, and it’s included in the hourly price. This makes pricing very predictable. Their policy of no egress fees is a huge win for anyone moving significant data, effectively removing a major hidden cost that plagues other providers.
Vast.ai: This is the Wild West. You need to inspect each host’s storage offering. Some will have minimal local storage, others will offer multiple terabytes. Pricing for additional storage is often higher than dedicated providers. Data persistence across reboots or instance changes can be a concern if not explicitly managed. Egress fees are also host-dependent, ranging from free to typical cloud rates.

The Verdict: Don’t Assume Your NVMe is Free or Fast

For small, ephemeral jobs where your data fits within a few hundred gigabytes and you don’t need persistence, Runpod’s approach with its affordable additional volumes (and cheap egress) offers great flexibility. You only pay for what you use, but you must actively manage those volumes.

For teams running larger, consistent training jobs or requiring predictable access to big datasets, Lambda Labs and Vultr’s bundled 1TB NVMe with persistent storage is a compelling offer. Lambda’s zero-egress policy makes it particularly attractive if you’re pulling a lot of data out of the cloud over time, as we’ve noted in our egress cost guides. This predictability and lack of hidden fees for data movement can save significant headaches and budget over a long project.

Vast.ai remains the choice for the adventurous, budget-conscious user willing to spend time sifting through listings to find a deal that matches their specific storage and performance needs. The potential for very cheap GPU hours is there, but the storage variability demands constant vigilance.

Ultimately, the cheapest GPU hour means little if your pipeline is bottlenecked by storage I/O or if your ‘free’ local disk is wiped every time the instance reboots. Before your next big training run, map out your data flows, estimate your storage needs, and factor in the cost of persistence and egress. If you want to kick the tyres yourself, you can spin up a pod via our referral link and see how their NVMe volumes hold up for your specific workload.