LLM model load times: how slow cloud block storage costs you money

On a Tuesday in May 2026, we finished a Llama 3 70B inference container setup on a fresh GPU instance. The A100 spun up in less than a minute, but then we hit the model.load() call and watched the terminal. For nearly five minutes, nothing happened but disk activity. That was over $0.20 in idle GPU time before the model was even ready for its first prompt. That’s the tax of slow block storage, and it adds up.

Why LLM startup latency matters for cloud GPUs

When you’re renting GPUs by the second or minute, every moment the hardware is active but not actually doing useful computation is money out the door. For large language models, especially those exceeding 70B parameters, the model weights alone can be hundreds of gigabytes. Loading these into VRAM from attached block storage is often the longest phase of a cold start. It’s a bottleneck that can easily eclipse the actual inference or fine-tuning time for short-lived jobs.

Consider an interactive LLM application or a serverless function where models are loaded on demand. If your users expect a response within a few seconds, a 3-5 minute model load time is a non-starter. Even for batch inference or iterative fine-tuning, those minutes spent waiting for weights to transfer from disk to GPU accumulate. We’ve written before about the general impact of GPU Instance Storage: The Hidden Cost You Keep Forgetting, and this is a prime example of it. When we looked at cold start times on Runpod Serverless, the model load component was consistently the largest variable.

Paying for a top-tier A100 GPU only for it to sit mostly idle while shuffling bits from a cheap, slow block volume defeats the purpose of per-second billing. The goal is to maximize useful GPU cycles, not just minimize the advertised hourly rate.

Our benchmarking setup: providers, gpus, and LLM

Over the past few weeks leading up to June 2026, we provisioned identical A100 80GB instances across Runpod, Vultr, and Lambda Labs. Our aim was to isolate the block storage performance for LLM loading. We tried to keep the GPU instance itself as consistent as possible, knowing that A100 Cloud Pricing can vary wildly. For our model, we chose Llama 3 70B (the Meta-Llama-3-70B-Instruct FP16 weights), a common model that clocks in at approximately 140GB on disk. We attached a 200GB block storage volume to each instance, ensuring enough room for the model and any temporary files.

Here’s the rundown of our setup:

Provider	GPU	GPU VRAM	Instance Storage Type	LLM Used	Total Model Size	Block Storage Size
Runpod	A100 80GB	80GB	Local NVMe + Block	Llama 3 70B (FP16)	~140GB	200GB
Vultr	A100 80GB	80GB	Local NVMe + Block	Llama 3 70B (FP16)	~140GB	200GB
Lambda Labs	A100 80GB	80GB	Local NVMe + Block	Llama 3 70B (FP16)	~140GB	200GB

For each provider, we used their standard, recommended block storage offering. The process was simple: launch the instance, attach the pre-filled block storage volume with the Llama 3 weights, and then time the model.load() call using Hugging Face’s transformers library, ensuring the model was loaded to cuda:0.

Runpod vs vultr vs lambda: LLM model load time results

We ran the model loading test five times on each platform, rebooting the instance between runs to clear any potential caching artifacts. The results were fairly consistent within each provider, but showed clear differences between them. These aren’t theoretical throughput numbers; these are real-world, wall-clock times we observed for a 140GB model hitting the GPU’s VRAM.

Provider	Average Load Time (s)	Min Load Time (s)	Max Load Time (s)
Runpod	188	182	195
Vultr	276	268	285
Lambda Labs	215	209	222

Runpod consistently delivered the fastest load times, averaging just over three minutes. Lambda Labs was a respectable second, usually finishing within four minutes. Vultr, however, lagged significantly, often taking nearly five minutes to get the model loaded. For a process that happens at the beginning of every job, these differences are not trivial.

Analyzing the block storage performance differences

The most immediate culprits for these discrepancies are the underlying block storage performance characteristics. While all providers offer block storage, the devil, as always, is in the details of IOPS, throughput, and, of course, pricing. We’ve covered this extensively in Runpod vs Vultr vs Lambda: Training I/O When the GPU Isn’t the Bottleneck, but it’s worth re-examining specifically for large model loading.

Here’s what each provider generally advertises for their block storage, along with their published rates as of June 2026 for a comparable volume:

Provider	Advertised Block Storage IOPS (approx)	Advertised Block Storage Throughput (MB/s, approx)	Advertised Block Storage Price ($/GB/hr, for 200GB)
Runpod	Varies by region, often 5k-10k	Varies by region, often 500-1000	0.000003 ¹
Vultr	Up to 15k	Up to 2000	~0.000139 (for $0.10/GB/month) ²
Lambda Labs	10k-20k	500-1000	0.000006 ³

It’s important to note that advertised IOPS and throughput numbers don’t always translate directly to real-world performance, especially when dealing with large, sequential reads like model loading. There are overheads, network latency, and shared infrastructure considerations that can impact actual speed. Runpod’s performance, despite sometimes advertising lower peak numbers than Vultr, consistently delivered faster model loading. This suggests their block storage is either more consistently provisioned, or their underlying network path to the GPU instances is more optimized for this type of workload.

Vultr’s pricing for block storage is notably higher per GB/hour when converted from their monthly rate, and yet its performance for this specific task was the slowest. This means you’re paying more for a slower experience in this particular scenario. Lambda Labs strikes a good balance, with decent performance at a competitive price.

When fast block storage pays for itself

Let’s put some numbers to this. Assume you’re running an A100 80GB instance at an average price of $2.50/hour. A Llama 3 70B model load happens 10 times a day for various experiments or inference calls. (We’re being conservative here; many teams spin up and tear down instances far more frequently).

Runpod: 188 seconds (3.13 minutes) per load
Lambda Labs: 215 seconds (3.58 minutes) per load
Vultr: 276 seconds (4.60 minutes) per load

Daily idle cost for model loading:

Runpod: 3.13 min/load * 10 loads/day = 31.3 min/day = 0.52 hr/day 0.52 hr/day * $2.50/hr = $1.30 per day
Lambda Labs: 3.58 min/load * 10 loads/day = 35.8 min/day = 0.60 hr/day 0.60 hr/day * $2.50/hr = $1.50 per day
Vultr: 4.60 min/load * 10 loads/day = 46.0 min/day = 0.77 hr/day 0.77 hr/day * $2.50/hr = $1.93 per day

Over a month (30 days), those differences become significant:

Runpod: $1.30/day * 30 days = $39.00/month
Lambda Labs: $1.50/day * 30 days = $45.00/month
Vultr: $1.93/day * 30 days = $57.90/month

That’s a difference of nearly $19/month between Runpod and Vultr just for the GPU idle time during model loading, assuming a relatively modest 10 loads per day. Factor in the actual block storage costs, and Vultr’s higher price per GB/hour makes the gap even wider. For a 200GB volume for 30 days:

Runpod: 200GB * 0.000003 $/GB/hr * 24 hr/day * 30 days = $0.43/month
Lambda Labs: 200GB * 0.000006 $/GB/hr * 24 hr/day * 30 days = $0.86/month
Vultr: 200GB * 0.000139 $/GB/hr * 24 hr/day * 30 days = $20.02/month

When you combine the GPU idle time and the block storage cost, the total monthly overhead for Llama 3 70B loading (10 times a day, 200GB volume) looks like this:

Runpod total: $39.00 (GPU idle) + $0.43 (storage) = $39.43/month
Lambda Labs total: $45.00 (GPU idle) + $0.86 (storage) = $45.86/month
Vultr total: $57.90 (GPU idle) + $20.02 (storage) = $77.92/month

This clearly illustrates how fast block storage doesn’t just improve developer experience; it directly impacts your bottom line, sometimes by a factor of two or more for this specific type of workload.

The best provider for LLM model load times

For LLM workloads where model loading time is a critical factor, our tests show a clear winner: Runpod. Their block storage, despite sometimes being priced similarly or even slightly lower than competitors, consistently delivered the fastest model load times for a large Llama 3 70B model. This directly translates into less idle GPU time and, ultimately, lower overall costs for iterative development, frequent inference, or any scenario demanding quick model readiness.

Lambda Labs is a solid second option, offering a good balance of performance and cost. Vultr, while a strong contender for raw GPU power, simply fell behind on this specific metric, and its comparatively higher block storage costs compound the issue. If you’re spinning up and tearing down instances frequently, or running serverless functions that need to load large models fast, those seconds saved on model loading quickly turn into dollars in your pocket. If you want to try the same workload yourself, our referral link is an easy way to get started.

Our takeaway is simple: don’t just look at the GPU hourly rate. Dig into the real-world performance of the supporting infrastructure, especially when dealing with multi-gigabyte models that need to be ready to run in seconds, not minutes.

Per Runpod’s GPU pricing page, their advertised hourly rate for block storage is around 0.000003 $/GB/hour. ↩
Vultr’s advertised hourly rate for high-performance block storage is based on $0.10/GB/month, which converts to approximately 0.000139 $/GB/hour. See Vultr’s pricing page. ↩
Lambda Labs’ advertised hourly rate for block storage is around $0.000006/GB/hour. See Lambda Labs’ pricing page. ↩

LLM model load times: how slow cloud block storage costs you money

Why LLM startup latency matters for cloud GPUs

Our benchmarking setup: providers, gpus, and LLM

Runpod vs vultr vs lambda: LLM model load time results

Analyzing the block storage performance differences

When fast block storage pays for itself

The best provider for LLM model load times

Nvidia L40 48GB vs A100 40GB: better value for LLM inference?

Cloud NVLink H200 pricing: Runpod, Lambda, CoreWeave for LLM training

Dual A100 40GB vs H100 80GB: where to train LLMs?

Why LLM startup latency matters for cloud GPUs

Our benchmarking setup: providers, gpus, and LLM

Runpod vs vultr vs lambda: LLM model load time results

Analyzing the block storage performance differences

When fast block storage pays for itself

The best provider for LLM model load times

Footnotes

Nvidia L40 48GB vs A100 40GB: better value for LLM inference?

Cloud NVLink H200 pricing: Runpod, Lambda, CoreWeave for LLM training

Dual A100 40GB vs H100 80GB: where to train LLMs?