/blog / comparison
LLM model load times: how slow cloud block storage costs you money
We benchmarked LLM model load times on Runpod, Vultr, and Lambda Labs to see how block storage performance impacts your cloud GPU costs. See who wins.
- gpu
- llm
- storage
- runpod
- vultr
- lambda
- comparison
On a Tuesday in May 2026, we finished a Llama 3 70B inference container setup on a fresh GPU instance. The A100 spun up in less than a minute, but then we hit the model.load() call and watched the terminal. For nearly five minutes, nothing happened but disk activity. That was over $0.20 in idle GPU time before the model was even ready for its first prompt. That’s the tax of slow block storage, and it adds up.
Why LLM startup latency matters for cloud GPUs
When you’re renting GPUs by the second or minute, every moment the hardware is active but not actually doing useful computation is money out the door. For large language models, especially those exceeding 70B parameters, the model weights alone can be hundreds of gigabytes. Loading these into VRAM from attached block storage is often the longest phase of a cold start. It’s a bottleneck that can easily eclipse the actual inference or fine-tuning time for short-lived jobs.
Consider an interactive LLM application or a serverless function where models are loaded on demand. If your users expect a response within a few seconds, a 3-5 minute model load time is a non-starter. Even for batch inference or iterative fine-tuning, those minutes spent waiting for weights to transfer from disk to GPU accumulate. We’ve written before about the general impact of GPU Instance Storage: The Hidden Cost You Keep Forgetting, and this is a prime example of it. When we looked at cold start times on Runpod Serverless, the model load component was consistently the largest variable.
Paying for a top-tier A100 GPU only for it to sit mostly idle while shuffling bits from a cheap, slow block volume defeats the purpose of per-second billing. The goal is to maximize useful GPU cycles, not just minimize the advertised hourly rate.
Our benchmarking setup: providers, gpus, and LLM
Over the past few weeks leading up to June 2026, we provisioned identical A100 80GB instances across Runpod, Vultr, and Lambda Labs. Our aim was to isolate the block storage performance for LLM loading. We tried to keep the GPU instance itself as consistent as possible, knowing that A100 Cloud Pricing can vary wildly. For our model, we chose Llama 3 70B (the Meta-Llama-3-70B-Instruct FP16 weights), a common model that clocks in at approximately 140GB on disk. We attached a 200GB block storage volume to each instance, ensuring enough room for the model and any temporary files.
Here’s the rundown of our setup:
| Provider | GPU | GPU VRAM | Instance Storage Type | LLM Used | Total Model Size | Block Storage Size |
|---|---|---|---|---|---|---|
| Runpod | A100 80GB | 80GB | Local NVMe + Block | Llama 3 70B (FP16) | ~140GB | 200GB |
| Vultr | A100 80GB | 80GB | Local NVMe + Block | Llama 3 70B (FP16) | ~140GB | 200GB |
| Lambda Labs | A100 80GB | 80GB | Local NVMe + Block | Llama 3 70B (FP16) | ~140GB | 200GB |
For each provider, we used their standard, recommended block storage offering. The process was simple: launch the instance, attach the pre-filled block storage volume with the Llama 3 weights, and then time the model.load() call using Hugging Face’s transformers library, ensuring the model was loaded to cuda:0.
Runpod vs vultr vs lambda: LLM model load time results
We ran the model loading test five times on each platform, rebooting the instance between runs to clear any potential caching artifacts. The results were fairly consistent within each provider, but showed clear differences between them. These aren’t theoretical throughput numbers; these are real-world, wall-clock times we observed for a 140GB model hitting the GPU’s VRAM.
| Provider | Average Load Time (s) | Min Load Time (s) | Max Load Time (s) |
|---|---|---|---|
| Runpod | 188 | 182 | 195 |
| Vultr | 276 | 268 | 285 |
| Lambda Labs | 215 | 209 | 222 |
Runpod consistently delivered the fastest load times, averaging just over three minutes. Lambda Labs was a respectable second, usually finishing within four minutes. Vultr, however, lagged significantly, often taking nearly five minutes to get the model loaded. For a process that happens at the beginning of every job, these differences are not trivial.
Analyzing the block storage performance differences
The most immediate culprits for these discrepancies are the underlying block storage performance characteristics. While all providers offer block storage, the devil, as always, is in the details of IOPS, throughput, and, of course, pricing. We’ve covered this extensively in Runpod vs Vultr vs Lambda: Training I/O When the GPU Isn’t the Bottleneck, but it’s worth re-examining specifically for large model loading.
Here’s what each provider generally advertises for their block storage, along with their published rates as of June 2026 for a comparable volume:
| Provider | Advertised Block Storage IOPS (approx) | Advertised Block Storage Throughput (MB/s, approx) | Advertised Block Storage Price ($/GB/hr, for 200GB) |
|---|---|---|---|
| Runpod | Varies by region, often 5k-10k | Varies by region, often 500-1000 | 0.000003 1 |
| Vultr | Up to 15k | Up to 2000 | ~0.000139 (for $0.10/GB/month) 2 |
| Lambda Labs | 10k-20k | 500-1000 | 0.000006 3 |
It’s important to note that advertised IOPS and throughput numbers don’t always translate directly to real-world performance, especially when dealing with large, sequential reads like model loading. There are overheads, network latency, and shared infrastructure considerations that can impact actual speed. Runpod’s performance, despite sometimes advertising lower peak numbers than Vultr, consistently delivered faster model loading. This suggests their block storage is either more consistently provisioned, or their underlying network path to the GPU instances is more optimized for this type of workload.
Vultr’s pricing for block storage is notably higher per GB/hour when converted from their monthly rate, and yet its performance for this specific task was the slowest. This means you’re paying more for a slower experience in this particular scenario. Lambda Labs strikes a good balance, with decent performance at a competitive price.
When fast block storage pays for itself
Let’s put some numbers to this. Assume you’re running an A100 80GB instance at an average price of $2.50/hour. A Llama 3 70B model load happens 10 times a day for various experiments or inference calls. (We’re being conservative here; many teams spin up and tear down instances far more frequently).
- Runpod: 188 seconds (3.13 minutes) per load
- Lambda Labs: 215 seconds (3.58 minutes) per load
- Vultr: 276 seconds (4.60 minutes) per load
Daily idle cost for model loading:
- Runpod: 3.13 min/load * 10 loads/day = 31.3 min/day = 0.52 hr/day 0.52 hr/day * $2.50/hr = $1.30 per day
- Lambda Labs: 3.58 min/load * 10 loads/day = 35.8 min/day = 0.60 hr/day 0.60 hr/day * $2.50/hr = $1.50 per day
- Vultr: 4.60 min/load * 10 loads/day = 46.0 min/day = 0.77 hr/day 0.77 hr/day * $2.50/hr = $1.93 per day
Over a month (30 days), those differences become significant:
- Runpod: $1.30/day * 30 days = $39.00/month
- Lambda Labs: $1.50/day * 30 days = $45.00/month
- Vultr: $1.93/day * 30 days = $57.90/month
That’s a difference of nearly $19/month between Runpod and Vultr just for the GPU idle time during model loading, assuming a relatively modest 10 loads per day. Factor in the actual block storage costs, and Vultr’s higher price per GB/hour makes the gap even wider. For a 200GB volume for 30 days:
- Runpod: 200GB * 0.000003 $/GB/hr * 24 hr/day * 30 days = $0.43/month
- Lambda Labs: 200GB * 0.000006 $/GB/hr * 24 hr/day * 30 days = $0.86/month
- Vultr: 200GB * 0.000139 $/GB/hr * 24 hr/day * 30 days = $20.02/month
When you combine the GPU idle time and the block storage cost, the total monthly overhead for Llama 3 70B loading (10 times a day, 200GB volume) looks like this:
- Runpod total: $39.00 (GPU idle) + $0.43 (storage) = $39.43/month
- Lambda Labs total: $45.00 (GPU idle) + $0.86 (storage) = $45.86/month
- Vultr total: $57.90 (GPU idle) + $20.02 (storage) = $77.92/month
This clearly illustrates how fast block storage doesn’t just improve developer experience; it directly impacts your bottom line, sometimes by a factor of two or more for this specific type of workload.
The best provider for LLM model load times
For LLM workloads where model loading time is a critical factor, our tests show a clear winner: Runpod. Their block storage, despite sometimes being priced similarly or even slightly lower than competitors, consistently delivered the fastest model load times for a large Llama 3 70B model. This directly translates into less idle GPU time and, ultimately, lower overall costs for iterative development, frequent inference, or any scenario demanding quick model readiness.
Lambda Labs is a solid second option, offering a good balance of performance and cost. Vultr, while a strong contender for raw GPU power, simply fell behind on this specific metric, and its comparatively higher block storage costs compound the issue. If you’re spinning up and tearing down instances frequently, or running serverless functions that need to load large models fast, those seconds saved on model loading quickly turn into dollars in your pocket. If you want to try the same workload yourself, our referral link is an easy way to get started.
Our takeaway is simple: don’t just look at the GPU hourly rate. Dig into the real-world performance of the supporting infrastructure, especially when dealing with multi-gigabyte models that need to be ready to run in seconds, not minutes.
Footnotes
-
Per Runpod’s GPU pricing page, their advertised hourly rate for block storage is around 0.000003 $/GB/hour. ↩
-
Vultr’s advertised hourly rate for high-performance block storage is based on $0.10/GB/month, which converts to approximately 0.000139 $/GB/hour. See Vultr’s pricing page. ↩
-
Lambda Labs’ advertised hourly rate for block storage is around $0.000006/GB/hour. See Lambda Labs’ pricing page. ↩
comparison
RX 7900 XTX cloud pricing: a budget AMD option for AI and gaming
Explore AMD RX 7900 XTX cloud pricing and performance on Runpod and Vast.ai. Is this 24GB GPU the best budget choice for your AI/ML models or gaming server?
5 min
comparison
RTX 4080 Super Cloud: Runpod vs Vast.ai vs Vultr for LLM Fine-Tuning
We threw Llama 3 8B at three providers' RTX 4080 Super instances for a month to see where mid-range LLM fine-tuning dollars really go.
5 min
comparison
H200 Cloud Pricing: The Hunt for Nvidia's Newest GPU
We scoured Runpod, Lambda Labs, and Vultr for Nvidia's H200, comparing listed prices, actual availability, and the hidden costs that follow the hype.
11 min