/blog / comparison

RTX 4080 Super Cloud: Runpod vs Vast.ai vs Vultr for LLM Fine-Tuning

We threw Llama 3 8B at three providers' RTX 4080 Super instances for a month to see where mid-range LLM fine-tuning dollars really go.

Tobias 5 min read
  • gpu
  • comparison
  • rtx4080super
  • runpod
  • vastai
  • vultr
  • llm

We needed a mid-range GPU for LLM fine-tuning in early 2026—something beyond a well-worn RTX 3090, but not quite warranting a full H100 budget. The RTX 4080 Super, with its 16 GB of VRAM and solid throughput, looked promising on paper. Then we started looking at cloud pricing. The variations weren’t just a few cents; one provider was nearly double the cost for the same theoretical performance, and the hidden fees quickly turned theoretical savings into real headaches. We spent a month pushing Llama 3 8B through fine-tuning runs on instances from Runpod, Vast.ai, and Vultr to find out what actually matters.

What We Were Comparing

Our goal was simple: cost-effective fine-tuning of moderately sized LLMs. The RTX 4080 Super sits in a sweet spot for many smaller teams and researchers who need more than a consumer card but aren’t ready to commit to A100 or H100 budgets, especially given the current pricing on those higher-tier cards (as we’ve explored in our A100 cloud pricing comparison). We focused on instances offering the bare RTX 4080 Super GPU, aiming for similar CPU/RAM configurations where possible, and located everything in a European region to keep network latency consistent.

The workload was a standard Llama 3 8B instruction fine-tune, using LoRA for efficiency. We tracked epochs per hour, overall stability, and, critically, the actual cost per completed epoch, factoring in idle time, storage, and egress. We didn’t just look at the advertised hourly rate; we spun up instances, left them running for a few hours, then tore them down, just like most real-world development cycles.

Price and Raw Specs: The On-Paper Battle

Here’s how the three contenders stacked up on paper, based on typical configurations we found available in mid-May 2026. Note that Vast.ai’s pricing is a dynamic marketplace, so these numbers are averages we observed for a decent host with the 4080 Super during our testing window.

ProviderInstance TypeGPUVRAMCPU (vCores)RAM (GB)Storage (GB NVMe)$/hr (on-demand)Included TrafficEgress $/GB
RunpodCommunity PodRTX 4080 Super16 GB8-1632-64256$0.421 TB/mo$0.01
Vast.aiCommunity OfferRTX 4080 Super16 GB8-1232-48200$0.35VariesVaries ($0.005-$0.02)
VultrCloud GPURTX 4080 Super16 GB1664300$0.581 TB/mo$0.01

Right away, Vast.ai looks like the clear winner on raw hourly rate. But as always, the devil is in the details. Vultr is noticeably more expensive upfront. Runpod sits in the middle, offering a more predictable experience than Vast.ai’s marketplace.

Performance and Stability Under Load

Paper specs are one thing; real-world performance is another. We ran our Llama 3 8B fine-tuning job across all three providers. The good news is that the RTX 4080 Super itself performed consistently across all hosts when we could get a stable instance. We saw an average of 4.8-5.2 epochs per hour for our specific Llama 3 8B LoRA workload, with minor variations attributable to host CPU or storage speed, not the GPU itself.

The real differences emerged in getting the work done and keeping it running:

  • Runpod: Spin-up times were generally good, usually under 60 seconds for a Community Pod. We experienced solid stability; once a job started, it tended to finish without interruption. The control plane is intuitive enough, and the API allows for easy automation. The storage is simple to manage, though you’ll want to pay attention to persistent storage costs if you’re keeping large datasets or checkpoints around for weeks (a topic we covered in our GPU instance storage deep dive).
  • Vast.ai: This is where the variability hits. While we found instances for as low as $0.30/hr, we also encountered hosts with flaky network, slow storage, or sudden shutdowns. The marketplace means you’re effectively renting from individual machine owners, and quality control is, shall we say, distributed. Finding a reliably fast instance often meant trying a few before settling. For pure raw cost, it’s hard to beat if you’re willing to hunt and tolerate occasional restarts. For a deeper dive into the Vast.ai experience, check out our guide for hobbyists.
  • Vultr: Predictability is Vultr’s strength. Instances spun up quickly (under 30 seconds), were consistently stable, and the network performance was excellent. Their UI is clean, and the overall experience feels more polished and enterprise-ready. This comes at a price, of course, but if you’re running time-sensitive jobs or need high reliability, that premium might be worth it. Our overall Vultr experience, even with their A100s, has been that they’re solid if you can justify the cost (see our A100 review).

The Egress Tax and Operational Friction

Hourly rates are only one part of the equation. For LLM fine-tuning, you’re constantly pulling datasets in and pushing model checkpoints out. Egress fees can quickly eat into your savings, especially if you’re not careful. We’ve dedicated entire posts to this, like our guide on egress costs.

  • Runpod: Their egress is a flat $0.01/GB after 1 TB included, which is competitive and easy to calculate. It’s not free, but it’s not punitive either. We found their included traffic generous enough for typical fine-tuning cycles without racking up huge overage bills.
  • Vast.ai: Egress policies vary wildly by host. Some claim