RTX 4070 Super: Pricing Out Llama 3 Fine-Tuning

The RTX 4070 Super launched in early 2024 with a lot of noise, promising a sweet spot for VRAM and price. At 12GB, it’s just enough to squeeze in smaller LLMs like Llama 3 8B for Q-LoRA fine-tuning without immediately hitting memory limits. We wanted to see if that promise held up in the cloud for real-world workloads by putting three common rental providers to the test: Runpod, Vast.ai, and Vultr. We weren’t looking for raw speed; we were looking for the total bill at the end of a month of consistent, if not heavy, use.

Our first Llama 3 8B fine-tune job on a 4070 Super instance, using a 10,000-sample dataset, took almost exactly 7 hours. The problem wasn’t the time, it was the price variance across providers for that same 7 hours of compute. One bill was $3.08, another $4.76, and a third almost $6.00. The GPU was the same. The difference was in the fine print and the hidden friction.

What We Compared and How We Ran It

For four weeks in April and May 2026, we cycled through RTX 4070 Super instances on Runpod (Community Cloud), Vast.ai, and Vultr. Our primary workload was Q-LoRA fine-tuning of Llama 3 8B on a custom medical text dataset. This task is VRAM-sensitive but not overwhelmingly so, making the 12GB on the 4070 Super a decent fit. We chose a constant dataset size and ran the fine-tuning for a fixed number of epochs (3 epochs per run) to ensure repeatable results. We also performed a handful of inference tests, but the fine-tuning runs proved to be the more telling metric for total cost.

We focused on the cheapest available instances with an RTX 4070 Super. This meant dealing with varying CPU cores, RAM, and storage across providers. We tried to standardize regions (EU-central for Runpod and Vultr, whatever was available in Europe for Vast.ai), but exact parity was, as always, elusive. It’s part of the trade-off you accept for these budget-friendly tiers.

The Contenders: Specs and Sticker Price

Here’s how the base configurations and hourly rates stacked up for the RTX 4070 Super instances we could reliably procure. Note that Vast.ai’s pricing is a dynamic average; we recorded what we paid for available instances during our testing period.

Provider	Instance Name / Type	GPU	VRAM	vCPU	RAM	Base SSD	Avg. $/hr	Included Traffic
Runpod	Community Pod	RTX 4070 Super	12 GB	8	32 GB	250 GB	$0.44	1 TB
Vast.ai	Marketplace	RTX 4070 Super	12 GB	6-12	24-64 GB	200-500 GB	$0.38	Unspecified
Vultr	Cloud GPU	RTX 4070 Super	12 GB	8	32 GB	300 GB	$0.68	2 TB

Note: Vast.ai instances vary significantly in CPU, RAM, and SSD. The listed values are common ranges we encountered. Traffic on Vast.ai is typically minimal before overage charges, varying by host.

On paper, Vast.ai is the cheapest by a noticeable margin on an hourly basis. However, this comes with the caveat of availability. Finding a stable, well-spec’d 4070 Super on Vast.ai often required patience and a willingness to compromise on CPU/RAM. Runpod’s Community Cloud offered consistent availability and predictable pricing, usually only a few cents more per hour than Vast.ai’s lowest. Vultr, as a more traditional cloud provider, had fixed pricing and immediate availability, but at a higher hourly cost.

Llama 3 8B Fine-Tuning Performance and Cost

We ran our Llama 3 8B Q-LoRA fine-tuning workload multiple times on each platform. The workload involved 3 epochs on a 10,000-sample dataset, with a batch size of 4. We logged the average steps per second and the total duration, then calculated the total cost based on the hourly rate.

Provider	GPU	VRAM	Avg. Steps/sec	Total Fine-Tune Time (hr)	Total Cost (per run)
Runpod	RTX 4070 Super	12 GB	1.8	7.0	$3.08
Vast.ai	RTX 4070 Super	12 GB	1.7	7.1	$2.70
Vultr	RTX 4070 Super	12 GB	1.8	7.0	$4.76

Raw performance (steps/sec) was remarkably similar across all three providers, which isn’t surprising given they’re all running the same GPU. The slight variations likely stem from differences in CPU, RAM, or background noise on shared instances. The key takeaway here is the total cost. Vast.ai, when available at its lower rates, was indeed the cheapest for a full fine-tuning run. Runpod was a close second, offering a more consistent experience. Vultr was significantly more expensive for the same output.

This cost comparison highlights the trade-offs. If your time is cheaper than the machine, and you don’t mind waiting for a good Vast.ai deal, you can save a few dollars. If you need to spin up a job now without fuss, Runpod offers a compelling middle ground. For context on higher-tier GPUs, we’ve done similar deep dives on A100 Cloud Pricing and RTX 4080 Super Cloud comparisons, where the dynamics shift but the core principle of scrutinizing all costs remains.

Operational Friction and Hidden Costs

Beyond the hourly rate, the experience differed in subtle but important ways:

Availability: Runpod offered the best availability for the 4070 Super, with pods usually spinning up within minutes. Vast.ai required more refreshing and often presented a wider array of specs, some less optimal than others. Vultr was instant, as expected from a traditional cloud provider, but you pay for that convenience.
Storage: While we didn’t run into major storage issues for this workload, it’s always a hidden cost. Runpod and Vultr offer predictable block storage pricing. Vast.ai’s local storage often requires careful management and can be less reliable. We’ve discussed this in more detail in our GPU Instance Storage piece.
Egress: For our fine-tuning workload, egress wasn’t a significant factor, as the dataset was small and models were downloaded once. However, for inference pipelines or heavy data movement, Vultr’s generous 2TB included traffic is a clear win over Runpod’s 1TB, and Vast.ai’s often unspecified limits. Always check egress for your specific use case; it can bite you hard, as we’ve noted in our egress cost guide.
User Experience: Runpod’s UI and API are generally straightforward for managing pods. Vast.ai’s marketplace can feel a bit like the wild west, requiring more manual checks and configuration. Vultr’s interface is clean and consistent with their broader cloud offering, which is a plus if you’re already in their ecosystem.

The Verdict: Where to Run Your 4070 Super Fine-Tunes

For most developers and small teams looking to fine-tune Llama 3 8B or similar 12GB VRAM-bound models, the RTX 4070 Super offers good value, but the provider choice matters. If you prioritize the absolute lowest hourly rate and are willing to spend some time hunting for instances, Vast.ai can deliver the cheapest per-run cost. However, be prepared for variability in host specs and potential availability issues.

Runpod struck the best balance for us. Its consistent availability, predictable pricing, and straightforward user experience made it the easiest to integrate into a regular workflow. The slight premium over Vast.ai’s lowest rates was often justified by the reduced operational friction and reliable access to instances. If you want to kick the tyres yourself, you can spin up a pod via our referral link.

Vultr, while offering instant provisioning and a polished platform, simply couldn’t compete on price for this specific workload. It might make sense if you’re already deeply invested in Vultr’s cloud or have a demanding uptime requirement where a few extra dollars per hour is irrelevant. For budget-conscious LLM fine-tuning, however, it fell short. In the end, we’d recommend starting with Runpod for a smooth experience and only exploring Vast.ai if you consistently need to shave off those last few cents per hour and have the patience to manage the marketplace.