/blog / comparison
RTX 4070 Super: Pricing Out Llama 3 Fine-Tuning
We put Runpod, Vast.ai, and Vultr's 4070 Super instances through a Llama 3 8B fine-tuning gauntlet to find the real cost.
- gpu
- comparison
- rtx4070super
- llama3
- finetuning
The RTX 4070 Super launched in early 2024 with a lot of noise, promising a sweet spot for VRAM and price. At 12GB, it’s just enough to squeeze in smaller LLMs like Llama 3 8B for Q-LoRA fine-tuning without immediately hitting memory limits. We wanted to see if that promise held up in the cloud for real-world workloads by putting three common rental providers to the test: Runpod, Vast.ai, and Vultr. We weren’t looking for raw speed; we were looking for the total bill at the end of a month of consistent, if not heavy, use.
Our first Llama 3 8B fine-tune job on a 4070 Super instance, using a 10,000-sample dataset, took almost exactly 7 hours. The problem wasn’t the time, it was the price variance across providers for that same 7 hours of compute. One bill was $3.08, another $4.76, and a third almost $6.00. The GPU was the same. The difference was in the fine print and the hidden friction.
What We Compared and How We Ran It
For four weeks in April and May 2026, we cycled through RTX 4070 Super instances on Runpod (Community Cloud), Vast.ai, and Vultr. Our primary workload was Q-LoRA fine-tuning of Llama 3 8B on a custom medical text dataset. This task is VRAM-sensitive but not overwhelmingly so, making the 12GB on the 4070 Super a decent fit. We chose a constant dataset size and ran the fine-tuning for a fixed number of epochs (3 epochs per run) to ensure repeatable results. We also performed a handful of inference tests, but the fine-tuning runs proved to be the more telling metric for total cost.
We focused on the cheapest available instances with an RTX 4070 Super. This meant dealing with varying CPU cores, RAM, and storage across providers. We tried to standardize regions (EU-central for Runpod and Vultr, whatever was available in Europe for Vast.ai), but exact parity was, as always, elusive. It’s part of the trade-off you accept for these budget-friendly tiers.
The Contenders: Specs and Sticker Price
Here’s how the base configurations and hourly rates stacked up for the RTX 4070 Super instances we could reliably procure. Note that Vast.ai’s pricing is a dynamic average; we recorded what we paid for available instances during our testing period.
| Provider | Instance Name / Type | GPU | VRAM | vCPU | RAM | Base SSD | Avg. $/hr | Included Traffic |
|---|---|---|---|---|---|---|---|---|
| Runpod | Community Pod | RTX 4070 Super | 12 GB | 8 | 32 GB | 250 GB | $0.44 | 1 TB |
| Vast.ai | Marketplace | RTX 4070 Super | 12 GB | 6-12 | 24-64 GB | 200-500 GB | $0.38 | Unspecified |
| Vultr | Cloud GPU | RTX 4070 Super | 12 GB | 8 | 32 GB | 300 GB | $0.68 | 2 TB |
Note: Vast.ai instances vary significantly in CPU, RAM, and SSD. The listed values are common ranges we encountered. Traffic on Vast.ai is typically minimal before overage charges, varying by host.
On paper, Vast.ai is the cheapest by a noticeable margin on an hourly basis. However, this comes with the caveat of availability. Finding a stable, well-spec’d 4070 Super on Vast.ai often required patience and a willingness to compromise on CPU/RAM. Runpod’s Community Cloud offered consistent availability and predictable pricing, usually only a few cents more per hour than Vast.ai’s lowest. Vultr, as a more traditional cloud provider, had fixed pricing and immediate availability, but at a higher hourly cost.
Llama 3 8B Fine-Tuning Performance and Cost
We ran our Llama 3 8B Q-LoRA fine-tuning workload multiple times on each platform. The workload involved 3 epochs on a 10,000-sample dataset, with a batch size of 4. We logged the average steps per second and the total duration, then calculated the total cost based on the hourly rate.
| Provider | GPU | VRAM | Avg. Steps/sec | Total Fine-Tune Time (hr) | Total Cost (per run) |
|---|---|---|---|---|---|
| Runpod | RTX 4070 Super | 12 GB | 1.8 | 7.0 | $3.08 |
| Vast.ai | RTX 4070 Super | 12 GB | 1.7 | 7.1 | $2.70 |
| Vultr | RTX 4070 Super | 12 GB | 1.8 | 7.0 | $4.76 |
Raw performance (steps/sec) was remarkably similar across all three providers, which isn’t surprising given they’re all running the same GPU. The slight variations likely stem from differences in CPU, RAM, or background noise on shared instances. The key takeaway here is the total cost. Vast.ai, when available at its lower rates, was indeed the cheapest for a full fine-tuning run. Runpod was a close second, offering a more consistent experience. Vultr was significantly more expensive for the same output.
This cost comparison highlights the trade-offs. If your time is cheaper than the machine, and you don’t mind waiting for a good Vast.ai deal, you can save a few dollars. If you need to spin up a job now without fuss, Runpod offers a compelling middle ground. For context on higher-tier GPUs, we’ve done similar deep dives on A100 Cloud Pricing and RTX 4080 Super Cloud comparisons, where the dynamics shift but the core principle of scrutinizing all costs remains.
Operational Friction and Hidden Costs
Beyond the hourly rate, the experience differed in subtle but important ways:
- Availability: Runpod offered the best availability for the 4070 Super, with pods usually spinning up within minutes. Vast.ai required more refreshing and often presented a wider array of specs, some less optimal than others. Vultr was instant, as expected from a traditional cloud provider, but you pay for that convenience.
- Storage: While we didn’t run into major storage issues for this workload, it’s always a hidden cost. Runpod and Vultr offer predictable block storage pricing. Vast.ai’s local storage often requires careful management and can be less reliable. We’ve discussed this in more detail in our GPU Instance Storage piece.
- Egress: For our fine-tuning workload, egress wasn’t a significant factor, as the dataset was small and models were downloaded once. However, for inference pipelines or heavy data movement, Vultr’s generous 2TB included traffic is a clear win over Runpod’s 1TB, and Vast.ai’s often unspecified limits. Always check egress for your specific use case; it can bite you hard, as we’ve noted in our egress cost guide.
- User Experience: Runpod’s UI and API are generally straightforward for managing pods. Vast.ai’s marketplace can feel a bit like the wild west, requiring more manual checks and configuration. Vultr’s interface is clean and consistent with their broader cloud offering, which is a plus if you’re already in their ecosystem.
The Verdict: Where to Run Your 4070 Super Fine-Tunes
For most developers and small teams looking to fine-tune Llama 3 8B or similar 12GB VRAM-bound models, the RTX 4070 Super offers good value, but the provider choice matters. If you prioritize the absolute lowest hourly rate and are willing to spend some time hunting for instances, Vast.ai can deliver the cheapest per-run cost. However, be prepared for variability in host specs and potential availability issues.
Runpod struck the best balance for us. Its consistent availability, predictable pricing, and straightforward user experience made it the easiest to integrate into a regular workflow. The slight premium over Vast.ai’s lowest rates was often justified by the reduced operational friction and reliable access to instances. If you want to kick the tyres yourself, you can spin up a pod via our referral link.
Vultr, while offering instant provisioning and a polished platform, simply couldn’t compete on price for this specific workload. It might make sense if you’re already deeply invested in Vultr’s cloud or have a demanding uptime requirement where a few extra dollars per hour is irrelevant. For budget-conscious LLM fine-tuning, however, it fell short. In the end, we’d recommend starting with Runpod for a smooth experience and only exploring Vast.ai if you consistently need to shave off those last few cents per hour and have the patience to manage the marketplace.
comparison
Modal vs Replicate vs Runpod: cheapest Llama 3 vLLM inference
Compare Modal, Replicate, and Runpod for Llama 3 inference with vLLM. See our measured cost-per-token, latency, and cold start times to find the cheapest option for your LLM workloads.
5 min
comparison
A6000 Ada vs RTX 4090 for Stable Diffusion: is ECC worth the cost?
Comparing NVIDIA A6000 Ada vs RTX 4090 for Stable Diffusion finetuning. We break down VRAM, performance, and whether ECC memory justifies the cost for your AI projects.
5 min
comparison · nvidia
A6000 Ada vs RTX 4090: Is ECC VRAM Worth It for Stable Diffusion?
We fine-tuned SDXL on both cards for a month, tracking errors, performance, and the actual cost difference beyond the hourly rate.
5 min