Hetzner GPU: Cloud vs. Dedicated for AI/ML

We needed a GPU for a few days to fine-tune a Llama 3 8B model. Naturally, we gravitated towards Hetzner Cloud’s GPU instances — quick to spin up, hourly billing, seemed like the path of least resistance. The model trained, the job finished, and the bill for those few days was entirely reasonable. But then we projected that same workload over a month, running 24/7 inference, and the numbers started looking less appealing. That’s when we remembered the dedicated server sitting mostly idle in the corner, and the question became: is ‘easy’ always ‘optimal’ when Hetzner offers both?

For most small teams and individual developers, the choice between managed cloud instances and bare metal usually boils down to convenience versus raw, unburdened power. Hetzner, unlike some hyperscalers, offers both at compelling prices. But they’re not interchangeable, and picking the wrong one for your specific AI/ML workload can mean hundreds of euros wasted over a quarter.

What We’re Comparing

Our goal was to understand the real-world trade-offs for typical AI/ML workloads, specifically LLM fine-tuning and sustained batch inference. We weren’t chasing bleeding-edge H100s here; we were looking for cost-effective workhorses. We focused on two distinct Hetzner offerings, both capable of GPU acceleration:

Hetzner Cloud GPU (CX41 + NVIDIA L4): This is a managed virtual machine, where you get a slice of a host server with dedicated GPU resources. Billing is hourly, making it ideal for bursty or experimental use cases. The NVIDIA L4 is a modern, power-efficient inference GPU with good training capabilities for its class.
Hetzner Dedicated Server (Auction Find: AX61 + RTX 3090): This represents the bare-metal route. We specifically looked at an auction find — a common strategy for squeezing maximum value from Hetzner’s dedicated lineup. These servers often come with older but still very capable consumer GPUs like the RTX 3090, offering significant VRAM and compute for a fixed monthly fee. The AX61, a Ryzen 7 5800X machine, provides ample dedicated CPU and RAM, crucial for data pre-processing and loading during training.

All our tests were conducted in Hetzner’s Falkenstein (FSN1) datacenter. We assumed sustained usage for a monthly cost projection, as one-off tasks are almost always better suited for the hourly cloud model.

Price and Raw Specs

This is where the numbers start to diverge, often dramatically. On paper, the Cloud GPU looks like a quick win. In practice, the dedicated server, if you’re willing to put in a little effort, offers a different kind of value. Here’s a side-by-side of the configurations we tested and their approximate costs, as of late May 2026:

Feature	Hetzner Cloud GPU (CX41 + L4)	Hetzner Dedicated (Auction Find: AX61 + RTX 3090)
CPU	8 vCPU (shared, AMD EPYC)	Ryzen 7 5800X (8c/16t, dedicated)
RAM	16 GB DDR4	64 GB DDR4
GPU	NVIDIA L4 (24 GB VRAM)	RTX 3090 (24 GB VRAM)
Storage	240 GB NVMe (block storage)	2 x 1.92 TB NVMe (local storage)
Network	1 Gbit/s unmetered	1 Gbit/s unmetered
Base Cost	€0.95/hr	€280/month
Monthly Est. (24/7)	€693.50	€280/month
Setup Time	< 5 minutes	1-2 days (order, OS install, driver setup)

The immediate takeaway from that table is the monthly cost. For a workload that runs 24/7, the dedicated server is less than half the price of the Cloud GPU instance, while offering comparable VRAM and significantly more dedicated CPU and RAM. The L4 is a more modern, power-efficient GPU, often excelling in inference, but the RTX 3090 holds its own for training, particularly given its dedicated host resources. As we’ve noted in our Hetzner AX52 vs OVH Rise-3 piece, the ‘unmetered’ network on Hetzner is a significant benefit, avoiding the egress surprises that can plague other providers.

How They Actually Performed

Numbers on a spec sheet are one thing; real-world performance is another. We put both setups through a standardized Llama 3 8B QLoRA fine-tuning job using a 10,000-sample dataset, tracking epochs per hour and VRAM utilization.

For the Hetzner Cloud GPU (CX41 + L4), the L4 proved to be a reliable workhorse. VRAM utilization was consistent at around 20GB, leaving some headroom. We observed an average of 2.8 epochs per hour. The shared vCPU resources were generally sufficient for data loading, though we occasionally saw minor dips during intensive I/O phases, indicating potential contention with other VMs on the same host.

The Hetzner Dedicated Server (AX61 + RTX 3090), despite the RTX 3090 being a generation older, delivered strong results. With its 24 GB of VRAM fully utilized, and the dedicated Ryzen 7 5800X handling data pre-processing, we consistently hit around 3.5 epochs per hour. The dedicated CPU and local NVMe storage eliminated any data loading bottlenecks we observed on the Cloud GPU, allowing the RTX 3090 to churn through batches without interruption. This translates directly to faster training times and a lower cost per epoch.

Cold Start & Deployment: This is where the Cloud GPU shines. From API call to a running nvidia-smi output, we were typically under 5 minutes. Pulling a Docker image and launching our training container added another 2-3 minutes. For the dedicated server, the initial setup—OS installation, NVIDIA driver compilation, Docker setup—took a solid 4-6 hours of hands-on work over a day or two. However, once configured, it’s always ‘on’ and ready, requiring no cold start waiting.

Network & Storage: Both options benefited from Hetzner’s generous network. We didn’t hit any egress walls, even when pulling multi-gigabyte datasets. (For more on egress, check out our guide on egress costs). The critical difference was storage. The Cloud GPU’s 240GB block NVMe, while reasonably fast, is virtualized and limited in size. For larger datasets or frequent checkpointing, you’d need to attach additional block storage, incurring more cost and potential latency. The dedicated server’s 2x 1.92 TB local NVMe drives offered direct, high-throughput access, making them ideal for I/O-intensive training with massive datasets.

Quality-of-Life Differences

Beyond raw performance and cost, the day-to-day experience of managing these environments differs significantly.

Control Panel & API: Hetzner Cloud offers an intuitive web UI and a robust API for spinning up, managing, and tearing down instances. It’s designed for automation and rapid iteration. The dedicated server panel is more basic, focused on power cycles, KVM access, and OS re-installs. You’re expected to manage most aspects via SSH.

Availability: Hetzner Cloud GPU instances are provisioned on demand, but specific GPU types, especially L4s, can experience ‘no capacity’ errors during peak times. While usually temporary, this can frustrate automated scaling or urgent deployments. Dedicated servers, once leased, are yours. The challenge is finding the right server in the auction. You might need patience and quick fingers to snag a good deal like our AX61 with an RTX 3090; they don’t linger.

Maintenance & Support: On the Cloud GPU, Hetzner handles hardware maintenance, hypervisor updates, and basic network issues. On dedicated, you’re responsible for everything from OS patching to driver updates. Hetzner’s support is responsive for hardware failures, but deep OS or software issues are typically on you.

Where Each Option Makes Sense

This isn’t a simple ‘winner takes all’ scenario. Both options excel for different use cases:

Hetzner Cloud GPU (CX41 + L4) for:

Experimentation & Development: When you’re prototyping, trying out new model architectures, or running short, exploratory training runs, the hourly billing and instant spin-up are invaluable. You pay only for compute time, down to the minute, making it easy to kill an instance when a dead end emerges.
Bursty Workloads: If your GPU needs are intermittent – a few hours of training here, a few batch inference jobs there – the Cloud GPU often proves more cost-effective than a dedicated server sitting idle for large portions of the month.
CI/CD & Automation: Its API-driven nature makes it easy to integrate into automated pipelines for model validation, testing, or on-demand batch processing, allowing you to scale GPU resources up and down programmatically.
Developers Who Prefer Managed VMs: If you’d rather not deal with OS maintenance, driver issues, or hardware troubleshooting, the Cloud GPU provides a more ‘hands-off’ virtualized environment.

Hetzner Dedicated (Auction Find: AX61 + RTX 3090) for:

Long-running Training Jobs: The significantly lower monthly cost for comparable (or better) VRAM, coupled with dedicated CPU and RAM, makes this a clear winner for models that train for days or weeks. The cost per epoch drops dramatically.
Consistent Inference: If you’re hosting a 24/7 inference endpoint where predictable performance and cost are paramount, the fixed monthly fee offers peace of mind. No noisy neighbors, no shared resource contention.
Large Datasets & I/O-intensive Workloads: Direct local NVMe access and ample dedicated CPU/RAM eliminate bottlenecks for massive data pipelines, ensuring your GPU isn’t waiting on data.
Cost-sensitive Production: For projects where every euro counts, the dedicated route offers the best $/VRAM and $/epoch for sustained usage, especially as you scale.
Teams Who Want Full Control: Root access allows for deep OS optimization, custom kernel modules, specific driver versions, and bespoke software stacks not possible on a managed VM.

Verdict

When we started, we thought this would be a simple case of ‘dedicated is cheaper, cloud is easier.’ It’s not quite that clean. For quick, bursty, or low-stakes GPU work, Hetzner Cloud GPUs absolutely deliver on convenience, allowing you to spin up a powerful L4 in minutes without commitment. It’s the right tool for rapid iteration and development. But if your AI/ML workloads run for days, weeks, or need consistent throughput without shared resource contention, the dedicated server route—even accounting for the hunt in the auction and the initial setup—still offers a compelling cost-performance advantage. We’d start with Cloud GPU for initial development and migrate to a dedicated box the moment a project shows signs of needing sustained resources. The up-front friction of dedicated pays dividends over time, particularly for any production workload. Just make sure you know what you’re looking for in the auction; those RTX 3090s don’t stay listed for long.