What is Hetzner LLM hosting?

Hetzner LLM hosting refers to running Large Language Models on Hetzner Cloud or dedicated servers. Hetzner offers affordable server infrastructure in Germany, making it attractive for AI workloads that require GDPR compliance. You can self-host models using tools like Ollama or vLLM, or use SUPA to get a managed, serverless LLM API running on the same Hetzner infrastructure without any DevOps overhead.

How do I deploy a Large Language Model on Hetzner?

To deploy an LLM on Hetzner, you can rent a cloud server (or dedicated GPU server), install a serving framework like Ollama or vLLM, download a model, and expose it via an API. For CPU servers, you are limited to small models (under 7B parameters). For larger models like Llama 3 70B, you need dedicated GPU servers — which require manual provisioning, CUDA driver setup, and ongoing maintenance. Alternatively, SUPA provides instant access to LLMs on Hetzner infrastructure through a simple API, with no server management required.

What is the pricing for Hetzner GPU instances?

Hetzner's GPU servers (like the GEX44 with NVIDIA L4) are dedicated servers, not on-demand cloud instances. They require monthly contracts and manual provisioning that can take days. Pricing varies based on availability and configuration. With SUPA, you access the same Hetzner GPU hardware through a pay-per-token API — no monthly commitments, no idle costs, and instant availability.

Can I run Ollama on a Hetzner server?

Yes! You can install Ollama on any Hetzner cloud server or dedicated server. On CPU-only servers (like CX32 or CX42), you can run small models such as Phi-3 Mini, Gemma 2B, or TinyLlama. For production workloads with larger models, you need GPU servers. See our step-by-step Ollama tutorial below for detailed instructions, including server selection, installation, and configuration.

Can I use my existing OpenAI code with SUPA?

Yes! SUPA provides an OpenAI-compatible API. Just change your base URL to api.supa.works and use your SUPA API key. Your existing code, SDKs, and integrations — whether using the OpenAI Python library, Node.js SDK, or raw HTTP requests — work without any modification.

Hetzner AI Hosting.
GPU Cloud Instances, Instant Availability.

Deploy AI models on Hetzner AI servers without managing GPU infrastructure. SUPA gives you a GPU-accelerated cloud API with serverless LLM hosting on German datacenters — same Hetzner hardware, zero ops.

Read the Ollama Guide Start Free with SUPA

Same Hetzner Hardware. Zero Hassle.

Same Hetzner GEX44 hardware

Same German datacenters

Serverless API — no GPUs to manage

Pay-per-token pricing

Instant availability

German data residency guaranteed

Step-by-Step Guide

How to Install Ollama on a Hetzner Server

A complete walkthrough for self-hosting LLMs on Hetzner Cloud with Ollama — from server selection to production configuration.

Choose Your Hetzner Cloud Server

A CPU-only Hetzner Cloud server is an affordable starting point for experimenting with small language models. The CX32 (8 GB RAM, ~$7/mo) or CX42 (16 GB RAM, ~$14/mo) are good choices.

Important: CPU-only servers can only run small models — Phi-3 Mini (~3.8B params), Gemma 2B, TinyLlama 1.1B, or Qwen2 0.5B. For anything larger (Llama 3, Mixtral), you need GPU hardware.

Server	RAM	Max Model Size	Example Models
CX22 (4 GB)	4 GB	~1-2B params	TinyLlama 1.1B, Qwen2 0.5B
CX32 (8 GB)	8 GB	~3-4B params	Phi-3 Mini, Gemma 2B
CX42 (16 GB)	16 GB	~7B params (slow)	Llama 3.2 3B, Mistral 7B (slow)

See the full list of models you can run with SUPA (no server management required) on our models page.

Create Your Server & Connect via SSH

Log in to the Hetzner Cloud Console
Create a new project (or use an existing one)
Click "Add Server" and select Ubuntu 24.04
Choose your server type (CX32 recommended to start)
Add your SSH key and create the server

Once your server is ready, connect via SSH:

ssh root@YOUR_SERVER_IP

If you haven't set up SSH keys, Hetzner will email you a root password. SSH keys are recommended for security.

Install Ollama

Ollama provides a one-line installer that sets up the binary and a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

This installs the ollama binary and registers a systemd service that starts automatically on boot.

Pull and Run Your First Model

Download and start chatting with a model:

# Download the model
ollama pull phi3:mini

# Start an interactive chat
ollama run phi3:mini

You'll see an interactive prompt where you can type messages. Type /bye to exit.

To see all downloaded models:

ollama list

CPU performance expectations: On a CX32, Phi-3 Mini generates ~5-10 tokens/second. Usable for testing and development, but not fast enough for production. Larger models (7B+) will be very slow or may not load at all on CPU.

Use Ollama as an API

Ollama runs a REST API on port 11434 by default. You can send requests to it programmatically:

curl http://localhost:11434/api/generate -d '{
  "model": "phi3:mini",
  "prompt": "What is Hetzner Cloud?",
  "stream": false
}'

Ollama also supports an OpenAI-compatible endpoint at /v1/chat/completions, so you can use it with existing OpenAI SDKs:

curl http://localhost:11434/v1/chat/completions -d '{
  "model": "phi3:mini",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

Configure for Production

To make Ollama accessible from other machines, set the host to listen on all interfaces:

# Edit the systemd service
sudo systemctl edit ollama

# Add these lines:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Restart
sudo systemctl restart ollama

Configure your firewall to allow access:

sudo ufw allow 22/tcp    # SSH
sudo ufw allow 11434/tcp # Ollama API
sudo ufw enable

For HTTPS, set up a reverse proxy with nginx. Point your domain to the server IP, install certbot for SSL, and proxy requests to localhost:11434.

The GPU Server Problem

The tutorial above works for small models on CPU servers. But for real production workloads — Llama 3 70B, Mixtral 8x7B, or any model larger than 7B parameters — you need GPU servers.

Hetzner GPU servers are dedicated servers, not cloud instances — provisioning takes days, not minutes
Monthly contracts required, no hourly billing
You're responsible for CUDA drivers, cooling, hardware failures, and security updates
Not available through the standard Hetzner Cloud purchase flow
Significantly more expensive than serverless alternatives for inference

Skip the Infrastructure — Use SUPA Instead

Everything in this tutorial — server provisioning, Ollama setup, GPU management, model deployment, scaling — SUPA handles with a single API call.

Same Hetzner infrastructure in Germany. Same models. Zero ops work.

Start Free with SUPA Browse Available Models

Frequently Asked Questions

Everything you need to know about Hetzner AI hosting and running LLMs with SUPA

Hetzner AI Hosting. GPU Cloud Instances, Instant Availability.