Hetzner AI Hosting.
GPU Cloud Instances, Instant Availability.
Deploy AI models on Hetzner AI servers without managing GPU infrastructure. SUPA gives you a GPU-accelerated cloud API with serverless LLM hosting on German datacenters — same Hetzner hardware, zero ops.
Same Hetzner Hardware. Zero Hassle.
Step-by-Step Guide
How to Install Ollama on a Hetzner Server
A complete walkthrough for self-hosting LLMs on Hetzner Cloud with Ollama — from server selection to production configuration.
Choose Your Hetzner Cloud Server
A CPU-only Hetzner Cloud server is an affordable starting point for experimenting with small language models. The CX32 (8 GB RAM, ~$7/mo) or CX42 (16 GB RAM, ~$14/mo) are good choices.
| Server | RAM | Max Model Size | Example Models |
|---|---|---|---|
| CX22 (4 GB) | 4 GB | ~1-2B params | TinyLlama 1.1B, Qwen2 0.5B |
| CX32 (8 GB) | 8 GB | ~3-4B params | Phi-3 Mini, Gemma 2B |
| CX42 (16 GB) | 16 GB | ~7B params (slow) | Llama 3.2 3B, Mistral 7B (slow) |
See the full list of models you can run with SUPA (no server management required) on our models page.
Create Your Server & Connect via SSH
- Log in to the Hetzner Cloud Console
- Create a new project (or use an existing one)
- Click "Add Server" and select Ubuntu 24.04
- Choose your server type (CX32 recommended to start)
- Add your SSH key and create the server
Once your server is ready, connect via SSH:
ssh root@YOUR_SERVER_IPIf you haven't set up SSH keys, Hetzner will email you a root password. SSH keys are recommended for security.
Install Ollama
Ollama provides a one-line installer that sets up the binary and a systemd service:
curl -fsSL https://ollama.com/install.sh | shVerify the installation:
ollama --versionThis installs the ollama binary and registers a systemd service that starts automatically on boot.
Pull and Run Your First Model
Download and start chatting with a model:
# Download the model
ollama pull phi3:mini
# Start an interactive chat
ollama run phi3:miniYou'll see an interactive prompt where you can type messages. Type /bye to exit.
To see all downloaded models:
ollama listUse Ollama as an API
Ollama runs a REST API on port 11434 by default. You can send requests to it programmatically:
curl http://localhost:11434/api/generate -d '{
"model": "phi3:mini",
"prompt": "What is Hetzner Cloud?",
"stream": false
}'Ollama also supports an OpenAI-compatible endpoint at /v1/chat/completions, so
you can use it with existing OpenAI SDKs:
curl http://localhost:11434/v1/chat/completions -d '{
"model": "phi3:mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'Configure for Production
To make Ollama accessible from other machines, set the host to listen on all interfaces:
# Edit the systemd service
sudo systemctl edit ollama
# Add these lines:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
# Restart
sudo systemctl restart ollamaConfigure your firewall to allow access:
sudo ufw allow 22/tcp # SSH
sudo ufw allow 11434/tcp # Ollama API
sudo ufw enableFor HTTPS, set up a reverse proxy with nginx. Point your domain to the server IP,
install certbot for SSL, and proxy requests to localhost:11434.
The GPU Server Problem
The tutorial above works for small models on CPU servers. But for real production workloads — Llama 3 70B, Mixtral 8x7B, or any model larger than 7B parameters — you need GPU servers.
- Hetzner GPU servers are dedicated servers, not cloud instances — provisioning takes days, not minutes
- Monthly contracts required, no hourly billing
- You're responsible for CUDA drivers, cooling, hardware failures, and security updates
- Not available through the standard Hetzner Cloud purchase flow
- Significantly more expensive than serverless alternatives for inference
Skip the Infrastructure — Use SUPA Instead
Everything in this tutorial — server provisioning, Ollama setup, GPU management, model deployment, scaling — SUPA handles with a single API call.
Same Hetzner infrastructure in Germany. Same models. Zero ops work.
Frequently Asked Questions
Everything you need to know about Hetzner AI hosting and running LLMs with SUPA