Powered by Hetzner Infrastructure

Hetzner AI Hosting.
GPU Cloud Instances, Instant Availability.

Deploy AI models on Hetzner AI servers without managing GPU infrastructure. SUPA gives you a GPU-accelerated cloud API with serverless LLM hosting on German datacenters — same Hetzner hardware, zero ops.

Same Hetzner Hardware. Zero Hassle.

Same Hetzner GEX44 hardware
Same German datacenters
Serverless API — no GPUs to manage
Pay-per-token pricing
Instant availability
German data residency guaranteed

Step-by-Step Guide

How to Install Ollama on a Hetzner Server

A complete walkthrough for self-hosting LLMs on Hetzner Cloud with Ollama — from server selection to production configuration.

1

Choose Your Hetzner Cloud Server

A CPU-only Hetzner Cloud server is an affordable starting point for experimenting with small language models. The CX32 (8 GB RAM, ~$7/mo) or CX42 (16 GB RAM, ~$14/mo) are good choices.

Important: CPU-only servers can only run small models — Phi-3 Mini (~3.8B params), Gemma 2B, TinyLlama 1.1B, or Qwen2 0.5B. For anything larger (Llama 3, Mixtral), you need GPU hardware.
ServerRAMMax Model SizeExample Models
CX22 (4 GB)4 GB~1-2B paramsTinyLlama 1.1B, Qwen2 0.5B
CX32 (8 GB)8 GB~3-4B paramsPhi-3 Mini, Gemma 2B
CX42 (16 GB)16 GB~7B params (slow)Llama 3.2 3B, Mistral 7B (slow)

See the full list of models you can run with SUPA (no server management required) on our models page.

2

Create Your Server & Connect via SSH

  1. Log in to the Hetzner Cloud Console
  2. Create a new project (or use an existing one)
  3. Click "Add Server" and select Ubuntu 24.04
  4. Choose your server type (CX32 recommended to start)
  5. Add your SSH key and create the server

Once your server is ready, connect via SSH:

ssh root@YOUR_SERVER_IP

If you haven't set up SSH keys, Hetzner will email you a root password. SSH keys are recommended for security.

3

Install Ollama

Ollama provides a one-line installer that sets up the binary and a systemd service:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

This installs the ollama binary and registers a systemd service that starts automatically on boot.

4

Pull and Run Your First Model

Download and start chatting with a model:

# Download the model
ollama pull phi3:mini

# Start an interactive chat
ollama run phi3:mini

You'll see an interactive prompt where you can type messages. Type /bye to exit.

To see all downloaded models:

ollama list
CPU performance expectations: On a CX32, Phi-3 Mini generates ~5-10 tokens/second. Usable for testing and development, but not fast enough for production. Larger models (7B+) will be very slow or may not load at all on CPU.
5

Use Ollama as an API

Ollama runs a REST API on port 11434 by default. You can send requests to it programmatically:

curl http://localhost:11434/api/generate -d '{
  "model": "phi3:mini",
  "prompt": "What is Hetzner Cloud?",
  "stream": false
}'

Ollama also supports an OpenAI-compatible endpoint at /v1/chat/completions, so you can use it with existing OpenAI SDKs:

curl http://localhost:11434/v1/chat/completions -d '{
  "model": "phi3:mini",
  "messages": [{"role": "user", "content": "Hello!"}]
}'
6

Configure for Production

To make Ollama accessible from other machines, set the host to listen on all interfaces:

# Edit the systemd service
sudo systemctl edit ollama

# Add these lines:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Restart
sudo systemctl restart ollama

Configure your firewall to allow access:

sudo ufw allow 22/tcp    # SSH
sudo ufw allow 11434/tcp # Ollama API
sudo ufw enable

For HTTPS, set up a reverse proxy with nginx. Point your domain to the server IP, install certbot for SSL, and proxy requests to localhost:11434.

The GPU Server Problem

The tutorial above works for small models on CPU servers. But for real production workloads — Llama 3 70B, Mixtral 8x7B, or any model larger than 7B parameters — you need GPU servers.

  • Hetzner GPU servers are dedicated servers, not cloud instances — provisioning takes days, not minutes
  • Monthly contracts required, no hourly billing
  • You're responsible for CUDA drivers, cooling, hardware failures, and security updates
  • Not available through the standard Hetzner Cloud purchase flow
  • Significantly more expensive than serverless alternatives for inference

Skip the Infrastructure — Use SUPA Instead

Everything in this tutorial — server provisioning, Ollama setup, GPU management, model deployment, scaling — SUPA handles with a single API call.

Same Hetzner infrastructure in Germany. Same models. Zero ops work.

Frequently Asked Questions

Everything you need to know about Hetzner AI hosting and running LLMs with SUPA