Ollama UK 2026 Setup Guide: Local LLM in 30 Minutes

Complete Ollama setup guide UK 2026 - install on Mac mini M4 or Linux mini PC, pull Llama 3.3 8B, connect to Home Assistant Assist. 30-minute walkthrough.

Mini PC ready to run Ollama for local LLM inference
Updated How we review →
Rob
By Rob12 June 2026 · 8 min read

Ollama is the easiest way to run a local Large Language Model on your own hardware in 2026 - whether for Home Assistant voice control, automation drafting, or general AI workloads. This guide walks through the complete setup on the two most common UK host platforms: Mac mini M4 (the easiest path) and Linux mini PC (Beelink SER8, Minisforum HX99G, Geekom AX8 Pro). You should be running real conversation responses within 30 minutes of starting.

What is Ollama and why use it?

Ollama is an open-source LLM runtime that wraps the underlying inference engines (llama.cpp on most platforms, MLX on Apple Silicon) in a simple Docker-like CLI. You run ollama run llama3.3:8b and get an interactive chat; you run a single Python or HTTP call against localhost:11434 and get programmatic access. Models, downloads, quantisation, and GPU acceleration are all handled for you. Background on the underlying neural architecture is at the Wikipedia large language model page.

The reason you'd use it over OpenAI API + Anthropic Claude in 2026:

  • Privacy: Nothing leaves your network. Important for households handling medical, security-sensitive, or family-children data.
  • Cost: Zero ongoing API spend after hardware purchase. Hardware breaks even against cloud API in 12-18 months at typical Home Assistant usage rates.
  • Offline operation: Works without internet. Useful during UK power-grid wobbles and ISP outages.
  • Hackability: You can swap models freely (Llama 3.3, Qwen 3.5, Mistral, etc.), tune quantisation, run multiple models simultaneously.

Install Ollama (Mac mini M4)

  1. Download the macOS installer

    Visit ollama.com/download and grab the macOS .pkg. Run it; it installs the Ollama CLI plus a small menubar app that handles auto-start. Single click; no terminal needed for this step.

  2. Verify the install

    Open Terminal. Run `ollama --version`. You should see something like `ollama version 0.6.x`. If you don't, restart and re-check.

  3. Pull your first model

    Run `ollama pull llama3.3:8b`. The model downloads (~4.7GB) once and cached locally. Subsequent runs use the cached version.

  4. Test interactively

    Run `ollama run llama3.3:8b`. You're now in a chat interface; type a question and watch the response stream. Token rate should be 25-35 tokens/sec on a Mac mini M4 24GB. Press Ctrl+D to exit.

  5. Bind Ollama to all network interfaces

    Edit `~/Library/LaunchAgents/com.ollama.plist` and add the environment variable `OLLAMA_HOST=0.0.0.0`. Restart the Ollama menubar app. Ollama now accepts connections from other devices on your network (essential for Home Assistant connectivity).

Install Ollama (Linux mini PC)

  1. Install Ubuntu 24.04 LTS on the mini PC

    If you don't already have Ubuntu running, install it via USB. Pick the 'minimal install' for headless server use. Set up SSH access from your main machine so you don't need a permanent monitor + keyboard.

  2. Run the Ollama installer

    SSH into the mini PC. Run: `curl -fsSL https://ollama.com/install.sh | sh`. The installer detects your hardware (NVIDIA / AMD / CPU) and configures the right backend. Takes 2-3 minutes.

  3. Enable the Ollama service

    Run `sudo systemctl enable ollama` and `sudo systemctl start ollama`. Ollama is now running as a systemd service on boot.

  4. Configure network binding

    Edit `/etc/systemd/system/ollama.service.d/override.conf` (create it if it doesn't exist) and add `Environment="OLLAMA_HOST=0.0.0.0:11434"`. Restart: `sudo systemctl daemon-reload && sudo systemctl restart ollama`.

  5. Pull and test a model

    Run `ollama pull llama3.3:8b` then `ollama run llama3.3:8b` to verify. Token rate varies by hardware - Beelink SER8 ~15-25 tok/sec, Minisforum HX99G ~40-55 tok/sec, Geekom AX8 Pro ~12-22 tok/sec.

Connect Ollama to Home Assistant Assist

  1. Verify Ollama is reachable from HA

    From your HA host, run `curl http://:11434/api/tags`. You should get a JSON list of your pulled models. If you get connection refused, check the OLLAMA_HOST binding from the install step.

  2. Add the Ollama integration in HA

    Settings → Devices & Services → Add Integration → search 'Ollama'. Enter the URL `http://:11434`. Pick the model (llama3.3:8b).

  3. Tune the system prompt

    The default Ollama HA integration prompt is generic. Replace it with a Home Assistant Assist context: 'You are a smart home assistant for a UK household. You respond to device control requests using HA service calls. Be concise.' Save.

  4. Set Ollama as the Conversation backend in Assist

    Settings → Voice Assistants → choose your Assist pipeline. Set Conversation Agent to 'Ollama' from the dropdown. Save.

  5. Test the full pipeline

    Use Home Assistant's voice test or your HA Voice satellite. Say 'turn off the kitchen lights' - should work within 1-3s. Say 'draft an automation that turns the porch light on at sunset on weekdays' - should produce usable YAML in 5-10s.

Choosing the right model

Ollama's model library has hundreds of options. The practical picks for UK Home Assistant + general use in 2026:

  • Llama 3.3 8B (`llama3.3:8b`): The default recommendation. Strong general reasoning, fast inference on 16-24GB hardware, well-supported by the Ollama community.
  • Qwen 2.5 7B (`qwen2.5:7b`): Alternative with slightly stronger structured-output performance (good for HA automation YAML generation). Lighter on memory.
  • Llama 3.3 70B (`llama3.3:70b`): Frontier-class local model. Requires 40GB+ unified RAM (Mac mini M4 Pro 48GB or similar). Materially better reasoning than 8B; slower inference.
  • Mistral Nemo 12B (`mistral-nemo`): Middle ground between 8B and 70B. Good for tasks that need more reasoning than 8B but where 70B is overkill.
  • Codestral 22B (`codestral`): Code-specialised. Useful if you'll also use the LLM for software development assistance. 14-16GB RAM minimum.

Start with Llama 3.3 8B; switch to Qwen 2.5 7B if you find Llama struggling with structured outputs in your HA automations.

Frequently asked questions

Q01Can Ollama use my Nvidia GPU?
Yes - and it dramatically improves inference speed. The Linux installer auto-detects CUDA-capable GPUs and configures llama.cpp to use them. Mac mini users get Apple Silicon Metal acceleration automatically. AMD users on Linux need ROCm installed separately - the installer prompts you through it.
Q02How do I update Ollama and pulled models?

Ollama itself: macOS auto-updates via menubar app; Linux: curl -fsSL https://ollama.com/install.sh | sh re-runs and updates. Models: ollama pull llama3.3:8b re-runs and grabs the latest version. Old versions stay cached - clean them up with ollama rm <model:tag>.

Q03Will Ollama work if my mini PC reboots?
Yes if you've enabled the systemd service (Linux) or have the menubar app running (macOS). Auto-start is on by default after install. Models are persisted to disk so no re-download needed after reboot.
Q04Can I run multiple models simultaneously?
Yes - Ollama keeps multiple models in memory subject to RAM limits. A Mac mini M4 24GB can hold Llama 3.3 8B + Qwen 2.5 7B in memory simultaneously without paging. The Mac mini M4 Pro 48GB can hold Llama 3.3 70B alongside several smaller models.
Q05Is Ollama secure to expose to my home network?
It's not designed for internet exposure - there's no authentication built in by default. For local network use behind your home router it's safe. If you want remote access, run it behind a Tailscale or Cloudflare Tunnel; never expose the raw port 11434 to the public internet.
Q06What about Ollama on a Raspberry Pi or Synology NAS?
Pi 5: technically works but the 3-5 tokens/sec inference rate makes it unusable for voice interactions. Synology NAS: same story unless it's a powerful x86 model with 16GB+ RAM. Stick with dedicated mini PC hardware for HA-grade voice latency.

The bottom line

Ollama in 2026 is the most accessible path to running a serious local LLM for Home Assistant + smart home use. The setup is genuinely 30 minutes start-to-finish on a Mac mini M4; 45 minutes on a fresh Linux mini PC build. Llama 3.3 8B is the default model recommendation; switch to Qwen 2.5 7B if structured-output performance matters more than general reasoning.

Once running, the integration with Home Assistant Assist is straightforward: install the integration, point at your Ollama URL, tune the system prompt, set it as the Conversation backend. Voice interactions feel local-fast (sub-3-second end-to-end) and the privacy posture is materially better than any cloud LLM. For HA-first UK households committed to the privacy-first path, this is the setup to commit to.