LLM Inference with Open WebUI

This guide walks through pulling a local language model with Ollama and chatting with it via Open WebUI — entirely on your own hardware, with no data leaving your container.

Prerequisites

A running Open Laboratory container (GPU recommended, CPU works for smaller models)
Your desktop open at https://{your-slug}.tunnels.laboratory.computer

If you haven’t started a container yet, see the Quick Start.

Step 1: Install Open WebUI

From the Open Laboratory desktop, open the Apps panel and find Open WebUI. Click Install.

Open WebUI includes Ollama — there’s nothing else to install separately. Setup takes 2–4 minutes.

Once installed, click Launch. Open WebUI opens in a new browser tab at:

https://{your-slug}--openwebui.tunnels.laboratory.computer

Step 2: Download a Model

Switch back to your desktop and open the Models panel. Filter by LLM to see the available language models.

Recommended starting models:

Model	Size	Good for
Llama 3.1 8B (Q4_K_M)	~5 GB	General purpose, great quality
Llama 3.2 3B (Q4_K_M)	~2 GB	Faster, good for CPU or low VRAM
Mistral 7B (Q4_K_M)	~4.1 GB	Strong reasoning, fast
Qwen 2.5 14B (Q4_K_M)	~9 GB	High quality, needs 16 GB VRAM
Phi-3 Mini (Q4_K_M)	~2.3 GB	Very fast, good at coding tasks

Click Download on your chosen model. It downloads directly into the container at /data/models/llm/.

Step 3: Start Chatting

Switch back to the Open WebUI tab. In the model selector at the top of the page, choose the model you downloaded.

Type a message in the chat input and press Enter. Your model starts generating a response, streamed token by token.

Step 4: Set a System Prompt (Optional)

System prompts let you give the model persistent instructions — a persona, a writing style, domain focus, etc.

Click the Settings icon → System Prompt and enter your instructions:

You are a helpful assistant specializing in Python and machine learning.
Keep responses concise and include code examples where relevant.

This prompt applies to all new conversations.

Tips

Model not appearing in the selector? Refresh the Open WebUI page — it re-queries Ollama on load. If it still doesn’t appear, the download may still be in progress; check the Models panel.

Slow generation on GPU? Make sure no other apps are consuming GPU memory. Stop any image generation apps from the Apps panel to free VRAM.

Slow generation on CPU? Use a smaller, more aggressively quantized model. Llama 3.2 3B Q4_K_M is the best starting point for CPU inference.

Multiple models: You can download several models and switch between them in the Open WebUI model selector mid-conversation. Each model loads into memory when selected and is evicted when you switch to another.

Conversation history: Open WebUI saves all your chat history locally inside the container. Restart the container and your conversations are still there.

Context length: By default, Ollama uses a 2048-token context window. For longer conversations or document analysis, increase this in Open WebUI under Model Settings → Context Length. Higher values use more VRAM.