Self-Hosted ChatGPT for Your Team: Open WebUI + Ollama Deployment Guide

ChatGPT Team costs $25 per user per month. For a team of 20, that is $6,000 per year — and your conversations, your company data, your proprietary code, all of it flows through OpenAI’s servers.

There is another way. Open WebUI + Ollama gives you a ChatGPT-like experience that runs entirely on your own hardware. Multi-user support, conversation history, model switching, file uploads, RAG, admin controls — all the features your team expects, with none of the data leaving your network.

This guide walks you through a production-ready deployment. Not a toy demo. A real setup with HTTPS, authentication, persistent storage, model management, and the operational considerations that matter when real people depend on it every day.

Architecture Overview

[Users' Browsers] --> [Nginx Reverse Proxy (HTTPS)]
                          |
                    [Open WebUI (Docker)]
                          |
                    [Ollama (Docker or Host)]
                          |
                    [GPU Server Hardware]

Ollama runs the language models on GPU hardware
Open WebUI provides the web interface, user management, and conversation storage
Nginx handles HTTPS termination and proxying
Everything runs on a single server (you can split Ollama to a separate GPU server for larger deployments)

Hardware Requirements

Your hardware needs depend on team size and expected usage:

Team Size	Concurrent Users	Recommended GPU	RAM	Storage
5-10	2-3	RTX 3090 (24GB)	32GB	500GB SSD
10-25	5-8	RTX 4090 (24GB)	64GB	1TB SSD
25-50	10-15	2x RTX 4090 or A6000	128GB	1TB NVMe
50+	15+	Consider vLLM + multiple GPUs	256GB+	2TB NVMe

For this guide, we use a single server with an RTX 3090 or 4090, suitable for teams of 5-25 people.

Estimated hardware cost: $2,500-5,000 for a capable server (or repurpose an existing workstation with a good GPU).

Step 1: Server Setup

We will use Ubuntu 24.04 LTS. Start with a fresh install or an existing server.

Install Docker

# Install Docker Engine
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Install Docker Compose plugin
sudo apt install docker-compose-plugin

# Verify
docker --version
docker compose version

Install NVIDIA Container Toolkit

This lets Docker containers access the GPU:

# Add NVIDIA repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

Step 2: Docker Compose Configuration

Create a project directory:

mkdir -p /opt/ai-platform && cd /opt/ai-platform

Create docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"  # Only expose to localhost
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_MAX_LOADED_MODELS=2    # Keep 2 models in VRAM
      - OLLAMA_NUM_PARALLEL=4          # Handle 4 concurrent requests
      - OLLAMA_MAX_QUEUE=20            # Queue up to 20 requests

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:8080"  # Only expose to localhost
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
      - WEBUI_NAME=Team AI Assistant
      - ENABLE_SIGNUP=false             # Admin creates accounts
      - DEFAULT_USER_ROLE=user
      - ENABLE_RAG_WEB_SEARCH=false     # Enable if you want web search
      - RAG_EMBEDDING_MODEL=nomic-embed-text
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

Key Configuration Decisions

OLLAMA_NUM_PARALLEL=4 — Ollama can handle multiple requests to the same model concurrently. Set this based on your GPU’s VRAM headroom. Each parallel request needs additional KV cache memory. With a 24GB GPU running a 32B model, 4 parallel requests is a safe maximum.

OLLAMA_MAX_LOADED_MODELS=2 — Keeps up to 2 models loaded in VRAM simultaneously. If your team uses different models for different tasks (e.g., a coding model and a general model), this avoids reload delays. Reduce to 1 if VRAM is tight.

ENABLE_SIGNUP=false — In a team setting, you want the admin to create accounts, not allow self-registration. The first user to sign up becomes the admin.

Step 3: HTTPS with Nginx and Let’s Encrypt

Install Nginx and Certbot:

sudo apt install nginx certbot python3-certbot-nginx

Create the Nginx site configuration:

# /etc/nginx/sites-available/ai-platform
server {
    listen 80;
    server_name ai.yourcompany.com;  # Replace with your domain

    # Redirect HTTP to HTTPS
    location / {
        return 301 https://$server_name$request_uri;
    }
}

server {
    listen 443 ssl http2;
    server_name ai.yourcompany.com;  # Replace with your domain

    # SSL will be configured by Certbot

    # Security headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Proxy to Open WebUI
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (needed for streaming responses)
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Increase timeouts for long-running AI responses
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_connect_timeout 60s;

        # Increase max body size for file uploads
        client_max_body_size 50M;
    }
}

Enable the site and get an SSL certificate:

sudo ln -s /etc/nginx/sites-available/ai-platform /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Get SSL certificate (your domain must point to this server)
sudo certbot --nginx -d ai.yourcompany.com

Step 4: Launch and Initialize

cd /opt/ai-platform

# Start the stack
docker compose up -d

# Watch logs
docker compose logs -f

# Wait for Ollama to be ready (check for "Listening on ...")

Pull Models

# Pull models your team will use
docker exec ollama ollama pull qwen3:32b       # General purpose
docker exec ollama ollama pull deepseek-coder-v3:33b  # Coding
docker exec ollama ollama pull phi4:14b        # Fast responses
docker exec ollama ollama pull nomic-embed-text # Embeddings for RAG

Create the Admin Account

Open https://ai.yourcompany.com in your browser
The first user to register becomes the admin
Register with your admin email and a strong password
Go to Admin Panel > Settings and configure:
- Default model for new users
- Available models (hide models you do not want users to access)
- Rate limiting (if needed)

Create User Accounts

Since we disabled self-registration:

Go to Admin Panel > Users
Click “Add User”
Enter the user’s name, email, and a temporary password
The user can change their password on first login

Step 5: Model Configuration for Teams

Open WebUI lets you create model presets with custom system prompts. Set up presets for common team use cases:

General Assistant

Name: General Assistant
Model: qwen3:32b
System Prompt: You are a helpful assistant for the [Company Name] team. 
Be professional, concise, and accurate. When you are unsure about something, 
say so rather than guessing.

Code Assistant

Name: Code Helper
Model: deepseek-coder-v3:33b
System Prompt: You are a senior software engineer helping the development team. 
Write clean, well-documented code. Follow our team's conventions: 
[add your coding standards here]. Always explain your reasoning.

Quick Q&A

Name: Quick Answer
Model: phi4:14b
System Prompt: You are a fast, concise assistant. Give brief, direct answers. 
Only elaborate if the user asks for more detail.

Step 6: RAG Setup for Company Documents

Open WebUI has built-in RAG support. This lets your team query company documents:

Go to Admin Panel > Documents
Upload your company documentation (PDFs, text files, markdown)
Open WebUI will automatically chunk and embed the documents
Users can reference documents in their conversations using the # tag

For better RAG performance, organize documents into collections:

Engineering Docs — Architecture docs, runbooks, API specs
Product Docs — Feature specs, user guides, release notes
HR & Policies — Company handbook, policies, benefits info

Users can then scope their queries to specific collections for more relevant results.

Cost Analysis: Self-Hosted vs ChatGPT Team

Let us compare the actual costs over 1 and 3 years for a team of 20:

ChatGPT Team

Item	Monthly	Year 1	Year 3
Subscription (20 users x $25)	$500	$6,000	$18,000
Total	$500	$6,000	$18,000

Self-Hosted (Open WebUI + Ollama)

Item	One-Time	Monthly	Year 1	Year 3
Server hardware (RTX 4090 build)	$4,500	—	$4,500	$4,500
Electricity (~500W avg, $0.14/kWh)	—	$50	$600	$1,800
Admin time (2 hrs/month at $75/hr)	—	$150	$1,800	$5,400
Total	$4,500	$200	$6,900	$11,700

Break-even point: About 14 months.

After the hardware is paid off, the self-hosted option costs $200/month vs $500/month for ChatGPT Team. Over 3 years, you save about $6,300 — and that hardware has residual value.

What the Numbers Do Not Capture

Data privacy: Your conversations and documents never leave your network. For companies handling sensitive data (legal, medical, financial, defense), this is not a nice-to-have — it is a requirement.
No per-user scaling cost: Adding user 21 costs nothing. With ChatGPT Team, it costs $25/month.
Customization: System prompts, model selection, RAG over your documents — you control everything.
No vendor dependency: OpenAI can change pricing, features, or terms of service at any time. Your self-hosted setup is yours.

When ChatGPT Team Is the Better Choice

Being honest: ChatGPT Team is better if:

Your team is very small (under 8 people) and the break-even period is too long
Nobody on the team can administer a Linux server
You need GPT-4o-class reasoning quality for most tasks (local models are close but not identical)
You need integrated web browsing and current information
You prioritize zero setup and maintenance

Operations: Keeping It Running

Backups

# Backup script — run nightly via cron
#!/bin/bash
BACKUP_DIR="/backup/ai-platform/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup Open WebUI data (conversations, users, settings)
docker run --rm \
  -v open_webui_data:/data \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/open-webui-data.tar.gz -C /data .

# Backup Ollama models (optional — can be re-pulled)
# docker run --rm \
#   -v ollama_data:/data \
#   -v "$BACKUP_DIR":/backup \
#   alpine tar czf /backup/ollama-data.tar.gz -C /data .

echo "Backup completed: $BACKUP_DIR"

Add to cron:

echo "0 2 * * * /opt/ai-platform/backup.sh" | crontab -

Monitoring

Monitor these metrics:

GPU utilization and VRAM usage: nvidia-smi or a Prometheus exporter
Docker container health: docker compose ps
Disk usage: Models are large; monitor /var/lib/docker/volumes
Response latency: Open WebUI logs response times

A simple health check script:

#!/bin/bash
# health-check.sh

# Check Ollama
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
  echo "ALERT: Ollama is not responding"
  # Send alert (email, Slack webhook, etc.)
fi

# Check Open WebUI
if ! curl -sf http://localhost:3000 > /dev/null; then
  echo "ALERT: Open WebUI is not responding"
fi

# Check GPU
GPU_TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
if [ "$GPU_TEMP" -gt 85 ]; then
  echo "ALERT: GPU temperature is ${GPU_TEMP}°C"
fi

Updates

# Update Open WebUI and Ollama (monthly recommended)
cd /opt/ai-platform
docker compose pull
docker compose up -d

# Update models
docker exec ollama ollama pull qwen3:32b
docker exec ollama ollama pull deepseek-coder-v3:33b

Scaling: When One Server Is Not Enough

If your team outgrows a single GPU:

Add a second GPU to the same server. Ollama supports multi-GPU with tensor parallelism.
Separate Ollama to a dedicated GPU server. Change OLLAMA_BASE_URL in Open WebUI to point to the remote Ollama instance.
Switch to vLLM for the backend. vLLM’s continuous batching handles higher concurrency than Ollama. Open WebUI supports vLLM as a backend via the OpenAI-compatible API.
For 50+ users, consider a proper orchestration setup with Kubernetes, multiple GPU nodes, and a load balancer.

Security Hardening

For production team deployments:

Firewall: Only expose ports 80 and 443 externally. Ollama (11434) and Open WebUI (3000) should be localhost-only.
VPN: For extra security, put the AI platform behind your company VPN so it is not accessible from the public internet.
Updates: Keep the host OS, Docker, and NVIDIA drivers updated for security patches.
Audit logs: Open WebUI logs user activity. Review periodically.
Strong passwords: Enforce password complexity in Open WebUI admin settings.
Backup encryption: Encrypt backups if they contain sensitive conversation data.

The Launch Email

When you are ready to launch, here is a template for introducing the platform to your team:

Subject: New AI Assistant Available — Private & Unlimited

Team,

We have deployed a self-hosted AI assistant at https://ai.yourcompany.com. It is similar to ChatGPT but runs entirely on our own servers — your conversations never leave our network.

Getting started: Log in with the credentials sent to your email. Change your password on first login.

Available assistants: General Assistant (everyday tasks), Code Helper (programming), Quick Answer (fast lookups).

Key differences from ChatGPT: Unlimited usage. Complete privacy. Works with company documents. No per-query cost.

Limitations to know about: The models are very capable but not identical to GPT-4o. For the hardest reasoning tasks, you may notice a quality difference. Report any issues to [admin].

Welcome aboard.

Running this setup? We would love to hear about your deployment. Share your experience in our community.