Self-Hosted ChatGPT for Your Team: Open WebUI + Ollama Deployment Guide

Deploy a multi-user ChatGPT alternative for your team using Open WebUI and Ollama. Complete guide covering Docker, HTTPS, authentication, model management, and cost analysis vs ChatGPT Team.

ChatGPT Team costs $25 per user per month. For a team of 20, that is $6,000 per year — and your conversations, your company data, your proprietary code, all of it flows through OpenAI’s servers.

There is another way. Open WebUI + Ollama gives you a ChatGPT-like experience that runs entirely on your own hardware. Multi-user support, conversation history, model switching, file uploads, RAG, admin controls — all the features your team expects, with none of the data leaving your network.

This guide walks you through a production-ready deployment. Not a toy demo. A real setup with HTTPS, authentication, persistent storage, model management, and the operational considerations that matter when real people depend on it every day.

Architecture Overview

[Users' Browsers] --> [Nginx Reverse Proxy (HTTPS)]
                          |
                    [Open WebUI (Docker)]
                          |
                    [Ollama (Docker or Host)]
                          |
                    [GPU Server Hardware]
  • Ollama runs the language models on GPU hardware
  • Open WebUI provides the web interface, user management, and conversation storage
  • Nginx handles HTTPS termination and proxying
  • Everything runs on a single server (you can split Ollama to a separate GPU server for larger deployments)

Hardware Requirements

Your hardware needs depend on team size and expected usage:

Team SizeConcurrent UsersRecommended GPURAMStorage
5-102-3RTX 3090 (24GB)32GB500GB SSD
10-255-8RTX 4090 (24GB)64GB1TB SSD
25-5010-152x RTX 4090 or A6000128GB1TB NVMe
50+15+Consider vLLM + multiple GPUs256GB+2TB NVMe

For this guide, we use a single server with an RTX 3090 or 4090, suitable for teams of 5-25 people.

Estimated hardware cost: $2,500-5,000 for a capable server (or repurpose an existing workstation with a good GPU).

Step 1: Server Setup

We will use Ubuntu 24.04 LTS. Start with a fresh install or an existing server.

Install Docker

# Install Docker Engine
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Install Docker Compose plugin
sudo apt install docker-compose-plugin

# Verify
docker --version
docker compose version

Install NVIDIA Container Toolkit

This lets Docker containers access the GPU:

# Add NVIDIA repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

Step 2: Docker Compose Configuration

Create a project directory:

mkdir -p /opt/ai-platform && cd /opt/ai-platform

Create docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"  # Only expose to localhost
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_MAX_LOADED_MODELS=2    # Keep 2 models in VRAM
      - OLLAMA_NUM_PARALLEL=4          # Handle 4 concurrent requests
      - OLLAMA_MAX_QUEUE=20            # Queue up to 20 requests

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:8080"  # Only expose to localhost
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
      - WEBUI_NAME=Team AI Assistant
      - ENABLE_SIGNUP=false             # Admin creates accounts
      - DEFAULT_USER_ROLE=user
      - ENABLE_RAG_WEB_SEARCH=false     # Enable if you want web search
      - RAG_EMBEDDING_MODEL=nomic-embed-text
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

Key Configuration Decisions

OLLAMA_NUM_PARALLEL=4 — Ollama can handle multiple requests to the same model concurrently. Set this based on your GPU’s VRAM headroom. Each parallel request needs additional KV cache memory. With a 24GB GPU running a 32B model, 4 parallel requests is a safe maximum.

OLLAMA_MAX_LOADED_MODELS=2 — Keeps up to 2 models loaded in VRAM simultaneously. If your team uses different models for different tasks (e.g., a coding model and a general model), this avoids reload delays. Reduce to 1 if VRAM is tight.

ENABLE_SIGNUP=false — In a team setting, you want the admin to create accounts, not allow self-registration. The first user to sign up becomes the admin.

Step 3: HTTPS with Nginx and Let’s Encrypt

Install Nginx and Certbot:

sudo apt install nginx certbot python3-certbot-nginx

Create the Nginx site configuration:

# /etc/nginx/sites-available/ai-platform
server {
    listen 80;
    server_name ai.yourcompany.com;  # Replace with your domain

    # Redirect HTTP to HTTPS
    location / {
        return 301 https://$server_name$request_uri;
    }
}

server {
    listen 443 ssl http2;
    server_name ai.yourcompany.com;  # Replace with your domain

    # SSL will be configured by Certbot

    # Security headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Proxy to Open WebUI
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (needed for streaming responses)
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Increase timeouts for long-running AI responses
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_connect_timeout 60s;

        # Increase max body size for file uploads
        client_max_body_size 50M;
    }
}

Enable the site and get an SSL certificate:

sudo ln -s /etc/nginx/sites-available/ai-platform /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Get SSL certificate (your domain must point to this server)
sudo certbot --nginx -d ai.yourcompany.com

Step 4: Launch and Initialize

cd /opt/ai-platform

# Start the stack
docker compose up -d

# Watch logs
docker compose logs -f

# Wait for Ollama to be ready (check for "Listening on ...")

Pull Models

# Pull models your team will use
docker exec ollama ollama pull qwen3:32b       # General purpose
docker exec ollama ollama pull deepseek-coder-v3:33b  # Coding
docker exec ollama ollama pull phi4:14b        # Fast responses
docker exec ollama ollama pull nomic-embed-text # Embeddings for RAG

Create the Admin Account

  1. Open https://ai.yourcompany.com in your browser
  2. The first user to register becomes the admin
  3. Register with your admin email and a strong password
  4. Go to Admin Panel > Settings and configure:
    • Default model for new users
    • Available models (hide models you do not want users to access)
    • Rate limiting (if needed)

Create User Accounts

Since we disabled self-registration:

  1. Go to Admin Panel > Users
  2. Click “Add User”
  3. Enter the user’s name, email, and a temporary password
  4. The user can change their password on first login

Step 5: Model Configuration for Teams

Open WebUI lets you create model presets with custom system prompts. Set up presets for common team use cases:

General Assistant

Name: General Assistant
Model: qwen3:32b
System Prompt: You are a helpful assistant for the [Company Name] team. 
Be professional, concise, and accurate. When you are unsure about something, 
say so rather than guessing.

Code Assistant

Name: Code Helper
Model: deepseek-coder-v3:33b
System Prompt: You are a senior software engineer helping the development team. 
Write clean, well-documented code. Follow our team's conventions: 
[add your coding standards here]. Always explain your reasoning.

Quick Q&A

Name: Quick Answer
Model: phi4:14b
System Prompt: You are a fast, concise assistant. Give brief, direct answers. 
Only elaborate if the user asks for more detail.

Step 6: RAG Setup for Company Documents

Open WebUI has built-in RAG support. This lets your team query company documents:

  1. Go to Admin Panel > Documents
  2. Upload your company documentation (PDFs, text files, markdown)
  3. Open WebUI will automatically chunk and embed the documents
  4. Users can reference documents in their conversations using the # tag

For better RAG performance, organize documents into collections:

  • Engineering Docs — Architecture docs, runbooks, API specs
  • Product Docs — Feature specs, user guides, release notes
  • HR & Policies — Company handbook, policies, benefits info

Users can then scope their queries to specific collections for more relevant results.

Cost Analysis: Self-Hosted vs ChatGPT Team

Let us compare the actual costs over 1 and 3 years for a team of 20:

ChatGPT Team

ItemMonthlyYear 1Year 3
Subscription (20 users x $25)$500$6,000$18,000
Total$500$6,000$18,000

Self-Hosted (Open WebUI + Ollama)

ItemOne-TimeMonthlyYear 1Year 3
Server hardware (RTX 4090 build)$4,500$4,500$4,500
Electricity (~500W avg, $0.14/kWh)$50$600$1,800
Admin time (2 hrs/month at $75/hr)$150$1,800$5,400
Total$4,500$200$6,900$11,700

Break-even point: About 14 months.

After the hardware is paid off, the self-hosted option costs $200/month vs $500/month for ChatGPT Team. Over 3 years, you save about $6,300 — and that hardware has residual value.

What the Numbers Do Not Capture

  • Data privacy: Your conversations and documents never leave your network. For companies handling sensitive data (legal, medical, financial, defense), this is not a nice-to-have — it is a requirement.
  • No per-user scaling cost: Adding user 21 costs nothing. With ChatGPT Team, it costs $25/month.
  • Customization: System prompts, model selection, RAG over your documents — you control everything.
  • No vendor dependency: OpenAI can change pricing, features, or terms of service at any time. Your self-hosted setup is yours.

When ChatGPT Team Is the Better Choice

Being honest: ChatGPT Team is better if:

  • Your team is very small (under 8 people) and the break-even period is too long
  • Nobody on the team can administer a Linux server
  • You need GPT-4o-class reasoning quality for most tasks (local models are close but not identical)
  • You need integrated web browsing and current information
  • You prioritize zero setup and maintenance

Operations: Keeping It Running

Backups

# Backup script — run nightly via cron
#!/bin/bash
BACKUP_DIR="/backup/ai-platform/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup Open WebUI data (conversations, users, settings)
docker run --rm \
  -v open_webui_data:/data \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/open-webui-data.tar.gz -C /data .

# Backup Ollama models (optional — can be re-pulled)
# docker run --rm \
#   -v ollama_data:/data \
#   -v "$BACKUP_DIR":/backup \
#   alpine tar czf /backup/ollama-data.tar.gz -C /data .

echo "Backup completed: $BACKUP_DIR"

Add to cron:

echo "0 2 * * * /opt/ai-platform/backup.sh" | crontab -

Monitoring

Monitor these metrics:

  • GPU utilization and VRAM usage: nvidia-smi or a Prometheus exporter
  • Docker container health: docker compose ps
  • Disk usage: Models are large; monitor /var/lib/docker/volumes
  • Response latency: Open WebUI logs response times

A simple health check script:

#!/bin/bash
# health-check.sh

# Check Ollama
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
  echo "ALERT: Ollama is not responding"
  # Send alert (email, Slack webhook, etc.)
fi

# Check Open WebUI
if ! curl -sf http://localhost:3000 > /dev/null; then
  echo "ALERT: Open WebUI is not responding"
fi

# Check GPU
GPU_TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
if [ "$GPU_TEMP" -gt 85 ]; then
  echo "ALERT: GPU temperature is ${GPU_TEMP}°C"
fi

Updates

# Update Open WebUI and Ollama (monthly recommended)
cd /opt/ai-platform
docker compose pull
docker compose up -d

# Update models
docker exec ollama ollama pull qwen3:32b
docker exec ollama ollama pull deepseek-coder-v3:33b

Scaling: When One Server Is Not Enough

If your team outgrows a single GPU:

  1. Add a second GPU to the same server. Ollama supports multi-GPU with tensor parallelism.
  2. Separate Ollama to a dedicated GPU server. Change OLLAMA_BASE_URL in Open WebUI to point to the remote Ollama instance.
  3. Switch to vLLM for the backend. vLLM’s continuous batching handles higher concurrency than Ollama. Open WebUI supports vLLM as a backend via the OpenAI-compatible API.
  4. For 50+ users, consider a proper orchestration setup with Kubernetes, multiple GPU nodes, and a load balancer.

Security Hardening

For production team deployments:

  1. Firewall: Only expose ports 80 and 443 externally. Ollama (11434) and Open WebUI (3000) should be localhost-only.
  2. VPN: For extra security, put the AI platform behind your company VPN so it is not accessible from the public internet.
  3. Updates: Keep the host OS, Docker, and NVIDIA drivers updated for security patches.
  4. Audit logs: Open WebUI logs user activity. Review periodically.
  5. Strong passwords: Enforce password complexity in Open WebUI admin settings.
  6. Backup encryption: Encrypt backups if they contain sensitive conversation data.

The Launch Email

When you are ready to launch, here is a template for introducing the platform to your team:

Subject: New AI Assistant Available — Private & Unlimited

Team,

We have deployed a self-hosted AI assistant at https://ai.yourcompany.com. It is similar to ChatGPT but runs entirely on our own servers — your conversations never leave our network.

Getting started: Log in with the credentials sent to your email. Change your password on first login.

Available assistants: General Assistant (everyday tasks), Code Helper (programming), Quick Answer (fast lookups).

Key differences from ChatGPT: Unlimited usage. Complete privacy. Works with company documents. No per-query cost.

Limitations to know about: The models are very capable but not identical to GPT-4o. For the hardest reasoning tasks, you may notice a quality difference. Report any issues to [admin].

Welcome aboard.

Running this setup? We would love to hear about your deployment. Share your experience in our community.