ChatGPT Team costs $25 per user per month. For a team of 20, that is $6,000 per year — and your conversations, your company data, your proprietary code, all of it flows through OpenAI’s servers.
There is another way. Open WebUI + Ollama gives you a ChatGPT-like experience that runs entirely on your own hardware. Multi-user support, conversation history, model switching, file uploads, RAG, admin controls — all the features your team expects, with none of the data leaving your network.
This guide walks you through a production-ready deployment. Not a toy demo. A real setup with HTTPS, authentication, persistent storage, model management, and the operational considerations that matter when real people depend on it every day.
Architecture Overview
[Users' Browsers] --> [Nginx Reverse Proxy (HTTPS)]
|
[Open WebUI (Docker)]
|
[Ollama (Docker or Host)]
|
[GPU Server Hardware]
- Ollama runs the language models on GPU hardware
- Open WebUI provides the web interface, user management, and conversation storage
- Nginx handles HTTPS termination and proxying
- Everything runs on a single server (you can split Ollama to a separate GPU server for larger deployments)
Hardware Requirements
Your hardware needs depend on team size and expected usage:
| Team Size | Concurrent Users | Recommended GPU | RAM | Storage |
|---|---|---|---|---|
| 5-10 | 2-3 | RTX 3090 (24GB) | 32GB | 500GB SSD |
| 10-25 | 5-8 | RTX 4090 (24GB) | 64GB | 1TB SSD |
| 25-50 | 10-15 | 2x RTX 4090 or A6000 | 128GB | 1TB NVMe |
| 50+ | 15+ | Consider vLLM + multiple GPUs | 256GB+ | 2TB NVMe |
For this guide, we use a single server with an RTX 3090 or 4090, suitable for teams of 5-25 people.
Estimated hardware cost: $2,500-5,000 for a capable server (or repurpose an existing workstation with a good GPU).
Step 1: Server Setup
We will use Ubuntu 24.04 LTS. Start with a fresh install or an existing server.
Install Docker
# Install Docker Engine
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Install Docker Compose plugin
sudo apt install docker-compose-plugin
# Verify
docker --version
docker compose version
Install NVIDIA Container Toolkit
This lets Docker containers access the GPU:
# Add NVIDIA repo
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
Step 2: Docker Compose Configuration
Create a project directory:
mkdir -p /opt/ai-platform && cd /opt/ai-platform
Create docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "127.0.0.1:11434:11434" # Only expose to localhost
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OLLAMA_MAX_LOADED_MODELS=2 # Keep 2 models in VRAM
- OLLAMA_NUM_PARALLEL=4 # Handle 4 concurrent requests
- OLLAMA_MAX_QUEUE=20 # Queue up to 20 requests
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "127.0.0.1:3000:8080" # Only expose to localhost
volumes:
- open_webui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=true
- WEBUI_NAME=Team AI Assistant
- ENABLE_SIGNUP=false # Admin creates accounts
- DEFAULT_USER_ROLE=user
- ENABLE_RAG_WEB_SEARCH=false # Enable if you want web search
- RAG_EMBEDDING_MODEL=nomic-embed-text
depends_on:
- ollama
volumes:
ollama_data:
open_webui_data:
Key Configuration Decisions
OLLAMA_NUM_PARALLEL=4 — Ollama can handle multiple requests to the same model concurrently. Set this based on your GPU’s VRAM headroom. Each parallel request needs additional KV cache memory. With a 24GB GPU running a 32B model, 4 parallel requests is a safe maximum.
OLLAMA_MAX_LOADED_MODELS=2 — Keeps up to 2 models loaded in VRAM simultaneously. If your team uses different models for different tasks (e.g., a coding model and a general model), this avoids reload delays. Reduce to 1 if VRAM is tight.
ENABLE_SIGNUP=false — In a team setting, you want the admin to create accounts, not allow self-registration. The first user to sign up becomes the admin.
Step 3: HTTPS with Nginx and Let’s Encrypt
Install Nginx and Certbot:
sudo apt install nginx certbot python3-certbot-nginx
Create the Nginx site configuration:
# /etc/nginx/sites-available/ai-platform
server {
listen 80;
server_name ai.yourcompany.com; # Replace with your domain
# Redirect HTTP to HTTPS
location / {
return 301 https://$server_name$request_uri;
}
}
server {
listen 443 ssl http2;
server_name ai.yourcompany.com; # Replace with your domain
# SSL will be configured by Certbot
# Security headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Proxy to Open WebUI
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (needed for streaming responses)
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Increase timeouts for long-running AI responses
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 60s;
# Increase max body size for file uploads
client_max_body_size 50M;
}
}
Enable the site and get an SSL certificate:
sudo ln -s /etc/nginx/sites-available/ai-platform /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Get SSL certificate (your domain must point to this server)
sudo certbot --nginx -d ai.yourcompany.com
Step 4: Launch and Initialize
cd /opt/ai-platform
# Start the stack
docker compose up -d
# Watch logs
docker compose logs -f
# Wait for Ollama to be ready (check for "Listening on ...")
Pull Models
# Pull models your team will use
docker exec ollama ollama pull qwen3:32b # General purpose
docker exec ollama ollama pull deepseek-coder-v3:33b # Coding
docker exec ollama ollama pull phi4:14b # Fast responses
docker exec ollama ollama pull nomic-embed-text # Embeddings for RAG
Create the Admin Account
- Open
https://ai.yourcompany.comin your browser - The first user to register becomes the admin
- Register with your admin email and a strong password
- Go to Admin Panel > Settings and configure:
- Default model for new users
- Available models (hide models you do not want users to access)
- Rate limiting (if needed)
Create User Accounts
Since we disabled self-registration:
- Go to Admin Panel > Users
- Click “Add User”
- Enter the user’s name, email, and a temporary password
- The user can change their password on first login
Step 5: Model Configuration for Teams
Open WebUI lets you create model presets with custom system prompts. Set up presets for common team use cases:
General Assistant
Name: General Assistant
Model: qwen3:32b
System Prompt: You are a helpful assistant for the [Company Name] team.
Be professional, concise, and accurate. When you are unsure about something,
say so rather than guessing.
Code Assistant
Name: Code Helper
Model: deepseek-coder-v3:33b
System Prompt: You are a senior software engineer helping the development team.
Write clean, well-documented code. Follow our team's conventions:
[add your coding standards here]. Always explain your reasoning.
Quick Q&A
Name: Quick Answer
Model: phi4:14b
System Prompt: You are a fast, concise assistant. Give brief, direct answers.
Only elaborate if the user asks for more detail.
Step 6: RAG Setup for Company Documents
Open WebUI has built-in RAG support. This lets your team query company documents:
- Go to Admin Panel > Documents
- Upload your company documentation (PDFs, text files, markdown)
- Open WebUI will automatically chunk and embed the documents
- Users can reference documents in their conversations using the
#tag
For better RAG performance, organize documents into collections:
- Engineering Docs — Architecture docs, runbooks, API specs
- Product Docs — Feature specs, user guides, release notes
- HR & Policies — Company handbook, policies, benefits info
Users can then scope their queries to specific collections for more relevant results.
Cost Analysis: Self-Hosted vs ChatGPT Team
Let us compare the actual costs over 1 and 3 years for a team of 20:
ChatGPT Team
| Item | Monthly | Year 1 | Year 3 |
|---|---|---|---|
| Subscription (20 users x $25) | $500 | $6,000 | $18,000 |
| Total | $500 | $6,000 | $18,000 |
Self-Hosted (Open WebUI + Ollama)
| Item | One-Time | Monthly | Year 1 | Year 3 |
|---|---|---|---|---|
| Server hardware (RTX 4090 build) | $4,500 | — | $4,500 | $4,500 |
| Electricity (~500W avg, $0.14/kWh) | — | $50 | $600 | $1,800 |
| Admin time (2 hrs/month at $75/hr) | — | $150 | $1,800 | $5,400 |
| Total | $4,500 | $200 | $6,900 | $11,700 |
Break-even point: About 14 months.
After the hardware is paid off, the self-hosted option costs $200/month vs $500/month for ChatGPT Team. Over 3 years, you save about $6,300 — and that hardware has residual value.
What the Numbers Do Not Capture
- Data privacy: Your conversations and documents never leave your network. For companies handling sensitive data (legal, medical, financial, defense), this is not a nice-to-have — it is a requirement.
- No per-user scaling cost: Adding user 21 costs nothing. With ChatGPT Team, it costs $25/month.
- Customization: System prompts, model selection, RAG over your documents — you control everything.
- No vendor dependency: OpenAI can change pricing, features, or terms of service at any time. Your self-hosted setup is yours.
When ChatGPT Team Is the Better Choice
Being honest: ChatGPT Team is better if:
- Your team is very small (under 8 people) and the break-even period is too long
- Nobody on the team can administer a Linux server
- You need GPT-4o-class reasoning quality for most tasks (local models are close but not identical)
- You need integrated web browsing and current information
- You prioritize zero setup and maintenance
Operations: Keeping It Running
Backups
# Backup script — run nightly via cron
#!/bin/bash
BACKUP_DIR="/backup/ai-platform/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup Open WebUI data (conversations, users, settings)
docker run --rm \
-v open_webui_data:/data \
-v "$BACKUP_DIR":/backup \
alpine tar czf /backup/open-webui-data.tar.gz -C /data .
# Backup Ollama models (optional — can be re-pulled)
# docker run --rm \
# -v ollama_data:/data \
# -v "$BACKUP_DIR":/backup \
# alpine tar czf /backup/ollama-data.tar.gz -C /data .
echo "Backup completed: $BACKUP_DIR"
Add to cron:
echo "0 2 * * * /opt/ai-platform/backup.sh" | crontab -
Monitoring
Monitor these metrics:
- GPU utilization and VRAM usage:
nvidia-smior a Prometheus exporter - Docker container health:
docker compose ps - Disk usage: Models are large; monitor
/var/lib/docker/volumes - Response latency: Open WebUI logs response times
A simple health check script:
#!/bin/bash
# health-check.sh
# Check Ollama
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
echo "ALERT: Ollama is not responding"
# Send alert (email, Slack webhook, etc.)
fi
# Check Open WebUI
if ! curl -sf http://localhost:3000 > /dev/null; then
echo "ALERT: Open WebUI is not responding"
fi
# Check GPU
GPU_TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
if [ "$GPU_TEMP" -gt 85 ]; then
echo "ALERT: GPU temperature is ${GPU_TEMP}°C"
fi
Updates
# Update Open WebUI and Ollama (monthly recommended)
cd /opt/ai-platform
docker compose pull
docker compose up -d
# Update models
docker exec ollama ollama pull qwen3:32b
docker exec ollama ollama pull deepseek-coder-v3:33b
Scaling: When One Server Is Not Enough
If your team outgrows a single GPU:
- Add a second GPU to the same server. Ollama supports multi-GPU with tensor parallelism.
- Separate Ollama to a dedicated GPU server. Change
OLLAMA_BASE_URLin Open WebUI to point to the remote Ollama instance. - Switch to vLLM for the backend. vLLM’s continuous batching handles higher concurrency than Ollama. Open WebUI supports vLLM as a backend via the OpenAI-compatible API.
- For 50+ users, consider a proper orchestration setup with Kubernetes, multiple GPU nodes, and a load balancer.
Security Hardening
For production team deployments:
- Firewall: Only expose ports 80 and 443 externally. Ollama (11434) and Open WebUI (3000) should be localhost-only.
- VPN: For extra security, put the AI platform behind your company VPN so it is not accessible from the public internet.
- Updates: Keep the host OS, Docker, and NVIDIA drivers updated for security patches.
- Audit logs: Open WebUI logs user activity. Review periodically.
- Strong passwords: Enforce password complexity in Open WebUI admin settings.
- Backup encryption: Encrypt backups if they contain sensitive conversation data.
The Launch Email
When you are ready to launch, here is a template for introducing the platform to your team:
Subject: New AI Assistant Available — Private & Unlimited
Team,
We have deployed a self-hosted AI assistant at https://ai.yourcompany.com. It is similar to ChatGPT but runs entirely on our own servers — your conversations never leave our network.
Getting started: Log in with the credentials sent to your email. Change your password on first login.
Available assistants: General Assistant (everyday tasks), Code Helper (programming), Quick Answer (fast lookups).
Key differences from ChatGPT: Unlimited usage. Complete privacy. Works with company documents. No per-query cost.
Limitations to know about: The models are very capable but not identical to GPT-4o. For the hardest reasoning tasks, you may notice a quality difference. Report any issues to [admin].
Welcome aboard.
Running this setup? We would love to hear about your deployment. Share your experience in our community.