Windows is the most popular desktop operating system, and it runs local LLMs well thanks to native support from Ollama, LM Studio, and other tools. This guide walks you through the complete setup process, from GPU driver installation to running your first model, with specific instructions for both NVIDIA and AMD hardware. Whether you want a simple one-click experience or a full development environment with WSL2, this guide covers every path.
Prerequisites
Before starting, verify your system meets these requirements:
- Windows 10 version 21H2 or later, or Windows 11
- 8 GB RAM minimum (16 GB recommended)
- 10 GB free disk space (more for models)
- A 64-bit processor with AVX2 support (most CPUs from 2015 onward)
Check your system:
# Open PowerShell and check Windows version
winver
# Check CPU features (look for AVX2)
wmic cpu get Name, NumberOfCores, NumberOfLogicalProcessors
Step 1: GPU Driver Setup
NVIDIA GPU Setup
NVIDIA GPUs offer the best local AI experience on Windows. You need the latest Game Ready or Studio drivers.
Install or update drivers:
- Open GeForce Experience or go to nvidia.com/drivers
- Download the latest driver for your GPU
- Install with “Express” settings
- Restart your computer
Verify the installation:
# Open PowerShell or Command Prompt
nvidia-smi
You should see output showing your GPU name, driver version, and CUDA version. If nvidia-smi is not recognized, the drivers didn’t install correctly.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 560.xx Driver Version: 560.xx CUDA Version: 12.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name | VRAM Usage | GPU Utilization |
|-------------------------------+----------------------+----------------------+
| 0 RTX 4090 | 0MiB/24GB | 0% |
+-------------------------------+----------------------+----------------------+
CUDA Toolkit (optional, needed for building from source):
Download from developer.nvidia.com/cuda-downloads. For Ollama and LM Studio, you do NOT need the CUDA Toolkit — the drivers alone are sufficient.
AMD GPU Setup
AMD GPU support on Windows uses DirectML (for Ollama) or Vulkan (for LM Studio and llama.cpp).
- Download the latest AMD Adrenalin drivers from amd.com/drivers
- Install with default settings
- Restart your computer
Supported GPUs: RX 6600 and newer (RDNA 2+) for best compatibility. Older GCN GPUs may work with Vulkan but performance will be limited.
Intel Arc GPU Setup
Intel Arc support is experimental. Install the latest Intel drivers and try LM Studio or llama.cpp with Vulkan backend.
Step 2: Install Ollama (Recommended)
Ollama is the fastest path to running a local LLM on Windows.
Installation
- Download the installer from ollama.com/download
- Run the
.exeinstaller - Follow the prompts (default settings are fine)
- Ollama starts automatically and runs in the system tray
Verify Installation
Open PowerShell or Command Prompt:
ollama --version
Run Your First Model
# Download and run Llama 3.1 8B
ollama run llama3.1:8b
The first run downloads the model (~4.7 GB). Subsequent runs start instantly.
# Try different models
ollama run phi3:mini # Small and fast (2.3 GB)
ollama run qwen2.5:7b # Strong general purpose
ollama run qwen2.5-coder:7b # Best for coding
ollama run llama3.1:70b # Large model (needs 48+ GB RAM/VRAM)
Verify GPU Is Being Used
While a model is running, check GPU utilization:
# In a new PowerShell window
nvidia-smi
# Or open Task Manager > Performance > GPU
# Look for "3D" or "Compute" usage
If the GPU shows 0% utilization while the model is generating text, Ollama is running on CPU. This usually means a driver issue.
Ollama Configuration on Windows
Ollama stores models in C:\Users\<username>\.ollama\models by default. To change the storage location:
# Set environment variable (persistent)
[System.Environment]::SetEnvironmentVariable("OLLAMA_MODELS", "D:\ollama\models", "User")
# Restart Ollama from system tray
Other useful environment variables:
# Change API listen address (default: localhost only)
[System.Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "User")
# Set GPU layers (useful for partial offload)
[System.Environment]::SetEnvironmentVariable("OLLAMA_NUM_GPU", "35", "User")
# Limit models loaded simultaneously
[System.Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "2", "User")
Step 3: Install LM Studio (Alternative)
LM Studio provides a graphical desktop experience with an integrated model browser.
Installation
- Download from lmstudio.ai
- Run the installer
- Launch LM Studio
Using LM Studio
Browse and download models:
- Click the Search icon in the left sidebar
- Search for a model (e.g., “llama 3.1 8b”)
- Select a quantization level (Q4_K_M recommended)
- Click Download
Chat with a model:
- Click the Chat icon
- Select your downloaded model from the dropdown
- Start typing
Run as a local server:
- Click the Server icon
- Load a model
- Toggle the server on
- The API is available at
http://localhost:1234/v1
LM Studio is particularly good for browsing Hugging Face models and trying different quantization levels without command-line work.
Step 4: WSL2 Setup (Advanced)
Windows Subsystem for Linux 2 gives you a full Linux environment inside Windows. This is useful for:
- Running vLLM (which doesn’t support native Windows)
- Using Docker with GPU passthrough
- Running Linux-only tools
- Building llama.cpp from source with full optimization
Install WSL2
# Open PowerShell as Administrator
wsl --install
# This installs Ubuntu by default. Restart when prompted.
# After restart, set up your Linux username and password.
Enable GPU in WSL2
NVIDIA GPU passthrough works automatically in WSL2 if you have the correct Windows drivers installed. No separate Linux driver is needed.
# Inside WSL2, verify GPU access
nvidia-smi
If nvidia-smi works, your GPU is available inside WSL2.
Install Ollama in WSL2
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama (it may need to be started manually in WSL)
ollama serve &
# Run a model
ollama run llama3.1:8b
Docker with GPU Support in WSL2
# Install Docker in WSL2
curl -fsSL https://get.docker.com | sh
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Test GPU in Docker
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi
# Run Ollama in Docker with GPU
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Step 5: Add a Web Interface
For a ChatGPT-like experience, install Open WebUI.
Using Docker Desktop
Install Docker Desktop for Windows first, then:
docker run -d -p 3000:8080 ^
--add-host=host.docker.internal:host-gateway ^
-v open-webui:/app/backend/data ^
--name open-webui ^
ghcr.io/open-webui/open-webui:main
Open http://localhost:3000 in your browser.
Without Docker
If you don’t want Docker, Open WebUI can be installed via pip:
pip install open-webui
open-webui serve
Performance Optimization
NVIDIA GPU Optimization
# Set GPU to maximum performance mode
# Open NVIDIA Control Panel > Manage 3D Settings
# Set "Power management mode" to "Prefer maximum performance"
# Or via command line (requires admin):
nvidia-smi -pm 1
nvidia-smi -pl 350 # Set power limit (adjust for your GPU)
Windows-Specific Tweaks
Disable memory compression (can interfere with model loading):
# PowerShell as Administrator
Disable-MMAgent -MemoryCompression
Increase virtual memory (for models larger than available RAM):
- Open System Properties > Advanced > Performance Settings
- Advanced tab > Virtual Memory > Change
- Set custom size: Initial = RAM size, Maximum = 2x RAM size
Exclude Ollama from antivirus scanning:
- Open Windows Security > Virus & threat protection
- Manage settings > Exclusions > Add exclusion
- Add folder:
C:\Users\<username>\.ollama - Add process:
ollama.exe
Context Length and Memory
Longer conversations use more memory. If you run out of VRAM mid-conversation:
# Reduce context length in Ollama
# Create a Modelfile
echo "FROM llama3.1:8b" > Modelfile
echo "PARAMETER num_ctx 2048" >> Modelfile
ollama create llama3.1-short -f Modelfile
ollama run llama3.1-short
Troubleshooting
”CUDA not available” or Model Running on CPU
# Check NVIDIA driver
nvidia-smi
# If nvidia-smi fails:
# 1. Reinstall NVIDIA drivers from nvidia.com/drivers
# 2. Make sure to select "Clean install" option
# 3. Restart computer
# Check Ollama GPU detection
ollama run llama3.1:8b --verbose
# Look for "using GPU" in the output
Ollama Won’t Start
# Check if Ollama is already running
tasklist | findstr ollama
# Kill existing instance
taskkill /f /im ollama.exe
# Restart
ollama serve
# Check logs
type %LOCALAPPDATA%\Ollama\server.log
Model Download Fails
# Check disk space
wmic logicaldisk get size,freespace,caption
# If space is low, move models to another drive
[System.Environment]::SetEnvironmentVariable("OLLAMA_MODELS", "D:\ollama\models", "User")
# Retry download
ollama pull llama3.1:8b
Out of Memory Errors
# Check how much VRAM is available
nvidia-smi
# Try a smaller model
ollama run phi3:mini
# Or use a lower quantization
ollama run llama3.1:8b-instruct-q3_K_M
# Close other GPU applications (games, video editors, Chrome with hardware acceleration)
WSL2 GPU Not Detected
# In Windows PowerShell (admin)
wsl --update
# Ensure Windows NVIDIA driver is up to date (not a Linux driver!)
# The WSL2 kernel uses the Windows driver through a shim
# Restart WSL
wsl --shutdown
wsl
Recommended Workflow
Here’s a recommended setup for different Windows users:
Casual User
Install LM Studio. Browse models in the GUI, download what interests you, chat directly. No command line needed.
Developer
Install Ollama natively. Use it from PowerShell and integrate with your IDE via Continue or similar extensions. Use the API for scripting.
Power User
Install both Ollama (for daily use) and WSL2 (for advanced tools). Run Open WebUI in Docker Desktop for a web interface. Use WSL2 for vLLM, fine-tuning, or other Linux-specific tools.
Team Lead
Set up Ollama + Open WebUI in Docker Desktop. Configure Open WebUI with user accounts for your team. See our Open WebUI guide for multi-user setup details.
Next Steps
Your Windows machine is now running local AI. Here’s where to go next:
- Choose the right model: Model selection guide
- Add a web interface: Open WebUI + Ollama setup
- Set up code assistance: Local AI Code Assistant guide
- Build a RAG chatbot: Local RAG Chatbot tutorial
- Understand the full stack: The Local AI Stack