Do I need a GPU to run Ollama?

No. Ollama runs on CPU by default. A GPU accelerates inference significantly, but smaller models (1B-7B) run well on modern CPUs with 8-16GB RAM.

How much disk space do I need?

Ollama itself is small (~100MB). Models range from 1GB (small, 1B) to 40GB+ (large, 70B). Start with a 3-7B model at around 2-4GB.

Can I run Ollama offline?

Yes. Once a model is downloaded, Ollama works completely offline. You only need internet for the initial download.

Your First Local AI in 5 Minutes: Ollama Quickstart

Running an AI model on your own machine takes less than 5 minutes with Ollama. No cloud account, no API key, no GPU required. Here’s how.

Step 1: Install Ollama

macOS / Linux — Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

Windows — Download the installer from ollama.com/download and run it.

Docker — Pull and run the official image:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 2: Run Your First Model

Once installed, pull and run a model in one command:

ollama run llama3.2

That’s it. You’re now chatting with Llama 3.2 running entirely on your machine. Type a message and press Enter.

For a smaller, faster model (great for older hardware):

ollama run phi3:mini

For a larger, more capable model (needs 16GB+ RAM):

ollama run llama3.2:70b

Step 3: Try Some Prompts

Here are some things to try:

>>> Explain quantum computing to a 10-year-old
>>> Write a Python function to find prime numbers
>>> Summarize the key differences between REST and GraphQL
>>> Write a haiku about programming

What’s Happening Under the Hood

When you run ollama run llama3.2:

Download — Ollama downloads the model weights (~4GB for 7B Q4) from the Ollama library
Load — The model loads into RAM (or VRAM if you have a GPU)
Inference — Your prompts are processed locally. No data leaves your machine
Response — Tokens are generated one at a time and streamed to your terminal

Using Ollama as an API Server

Ollama automatically runs an API server on http://localhost:11434. You can use it from any programming language:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What is local AI?",
  "stream": false
}'

The API is OpenAI-compatible, so tools like Open WebUI, LangChain, and Continue work out of the box.

Next Steps

You’re now running AI locally. Here’s where to go next:

Add a UI — Install Open WebUI for a ChatGPT-like interface
Choose the right model — Read our model selection guide to find the best model for your use case
Build something — Follow our RAG chatbot tutorial to chat with your documents
Understand your options — Compare Ollama vs LM Studio to see if a different tool fits better
Go deeper — Read What Is Local AI? for the full picture

Welcome to local AI. Your data stays yours.