← Back to BlogTutorial
How to Run AI Locally: Complete Step-by-Step Guide
2026-02-01•14 min read
Want to run AI on your own computer — no cloud, no subscriptions, no data leaving your machine? This guide walks you through setting up local AI step by step. Whether you want complete privacy, offline access, or just to experiment, here's exactly how to do it.
Why Run AI Locally?
Before we dive into setup, let's clarify why you'd want this:
Privacy: Your conversations never leave your computer. No one can read your prompts or train on your data.
Cost: No monthly subscriptions. Run as much as you want once set up.
Offline Access: Works without internet (after model download).
Control: Choose exactly which model runs, modify settings, no restrictions.
Learning: Understand how AI actually works under the hood.
The tradeoff: Local models are generally less capable than GPT-4 or Claude. But they're improving rapidly, and for many tasks, they're good enough.
What You'll Need
Hardware Requirements:
*Minimum (basic chat):*
- Apple M1/M2 Mac with 8GB RAM, OR
- Windows/Linux with NVIDIA GPU (8GB+ VRAM), OR
- Modern CPU with 16GB+ RAM (slower)
*Recommended (good experience):*
- Apple M2/M3/M4 with 16GB+ RAM, OR
- Windows/Linux with RTX 3080+ or equivalent
- 32GB system RAM for larger models
*Optimal (run big models):*
- Apple M2/M3/M4 Max with 32GB+ RAM, OR
- RTX 4090 or multiple GPUs
- 64GB+ system RAM
Storage:
- 10-50GB free space depending on models you want
Software:
- Terminal/command line access
- Basic familiarity with running commands
Method 1: Ollama (Easiest)
Ollama is the simplest way to run local AI. One command to install, one command to run.
Step 1: Install Ollama
*Mac:*
```bash
brew install ollama
```
*Windows/Linux:*
Download from ollama.com or:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
Step 2: Run your first model
```bash
ollama run llama3.1
```
That's it. First run downloads the model (~4GB for 8B version), then you can chat.
Step 3: Try different models
```bash
# Smaller, faster
ollama run phi3
# Larger, smarter
ollama run llama3.1:70b
# Code-focused
ollama run codellama
# Uncensored
ollama run dolphin-mixtral
```
Pros: Dead simple. Works great.
Cons: Just chat, no assistant capabilities (email, calendar, etc.)
Method 2: OpenClaw with Local Models
Want local AI with full assistant capabilities? OpenClaw can use Ollama models.
Step 1: Install both
```bash
# Install Ollama first
brew install ollama
# Then OpenClaw
npm install -g openclaw
```
Step 2: Start Ollama
```bash
ollama serve
```
Step 3: Configure OpenClaw for local
```bash
openclaw setup --provider ollama
```
When asked for model, choose: `llama3.1` or another you've downloaded.
Step 4: Start your assistant
```bash
openclaw start
```
Now you have:
- Local AI (no cloud)
- Full memory system
- Works in Telegram, WhatsApp
- Email, calendar, reminders
Pro tip: You can hybrid — use local for simple tasks, Claude API for complex ones:
```bash
openclaw config edit
# Set default to ollama
# Set fallback to anthropic for complex tasks
```
Method 3: Jan (No Terminal Required)
If command lines scare you, Jan provides a graphical interface.
Step 1: Download Jan
Go to jan.ai and download for your OS.
Step 2: Install and open
Run the installer. Open Jan.
Step 3: Download a model
Click "Hub" → Browse models → Download one
Recommended: Start with Phi-3 (smaller) or Llama 3.1 8B
Step 4: Start chatting
Select your model and chat. That's it.
Pros:
- No terminal needed
- Clean interface
- Easy model management
Cons:
- Chat only, no assistant features
- Less flexible than command line
- No messaging integration
Choosing the Right Model
Not all models are equal. Here's a practical guide:
For general chat (8GB RAM):
- Llama 3.1 8B — Best balance of speed and quality
- Phi-3 — Smaller but surprisingly capable
- Mistral 7B — Good reasoning
For smarter responses (16GB+ RAM):
- Llama 3.1 70B — Approaching GPT-4 quality
- Mixtral 8x7B — Fast with mixture of experts
- DeepSeek-V2 — Strong reasoning
For coding:
- CodeLlama — Optimized for code
- DeepSeek-Coder — Excellent for programming
- StarCoder — Good for completion
For uncensored/creative:
- Dolphin-Mixtral — Unrestricted version
- WizardLM — Creative writing
Quantization explained:
Models come in sizes: Q4_K_M, Q5_K_M, Q8_0, etc.
- Lower numbers (Q4) = smaller, faster, slightly dumber
- Higher numbers (Q8) = larger, slower, smarter
- K_M variants are best balance
Start with the default and adjust based on your hardware.
Performance Optimization
Getting the best speed from local AI:
On Mac (Metal acceleration):
Ollama uses Metal automatically. Ensure you have:
```bash
# Check Metal is being used
ollama run llama3.1 --verbose
```
On NVIDIA GPU (CUDA):
Install NVIDIA drivers and CUDA toolkit. Ollama detects automatically.
On CPU only (slower but works):
Use smaller models and quantized versions:
```bash
ollama run phi3:3.8b-mini-4k-instruct-q4_K_M
```
Memory management:
```bash
# Limit model memory usage
OLLAMA_MAX_VRAM=6000 ollama serve
# Or in config
ollama set OLLAMA_MAX_VRAM 6000
```
Context length:
Longer context = more memory. Default is usually fine, but:
```bash
# Reduce for less memory
ollama run llama3.1 --ctx-size 2048
```
Adding a Web Interface
Ollama is command-line by default. Want a ChatGPT-like interface?
Option 1: Open WebUI
The most polished option:
```bash
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
```
Then open http://localhost:3000
Option 2: Chatbot UI
Minimal interface:
```bash
git clone https://github.com/mckaywrigley/chatbot-ui
cd chatbot-ui
npm install
npm run dev
```
Option 3: Use Jan
Already has a nice UI built in.
My recommendation: If you want chat, use Jan or Open WebUI. If you want an assistant, use OpenClaw.
Local vs Cloud: Honest Comparison
Let's be real about trade-offs:
Local AI (Llama 3.1 70B):
✓ Complete privacy
✓ No subscription cost
✓ Works offline
✓ No rate limits
✗ Slower responses
✗ Less capable than GPT-4/Claude
✗ Requires good hardware
✗ Setup complexity
Cloud AI (Claude, GPT-4):
✓ Best-in-class intelligence
✓ Fast responses
✓ No hardware requirements
✓ Always improving
✗ Privacy concerns
✗ Ongoing costs
✗ Rate limits
✗ Requires internet
The pragmatic approach:
Use local for:
- Sensitive data
- High volume tasks
- Offline situations
- Learning/experimenting
Use cloud for:
- Complex reasoning
- Creative writing
- When quality matters most
OpenClaw lets you mix both.
Troubleshooting Common Issues
"Model runs very slowly"
- Use smaller model or lower quantization
- Check GPU is being used (--verbose flag)
- Close other applications
- Reduce context length
"Out of memory"
- Use smaller model (phi3 instead of llama3.1)
- Lower quantization (q4 instead of q8)
- Reduce context: `--ctx-size 2048`
"Model gives weird outputs"
- Try a different model
- Adjust temperature: `--temperature 0.7`
- Check if model is appropriate for your task
"Ollama won't start"
```bash
# Kill existing process
pkill ollama
# Restart
ollama serve
```
"Can't download models"
- Check internet connection
- Check disk space
- Try: `ollama pull llama3.1` directly
Need more help?
- Ollama Discord has active support
- Check GitHub issues for specific problems
Next Steps: From Local Chat to Full Assistant
You now have AI running locally. What's next?
Level up to full assistant:
Local chat is nice, but an assistant that takes action is powerful.
Set up OpenClaw with local models
Explore capabilities:
10 ways to use your AI assistant
Connect to WhatsApp
Connect to Telegram
Learn more:
Why open source matters
AI memory explained
Private AI benefits
The bottom line:
Running AI locally is easier than ever. Start simple with Ollama, upgrade to OpenClaw when you want real assistant capabilities. Your data, your hardware, your rules.
Real People Using AI Assistants
“I was intimidated by 'run AI locally' but Ollama made it stupid simple. One command and I'm chatting with Llama. Amazing.”
“Running completely local for my legal work. Can't have client data touching cloud servers. Llama 3.1 handles everything I need.”
“Started with local Ollama, then added OpenClaw for assistant features. Best of both worlds — privacy with capability.”
Ready to try it yourself?
Get the free guide and set up your AI assistant in 30 minutes.