In this tutorial, we’ll walk through setting up Ollama on your computer and getting started with the deepseek-r1:8b model, one of DeepSeek’s efficient distilled models. Before we dive into the setup, let’s understand what makes this model special and why it’s a great choice for local development.
Quick Setup: With this guide, you can have your own AI assistant running locally in under 30 minutes! All you need is a decent computer and basic familiarity with terminal commands. No cloud services, no API keys, and no subscription fees required. As this is a rapid guide, keep an eye out for new blogs on how to use other models and more complex setups.
Understanding DeepSeek Models
DeepSeek offers a range of models to suit different needs and hardware capabilities:
The Full DeepSeek-R1
- Massive Scale: The core model contains 671 billion parameters
- Large Context: Supports a 128,000 token context window
- Hardware Requirements: Requires enterprise-grade hardware and significant computational resources
Distilled Models (What We’ll Use)
DeepSeek has created smaller, more efficient versions through knowledge distillation:
- Size Range: From 1.5B to 671B parameters (use 7b then 1.5b if 8b is too large)
- Accessibility: Can run on consumer hardware (laptops/desktops with decent GPUs)
- Performance Balance: Maintains good performance while being more resource-efficient
In this tutorial, we’ll use the 8B parameter version, which offers an excellent balance between performance and resource requirements, making it perfect for local development on standard hardware.
Prerequisites
Before we begin, ensure you have:
- At least 8GB of RAM (16GB recommended)
- Approximately 8GB of free storage space
- For GPU acceleration (optional):
- NVIDIA GPU with CUDA support (Windows/Linux)
- Apple Silicon or Intel processor (macOS)
Installing Ollama
Download Ollama from Ollama.com/download
Choose your operating system below for specific installation instructions:
macOS Installation
- Download the .dmg file
- Open the .dmg file and drag Ollama to your Applications folder
- Open Terminal and verify the installation:
ollama --version
Windows Installation
- Enable Windows Subsystem for Linux (WSL2):
wsl --install
- Restart your computer
- Install Ollama
- Open Command Prompt or PowerShell and verify:
ollama --version
Linux Installation
- Install Ollama using the official install script:
curl -fsSL https://ollama.com/install.sh | sh
- Verify the installation:
ollama --version
Setting Up Deepseek-r1:8b
The process is the same across all operating systems:
- Pull the model:
See README on the LLM Ollama.com/library/deepseek-r1ollama pull deepseek-r1:8b
- Start a conversation:
ollama run deepseek-r1:8b
Using Ollama with Chatbox
Chatbox provides a user-friendly interface across all platforms:
- Download Chatbox:
- Windows: Windows Installer
- macOS: macOS DMG
- Linux: AppImage or .deb package
- Configure Ollama in Chatbox:
- Click Settings
- Add New API Endpoint
- Choose "Ollama"
- Set API URL:
http://0.0.0.0:11434
- Start a new chat:
- Click "+" for new conversation
- Select "deepseek-r1:8b"
- Begin chatting!
Testing Your Setup
To verify everything is working correctly, try asking the model a simple question:
What is deepseek-r1:8b and how is it different from deepseek-r1:14b?
Once you’ve asked the question, you’ll see the response start as a “Thinking” response before the model generates the answer as shown here:
Just to be clear, this was the deepseek-r1:14b model that generated the response in the screenshot above. I have a gaming desktop with a more powerful GPU, so added the larger model that would not work on my laptop. To configure Chatbox to a remote model, you can use the same process but set the API URL to the remote IP you want to use and set Ollama as a server on that machine.
Performance Tips
- Hardware Optimization:
- CPU: The model will run on CPU, but expect slower responses
- GPU: Significantly faster performance with NVIDIA GPUs (Windows/Linux) or Apple Silicon (macOS)
- RAM Usage: The 8B model typically uses 6-8GB of RAM during operation
- Operating System Specific Tips:
- Windows: Close background applications and ensure WSL2 has adequate resources
- macOS: Monitor CPU/RAM usage through Activity Monitor
- Linux: Use
nvidia-smi
to monitor GPU usage if applicable
- Temperature Settings:
- Lower (0.1-0.3): More focused, deterministic responses
- Higher (0.7-1.0): More creative responses
- Default (0.5): Balanced responses
Troubleshooting
- Model Not Loading:
Verify the model is installedollama list
- Platform-Specific Issues:
- Windows:
- Ensure WSL2 is properly configured
- Check NVIDIA drivers if using GPU
- macOS:
- Reset Ollama:
killall ollama
- Reset Ollama:
- Linux:
- Check CUDA installation:
nvidia-smi
- Verify permissions:
ls -l ~/.ollama
- Check CUDA installation:
- Windows:
- Connection Issues:
Restart the Ollama serviceollama serve
Next Steps
Now that you have Ollama and the deepseek-r1:8b model running locally, you can:
- Experiment with different prompt styles
- Try other available models
- Integrate Ollama into your development workflow
- Consider upgrading to larger models if you have more powerful hardware
Remember that while we’re using the 8B parameter version, DeepSeek offers larger models if you need more advanced capabilities and have the hardware to support them. The beauty of running these models locally is that your data stays on your machine, making it perfect for sensitive or private development work.
Here is an example installing and listing models in CLI:
And here is a screenshot of the same models in Chatbox once you’ve added the Ollama API endpoint for the local machine. You can see the models and the ability to select the one you want to use:
Feel free to experiment with different models and settings to find what works best for your needs. The deepseek-r1:8b model provides an excellent balance of performance and resource usage, making it an ideal choice for local development, but if you have an old gaming PC or want to upgrade your video card, you can use the larger models.