Open-source LLMs on Chatbox via Ollama

February 3, 2025

Ollama Chatbox with DeepSeek

In this tutorial, we’ll walk through setting up Ollama on your computer and getting started with the deepseek-r1:8b model, one of DeepSeek’s efficient distilled models. Before we dive into the setup, let’s understand what makes this model special and why it’s a great choice for local development.

Quick Setup: With this guide, you can have your own AI assistant running locally in under 30 minutes! All you need is a decent computer and basic familiarity with terminal commands. No cloud services, no API keys, and no subscription fees required. As this is a rapid guide, keep an eye out for new blogs on how to use other models and more complex setups.

Understanding DeepSeek Models

DeepSeek offers a range of models to suit different needs and hardware capabilities:

The Full DeepSeek-R1

Massive Scale: The core model contains 671 billion parameters
Large Context: Supports a 128,000 token context window
Hardware Requirements: Requires enterprise-grade hardware and significant computational resources

Distilled Models (What We’ll Use)

DeepSeek has created smaller, more efficient versions through knowledge distillation:

Size Range: From 1.5B to 671B parameters (use 7b then 1.5b if 8b is too large)
Accessibility: Can run on consumer hardware (laptops/desktops with decent GPUs)
Performance Balance: Maintains good performance while being more resource-efficient

In this tutorial, we’ll use the 8B parameter version, which offers an excellent balance between performance and resource requirements, making it perfect for local development on standard hardware.

Prerequisites

Before we begin, ensure you have:

At least 8GB of RAM (16GB recommended)
Approximately 8GB of free storage space
For GPU acceleration (optional):
- NVIDIA GPU with CUDA support (Windows/Linux)
- Apple Silicon or Intel processor (macOS)

Installing Ollama

Download Ollama from Ollama.com/download

Choose your operating system below for specific installation instructions:

macOS Installation

Download the .dmg file
Open the .dmg file and drag Ollama to your Applications folder
Open Terminal and verify the installation:
```
ollama --version
```

Windows Installation

Enable Windows Subsystem for Linux (WSL2):
```
wsl --install
```
Restart your computer
Install Ollama
Open Command Prompt or PowerShell and verify:
```
ollama --version
```

Linux Installation

Install Ollama using the official install script:
```
curl -fsSL https://ollama.com/install.sh | sh
```
Verify the installation:
```
ollama --version
```

Setting Up Deepseek-r1:8b

The process is the same across all operating systems:

Pull the model:
```
ollama pull deepseek-r1:8b
```
See README on the LLM Ollama.com/library/deepseek-r1
Start a conversation:
```
ollama run deepseek-r1:8b
```

Using Ollama with Chatbox

Chatbox provides a user-friendly interface across all platforms:

Download Chatbox:
- Windows: Windows Installer
- macOS: macOS DMG
- Linux: AppImage or .deb package
Configure Ollama in Chatbox:
- Click Settings
- Add New API Endpoint
- Choose "Ollama"
- Set API URL: http://0.0.0.0:11434
Start a new chat:
- Click "+" for new conversation
- Select "deepseek-r1:8b"
- Begin chatting!

Testing Your Setup

To verify everything is working correctly, try asking the model a simple question:

What is deepseek-r1:8b and how is it different from deepseek-r1:14b?

Once you’ve asked the question, you’ll see the response start as a “Thinking” response before the model generates the answer as shown here:

Just to be clear, this was the deepseek-r1:14b model that generated the response in the screenshot above. I have a gaming desktop with a more powerful GPU, so added the larger model that would not work on my laptop. To configure Chatbox to a remote model, you can use the same process but set the API URL to the remote IP you want to use and set Ollama as a server on that machine.

Performance Tips

Hardware Optimization:
- CPU: The model will run on CPU, but expect slower responses
- GPU: Significantly faster performance with NVIDIA GPUs (Windows/Linux) or Apple Silicon (macOS)
- RAM Usage: The 8B model typically uses 6-8GB of RAM during operation
Operating System Specific Tips:
- Windows: Close background applications and ensure WSL2 has adequate resources
- macOS: Monitor CPU/RAM usage through Activity Monitor
- Linux: Use nvidia-smi to monitor GPU usage if applicable
Temperature Settings:
- Lower (0.1-0.3): More focused, deterministic responses
- Higher (0.7-1.0): More creative responses
- Default (0.5): Balanced responses

Troubleshooting

Model Not Loading:
```
ollama list
```
Verify the model is installed
Platform-Specific Issues:
- Windows:
  - Ensure WSL2 is properly configured
  - Check NVIDIA drivers if using GPU
- macOS:
  - Reset Ollama: killall ollama
- Linux:
  - Check CUDA installation: nvidia-smi
  - Verify permissions: ls -l ~/.ollama
Connection Issues:
```
ollama serve
```
Restart the Ollama service

Next Steps

Now that you have Ollama and the deepseek-r1:8b model running locally, you can:

Experiment with different prompt styles
Try other available models
Integrate Ollama into your development workflow
Consider upgrading to larger models if you have more powerful hardware

Remember that while we’re using the 8B parameter version, DeepSeek offers larger models if you need more advanced capabilities and have the hardware to support them. The beauty of running these models locally is that your data stays on your machine, making it perfect for sensitive or private development work.

Here is an example installing and listing models in CLI:

And here is a screenshot of the same models in Chatbox once you’ve added the Ollama API endpoint for the local machine. You can see the models and the ability to select the one you want to use:

Feel free to experiment with different models and settings to find what works best for your needs. The deepseek-r1:8b model provides an excellent balance of performance and resource usage, making it an ideal choice for local development, but if you have an old gaming PC or want to upgrade your video card, you can use the larger models.

ShaneOfAllTrades

AI- LLM