Ollama (local models)

Ollama lets you run models like Llama, Mistral, Qwen, and DeepSeek locally. With Ollama configured in Rumus, you can keep prompts entirely on your machine — no network round-trip, no per-token cost, no provider account.

Before you start

You need:

Ollama installed and running on the same machine (or a reachable one). Get it from ollama.com/download.

At least one model pulled locally:

ollama pull llama3.2
ollama pull qwen2.5-coder

Enough RAM and disk for the model you choose. A 7B model wants ~8 GB RAM; bigger models scale from there.

Add Ollama in Rumus

Make sure Ollama is running

By default Ollama listens on http://localhost:11434. Verify with:

curl http://localhost:11434/api/tags

You should get JSON listing the models you’ve pulled.

Open the model settings

Go to Settings → AI → Models and click Add Model.

Pick the provider

Set Provider to Ollama. There’s no API key field — Ollama is unauthenticated by default.

Base URL

Default http://localhost:11434 works for local Ollama. Override if you’ve set OLLAMA_HOST to bind to another port, or if Ollama is running on another machine on your LAN.

Enter the model ID

Type the exact model tag you pulled (e.g. llama3.2, qwen2.5-coder:32b, mistral). Rumus does not auto-fetch the list — model IDs come straight from ollama list.

Capabilities

On the Capabilities tab, mark only what your chosen model actually supports:

Tool Calling — only some models (e.g. Llama 3.1+, Qwen 2.5) handle tools well.
Vision — only multimodal variants (e.g. llava, qwen2.5-vl).
Prompt Cache — Ollama doesn’t support an explicit prompt cache API; leave this off.

Save

The model appears in the picker under Custom Models.

Recommended models for Rumus

Rumus benefits from models that are good at tool use and code. Solid local choices:

Model	Why
`qwen2.5-coder:32b`	Strong at code, supports tools — good agent driver if you have the RAM
`qwen2.5-coder:7b`	Smaller variant — runs comfortably on 16 GB RAM
`llama3.2`	Fast generalist for chat-style queries
`llava`	Multimodal — useful for screenshots and diagrams

For the full catalog see ollama.com/library.

Tips

Keep a model warm. First-token latency on a cold model can be many seconds while Ollama loads weights into memory. Hit it with a quick prompt right before a session.
Reachability across the LAN. Set OLLAMA_HOST=0.0.0.0:11434 on the Ollama host and point Rumus at http://<host-ip>:11434. Make sure the firewall allows it.
Tool calling quality varies wildly. If the agent stops mid-task or fails to invoke a tool, fall back to a model with documented tool-use support.
Quantization matters. A 7B model at Q4 quant runs on far less RAM than the FP16 version with little quality loss — pick the tag that fits your hardware.

Troubleshooting

Connection refused / no response

Ollama isn’t running, or it’s bound to a different host/port. Run ollama serve (or restart the app) and verify with curl http://localhost:11434/api/tags.

404 model not found

The model ID doesn’t match anything in ollama list. Either pull it (ollama pull <name>) or correct the ID — tags are case-sensitive and include the size suffix (e.g. qwen2.5-coder:32b, not just qwen2.5-coder).

Very slow generation

Either the model is too large for available RAM (Ollama is offloading to disk), or there’s no GPU acceleration. Try a smaller model or a more aggressive quant.

Tool calls don't work

The model doesn’t support tools well. Switch to a model with documented tool support like Llama 3.1+ or Qwen 2.5.

Hit a snag we didn’t cover? Ask in the Rumus community.

Next steps

Other providers

Anthropic, OpenAI, Google, Z.AI, DeepSeek, Kimi, OpenAI-compatible.

OpenAI-compatible

For vLLM, LiteLLM, and other local servers that speak OpenAI’s API.

Getting started

Models & Providers

Terminal & Workspace

Remote & SSH

AI Assistant

Customization

Account & Sync

FAQ

Ollama (local models)

Before you start

Add Ollama in Rumus

Recommended models for Rumus

Tips

Troubleshooting

Next steps

Other providers

OpenAI-compatible

Getting started

Models & Providers

Terminal & Workspace

Remote & SSH

AI Assistant

Customization

Account & Sync

FAQ

Documentation Index

​Before you start

​Add Ollama in Rumus

​Recommended models for Rumus

​Tips

​Troubleshooting

​Next steps

Other providers

OpenAI-compatible

Before you start

Add Ollama in Rumus

Recommended models for Rumus

Tips

Troubleshooting

Next steps