Z.AI

Z.AI provides the GLM model family — strong, cost-effective models with first-class tool calling and prompt caching. Rumus has a dedicated provider so you can use both the general endpoint and the Coding-specialized one.

Before you start

You need:

A Z.AI account at z.ai (or the international portal).
An API key from your Z.AI dashboard.
Sufficient credit on the Z.AI account.

Add Z.AI in Rumus

Open the model settings

Go to Settings → AI → Models and click Add Model.

Pick the provider

Set Provider to Z.AI.

Paste your API key

Paste the key into API Key. Stored encrypted in your local vault.

(Optional) Custom base URL

Two endpoints are common:

General: https://api.z.ai/api/paas/v4 (default — leave blank).
Coding: https://api.z.ai/api/coding/paas/v4 — better routing for code-heavy workloads.

Paste the Coding URL into Base URL if you want that variant.

Pick a model

Choose from the list (GLM-4.7, GLM-4.6, GLM-4.5, GLM-4.5 Air, GLM-4.5 Flash, etc.) or toggle Enter custom ID to type a model ID manually.

Capabilities

On the Capabilities tab:

Tool Calling — supported on GLM-4.5+ and 4.7.
Vision — supported on multimodal GLM versions.
Prompt Cache — supported on GLM-4.7; cache reads are ~50% of input price.

Save

The model appears in the picker under Custom Models.

Recommended models

Model	Good for
GLM-4.7	Default — long context (128K), strong tools, prompt caching
GLM-4.5 Air	Lightweight, faster, lower cost — great for autocomplete
GLM-4.5 Flash	High-volume, latency-sensitive

For the latest model details and limits, see the Z.AI documentation.

Tips

The default temperature/top_p for Z.AI (0.7 / 0.9) differs from OpenAI’s defaults. Rumus respects whatever you set per thread.
Coding endpoint vs general: if you’re mostly using the agent for code tasks, configure the Coding endpoint — routing and rate limits are tuned for it.
Prompt cache kicks in automatically on supported models. Long, stable system prompts (skills, rules, long file context) benefit the most.

Troubleshooting

401 Unauthorized

Recheck the API key. Note that Z.AI keys are not interchangeable between the China portal and the international portal.

Slow first responses

Streaming first-token latency varies by region. If you’re far from the endpoint, switch to a closer region (Z.AI publishes regional URLs).

Model not in the list

Toggle Enter custom ID and paste the exact model ID (e.g. glm-4.7).

Hit a snag we didn’t cover? Ask in the Rumus community.

Next steps

Other providers

Anthropic, OpenAI, Google, DeepSeek, Kimi, Ollama, OpenAI-compatible.

Built-in models

Use Rumus’s bundled GLM-4.7 without managing a key yourself.

Getting started

Models & Providers

Terminal & Workspace

Remote & SSH

AI Assistant

Customization

Account & Sync

FAQ

Before you start

Add Z.AI in Rumus

Recommended models

Tips

Troubleshooting

Next steps

Other providers

Built-in models

Getting started

Models & Providers

Terminal & Workspace

Remote & SSH

AI Assistant

Customization

Account & Sync

FAQ

Documentation Index

​Before you start

​Add Z.AI in Rumus

​Recommended models

​Tips

​Troubleshooting

​Next steps

Other providers

Built-in models

Before you start

Add Z.AI in Rumus

Recommended models

Tips

Troubleshooting

Next steps