Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.rumus.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Z.AI provides the GLM model family — strong, cost-effective models with first-class tool calling and prompt caching. Rumus has a dedicated provider so you can use both the general endpoint and the Coding-specialized one.

Before you start

You need:
  • A Z.AI account at z.ai (or the international portal).
  • An API key from your Z.AI dashboard.
  • Sufficient credit on the Z.AI account.

Add Z.AI in Rumus

1

Open the model settings

Go to Settings → AI → Models and click Add Model.
2

Pick the provider

Set Provider to Z.AI.
3

Paste your API key

Paste the key into API Key. Stored encrypted in your local vault.
4

(Optional) Custom base URL

Two endpoints are common:
  • General: https://api.z.ai/api/paas/v4 (default — leave blank).
  • Coding: https://api.z.ai/api/coding/paas/v4 — better routing for code-heavy workloads.
Paste the Coding URL into Base URL if you want that variant.
5

Pick a model

Choose from the list (GLM-4.7, GLM-4.6, GLM-4.5, GLM-4.5 Air, GLM-4.5 Flash, etc.) or toggle Enter custom ID to type a model ID manually.
6

Capabilities

On the Capabilities tab:
  • Tool Calling — supported on GLM-4.5+ and 4.7.
  • Vision — supported on multimodal GLM versions.
  • Prompt Cache — supported on GLM-4.7; cache reads are ~50% of input price.
7

Save

The model appears in the picker under Custom Models.
ModelGood for
GLM-4.7Default — long context (128K), strong tools, prompt caching
GLM-4.5 AirLightweight, faster, lower cost — great for autocomplete
GLM-4.5 FlashHigh-volume, latency-sensitive
For the latest model details and limits, see the Z.AI documentation.

Tips

  • The default temperature/top_p for Z.AI (0.7 / 0.9) differs from OpenAI’s defaults. Rumus respects whatever you set per thread.
  • Coding endpoint vs general: if you’re mostly using the agent for code tasks, configure the Coding endpoint — routing and rate limits are tuned for it.
  • Prompt cache kicks in automatically on supported models. Long, stable system prompts (skills, rules, long file context) benefit the most.

Troubleshooting

Recheck the API key. Note that Z.AI keys are not interchangeable between the China portal and the international portal.
Streaming first-token latency varies by region. If you’re far from the endpoint, switch to a closer region (Z.AI publishes regional URLs).
Toggle Enter custom ID and paste the exact model ID (e.g. glm-4.7).
Hit a snag we didn’t cover? Ask in the Rumus community.

Next steps

Other providers

Anthropic, OpenAI, Google, DeepSeek, Kimi, Ollama, OpenAI-compatible.

Built-in models

Use Rumus’s bundled GLM-4.7 without managing a key yourself.