T

Cannot Use Custom OpenAI-Compatible Endpoint in Rumus Agent

testkey · June 6, 2026 at 07:46 AM

Issue Description

I am trying to use my own OpenAI-compatible endpoint in Rumus Agent.

Endpoint:

http://192.168.1.1:4000/v1/chat/completions

However, every request fails with the following error:

Error: url not allowed on the configured scope:
http://192.168.1.1:4000/v1/chat/completions

The endpoint itself is working correctly.

I have tested it with:

  • Open WebUI

  • LiteLLM

  • OpenAI-compatible clients

  • Direct API requests

All of them can successfully connect and generate responses.

The issue only occurs inside Rumus Agent.

Questions

  1. Does Rumus Agent restrict custom API URLs?

  2. Is there an allowlist/whitelist configuration that needs to be modified?

  3. Are self-hosted LiteLLM endpoints supported?

  4. Is HTTPS required for custom endpoints?

  5. Where can I configure the allowed URL scope?

Environment

  • Rumus Agent Version: 0.1.19

  • Endpoint Type: LiteLLM (OpenAI Compatible)

  • API URL:

    http://192.168.1.1:4000/v1/chat/completions

Any help would be appreciated.

Thank you.

6 replies 71 views

6 Replies

Rumus 6/6/2026

Hi testkey, thanks for the detailed report — this was a real bug on our end, and your repro made it easy to pin down. 🙏

What was happening: Rumus routes model requests through a built-in HTTP allow-list. The rule was only matching URLs without an explicit port, so a standard endpoint like https://api.openai.com/v1 worked, but anything with a port — your LiteLLM on :4000, Ollama on :11434, in-house gateways, etc. — got rejected with url not allowed on the configured scope. That's why the exact same endpoint worked from curl and Open WebUI but failed inside Rumus.

The fix: We've broadened the allow-list to accept any host and port over http/https. Self-hosted, OpenAI-compatible endpoints with a port now connect normally — no whitelisting or HTTPS required, and http://192.168.1.1:4000/v1 works as-is.

This ships in v0.1.20. Once you update, just point the OpenAI-compatible provider at your LiteLLM base URL and it should connect. If you still hit anything after updating, reply here and we'll take another look.

Thanks again for helping us catch this! 🚀

testkey 6/6/2026

Thanks for your quick reply, so I will wait for the new version and test again.

Rumus 6/6/2026

Hi testkey, good news — v0.1.20 is now available, so you can go ahead and update. Once you're on the new version, just point the OpenAI-compatible provider at your LiteLLM base URL (port included, e.g. http://192.168.1.1:4000/v1) and it should connect right away.

If you run into anything else — with this or anything else in Rumus — feel free to reply here anytime. We keep an eye on the community and will get to it as soon as we can. 🚀

Thanks again for the report!

testkey 6/6/2026

Hi, there,

I’ve updated to 0.1.20; however, when I used the Agent conversation, I encountered another issue with a custom OpenAI-compatible model.

Model Information

  • Model: Qwen3-Next-80B-Instruct

  • Backend: NVIDIA NIM

  • API Type: OpenAI Compatible

Problem

The agent successfully executes commands on the server.

For example:

  • Commands are executed correctly.

  • Files are created or modified successfully.

  • The server-side task completes without errors.

However, in the Rumus chat interface:

  • The action remains in the "Executing" state indefinitely.

  • The Send button remains disabled.

  • The conversation does not return control to the user.

  • The task appears to be stuck even though execution has already completed on the server.

The only way to continue is to refresh the page or restart the session.

Expected Behavior

After the server finishes executing the command and returns a successful response, Rumus should:

  1. Mark the action as completed.

  2. Display the final result in the chat.

  3. Re-enable the Send button.

  4. Return control to the user.

Actual Behavior

The server-side operation completes successfully, but the Rumus UI continues to show that the action is still running.

Additional Information

  • This issue occurs when using a custom OpenAI-compatible endpoint.

  • The endpoint itself is working correctly.

  • Commands are successfully executed on the server.

  • The problem appears to be related to how Rumus handles the completion status or streaming response from the model.

Could this be related to streaming response handling, tool-calling completion detection, or compatibility with OpenAI-compatible providers?

testkey 6/6/2026

The reason is that the model response time is too long, but the final result is all correct.

Rumus 6/7/2026

Thanks for the update — and great news that the port fix got you connected on v0.1.20! 🎉

This new "stuck on Executing" behavior is a separate issue, and your own diagnosis is spot on: it's tied to the very long response time of the model (Qwen3-Next-80B on NIM is a big model, so time-to-first-token can be quite long).

Here's what's happening under the hood: Rumus drives the conversation off the model's streaming response. Right now there's no idle timeout or heartbeat on that stream — so if the model goes quiet for a long stretch while it's "thinking" (or an intermediary like a gateway/proxy/load-balancer silently drops the idle connection without closing it cleanly), the client never receives a stream-end or an error. It just keeps waiting, which is why the UI stays in Executing and the send button stays disabled until you refresh — even though the command already ran and the final result is correct.

What we're going to do: add an idle-timeout watchdog on the model stream, so a stalled/slow response turns into a visible, retryable error instead of an indefinite hang. We'll get this into an upcoming release.

In the meantime, a couple of things that may help:

  • If there's a gateway/reverse proxy/load-balancer in front of NIM, bump its idle/read timeout (and any streaming buffering settings) so it doesn't drop long-running streamed connections.

  • Make sure the endpoint is actually streaming responses (SSE) rather than buffering the full reply.

  • For interactive use, a faster/smaller model will sidestep the long time-to-first-token entirely.

We'll update this thread once the watchdog fix ships. Thanks again for the thorough reports — they're genuinely helpful. 🚀

Related Posts