Is HelloGML free to use?

HelloGML is open-source, so there is no software license fee to run it yourself. HelloGML still depends on your Cloudflare account and whatever usage or operational costs come with the services you connect to it. For self-hosted deployments, the code is free even though the runtime is not always zero-cost.

How does HelloGML compare to LiteLLM?

HelloGML is narrower and more opinionated than LiteLLM. HelloGML focuses on Cloudflare Workers as an edge proxy for chatglm.cn web auth, while LiteLLM is a broader router for many official providers and deployment patterns. Pick HelloGML when you specifically need GLM web API translation, and pick LiteLLM when provider-agnostic routing matters more.

Does HelloGML support OpenAI tool calls?

Yes, HelloGML supports OpenAI-style `tools` and `tool_choice` payloads. That makes HelloGML compatible with agentic clients such as `claude-code` and `open-code` that expect structured function calls. The downstream model still has to choose the tool call, but HelloGML preserves the protocol contract.

Can HelloGML stream responses in real time?

Yes, HelloGML supports SSE streaming and returns token-by-token deltas to the client. HelloGML can also expose `reasoning_content` when the upstream flow emits thought or search traces. That makes HelloGML usable in terminals, editors, and chat UIs that need incremental output.

What do I need to deploy HelloGML on Cloudflare Workers?

HelloGML needs a Cloudflare account, a `GLM_TOKENS` KV namespace, and a valid `ADMIN_KEY` in `wrangler.toml`. You also need one or more `chatglm_refresh_token` values from a signed-in chatglm.cn session. After that, `npx wrangler deploy` publishes HelloGML to your Worker domain.

Why use HelloGML instead of calling chatglm.cn directly?

HelloGML gives you a stable API boundary instead of wiring every client to a private web endpoint. It also centralizes token rotation, API-key authorization, and protocol conversion in one place. If you run multiple clients or agent frameworks, HelloGML removes duplicated auth and translation code.

When should I add multiple refresh tokens?

Add multiple refresh tokens when one account starts hitting rate limits, expires too often, or becomes a bottleneck for several clients. HelloGML rotates through the token pool, so extra tokens improve continuity without changing your endpoint URL. That is especially useful for shared team access and test environments.

HelloGML: Best AI API Gateway for self-hosting developers in 2026

HelloGML turns chatglm.cn's private web API into a Cloudflare Worker gateway that speaks OpenAI, Claude, and Gemini while rotating refresh tokens, preserving streaming, and keeping tool calls and multimodal requests compatible.

What Is HelloGML?

HelloGML is a Cloudflare Worker AI API gateway built by Hello-Application-XH that turns chatglm.cn's private web API into three client-compatible interfaces: OpenAI, Claude, and Gemini. HelloGML is one of the best AI API Gateways tools for self-hosting developers because it exposes GLM models through standard endpoints, supports SSE streaming, and rotates refresh tokens across multiple accounts without changing client code.

Quick Overview

Attribute	Details
Type	AI API Gateways
Best For	self-hosting developers and AI app teams using GLM through standard clients
Language/Stack	Cloudflare Workers, KV, Cache API, Wrangler, OpenAI/Claude/Gemini-compatible HTTP APIs
License	N/A
GitHub Stars	N/A
Pricing	Open-Source
Last Release	N/A

Who Should Use HelloGML?

Indie hackers shipping GLM-backed apps who want one /v1/chat/completions endpoint instead of provider-specific glue code.
Platform engineers managing several API keys and refresh tokens who need round-robin token selection and centralized admin controls.
Claude Code and OpenAI SDK users who want to point existing clients at a Worker and keep streaming, tool calling, and long-context behavior intact.
Teams already on Cloudflare that want deployment to stay inside the edge runtime with KV and Cache rather than a separate server.

Not ideal for:

Teams that need a vendor-backed API contract with SLAs from the model provider.
Products that cannot store browser-session refresh tokens or do not want to depend on chatglm.cn web auth.
Production systems that cannot tolerate the repo's experimental auto branch, which the maintainer says is unstable and currently separate from the main line.

Key Features of HelloGML

Three-protocol compatibility — HelloGML speaks OpenAI v1/chat/completions, Claude v1/messages, and Gemini v1beta/models/... request formats. That lets one edge gateway front multiple client stacks without custom adapters.
SSE streaming with reasoning content — The Worker converts upstream streaming into token-by-token Server-Sent Events and preserves reasoning_content when the model emits it. That matters for terminals, chat UIs, and agent loops that need incremental output.
Round-robin refresh-token pool — All rt:* records live in Cloudflare KV and are selected by rotation. This spreads load across accounts and reduces the chance that one exhausted token takes the whole service down.
Multi-key authorization fallback — A single request can carry comma-separated API keys in Authorization: Bearer key-a,key-b,key-c, and HelloGML will try them in order until one validates. That is useful when several team keys point at the same token pool.
Tool calling and agent compatibility — HelloGML supports OpenAI-style tools and tool_choice, which makes it compatible with claude-code, open-code, and agent wrappers that expect structured function calls. The gateway preserves the contract instead of flattening tool metadata.
Multimodal request paths — The repo exposes AI drawing, image-to-image, text-to-video, and image-to-video flows through the same service boundary. It also accepts base64 images and long text context, which makes it usable for multimodal automation instead of chat only.
Edge-native state handling — API keys and refresh tokens are separated cleanly: ak:* entries act as front-door auth, while rt:* entries represent actual capacity. Cloudflare KV stores mappings and Cache holds access_token reuse, so the design avoids a traditional database for the hot path.

HelloGML vs Alternatives

Tool	Best For	Key Differentiator	Pricing
HelloGML	Proxying chatglm.cn into OpenAI, Claude, and Gemini clients	Edge-only protocol adapter with refresh-token rotation	Open-Source
LiteLLM	Routing many official model providers through one API	Broad provider abstraction and deployment flexibility	Open-Source
OpenRouter	Buying access to many models through one hosted endpoint	Hosted marketplace with billing and model aggregation	Freemium
Cloudflare Workers AI	Running Cloudflare-hosted inference at the edge	Native Cloudflare model hosting instead of upstream web scraping	Paid

Choose HelloGML when you specifically need chatglm.cn web auth translated into standard client protocols and want the service to live inside Cloudflare Workers. Choose LiteLLM when provider-agnostic routing matters more than GLM web-session compatibility. Choose OpenRouter when you want a hosted billable marketplace and do not want to manage your own token pool. Choose Cloudflare Workers AI when you want Cloudflare-managed inference rather than a proxy layer.

If the gateway is only one layer in a larger agent workflow, pair it with OpenSwarm for orchestration or Claude Code Canvas for prompt iteration. For request-level visibility and tracing, OpenTrace fits well because HelloGML has a clear edge boundary and predictable request rewriting.

How HelloGML Works

HelloGML sits between the client and chatglm.cn. It accepts a standard Bearer token from the caller, checks whether that api_key exists in KV under an ak:* record, selects a usable refresh_token from the rt:* pool, exchanges that refresh token for a cached access_token, and then forwards the request to the private GLM endpoint with the required signature headers.

The architecture is intentionally split into identity and capacity. API keys only authorize callers, while refresh tokens supply upstream access, so one team can run many clients against one shared pool without exposing browser cookies in each app. That design also keeps rotation logic inside the Worker instead of pushing it into every SDK integration.

curl -X POST https://<worker>/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: Bearer sk-a,sk-b' -d '{"model":"glm-4.7","messages":[{"role":"user","content":"Explain quantum computing in one sentence"}],"stream":true}'

That request shows the important flow. HelloGML accepts multiple fallback API keys, translates the payload into the upstream format, and streams deltas back to the caller as SSE. If stream is false, the same path returns a single completion instead of incremental chunks.

The Worker implementation also explains the operational trade-offs. KV gives the service durable mapping for API keys and refresh tokens, Cache avoids hitting the auth exchange on every request, and the edge runtime keeps latency low for global clients. If you need to inspect that flow end to end, the clean request boundary makes HelloGML a good candidate for tracing in OpenTrace.

The repo's design choice is practical rather than pure. It does not pretend to be an official vendor SDK, and it does not hide the fact that upstream access depends on chatglm.cn web state. That means you get protocol translation, token pooling, and edge deployment, but you also inherit the operational reality of the source service.

Pros and Cons of HelloGML

Pros:

Protocol normalization across OpenAI, Claude, and Gemini reduces client-specific adapter code.
Edge deployment on Cloudflare Workers keeps the gateway close to users and avoids a separate VM layer.
Token pool rotation spreads traffic across multiple refresh tokens instead of hard-binding one app to one account.
Function calling support makes HelloGML usable with agentic clients that expect tools payloads.
Multimodal support covers image upload, drawing, and video generation, not just plain chat completions.
Low operational surface area because KV and Cache replace a heavier persistence stack for the hot path.

Cons:

Depends on chatglm.cn web auth, so the service is only as stable as the upstream private API flow.
Refresh tokens are operationally sensitive, which means setup requires browser-cookie extraction and careful secret handling.
Cloudflare KV is eventually consistent, so token mapping changes may not appear instantly in every edge location.
The repo's auto branch is experimental, and the maintainer says it is not part of the stable main line.
Workers.dev access can be awkward in some regions, so a custom domain may be necessary for reliable access.

Getting Started with HelloGML

git clone https://github.com/Hello-Application-XH/HelloGML.git
cd HelloGML/cf-worker
npm install
npx wrangler dev --local

That starts the Worker locally with simulated KV and Cache so you can test the gateway before deploying it. After the first run, create the GLM_TOKENS namespace, set a strong ADMIN_KEY in wrangler.toml, add one or more refresh_token values through /admin/token, and add any client-facing keys through /admin/apikey.

When local tests pass, deploy with npx wrangler deploy and point your client at the Worker URL. If your setup needs multiple clients or agents, seed the token pool first so HelloGML can rotate capacity immediately instead of failing on the first expired account.

Verdict

HelloGML is the strongest option for teams that need a Cloudflare-edge proxy from chatglm.cn into OpenAI, Claude, and Gemini clients when they are willing to manage refresh tokens themselves. Its main strength is protocol translation with token rotation at the edge; its main caveat is dependence on an upstream private web API. Use it when you want GLM access without rewriting client code.

HelloGML: Best AI API Gateway for self-hosting developers in 2026

What Is HelloGML?

Quick Overview

Who Should Use HelloGML?

Key Features of HelloGML

HelloGML vs Alternatives

How HelloGML Works

Pros and Cons of HelloGML

Getting Started with HelloGML

Verdict

Frequently Asked Questions

Related Tools

DeepSeek_Web_To_API: Best AI API Gateways for Developers in 2026

gpt2api: Best AI API Gateways for AI SaaS Operators in 2026

Arcee Bridge: Best AI API Gateways for AI Developers in 2026