What Is HelloGML?
HelloGML is a Cloudflare Worker AI API gateway built by Hello-Application-XH that turns chatglm.cn's private web API into three client-compatible interfaces: OpenAI, Claude, and Gemini. HelloGML is one of the best AI API Gateways tools for self-hosting developers because it exposes GLM models through standard endpoints, supports SSE streaming, and rotates refresh tokens across multiple accounts without changing client code.
Quick Overview
| Attribute | Details |
|---|---|
| Type | AI API Gateways |
| Best For | self-hosting developers and AI app teams using GLM through standard clients |
| Language/Stack | Cloudflare Workers, KV, Cache API, Wrangler, OpenAI/Claude/Gemini-compatible HTTP APIs |
| License | N/A |
| GitHub Stars | N/A |
| Pricing | Open-Source |
| Last Release | N/A |
Who Should Use HelloGML?
- Indie hackers shipping GLM-backed apps who want one
/v1/chat/completionsendpoint instead of provider-specific glue code. - Platform engineers managing several API keys and refresh tokens who need round-robin token selection and centralized admin controls.
- Claude Code and OpenAI SDK users who want to point existing clients at a Worker and keep streaming, tool calling, and long-context behavior intact.
- Teams already on Cloudflare that want deployment to stay inside the edge runtime with KV and Cache rather than a separate server.
Not ideal for:
- Teams that need a vendor-backed API contract with SLAs from the model provider.
- Products that cannot store browser-session refresh tokens or do not want to depend on chatglm.cn web auth.
- Production systems that cannot tolerate the repo's experimental auto branch, which the maintainer says is unstable and currently separate from the main line.
Key Features of HelloGML
- Three-protocol compatibility — HelloGML speaks OpenAI
v1/chat/completions, Claudev1/messages, and Geminiv1beta/models/...request formats. That lets one edge gateway front multiple client stacks without custom adapters. - SSE streaming with reasoning content — The Worker converts upstream streaming into token-by-token Server-Sent Events and preserves
reasoning_contentwhen the model emits it. That matters for terminals, chat UIs, and agent loops that need incremental output. - Round-robin refresh-token pool — All
rt:*records live in Cloudflare KV and are selected by rotation. This spreads load across accounts and reduces the chance that one exhausted token takes the whole service down. - Multi-key authorization fallback — A single request can carry comma-separated API keys in
Authorization: Bearer key-a,key-b,key-c, and HelloGML will try them in order until one validates. That is useful when several team keys point at the same token pool. - Tool calling and agent compatibility — HelloGML supports OpenAI-style
toolsandtool_choice, which makes it compatible withclaude-code,open-code, and agent wrappers that expect structured function calls. The gateway preserves the contract instead of flattening tool metadata. - Multimodal request paths — The repo exposes AI drawing, image-to-image, text-to-video, and image-to-video flows through the same service boundary. It also accepts base64 images and long text context, which makes it usable for multimodal automation instead of chat only.
- Edge-native state handling — API keys and refresh tokens are separated cleanly:
ak:*entries act as front-door auth, whilert:*entries represent actual capacity. Cloudflare KV stores mappings and Cache holdsaccess_tokenreuse, so the design avoids a traditional database for the hot path.
HelloGML vs Alternatives
| Tool | Best For | Key Differentiator | Pricing |
|---|---|---|---|
| HelloGML | Proxying chatglm.cn into OpenAI, Claude, and Gemini clients | Edge-only protocol adapter with refresh-token rotation | Open-Source |
| LiteLLM | Routing many official model providers through one API | Broad provider abstraction and deployment flexibility | Open-Source |
| OpenRouter | Buying access to many models through one hosted endpoint | Hosted marketplace with billing and model aggregation | Freemium |
| Cloudflare Workers AI | Running Cloudflare-hosted inference at the edge | Native Cloudflare model hosting instead of upstream web scraping | Paid |
Choose HelloGML when you specifically need chatglm.cn web auth translated into standard client protocols and want the service to live inside Cloudflare Workers. Choose LiteLLM when provider-agnostic routing matters more than GLM web-session compatibility. Choose OpenRouter when you want a hosted billable marketplace and do not want to manage your own token pool. Choose Cloudflare Workers AI when you want Cloudflare-managed inference rather than a proxy layer.
If the gateway is only one layer in a larger agent workflow, pair it with OpenSwarm for orchestration or Claude Code Canvas for prompt iteration. For request-level visibility and tracing, OpenTrace fits well because HelloGML has a clear edge boundary and predictable request rewriting.
How HelloGML Works
HelloGML sits between the client and chatglm.cn. It accepts a standard Bearer token from the caller, checks whether that api_key exists in KV under an ak:* record, selects a usable refresh_token from the rt:* pool, exchanges that refresh token for a cached access_token, and then forwards the request to the private GLM endpoint with the required signature headers.
The architecture is intentionally split into identity and capacity. API keys only authorize callers, while refresh tokens supply upstream access, so one team can run many clients against one shared pool without exposing browser cookies in each app. That design also keeps rotation logic inside the Worker instead of pushing it into every SDK integration.
curl -X POST https://<worker>/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: Bearer sk-a,sk-b' -d '{"model":"glm-4.7","messages":[{"role":"user","content":"Explain quantum computing in one sentence"}],"stream":true}'
That request shows the important flow. HelloGML accepts multiple fallback API keys, translates the payload into the upstream format, and streams deltas back to the caller as SSE. If stream is false, the same path returns a single completion instead of incremental chunks.
The Worker implementation also explains the operational trade-offs. KV gives the service durable mapping for API keys and refresh tokens, Cache avoids hitting the auth exchange on every request, and the edge runtime keeps latency low for global clients. If you need to inspect that flow end to end, the clean request boundary makes HelloGML a good candidate for tracing in OpenTrace.
The repo's design choice is practical rather than pure. It does not pretend to be an official vendor SDK, and it does not hide the fact that upstream access depends on chatglm.cn web state. That means you get protocol translation, token pooling, and edge deployment, but you also inherit the operational reality of the source service.
Pros and Cons of HelloGML
Pros:
- Protocol normalization across OpenAI, Claude, and Gemini reduces client-specific adapter code.
- Edge deployment on Cloudflare Workers keeps the gateway close to users and avoids a separate VM layer.
- Token pool rotation spreads traffic across multiple refresh tokens instead of hard-binding one app to one account.
- Function calling support makes HelloGML usable with agentic clients that expect
toolspayloads. - Multimodal support covers image upload, drawing, and video generation, not just plain chat completions.
- Low operational surface area because KV and Cache replace a heavier persistence stack for the hot path.
Cons:
- Depends on chatglm.cn web auth, so the service is only as stable as the upstream private API flow.
- Refresh tokens are operationally sensitive, which means setup requires browser-cookie extraction and careful secret handling.
- Cloudflare KV is eventually consistent, so token mapping changes may not appear instantly in every edge location.
- The repo's auto branch is experimental, and the maintainer says it is not part of the stable main line.
- Workers.dev access can be awkward in some regions, so a custom domain may be necessary for reliable access.
Getting Started with HelloGML
git clone https://github.com/Hello-Application-XH/HelloGML.git
cd HelloGML/cf-worker
npm install
npx wrangler dev --local
That starts the Worker locally with simulated KV and Cache so you can test the gateway before deploying it. After the first run, create the GLM_TOKENS namespace, set a strong ADMIN_KEY in wrangler.toml, add one or more refresh_token values through /admin/token, and add any client-facing keys through /admin/apikey.
When local tests pass, deploy with npx wrangler deploy and point your client at the Worker URL. If your setup needs multiple clients or agents, seed the token pool first so HelloGML can rotate capacity immediately instead of failing on the first expired account.
Verdict
HelloGML is the strongest option for teams that need a Cloudflare-edge proxy from chatglm.cn into OpenAI, Claude, and Gemini clients when they are willing to manage refresh tokens themselves. Its main strength is protocol translation with token rotation at the edge; its main caveat is dependence on an upstream private web API. Use it when you want GLM access without rewriting client code.



