Frequently
asked questions
Everything you need to know about running inference on Keln. Still have questions? Reach out at kris@keln.ai.
Keln is an LLM inference provider. We serve large open-weight models — starting with OpenAI's GPT OSS 120B, with GLM 5.1, Kimi K2.5, MiniMax M2.7 and others coming soon — on NVIDIA Blackwell hardware, available through OpenRouter at prices below the official API. Our optimized infrastructure lets us serve them for less.
Teams building on open-weight LLMs through OpenRouter — especially ones running agent and coding workloads through tools like Cursor, Cline, Claude Code, Aider, and Continue, where long context and caching matter most.
Same category — inference for open-weight LLMs on OpenRouter. What sets Keln apart: we run on the latest NVIDIA B200 and B300 GPUs, host larger models that most providers skip, publish live speed and latency benchmarks for every model, and come in below official API pricing with no tiers, thanks to our optimized infrastructure.
No. Keln is a managed service. The models we serve are open-weight, so you can always run them yourself, but Keln itself is hosted and accessed through OpenRouter.
GPT OSS 120B (OpenAI) is live today. GLM 5.1 (Zhipu AI), Kimi K2.5 (Moonshot), MiniMax M2.7, Qwen 3.5 397B, and Step 3.5 Flash are coming soon, alongside a rotating catalog of popular community models. The full list with live benchmarks is on the Models page.
Possibly. If it is an open-weight model and there is real demand, we will evaluate it. Email kris@keln.ai with the model and your expected volume.
No. Keln serves open-weight models as released — we don't alter the weights. What we optimize is how they run (batching, precision, caching), and we publish quality benchmarks against the original so you can see how closely we match it.
NVIDIA B200 and B300 GPUs — the Blackwell generation. These are the fastest GPUs available for inference today, which is what lets us host the largest open-weight models.
Across multiple regions and cloud providers, with automatic failover between them. A single region outage doesn't take your traffic offline.
99.5% or better, backed by redundant capacity across multiple regions and clouds.
Yes. Tokens per second, time-to-first-token, and pricing are published for every model, updated continuously. See the Models page.
Per-token, priced below the official API for each model. No subscriptions, no tiers, no minimum spend — our optimized infrastructure lets us offer lower rates than the model's own provider.
No. Billing flows through OpenRouter. You top up OpenRouter credit, route requests to Keln-served model IDs, and OpenRouter handles the rest. There is no separate Keln invoice or signup for standard usage.
Call the OpenRouter API and pick a Keln-served model. Any OpenAI-compatible client works — no Keln-specific SDK or separate signup required.
Yes. All of these tools work with OpenRouter, so pointing them at a Keln-served model works out of the box. We tune our setup specifically for the long-context, cache-heavy usage these tools produce.
Yes — the full context window of each model is available. Our setup is built for sustained long-context use, not just short chat. Agent tools sending hundreds of thousands of input tokens per request are a first-class use case.
Yes. Token streaming is available on every model we serve through OpenRouter.
Yes, for models that support them. Function calling, JSON mode, and tool use pass through unchanged — we don't restrict features the underlying model already supports.
OpenRouter applies its standard account-level rate limits. Keln doesn't add any of its own. For sustained high-volume usage, reach out to kris@keln.ai about reserved capacity.
No. Keln doesn't store your prompts or completions beyond what's needed to serve each request, and we never train on user data. The only thing we keep is basic operational metadata (latency, token counts, status) for billing and reliability.