Introduction
Obol tracks what you spend on Large Language Models across OpenAI, Anthropic, OpenRouter, Google, Mistral, Groq, xAI, DeepSeek, Cohere, and OpenCode Go. Providers fall into three categories depending on how usage data is collected:
- Sync providers — Obol polls the provider's usage API hourly and builds a historical dashboard automatically. Works with OpenAI, Anthropic, and OpenRouter.
- CSV upload — Google AI has no public usage API. You can import history via CSV upload, or route traffic through the proxy for live tracking.
- Proxy-only providers — Mistral, Groq, xAI, DeepSeek, Cohere, and OpenCode Go are tracked per-request as traffic flows through the Obol LLM proxy. Some (xAI, Cohere) have partial billing APIs — Obol may add sync support in the future. OpenCode Go is a flat-rate plan, so per-call cost displays as $0 by design — see the OpenCode Go section.
You can use either or both. Most users pair them: sync for historical visibility, proxy for zero-lag observability on hot traffic.
Connections
A connection is one provider API key that Obol uses to pull usage data on your behalf. Each provider has a specific key type it accepts — using the wrong kind is the most common reason a connection fails.
| Provider | Aggregate usage API | Obol tracking | Key type |
|---|---|---|---|
| OpenAI | Yes | Auto-sync + proxy | Admin key (sync) + separate inference key (proxy) |
| Anthropic | Yes | Auto-sync + proxy | Admin key sk-ant-admin... (sync) + API key (proxy) |
| OpenRouter | Yes | Auto-sync + proxy | Same key for both |
| Google (Gemini) | Partial | CSV upload + proxy | Gemini API key |
| Mistral | No | Proxy only | API key |
| Groq | No | Proxy only | API key |
| xAI (Grok) | Available | Proxy only (sync planned) | API key |
| DeepSeek | No | Proxy only | API key |
| Cohere | Partial | Proxy only | API key |
Anthropic
Auto-syncRequires an Admin API key — format:
sk-ant-admin01-…
Important: Admin keys are only available if you have an Organization in the Anthropic Console, and only members with the admin role can create them. Individual (personal) accounts can't generate admin keys. Creating an organization is free and takes under a minute.
- Go to console.anthropic.com and sign in.
- Open Settings → Organization. If you don't have one, create it — free, instant.
- In the left sidebar, click Admin Keys.
- Click Create Admin Key, name it, copy the key. It is shown only once.
- In Obol → Connections → Add connection → Anthropic, paste and save.
CSV fallback
Can't create an admin key? console.anthropic.com → Usage → Export CSV, then upload in Obol.
What syncs: per-model daily spend, input tokens, output tokens, cache-read tokens, cache-creation tokens.
For the proxy:
admin keys cannot make inference calls. If you want to use the LLM proxy,
you'll also need a regular
sk-ant-api-…
key — attach it on the connection card. See
Admin vs inference keys.
OpenAI
Auto-syncRequires an Admin key — format:
sk-admin-…
Important:
OpenAI splits keys into admin keys (for
org-level management endpoints including usage) and project keys
(for inference). Project keys return 403 on
/organization/usage.
You need an admin key for sync.
- Go to platform.openai.com → Settings → API keys.
- In the key type selector, choose Admin (not "Project").
- Create the key, copy it, paste into Obol.
What syncs: completions, embeddings, images, audio — all usage types fetched in parallel, per-model daily.
For the proxy:
admin keys can't make chat completions. Attach a separate
sk-… or
sk-proj-…
inference key. See
Admin vs inference keys.
OpenRouter
Auto-syncRequires a Management key — format:
sk-or-v1-…
Important: OpenRouter has two key types — inference keys (for API calls) and management keys (for admin/usage). Management keys can't make completions, and inference keys can't read usage. For sync, use a management key.
- Go to openrouter.ai/settings/management-keys.
- Click Create new key, copy, paste into Obol.
Limitation
OpenRouter's activity API only exposes the last 30 completed UTC days. Older data is not available via auto-sync.
CSV fallback
openrouter.ai/activity → Export CSV, then upload in Obol.
What syncs: per-model daily spend, prompt tokens, completion tokens, reasoning tokens, request counts.
For the proxy: you can attach a separate inference key if you want to isolate proxy traffic, or leave it blank and the management key falls back automatically — OpenRouter accepts either for inference.
Google AI
CSV only · proxy recommendedAny valid Google AI Studio API key. No public usage API exists — data must come from CSV upload, or the proxy.
Google AI Studio exposes usage in its UI but has no public usage endpoint Obol can poll. Your options are to upload CSVs manually, or (strongly recommended) route requests through the Obol LLM proxy so every call is metered as it happens.
- Go to aistudio.google.com/app/apikey and copy any valid key.
- In Obol → Connections → Add connection → Google AI, paste the key. This validates the key and unlocks CSV upload — it does not start an auto-sync.
- To import history: in Google AI Studio, open the usage/logs panel and export as CSV, then upload via the connection card's Upload CSV button.
For live tracking: switch your Gemini SDK to point at Obol (see Proxy quick start). Every call will flow through Obol and show up in your dashboard in seconds.
Proxy-only providers
Proxy onlyCovers Mistral, Groq, xAI, DeepSeek, and Cohere.
These providers are currently tracked through the proxy only. Every call is metered and lands in your dashboard in seconds. OpenCode Go is also proxy-only but has its own section below because it's a flat-rate gateway with a fixed model list.
| Provider | Per-response usage | Aggregate API | Notes |
|---|---|---|---|
| Mistral | Yes | No | Dashboard only at console.mistral.ai |
| Groq | Yes | No | Open feature request on their community forum |
| xAI (Grok) | Yes | Available | Management API at management-api.x.ai — sync support planned |
| DeepSeek | Yes | No | Dashboard only, no programmatic API |
| Cohere | Yes | Partial | Per-response billed_units metadata, no aggregate endpoint |
- In Obol, go to Connections → Add connection and select the provider.
- Paste your provider API key (the same key you'd use for inference).
- Create a virtual key on the Proxy page.
- Point your app at
https://useobol.pages.dev/v1/chat/completionsand set the headerX-Obol-Provider: <provider>(e.g.X-Obol-Provider: mistral).
Important: Because there is no usage API, the only way to capture spend for these providers is through the proxy. Direct calls to the provider bypass Obol entirely and will not appear in your dashboard.
OpenCode Go
Proxy only Flat rateOpenCode Go
is a $10/month flat-rate gateway from
sst/opencode
that resells curated open-source coding models — GLM, Kimi, MiniMax, MiMo, plus a few
others — over an OpenAI-compatible API at
https://opencode.ai/zen/go/v1.
Cost on Obol displays as $0. That's truthful — Go is a flat-rate plan with usage caps ($12 / 5h, $30 / wk, $60 / mo measured in dollar-value of upstream model time, not a per-token bill). Obol records latency, errors, and quota — the dollar column is just inert. Use OpenCode's own dashboard for the flat-rate consumption view.
Models
Pulled from
GET https://opencode.ai/zen/go/v1/models
(public — no auth needed). 14 models live today; OpenCode adds more periodically. Any
new model proxies fine immediately, just records as unpriced_model until the
supplement is updated.
| Family | Models | Auto-route prefix |
|---|---|---|
| GLM (Alibaba) | glm-5.1, glm-5 | glm- |
| Kimi (Moonshot) | kimi-k2.6, kimi-k2.5 | kimi- |
| MiniMax | minimax-m2.7, minimax-m2.5 | minimax- |
| MiMo | mimo-v2-pro, mimo-v2-omni, mimo-v2.5, mimo-v2.5-pro | mimo- |
| Qwen (Alibaba) | qwen3.5-plus, qwen3.6-plus | none — header required |
| DeepSeek (via OpenCode) | deepseek-v4-pro, deepseek-v4-flash | none — header required |
Auto-route prefixes route directly without
X-Obol-Provider.
Qwen and DeepSeek prefixes are deliberately excluded because they collide with Groq /
OpenRouter / native DeepSeek — those families need an explicit
X-Obol-Provider: opencode
header to disambiguate.
Setup
- Sign up for OpenCode Go at opencode.ai/go and copy your API key from the OpenCode Zen dashboard.
- In Obol, open Connections → Add connection,
choose OpenCode Go, paste the key. The
test step validates by calling
GET /zen/go/v1/models. - Mint a virtual key on the
Proxy page —
optionally set
allowed_providers=["opencode"]so this key is dedicated to OpenCode traffic. - Point your client at
https://useobol.pages.dev/v1/chat/completionsusing your Obol virtual key. SetX-Obol-Provider: opencode(or rely on auto-route forglm-/kimi-/minimax-/mimo-models).
Test request
curl https://useobol.pages.dev/v1/chat/completions \ -H "Authorization: Bearer obol_sk_live_..." \ -H "X-Obol-Provider: opencode" \ -H "Content-Type: application/json" \ -d '{"model":"glm-5.1","messages":[{"role":"user","content":"hi"}],"max_tokens":10}'
OpenCode CLI integration
If you use the OpenCode CLI itself, point it at Obol via its config:
// ~/.config/opencode/config.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"opencode": {
"options": { "baseURL": "https://useobol.pages.dev/v1" }
}
}
}Then export your Obol virtual key (not the OpenCode key) as
OPENCODE_API_KEY
and run opencode.
OpenCode CLI doesn't support custom HTTP headers declaratively, so for per-feature
attribution mint one virtual key per project — Obol surfaces them by
proxy_key_id
on the Monitor dashboard.
Dashboard
The Overview page is your main spend dashboard. Each panel below is one row on that page.
Summary cards
The four stat cards at the top give you a quick pulse check.
| Card | What it shows |
|---|---|
| This Month | Total spend since the 1st of the current month. The percentage below compares to the same period last month. |
| Today | Spend since UTC midnight. Resets daily. |
| Top Model | The single most expensive model across all providers in the current window. |
| Connections | Count of provider integrations set up. Click through to manage them. |
Spend over time
Stacked area chart — each colored band is one provider, so the total height is your combined daily spend. Hover any day for a per-provider breakdown plus the total. Window is 30 days on Free, up to 90 on Pro. The x-axis starts from whichever is earlier: your oldest connection or your oldest recorded spend.
By provider
Horizontal bars ranking providers by spend share over the window. Zero-spend providers are hidden.
Model breakdown
Three tabs — Spend, Requests, Tokens — each ranking every model that produced usage in the window. Switch tabs to find your most-requested model vs your most expensive one — they're often not the same.
Two extra columns — σ (standard deviation of per-request cost) and p95/p50 spike ratio — quantify how variable a model's per-request cost is. A spike ratio ≥ 5× (rendered amber) means a small number of outliers drive most of that model's spend; investigate or cap. Variance is computed from per-request events and only shown when at least 5 events exist for the model.
Forecast Pro
Projects the current month's spend to end-of-month using a linear trend over recent daily data. If your usage is growing, the forecast will sit above a naive extrapolation of today's rate.
Cache stats Pro
Your Anthropic prompt-cache hit rate over the window: cache read tokens ÷ total input tokens. A higher hit rate means more input is being served from cache at a fraction of the normal cost.
Alerts
Set a monthly budget in Settings → Alerts. Obol emails you once when spend crosses the threshold and shows a banner on the Overview page while you're over. The alert resets on the 1st of each month.
Monitor Pro
Per-event observability for the LLM proxy. While the main dashboard shows what you spent, Monitor shows how requests behaved — latency percentiles, error taxonomy, throughput, time-of-day patterns, and per-SDK / per-IP-network breakdowns. Pulled live from Keplor's per-event log on every page load (no caching), so it reflects your last 24 hours of traffic in real time.
Monitor reads up to 1000 events from the last 24h plus 5000 events across the last 7 days for the heatmap. Sample size is shown at the top of the page, with an amber warning when the sample is clipped.
Summary + throughput
Top row of stat cards — total requests, error rate, wasted spend (cost on 4xx / 5xx / aborted requests), p50 / p95 time-to-first-token, stream-abort rate. Below that, a per-minute throughput sparkline across the 24h window. Useful for spotting traffic spikes or sudden drops at a glance.
Latency by provider & model
One row per (provider, model, mode) — streaming and non-streaming get separate rows, since TTFT means different things in each (first chunk arriving vs. full response complete). Columns: p50 / p95 / p99 TTFT, p50 / p95 total latency, and average output throughput in tok/s. Surfaces "Groq is 3× faster than Anthropic" as a number.
Includes a σ + p95/p50 spike ratio column on the /models dashboard for cost variance — distinguishes metronomic models from spiky ones where a few outliers dominate spend.
Error taxonomy
Errors grouped by Keplor's flattened error discriminator — actionable buckets
like rate_limited,
auth_failed,
context_length_exceeded,
content_filtered,
upstream_timeout,
upstream_unavailable,
invalid_request.
Each row shows total count plus per-provider breakdown, so "13 context_length_exceeded
on Anthropic" is a single line you can act on.
Latency by prompt size
Buckets requests by input-token count (0–500, 500–2K, 2K–8K, 8K+) and shows p50 / p95 TTFT and total latency for each. Tells you whether a model's time-to-first-token scales linearly with context length, or flatlines past a certain prompt size — critical when picking a model for long-context workloads.
Time-of-day heatmap
A 7-day × 24-hour grid where each cell is colored by request volume (or error rate, via the toggle). Reveals "Thursday 14:00 UTC is always our peak" or "we have a stuck retry loop running every Saturday at 03:00". Hover any cell for exact counts. Times are in UTC; the cell tooltip shows the absolute hour.
Provider failure correlation
Pearson correlation matrix of per-provider error rates across 5-minute windows. High amber cells (≥0.7) mean two providers fail at the same minute — usually a shared-infrastructure outage (your network, Cloudflare, or a region-wide incident). Near-zero cells mean the providers fail independently — the issue is single-provider, like a key exhausting its rate limit.
Distinguishes "Anthropic AND Mistral both 5xx'd at 14:30" from "key X exhausted on Anthropic only" — answers a question the per-provider error count alone can't. Hidden when fewer than 2 providers are active in the window.
SDK + geographic breakdowns
SDK breakdown: coarse identifier
derived from the caller's User-Agent header
(openai-python,
anthropic-sdk-typescript,
curl,
browser,
etc.). Long tail collapses into other;
raw User-Agent strings are never displayed.
Geographic spread: top 10 source-IP
buckets at /24
for IPv4 and /64
for IPv6 — raw IPs are never rendered. Useful for spotting traffic concentration
(one /24 doing 95% = single backend service) or anomalies (a single /24 with
500 errors and 0 successes = misconfigured client or attacker).
Both sections only populate from requests that arrive after Obol started forwarding User-Agent / client IP. They hide automatically on accounts with no recorded events of either kind.
LLM Proxy
Point your apps at Obol's edge instead of the provider directly. Every request is metered in real time — no 24-hour polling lag, no admin keys required for tracking, and we can capture Google Gemini usage (which has no public usage API).
Base URL
https://useobol.pages.dev
Quick start
- Attach an inference key to each provider you want to proxy (see Admin vs inference keys).
- Go to Proxy → Create virtual key. The raw
obol_sk_live_…token is shown exactly once — copy it now. - Point your SDK at Obol, using the virtual key as the API key.
- Open Proxy → Recent requests to watch calls in real time.
OpenAI (Python)
from openai import OpenAI
client = OpenAI(
base_url="https://useobol.pages.dev/v1",
api_key="obol_sk_live_…",
)
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"hi"}],
)Anthropic (Python)
from anthropic import Anthropic
client = Anthropic(
base_url="https://useobol.pages.dev",
api_key="obol_sk_live_…",
)
client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=50,
messages=[{"role":"user","content":"hi"}],
)Gemini (curl)
curl "https://useobol.pages.dev/v1beta/models/gemini-2.0-flash:generateContent?key=obol_sk_live_…" \
-H "content-type: application/json" \
-d '{"contents":[{"parts":[{"text":"hi"}]}]}'Virtual keys
Each virtual key is the token your app presents to the proxy. They're distinct from your provider API keys and from the Personal Access Tokens used for the desktop widget.
- Format:
obol_sk_live_<64 hex>. - Hashed at rest (SHA-256). Only the prefix is stored unhashed for UI display.
- Every key can set its own RPM cap, daily request cap, monthly budget cap (Pro), allowed providers, and allowed-models pattern.
- Revoking a key takes effect on the next proxy call — no caching.
Admin vs inference keys
OpenAI and Anthropic split their API surface into two distinct keys:
| Provider | Sync key (admin) | Proxy key (inference) |
|---|---|---|
| OpenAI | sk-admin-… | sk-… or sk-proj-… |
| Anthropic | sk-ant-admin-… | sk-ant-api-… |
| OpenRouter | sk-or-v1-… (management) | Same key works — optional |
| Google AI | n/a (no usage API) | Same key works |
Admin keys can read usage reports but cannot make inference calls.
If you try to run the proxy against a connection that only has an admin key,
Obol returns a clear 402 no_inference_key
with setup instructions rather than forwarding and getting rejected upstream.
Attach an inference key from the Connections page — each card shows "Add proxy inference key"
when one is needed. The admin key stays in place for the hourly sync; the inference key is
used only by /v1/*
proxy routes.
Endpoints & routing
| Path | Compat | Upstream |
|---|---|---|
| POST /v1/chat/completions | OpenAI | OpenAI or OpenRouter |
| POST /v1/messages | Anthropic | Anthropic |
| POST /v1beta/models/{m}:generateContent | Gemini | |
| POST /v1beta/models/{m}:streamGenerateContent | Gemini (stream) |
/v1/chat/completions
routes to OpenAI by default. Use OpenRouter either by setting the header
X-Obol-Provider: openrouter
or by passing an OpenRouter-style slug like
anthropic/claude-3-5-sonnet
in the model field.
Customer headers
Optional headers your app can send on every proxied request. None are required; all are length-capped server-side so a hostile client can't bloat the audit row.
| Header | Purpose |
|---|---|
| X-Obol-Provider | Force a specific provider for ambiguous endpoints (e.g. openrouter on /v1/chat/completions). |
| X-Obol-Fallback | Comma-separated provider chain for cross-provider failover. See Fallback. |
| X-Obol-User | Opaque tag for grouping requests by feature, tenant, or customer. Surfaces in Monitor → User tags and CSV exports. |
| X-Obol-Session | Opaque per-conversation tag. Same shape as X-Obol-User; intended for grouping multi-turn flows. |
| X-Obol-Cache | Set to bypass to skip the response cache for a single request — useful when you've enabled per-key caching but need a fresh answer. |
| Idempotency-Key | Deduplicate retries. Same key + same body replays the stored response without hitting upstream. See Idempotency. |
The proxy also auto-captures the User-Agent
header and the true client IP (via cf-connecting-ip)
on every event — no client change needed. These power the SDK and geographic
breakdowns on Monitor and are stored on Keplor as part of the audit row. Raw IPs
are never displayed in the dashboard; see Security → Data.
Cross-provider fallback
If a provider is down or you don't have a connection for it, the proxy can
automatically try the next provider in a fallback chain. Set the
X-Obol-Fallback
header with a comma-separated list of providers:
curl https://useobol.pages.dev/v1/chat/completions \ -H "Authorization: Bearer obol_sk_live_..." \ -H "X-Obol-Provider: deepseek" \ -H "X-Obol-Fallback: groq,openrouter" \ -d '...'
If DeepSeek fails (no connection, upstream 502/503/504), the proxy tries
Groq. If Groq also fails, it tries OpenRouter. When a fallback succeeds,
the response includes an
X-Obol-Fallback-Provider
header so your app knows which provider actually served the request.
- Max 3 fallback providers per request.
- Each fallback must be in the virtual key's allowed-providers list (if configured).
- Fallback only triggers on upstream failures and missing connections — not on auth errors, rate limits, or quota violations.
- Only OpenAI-compatible endpoints (
/v1/chat/completions,/v1/embeddings) can fall back across providers. Anthropic/v1/messagesand GooglegenerateContentare endpoint-locked.
Streaming
Streaming responses pass through unchanged — the first byte lands at your client as soon as the upstream produces it, so Obol adds only ~20ms of edge overhead. Token counting runs in the background on a teed copy of the stream; accounting lands in the dashboard after the stream closes.
One mutation: for OpenAI and OpenRouter we force-set
stream_options.include_usage=true
so the final SSE chunk includes token totals. It's the only change we ever make to your request body.
TTFT (time-to-first-token) is captured for streaming responses — the interval between the proxy forwarding the request and the first SSE data chunk arriving from the upstream provider.
Reasoning tokens — for OpenAI o1/o3/o4 and
other thinking models, the proxy captures
completion_tokens_details.reasoning_tokens
from the response and logs them separately for accurate cost attribution.
Cost header
Non-streaming 2xx responses include an
x-obol-cost-cents
header with the computed cost of that request:
x-obol-cost-cents: 0.0312
The value is derived from the pricing table using actual token counts from the upstream response. Useful for client-side spend tracking without polling the dashboard.
- Only present on successful non-streaming responses with token usage.
- Streaming responses do not include this header — cost is unknown until the stream completes.
- Value is in cents with 4 decimal places.
Timeout
The proxy enforces a configurable upstream timeout. If the provider does not respond
within the window, the proxy returns 504:
{ "error": { "type": "upstream_timeout", "message": "openai did not respond within 120s." } }
Default: 120 000 ms (2 minutes). Configure via the
PROXY_TIMEOUT_MS
environment variable. Set higher for o3/thinking workloads that routinely exceed 2 minutes.
Heartbeat
Long-running streaming responses (o3 extended thinking, large-context generation) can trigger intermediate proxy idle-timeout kills. The proxy injects SSE comment lines every 30 seconds of silence:
: heartbeat
Per the SSE spec, comment lines (starting with :)
are ignored by all compliant clients including the OpenAI, Anthropic, and Vercel AI SDKs.
This keeps the TCP connection alive through Cloudflare's 100-second idle timeout without
affecting the response payload.
Idempotency
Add an Idempotency-Key
header to any proxy POST to prevent duplicate charges on network retries.
Follows Stripe / OpenAI conventions.
curl https://useobol.pages.dev/v1/chat/completions \ -H "Authorization: Bearer obol_sk_live_..." \ -H "Idempotency-Key: my-unique-request-id-123" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'
Behaviour
| Scenario | Result |
|---|---|
| Same key + same body | Replays the stored response (no upstream call, no quota consumed) |
| Same key + different body | 409 idempotency_mismatch |
| Same key while original is in-flight | 409 idempotency_in_progress |
| Malformed key (>255 chars or invalid chars) | 400 invalid_idempotency_key |
| Upstream returned 5xx | Not stored — retries hit a fresh upstream |
| Streaming request | Supported — stream is buffered and stored on completion |
Key format
1–255 characters. Alphanumeric, dashes (-),
and underscores (_) only. UUIDs work well.
TTL
Keys expire after 24 hours. After expiry the same key can be reused for a new request.
Replay headers
Replayed responses include the original upstream headers that SDKs use for debugging:
x-request-id,
openai-organization,
openai-processing-ms,
anthropic-request-id,
and rate-limit headers. An additional
x-obol-idempotent-replay: true
header is added so your app can distinguish replays from fresh responses.
Quota
Confirmed replays do not consume RPM, daily request, or budget quota. You paid for the original — the replay is free.
Streaming note: When a streaming request is replayed, the response is returned as a single buffered body (not SSE chunks). Your SDK will receive the complete response at once rather than token-by-token. Most OpenAI/Anthropic SDKs handle this transparently.
Quotas & limits
| Limit | Free | Pro |
|---|---|---|
| Virtual keys | 1 | 50 |
| Requests / UTC day | 100 | Unlimited |
| RPM per key | 60 (capped) | 60 default, up to 600 |
| Monthly budget cap per key | — | Optional |
| Allowed-providers filter | ✓ | ✓ |
| Allowed-models glob filter | ✓ | ✓ |
When a limit is breached the proxy returns an OpenAI-shaped error envelope — SDK error
handlers that key off error.type
and error.code keep working unchanged.
Security
- Your upstream provider keys are encrypted at rest with AES-256-GCM. They're decrypted in memory for the duration of a single request and never logged.
- Virtual keys are hashed with SHA-256 at rest — the raw value is unrecoverable from the DB. Rotating or revoking takes effect immediately.
- Request and response bodies are not logged. We store metadata only: model, token counts, cost, status, latency, timestamp.
- Every proxy endpoint is Bearer-auth only (no cookies), exempt from same-origin checks, and returns permissive CORS — safe because virtual keys can't be replayed via CSRF.
Powered by Keplor
Under the hood, every proxy request is streamed to Keplor, an open-source LLM log aggregator written in Rust. Keplor handles:
- Automatic cost computation from a 2,263-model pricing catalog (LiteLLM). You send tokens — Keplor returns cost. Separate cache_read and cache_creation rates are applied where available (e.g. Anthropic prompt caching) for accurate cached-input pricing.
- Daily rollups refreshed every 60 seconds, so the proxy dashboard loads in milliseconds even with millions of events.
- Real-time quota checks for budget enforcement —
Obol queries Keplor's
GET /v1/quotaon every request to enforce monthly spend caps. - Event storage with zstd compression and content-hash deduplication. Single static binary under 10 MB.
Keplor is provider-agnostic — it works with any LLM gateway, not just Obol. See the Keplor docs for the full API reference.
Optimization Engine Pro
Scans your last 30 days of usage and produces actionable cost-saving recommendations. Every recommendation must project at least $1/month in savings to appear — no noise.
Dynamic swap engine
Instead of hardcoded rules, Obol scans all 342 models in its pricing table against your actual usage. For every model you use, it finds all cheaper alternatives in the same model family (Claude, GPT, Gemini, Grok, Mistral, etc.) on the same provider.
Models are classified into families and tiers (flagship, standard, lite, mini, nano). Swaps are limited to two tier steps down at most — Opus can suggest Sonnet or Haiku, but never a nano model. Confidence is based on the tier gap:
- High — same tier (newer or cheaper variant)
- Medium — one tier step down
- Low — two tier steps down
Savings are computed by re-pricing your actual token volumes (input, output, cached) with the suggested model's rates and scaling to a 30-day month. The highest-savings match per model wins. Provider-aware — an OpenRouter user only sees OpenRouter alternatives.
Each swap card carries a per-day what-if sparkline beneath the headline estimate — actual cost on the current model vs. what the same per-day traffic would have cost on the alternative, with totals + delta. Real history, not extrapolation; useful for confirming the estimate matches your actual variability before committing to a switch.
Cross-provider arbitrage
The same model can have different pricing across providers. Obol normalizes model slugs across all 9 providers (stripping vendor prefixes, date suffixes, and version format differences) to find the same underlying model available at a lower price on another provider you're already connected to.
Example: Claude Sonnet via OpenRouter might cost more than direct Anthropic for your usage pattern. Obol spots the delta and shows the exact monthly savings. These recommendations always carry high confidence — same model, same quality, just cheaper routing. The per-day what-if sparkline (see Dynamic swap engine) renders on these cards too, so you can see the historical delta in addition to the projected one.
Cache opportunity
For models that support prompt caching (Anthropic, some OpenAI models), Obol checks three conditions:
- Current cache hit rate < 10%
- Average input tokens > 2,000 (enough context to benefit from caching)
- At least 7 days of data (avoids noise from small samples)
Savings are computed using the actual pricing delta between the model's regular input rate and its cache-read rate (e.g. Anthropic charges 10% of input rate for cached tokens, Google charges 25%). Obol projects what you'd save at an 80% cache hit rate.
Efficiency insights
Beyond model choice, Obol analyzes token patterns to surface waste in your application:
- Input bloat — flags models where the input-to-output ratio exceeds 20:1. Often signals oversized system prompts or unnecessary context.
- Cache opportunity — quantifies the exact dollar amount you'd save by enabling prompt caching, based on your actual input volumes and the model's pricing rates.
- High cost per request — highlights models in the top 25% of your per-request cost, which may indicate mismatched model selection.
Each model gets a 0–100 efficiency score — a weighted composite of cost-per-request rank (50%), cache utilization (25%), and input/output balance (25%). Scores are relative to your own model set, not absolute. Only models with 7+ active days and 50+ requests are included.
What-if playground
An interactive cost simulator. Pick any model you currently use, select a target model to switch to, and see the projected cost impact — current monthly spend, projected spend, savings amount and percentage — all computed instantly in the browser using your actual token volumes.
Toggle cross-provider mode to compare models across all your connected providers. Useful for planning optimizations before committing.
Trend detection
Compares the last 7 days against the previous 7 days per model. A model is flagged when spend grows > 50% week-over-week and the projected monthly delta exceeds $1. This catches runaway costs before they hit your budget alert — an early warning with intentionally low confidence to surface potential problems without causing alarm on normal traffic spikes.
Wall of shame
A ranked table of your 10 most expensive (model, date) pairs by absolute cost — not per-request cost, but total spend on that model that day. Useful for identifying which single days burned the most and whether any model is disproportionately driving your bill.
Personality
Obol adds a human layer on top of raw cost metrics. Mood indicators, daily roasts, and relatable spend comparisons turn your dashboard into something you actually want to check — not just another monitoring page. All personality features are Pro.
Personality modes
Three modes, switchable in Settings:
- Professional — clean metric labels and dry observations. No humor.
- Casual — ASCII faces, light sarcasm, and gentle nudges when you're over budget.
- Unhinged — aggressive roasting, Hindi-English slang, and zero sympathy for your wallet.
Mode affects the mood meter, roasts, and badge flavor text. It does not affect underlying data, alerts, or cost calculations.
Mood meter
Your current-month spend as a percentage of budget drives a severity tier:
| Budget % | Severity | Animation |
|---|---|---|
| 0–25% | Chill | Slow breathing pulse |
| 25–50% | Warm | Gentle horizontal sway |
| 50–75% | Hot | Fast throbbing pulse |
| 75–100% | Fire | Rapid shake |
| > 100% | Meltdown | Glitchy jitter + hue shift |
Each mode maps to a different ASCII face at each tier — Professional uses geometric
symbols, Casual uses expressive faces like (O_O),
Unhinged adds sarcastic commentary. All animations respect
prefers-reduced-motion.
Daily roasts
Context-aware one-liners generated from your current state — spend percentage, top model, days into the billing period. Each personality mode has its own pool of roast functions. The system evaluates all functions against your context, collects the ones that match, and picks one via a seeded random selection so you see a consistent roast per page load.
Examples: Casual mode might tell you "Your budget called. It's in a coma" when you're over 100%. Unhinged mode might say "Opus 4 to format a JSON? Aukaat dekho bhai" if you're running expensive models for simple tasks.
Spend comparisons
Converts your monthly spend into relatable items. The item pool is timezone-aware:
- India (Asia/Kolkata) — cups of chai, samosas, auto rickshaw rides, vada pavs, Jio recharges
- Everywhere else — coffees, Netflix months, Spotify months, GitHub Copilot months, tacos, ramen bowls
"That's 210 samosas" hits differently than "$42.00".
Achievements
Eight unlockable badges evaluated against your current-month metrics:
| Badge | Condition |
|---|---|
| First Blood | Single day over $10 spend |
| Penny Pincher | 30 consecutive days under budget |
| Cache Lord | 50%+ prompt cache hit rate |
| The Collector | 3+ providers connected |
| The Diet Works | Month-over-month spend down 20%+ |
| The $100 Club | $100+ spend in a single day |
| Model Tourist | 5+ unique models used in a month |
| Zen Master | $0.00 spend day with active connections |
Badges are re-evaluated on every visit to the Fun page. The Penny Pincher badge tracks your budget streak via a daily cron — each day you stay under your monthly budget threshold, the streak increments.