Docs

Obol handbook

How to connect your AI providers, read your spend dashboard, and use the LLM proxy.

Introduction

Obol tracks what you spend on Large Language Models across OpenAI, Anthropic, OpenRouter, Google, Mistral, Groq, xAI, DeepSeek, Cohere, and OpenCode Go. Providers fall into three categories depending on how usage data is collected:

  • Sync providers — Obol polls the provider's usage API hourly and builds a historical dashboard automatically. Works with OpenAI, Anthropic, and OpenRouter.
  • CSV upload — Google AI has no public usage API. You can import history via CSV upload, or route traffic through the proxy for live tracking.
  • Proxy-only providers — Mistral, Groq, xAI, DeepSeek, Cohere, and OpenCode Go are tracked per-request as traffic flows through the Obol LLM proxy. Some (xAI, Cohere) have partial billing APIs — Obol may add sync support in the future. OpenCode Go is a flat-rate plan, so per-call cost displays as $0 by design — see the OpenCode Go section.

You can use either or both. Most users pair them: sync for historical visibility, proxy for zero-lag observability on hot traffic.

Connections

A connection is one provider API key that Obol uses to pull usage data on your behalf. Each provider has a specific key type it accepts — using the wrong kind is the most common reason a connection fails.

Provider Aggregate usage API Obol tracking Key type
OpenAI Yes Auto-sync + proxy Admin key (sync) + separate inference key (proxy)
Anthropic Yes Auto-sync + proxy Admin key sk-ant-admin... (sync) + API key (proxy)
OpenRouter Yes Auto-sync + proxy Same key for both
Google (Gemini) Partial CSV upload + proxy Gemini API key
Mistral No Proxy only API key
Groq No Proxy only API key
xAI (Grok) Available Proxy only (sync planned) API key
DeepSeek No Proxy only API key
Cohere Partial Proxy only API key

Anthropic

Auto-sync

Requires an Admin API key — format: sk-ant-admin01-…

Important: Admin keys are only available if you have an Organization in the Anthropic Console, and only members with the admin role can create them. Individual (personal) accounts can't generate admin keys. Creating an organization is free and takes under a minute.

  1. Go to console.anthropic.com and sign in.
  2. Open Settings → Organization. If you don't have one, create it — free, instant.
  3. In the left sidebar, click Admin Keys.
  4. Click Create Admin Key, name it, copy the key. It is shown only once.
  5. In Obol → Connections → Add connection → Anthropic, paste and save.

CSV fallback

Can't create an admin key? console.anthropic.com → Usage → Export CSV, then upload in Obol.

What syncs: per-model daily spend, input tokens, output tokens, cache-read tokens, cache-creation tokens.

For the proxy: admin keys cannot make inference calls. If you want to use the LLM proxy, you'll also need a regular sk-ant-api-… key — attach it on the connection card. See Admin vs inference keys.

OpenAI

Auto-sync

Requires an Admin key — format: sk-admin-…

Important: OpenAI splits keys into admin keys (for org-level management endpoints including usage) and project keys (for inference). Project keys return 403 on /organization/usage. You need an admin key for sync.

  1. Go to platform.openai.com → Settings → API keys.
  2. In the key type selector, choose Admin (not "Project").
  3. Create the key, copy it, paste into Obol.

What syncs: completions, embeddings, images, audio — all usage types fetched in parallel, per-model daily.

For the proxy: admin keys can't make chat completions. Attach a separate sk-… or sk-proj-… inference key. See Admin vs inference keys.

OpenRouter

Auto-sync

Requires a Management key — format: sk-or-v1-…

Important: OpenRouter has two key types — inference keys (for API calls) and management keys (for admin/usage). Management keys can't make completions, and inference keys can't read usage. For sync, use a management key.

  1. Go to openrouter.ai/settings/management-keys.
  2. Click Create new key, copy, paste into Obol.

Limitation

OpenRouter's activity API only exposes the last 30 completed UTC days. Older data is not available via auto-sync.

CSV fallback

openrouter.ai/activity → Export CSV, then upload in Obol.

What syncs: per-model daily spend, prompt tokens, completion tokens, reasoning tokens, request counts.

For the proxy: you can attach a separate inference key if you want to isolate proxy traffic, or leave it blank and the management key falls back automatically — OpenRouter accepts either for inference.

Google AI

CSV only · proxy recommended

Any valid Google AI Studio API key. No public usage API exists — data must come from CSV upload, or the proxy.

Google AI Studio exposes usage in its UI but has no public usage endpoint Obol can poll. Your options are to upload CSVs manually, or (strongly recommended) route requests through the Obol LLM proxy so every call is metered as it happens.

  1. Go to aistudio.google.com/app/apikey and copy any valid key.
  2. In Obol → Connections → Add connection → Google AI, paste the key. This validates the key and unlocks CSV upload — it does not start an auto-sync.
  3. To import history: in Google AI Studio, open the usage/logs panel and export as CSV, then upload via the connection card's Upload CSV button.

For live tracking: switch your Gemini SDK to point at Obol (see Proxy quick start). Every call will flow through Obol and show up in your dashboard in seconds.

Proxy-only providers

Proxy only

Covers Mistral, Groq, xAI, DeepSeek, and Cohere.

These providers are currently tracked through the proxy only. Every call is metered and lands in your dashboard in seconds. OpenCode Go is also proxy-only but has its own section below because it's a flat-rate gateway with a fixed model list.

Provider Per-response usage Aggregate API Notes
Mistral Yes No Dashboard only at console.mistral.ai
Groq Yes No Open feature request on their community forum
xAI (Grok) Yes Available Management API at management-api.x.ai — sync support planned
DeepSeek Yes No Dashboard only, no programmatic API
Cohere Yes Partial Per-response billed_units metadata, no aggregate endpoint
  1. In Obol, go to Connections → Add connection and select the provider.
  2. Paste your provider API key (the same key you'd use for inference).
  3. Create a virtual key on the Proxy page.
  4. Point your app at https://useobol.pages.dev/v1/chat/completions and set the header X-Obol-Provider: <provider> (e.g. X-Obol-Provider: mistral).

Important: Because there is no usage API, the only way to capture spend for these providers is through the proxy. Direct calls to the provider bypass Obol entirely and will not appear in your dashboard.

OpenCode Go

Proxy only Flat rate

OpenCode Go is a $10/month flat-rate gateway from sst/opencode that resells curated open-source coding models — GLM, Kimi, MiniMax, MiMo, plus a few others — over an OpenAI-compatible API at https://opencode.ai/zen/go/v1.

Cost on Obol displays as $0. That's truthful — Go is a flat-rate plan with usage caps ($12 / 5h, $30 / wk, $60 / mo measured in dollar-value of upstream model time, not a per-token bill). Obol records latency, errors, and quota — the dollar column is just inert. Use OpenCode's own dashboard for the flat-rate consumption view.

Models

Pulled from GET https://opencode.ai/zen/go/v1/models (public — no auth needed). 14 models live today; OpenCode adds more periodically. Any new model proxies fine immediately, just records as unpriced_model until the supplement is updated.

Family Models Auto-route prefix
GLM (Alibaba) glm-5.1, glm-5 glm-
Kimi (Moonshot) kimi-k2.6, kimi-k2.5 kimi-
MiniMax minimax-m2.7, minimax-m2.5 minimax-
MiMo mimo-v2-pro, mimo-v2-omni, mimo-v2.5, mimo-v2.5-pro mimo-
Qwen (Alibaba) qwen3.5-plus, qwen3.6-plus none — header required
DeepSeek (via OpenCode) deepseek-v4-pro, deepseek-v4-flash none — header required

Auto-route prefixes route directly without X-Obol-Provider. Qwen and DeepSeek prefixes are deliberately excluded because they collide with Groq / OpenRouter / native DeepSeek — those families need an explicit X-Obol-Provider: opencode header to disambiguate.

Setup

  1. Sign up for OpenCode Go at opencode.ai/go and copy your API key from the OpenCode Zen dashboard.
  2. In Obol, open Connections → Add connection, choose OpenCode Go, paste the key. The test step validates by calling GET /zen/go/v1/models.
  3. Mint a virtual key on the Proxy page — optionally set allowed_providers=["opencode"] so this key is dedicated to OpenCode traffic.
  4. Point your client at https://useobol.pages.dev/v1/chat/completions using your Obol virtual key. Set X-Obol-Provider: opencode (or rely on auto-route for glm-/kimi-/minimax-/mimo- models).

Test request

curl https://useobol.pages.dev/v1/chat/completions \
  -H "Authorization: Bearer obol_sk_live_..." \
  -H "X-Obol-Provider: opencode" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-5.1","messages":[{"role":"user","content":"hi"}],"max_tokens":10}'

OpenCode CLI integration

If you use the OpenCode CLI itself, point it at Obol via its config:

// ~/.config/opencode/config.json
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "opencode": {
      "options": { "baseURL": "https://useobol.pages.dev/v1" }
    }
  }
}

Then export your Obol virtual key (not the OpenCode key) as OPENCODE_API_KEY and run opencode. OpenCode CLI doesn't support custom HTTP headers declaratively, so for per-feature attribution mint one virtual key per project — Obol surfaces them by proxy_key_id on the Monitor dashboard.

Dashboard

The Overview page is your main spend dashboard. Each panel below is one row on that page.

Summary cards

The four stat cards at the top give you a quick pulse check.

Card What it shows
This Month Total spend since the 1st of the current month. The percentage below compares to the same period last month.
Today Spend since UTC midnight. Resets daily.
Top Model The single most expensive model across all providers in the current window.
Connections Count of provider integrations set up. Click through to manage them.

Spend over time

Stacked area chart — each colored band is one provider, so the total height is your combined daily spend. Hover any day for a per-provider breakdown plus the total. Window is 30 days on Free, up to 90 on Pro. The x-axis starts from whichever is earlier: your oldest connection or your oldest recorded spend.

By provider

Horizontal bars ranking providers by spend share over the window. Zero-spend providers are hidden.

Model breakdown

Three tabs — Spend, Requests, Tokens — each ranking every model that produced usage in the window. Switch tabs to find your most-requested model vs your most expensive one — they're often not the same.

Two extra columns — σ (standard deviation of per-request cost) and p95/p50 spike ratio — quantify how variable a model's per-request cost is. A spike ratio ≥ 5× (rendered amber) means a small number of outliers drive most of that model's spend; investigate or cap. Variance is computed from per-request events and only shown when at least 5 events exist for the model.

Forecast Pro

Projects the current month's spend to end-of-month using a linear trend over recent daily data. If your usage is growing, the forecast will sit above a naive extrapolation of today's rate.

Cache stats Pro

Your Anthropic prompt-cache hit rate over the window: cache read tokens ÷ total input tokens. A higher hit rate means more input is being served from cache at a fraction of the normal cost.

Alerts

Set a monthly budget in Settings → Alerts. Obol emails you once when spend crosses the threshold and shows a banner on the Overview page while you're over. The alert resets on the 1st of each month.

Monitor Pro

Per-event observability for the LLM proxy. While the main dashboard shows what you spent, Monitor shows how requests behaved — latency percentiles, error taxonomy, throughput, time-of-day patterns, and per-SDK / per-IP-network breakdowns. Pulled live from Keplor's per-event log on every page load (no caching), so it reflects your last 24 hours of traffic in real time.

Monitor reads up to 1000 events from the last 24h plus 5000 events across the last 7 days for the heatmap. Sample size is shown at the top of the page, with an amber warning when the sample is clipped.

Summary + throughput

Top row of stat cards — total requests, error rate, wasted spend (cost on 4xx / 5xx / aborted requests), p50 / p95 time-to-first-token, stream-abort rate. Below that, a per-minute throughput sparkline across the 24h window. Useful for spotting traffic spikes or sudden drops at a glance.

Latency by provider & model

One row per (provider, model, mode) — streaming and non-streaming get separate rows, since TTFT means different things in each (first chunk arriving vs. full response complete). Columns: p50 / p95 / p99 TTFT, p50 / p95 total latency, and average output throughput in tok/s. Surfaces "Groq is 3× faster than Anthropic" as a number.

Includes a σ + p95/p50 spike ratio column on the /models dashboard for cost variance — distinguishes metronomic models from spiky ones where a few outliers dominate spend.

Error taxonomy

Errors grouped by Keplor's flattened error discriminator — actionable buckets like rate_limited, auth_failed, context_length_exceeded, content_filtered, upstream_timeout, upstream_unavailable, invalid_request. Each row shows total count plus per-provider breakdown, so "13 context_length_exceeded on Anthropic" is a single line you can act on.

Latency by prompt size

Buckets requests by input-token count (0–500, 500–2K, 2K–8K, 8K+) and shows p50 / p95 TTFT and total latency for each. Tells you whether a model's time-to-first-token scales linearly with context length, or flatlines past a certain prompt size — critical when picking a model for long-context workloads.

User tags + peak concurrency

When your client tags requests with the X-Obol-User header (e.g. one tag per downstream feature, customer, or tenant), Monitor shows the top 10 tags by request count with their per-tag spend, error rate, and p95 TTFT. Hidden when no events in the window have a user tag.

A peak concurrency stat shows the highest number of in-flight requests in the window plus the timestamp it occurred — computed via sweep-line over (timestamp, latency_total_ms) intervals. Reveals burst patterns invisible in averaged metrics.

Reasoning-token usage gets its own card pair when you call o1, o3, or Claude-thinking models — total reasoning tokens, share of output tokens, and a prorated cost estimate. Hidden on accounts that don't use thinking models.

Time-of-day heatmap

A 7-day × 24-hour grid where each cell is colored by request volume (or error rate, via the toggle). Reveals "Thursday 14:00 UTC is always our peak" or "we have a stuck retry loop running every Saturday at 03:00". Hover any cell for exact counts. Times are in UTC; the cell tooltip shows the absolute hour.

Provider failure correlation

Pearson correlation matrix of per-provider error rates across 5-minute windows. High amber cells (≥0.7) mean two providers fail at the same minute — usually a shared-infrastructure outage (your network, Cloudflare, or a region-wide incident). Near-zero cells mean the providers fail independently — the issue is single-provider, like a key exhausting its rate limit.

Distinguishes "Anthropic AND Mistral both 5xx'd at 14:30" from "key X exhausted on Anthropic only" — answers a question the per-provider error count alone can't. Hidden when fewer than 2 providers are active in the window.

SDK + geographic breakdowns

SDK breakdown: coarse identifier derived from the caller's User-Agent header (openai-python, anthropic-sdk-typescript, curl, browser, etc.). Long tail collapses into other; raw User-Agent strings are never displayed.

Geographic spread: top 10 source-IP buckets at /24 for IPv4 and /64 for IPv6 — raw IPs are never rendered. Useful for spotting traffic concentration (one /24 doing 95% = single backend service) or anomalies (a single /24 with 500 errors and 0 successes = misconfigured client or attacker).

Both sections only populate from requests that arrive after Obol started forwarding User-Agent / client IP. They hide automatically on accounts with no recorded events of either kind.

LLM Proxy

Point your apps at Obol's edge instead of the provider directly. Every request is metered in real time — no 24-hour polling lag, no admin keys required for tracking, and we can capture Google Gemini usage (which has no public usage API).

Base URL

https://useobol.pages.dev

Quick start

  1. Attach an inference key to each provider you want to proxy (see Admin vs inference keys).
  2. Go to Proxy → Create virtual key. The raw obol_sk_live_… token is shown exactly once — copy it now.
  3. Point your SDK at Obol, using the virtual key as the API key.
  4. Open Proxy → Recent requests to watch calls in real time.

OpenAI (Python)

from openai import OpenAI
client = OpenAI(
    base_url="https://useobol.pages.dev/v1",
    api_key="obol_sk_live_…",
)
client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role":"user","content":"hi"}],
)

Anthropic (Python)

from anthropic import Anthropic
client = Anthropic(
    base_url="https://useobol.pages.dev",
    api_key="obol_sk_live_…",
)
client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=50,
    messages=[{"role":"user","content":"hi"}],
)

Gemini (curl)

curl "https://useobol.pages.dev/v1beta/models/gemini-2.0-flash:generateContent?key=obol_sk_live_…" \
     -H "content-type: application/json" \
     -d '{"contents":[{"parts":[{"text":"hi"}]}]}'

Virtual keys

Each virtual key is the token your app presents to the proxy. They're distinct from your provider API keys and from the Personal Access Tokens used for the desktop widget.

  • Format: obol_sk_live_<64 hex>.
  • Hashed at rest (SHA-256). Only the prefix is stored unhashed for UI display.
  • Every key can set its own RPM cap, daily request cap, monthly budget cap (Pro), allowed providers, and allowed-models pattern.
  • Revoking a key takes effect on the next proxy call — no caching.

Admin vs inference keys

OpenAI and Anthropic split their API surface into two distinct keys:

Provider Sync key (admin) Proxy key (inference)
OpenAI sk-admin-… sk-… or sk-proj-…
Anthropic sk-ant-admin-… sk-ant-api-…
OpenRouter sk-or-v1-… (management) Same key works — optional
Google AI n/a (no usage API) Same key works

Admin keys can read usage reports but cannot make inference calls. If you try to run the proxy against a connection that only has an admin key, Obol returns a clear 402 no_inference_key with setup instructions rather than forwarding and getting rejected upstream.

Attach an inference key from the Connections page — each card shows "Add proxy inference key" when one is needed. The admin key stays in place for the hourly sync; the inference key is used only by /v1/* proxy routes.

Endpoints & routing

Path Compat Upstream
POST /v1/chat/completions OpenAI OpenAI or OpenRouter
POST /v1/messages Anthropic Anthropic
POST /v1beta/models/{m}:generateContent Gemini Google
POST /v1beta/models/{m}:streamGenerateContent Gemini (stream) Google

/v1/chat/completions routes to OpenAI by default. Use OpenRouter either by setting the header X-Obol-Provider: openrouter or by passing an OpenRouter-style slug like anthropic/claude-3-5-sonnet in the model field.

Customer headers

Optional headers your app can send on every proxied request. None are required; all are length-capped server-side so a hostile client can't bloat the audit row.

Header Purpose
X-Obol-Provider Force a specific provider for ambiguous endpoints (e.g. openrouter on /v1/chat/completions).
X-Obol-Fallback Comma-separated provider chain for cross-provider failover. See Fallback.
X-Obol-User Opaque tag for grouping requests by feature, tenant, or customer. Surfaces in Monitor → User tags and CSV exports.
X-Obol-Session Opaque per-conversation tag. Same shape as X-Obol-User; intended for grouping multi-turn flows.
X-Obol-Cache Set to bypass to skip the response cache for a single request — useful when you've enabled per-key caching but need a fresh answer.
Idempotency-Key Deduplicate retries. Same key + same body replays the stored response without hitting upstream. See Idempotency.

The proxy also auto-captures the User-Agent header and the true client IP (via cf-connecting-ip) on every event — no client change needed. These power the SDK and geographic breakdowns on Monitor and are stored on Keplor as part of the audit row. Raw IPs are never displayed in the dashboard; see Security → Data.

Cross-provider fallback

If a provider is down or you don't have a connection for it, the proxy can automatically try the next provider in a fallback chain. Set the X-Obol-Fallback header with a comma-separated list of providers:

curl https://useobol.pages.dev/v1/chat/completions \
  -H "Authorization: Bearer obol_sk_live_..." \
  -H "X-Obol-Provider: deepseek" \
  -H "X-Obol-Fallback: groq,openrouter" \
  -d '...'

If DeepSeek fails (no connection, upstream 502/503/504), the proxy tries Groq. If Groq also fails, it tries OpenRouter. When a fallback succeeds, the response includes an X-Obol-Fallback-Provider header so your app knows which provider actually served the request.

  • Max 3 fallback providers per request.
  • Each fallback must be in the virtual key's allowed-providers list (if configured).
  • Fallback only triggers on upstream failures and missing connections — not on auth errors, rate limits, or quota violations.
  • Only OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings) can fall back across providers. Anthropic /v1/messages and Google generateContent are endpoint-locked.

Streaming

Streaming responses pass through unchanged — the first byte lands at your client as soon as the upstream produces it, so Obol adds only ~20ms of edge overhead. Token counting runs in the background on a teed copy of the stream; accounting lands in the dashboard after the stream closes.

One mutation: for OpenAI and OpenRouter we force-set stream_options.include_usage=true so the final SSE chunk includes token totals. It's the only change we ever make to your request body.

TTFT (time-to-first-token) is captured for streaming responses — the interval between the proxy forwarding the request and the first SSE data chunk arriving from the upstream provider.

Reasoning tokens — for OpenAI o1/o3/o4 and other thinking models, the proxy captures completion_tokens_details.reasoning_tokens from the response and logs them separately for accurate cost attribution.

Cost header

Non-streaming 2xx responses include an x-obol-cost-cents header with the computed cost of that request:

x-obol-cost-cents: 0.0312

The value is derived from the pricing table using actual token counts from the upstream response. Useful for client-side spend tracking without polling the dashboard.

  • Only present on successful non-streaming responses with token usage.
  • Streaming responses do not include this header — cost is unknown until the stream completes.
  • Value is in cents with 4 decimal places.

Timeout

The proxy enforces a configurable upstream timeout. If the provider does not respond within the window, the proxy returns 504:

{
  "error": {
    "type": "upstream_timeout",
    "message": "openai did not respond within 120s."
  }
}

Default: 120 000 ms (2 minutes). Configure via the PROXY_TIMEOUT_MS environment variable. Set higher for o3/thinking workloads that routinely exceed 2 minutes.

Heartbeat

Long-running streaming responses (o3 extended thinking, large-context generation) can trigger intermediate proxy idle-timeout kills. The proxy injects SSE comment lines every 30 seconds of silence:

: heartbeat

Per the SSE spec, comment lines (starting with :) are ignored by all compliant clients including the OpenAI, Anthropic, and Vercel AI SDKs. This keeps the TCP connection alive through Cloudflare's 100-second idle timeout without affecting the response payload.

Idempotency

Add an Idempotency-Key header to any proxy POST to prevent duplicate charges on network retries. Follows Stripe / OpenAI conventions.

curl https://useobol.pages.dev/v1/chat/completions \
  -H "Authorization: Bearer obol_sk_live_..." \
  -H "Idempotency-Key: my-unique-request-id-123" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

Behaviour

Scenario Result
Same key + same body Replays the stored response (no upstream call, no quota consumed)
Same key + different body 409 idempotency_mismatch
Same key while original is in-flight 409 idempotency_in_progress
Malformed key (>255 chars or invalid chars) 400 invalid_idempotency_key
Upstream returned 5xx Not stored — retries hit a fresh upstream
Streaming request Supported — stream is buffered and stored on completion

Key format

1–255 characters. Alphanumeric, dashes (-), and underscores (_) only. UUIDs work well.

TTL

Keys expire after 24 hours. After expiry the same key can be reused for a new request.

Replay headers

Replayed responses include the original upstream headers that SDKs use for debugging: x-request-id, openai-organization, openai-processing-ms, anthropic-request-id, and rate-limit headers. An additional x-obol-idempotent-replay: true header is added so your app can distinguish replays from fresh responses.

Quota

Confirmed replays do not consume RPM, daily request, or budget quota. You paid for the original — the replay is free.

Streaming note: When a streaming request is replayed, the response is returned as a single buffered body (not SSE chunks). Your SDK will receive the complete response at once rather than token-by-token. Most OpenAI/Anthropic SDKs handle this transparently.

Quotas & limits

Limit Free Pro
Virtual keys150
Requests / UTC day100Unlimited
RPM per key60 (capped)60 default, up to 600
Monthly budget cap per keyOptional
Allowed-providers filter
Allowed-models glob filter

When a limit is breached the proxy returns an OpenAI-shaped error envelope — SDK error handlers that key off error.type and error.code keep working unchanged.

Security

  • Your upstream provider keys are encrypted at rest with AES-256-GCM. They're decrypted in memory for the duration of a single request and never logged.
  • Virtual keys are hashed with SHA-256 at rest — the raw value is unrecoverable from the DB. Rotating or revoking takes effect immediately.
  • Request and response bodies are not logged. We store metadata only: model, token counts, cost, status, latency, timestamp.
  • Every proxy endpoint is Bearer-auth only (no cookies), exempt from same-origin checks, and returns permissive CORS — safe because virtual keys can't be replayed via CSRF.

Powered by Keplor

Under the hood, every proxy request is streamed to Keplor, an open-source LLM log aggregator written in Rust. Keplor handles:

  • Automatic cost computation from a 2,263-model pricing catalog (LiteLLM). You send tokens — Keplor returns cost. Separate cache_read and cache_creation rates are applied where available (e.g. Anthropic prompt caching) for accurate cached-input pricing.
  • Daily rollups refreshed every 60 seconds, so the proxy dashboard loads in milliseconds even with millions of events.
  • Real-time quota checks for budget enforcement — Obol queries Keplor's GET /v1/quota on every request to enforce monthly spend caps.
  • Event storage with zstd compression and content-hash deduplication. Single static binary under 10 MB.

Keplor is provider-agnostic — it works with any LLM gateway, not just Obol. See the Keplor docs for the full API reference.

Optimization Engine Pro

Scans your last 30 days of usage and produces actionable cost-saving recommendations. Every recommendation must project at least $1/month in savings to appear — no noise.

Dynamic swap engine

Instead of hardcoded rules, Obol scans all 342 models in its pricing table against your actual usage. For every model you use, it finds all cheaper alternatives in the same model family (Claude, GPT, Gemini, Grok, Mistral, etc.) on the same provider.

Models are classified into families and tiers (flagship, standard, lite, mini, nano). Swaps are limited to two tier steps down at most — Opus can suggest Sonnet or Haiku, but never a nano model. Confidence is based on the tier gap:

  • High — same tier (newer or cheaper variant)
  • Medium — one tier step down
  • Low — two tier steps down

Savings are computed by re-pricing your actual token volumes (input, output, cached) with the suggested model's rates and scaling to a 30-day month. The highest-savings match per model wins. Provider-aware — an OpenRouter user only sees OpenRouter alternatives.

Each swap card carries a per-day what-if sparkline beneath the headline estimate — actual cost on the current model vs. what the same per-day traffic would have cost on the alternative, with totals + delta. Real history, not extrapolation; useful for confirming the estimate matches your actual variability before committing to a switch.

Cross-provider arbitrage

The same model can have different pricing across providers. Obol normalizes model slugs across all 9 providers (stripping vendor prefixes, date suffixes, and version format differences) to find the same underlying model available at a lower price on another provider you're already connected to.

Example: Claude Sonnet via OpenRouter might cost more than direct Anthropic for your usage pattern. Obol spots the delta and shows the exact monthly savings. These recommendations always carry high confidence — same model, same quality, just cheaper routing. The per-day what-if sparkline (see Dynamic swap engine) renders on these cards too, so you can see the historical delta in addition to the projected one.

Cache opportunity

For models that support prompt caching (Anthropic, some OpenAI models), Obol checks three conditions:

  • Current cache hit rate < 10%
  • Average input tokens > 2,000 (enough context to benefit from caching)
  • At least 7 days of data (avoids noise from small samples)

Savings are computed using the actual pricing delta between the model's regular input rate and its cache-read rate (e.g. Anthropic charges 10% of input rate for cached tokens, Google charges 25%). Obol projects what you'd save at an 80% cache hit rate.

Efficiency insights

Beyond model choice, Obol analyzes token patterns to surface waste in your application:

  • Input bloat — flags models where the input-to-output ratio exceeds 20:1. Often signals oversized system prompts or unnecessary context.
  • Cache opportunity — quantifies the exact dollar amount you'd save by enabling prompt caching, based on your actual input volumes and the model's pricing rates.
  • High cost per request — highlights models in the top 25% of your per-request cost, which may indicate mismatched model selection.

Each model gets a 0–100 efficiency score — a weighted composite of cost-per-request rank (50%), cache utilization (25%), and input/output balance (25%). Scores are relative to your own model set, not absolute. Only models with 7+ active days and 50+ requests are included.

What-if playground

An interactive cost simulator. Pick any model you currently use, select a target model to switch to, and see the projected cost impact — current monthly spend, projected spend, savings amount and percentage — all computed instantly in the browser using your actual token volumes.

Toggle cross-provider mode to compare models across all your connected providers. Useful for planning optimizations before committing.

Wall of shame

A ranked table of your 10 most expensive (model, date) pairs by absolute cost — not per-request cost, but total spend on that model that day. Useful for identifying which single days burned the most and whether any model is disproportionately driving your bill.

Personality

Obol adds a human layer on top of raw cost metrics. Mood indicators, daily roasts, and relatable spend comparisons turn your dashboard into something you actually want to check — not just another monitoring page. All personality features are Pro.

Personality modes

Three modes, switchable in Settings:

  • Professional — clean metric labels and dry observations. No humor.
  • Casual — ASCII faces, light sarcasm, and gentle nudges when you're over budget.
  • Unhinged — aggressive roasting, Hindi-English slang, and zero sympathy for your wallet.

Mode affects the mood meter, roasts, and badge flavor text. It does not affect underlying data, alerts, or cost calculations.

Mood meter

Your current-month spend as a percentage of budget drives a severity tier:

Budget % Severity Animation
0–25%ChillSlow breathing pulse
25–50%WarmGentle horizontal sway
50–75%HotFast throbbing pulse
75–100%FireRapid shake
> 100%MeltdownGlitchy jitter + hue shift

Each mode maps to a different ASCII face at each tier — Professional uses geometric symbols, Casual uses expressive faces like (O_O), Unhinged adds sarcastic commentary. All animations respect prefers-reduced-motion.

Daily roasts

Context-aware one-liners generated from your current state — spend percentage, top model, days into the billing period. Each personality mode has its own pool of roast functions. The system evaluates all functions against your context, collects the ones that match, and picks one via a seeded random selection so you see a consistent roast per page load.

Examples: Casual mode might tell you "Your budget called. It's in a coma" when you're over 100%. Unhinged mode might say "Opus 4 to format a JSON? Aukaat dekho bhai" if you're running expensive models for simple tasks.

Spend comparisons

Converts your monthly spend into relatable items. The item pool is timezone-aware:

  • India (Asia/Kolkata) — cups of chai, samosas, auto rickshaw rides, vada pavs, Jio recharges
  • Everywhere else — coffees, Netflix months, Spotify months, GitHub Copilot months, tacos, ramen bowls

"That's 210 samosas" hits differently than "$42.00".

Achievements

Eight unlockable badges evaluated against your current-month metrics:

Badge Condition
First BloodSingle day over $10 spend
Penny Pincher30 consecutive days under budget
Cache Lord50%+ prompt cache hit rate
The Collector3+ providers connected
The Diet WorksMonth-over-month spend down 20%+
The $100 Club$100+ spend in a single day
Model Tourist5+ unique models used in a month
Zen Master$0.00 spend day with active connections

Badges are re-evaluated on every visit to the Fun page. The Penny Pincher badge tracks your budget streak via a daily cron — each day you stay under your monthly budget threshold, the streak increments.