https://github.com/ejfox/openrouter-census
🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard
https://github.com/ejfox/openrouter-census
Last synced: 9 months ago
JSON representation
🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard
- Host: GitHub
- URL: https://github.com/ejfox/openrouter-census
- Owner: ejfox
- Created: 2025-08-21T18:25:17.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-21T18:25:21.000Z (10 months ago)
- Last Synced: 2025-09-06T18:00:17.847Z (9 months ago)
- Language: JavaScript
- Size: 132 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🤖 OpenRouter Model Census
A complete analysis tool for **all 316+ AI models** on OpenRouter—featuring fast scraping, Python-style data analysis, and beautiful Observable Plot visualizations. Get pricing insights, provider comparisons, and model capabilities in an interactive dashboard.
  
---
## 🚀 Quick Start
```bash
# Clone and install
git clone https://github.com/your-username/openrouter-scrape
cd openrouter-scrape
npm install
# Run complete pipeline (scrape + analyze + serve dashboard)
npm run full-pipeline
```
**That's it!** Your dashboard opens at `http://localhost:8080` with fresh data on all 316+ models.
## ✨ Features
* **⚡ Fast scraping** - All 316 models in under 1 second
* **📊 Beautiful charts** - 7 interactive Observable Plot visualizations
* **đź’° Pricing insights** - Find the cheapest models, price vs context analysis
* **🏢 Provider analysis** - Market share, strategy mapping, capabilities
* **📱 Responsive design** - Works great on mobile and desktop
* **🔄 Easy updates** - Re-run anytime for fresh data
---
## What gets collected
### Layer 1 — Models (`GET /api/v1/models`)
* Identity: `id`, `name`, `canonical_slug`, `created`, `description`
* Architecture: `input_modalities`, `output_modalities`, `tokenizer`, `instruct_type`
* Aggregates: `context_length` (model-wide), `supported_parameters` (**union across providers**)
* Pricing (as published): `prompt`, `completion`, `request`, `image`, `web_search`, `internal_reasoning`, `input_cache_read`, `input_cache_write`
* Top provider summary: `context_length`, `max_completion_tokens`, `is_moderated`
Notes: `supported_parameters` at model level is a **union**, not guaranteed at any single endpoint. ([OpenRouter][2])
### Layer 2 — Provider Endpoints (`GET /api/v1/models/:author/:slug/endpoints`)
Per provider that hosts the model:
* `provider_name`, `name` (endpoint), `status`, `uptime_last_30m`
* Limits: `context_length`, `max_prompt_tokens`, `max_completion_tokens`, `quantization`
* Capabilities: endpoint-level `supported_parameters`
* Pricing overrides: `prompt`, `completion`, `request`, `image`
Use these to resolve real availability and price for your route. ([OpenRouter][4])
### Layer 3 — Optional Generation Telemetry (`GET /api/v1/generation` or inline **usage accounting**)
If you have generation IDs or enable usage accounting in your workloads:
* Identity & routing: `model`, `provider_name`, `origin`, `upstream_id`
* Tokens (native): `native_tokens_prompt`, `native_tokens_completion`, `native_tokens_reasoning`
* Tokens (gateway): `tokens_prompt`, `tokens_completion`
* Cost: `total_cost`, `cache_discount`, `upstream_inference_cost`
* Timings: `latency`, `generation_time`, moderation latency
* Termination: `finish_reason`, `native_finish_reason`
* Media/search counters; cached token counts (reads)
This enables empirical latency/cost analyses grounded in the provider’s native tokenizer. ([OpenRouter][3])
---
## Outputs (immutable, analysis-ready)
* `artifacts/YYYY-MM-DD/`
* `models.jsonl` (one model per line; raw API objects)
* `endpoints.jsonl` (provider-expanded rows)
* `joined.parquet` (denormalized: model ⨯ endpoint)
* `telemetry.jsonl` (optional generation/usage rows)
* `schema/*.json` (JSON Schema for each file)
* `MANIFEST.sha256` (file → SHA-256)
* `FETCH_METADATA.json` (request URLs, headers, etags, timestamps, versions)
* `CITATION.cff` (how to cite this snapshot)
* `NOTICE.txt` (license + terms provenance)
* All files compressed with `zstd` if `--zstd` is set.
---
## Install
```bash
# Node 20+ recommended
pnpm i -g openrouter-census # or: npx openrouter-census
```
Local dev:
```bash
git clone https://github.com/yourorg/openrouter-census
cd openrouter-census
pnpm install
```
---
## Configuration
```bash
# Optional: required only if you fetch protected usage for your own generations
export OPENROUTER_API_KEY=...
# Rate limits (Bottleneck): default is conservative; override as needed
export CENSUS_RESERVOIR=100 # max requests per window
export CENSUS_INTERVAL_MS=60_000 # window length
export CENSUS_MAX_CONCURRENCY=8 # parallelism
```
> Tip: tune Bottleneck’s **reservoir** + **interval** for burst/steady control. Keep headroom for Cloudflare/DDOS protections and any per-account limits. ([OpenRouter][5], [DEV Community][6])
---
## Usage (CLI & TUI)
```bash
openrouter-census tui
```
**TUI features**
* Live counters: queued / inflight / completed / failed
* Throttles: adjust concurrency & reservoir at runtime
* Filters: by provider, family, modality, supports(param)
* Export panel: write JSONL/Parquet + schemas + manifest
* Retry controls: backoff/rehydrate failed requests
**Non-interactive**
```bash
# Full scrape → immutable snapshot directory
openrouter-census run --out artifacts/$(date +%F) --zstd
# Fetch with telemetry (ids from stdin) and merge
cat ids.txt | openrouter-census usage --append artifacts/2025-08-21/telemetry.jsonl
```
---
## Data model (schemas)
* `schema/models.schema.json` – exact API fields from `/models` with permissive `additionalProperties: true`
* `schema/endpoints.schema.json` – rows keyed by `(model_id, provider_name, endpoint_name)`
* `schema/telemetry.schema.json` – union of `/generation` + inline usage accounting fields
* `schema/joined.schema.json` – normalized and namespaced (`model.*`, `endpoint.*`, `price.*`, `limits.*`)
Schemas are versioned; each run stores the **exact** schema copy used to validate outputs.
---
## Reproducibility & provenance
* **Deterministic traversal:** stable sort of model IDs; endpoint requests queued in lexical order.
* **Version pins:** `pnpm-lock.yaml` checked in; run logs emit Node + package versions.
* **Raw preservation:** we persist **unmodified** API payloads alongside normalized tables.
* **Integrity:** SHA-256 digests and `FETCH_METADATA.json` (timestamps, etags, cache status).
* **Citations:** `CITATION.cff` includes title, version, DOI placeholder, and snapshot date.
* **License/Terms capture:** `NOTICE.txt` records the doc URLs and access dates for API docs.
---
## Limits, caveats, ethics
* `supported_parameters` at model level is **union**; intersect with endpoint capabilities for truth in UX or routing guarantees. ([OpenRouter][1])
* Endpoint `uptime_last_30m` is a short pulse, not an SLA. Prefer aggregates if you collect telemetry. ([OpenRouter][4])
* Pricing may differ per endpoint; do not assume model-level pricing applies to your chosen provider. ([OpenRouter][4])
* Respect rate limits & terms; don’t stress shared free endpoints. Consider backoffs and caching. ([OpenRouter][5])
---
## Programmatic API (library usage)
```ts
import { census } from "openrouter-census";
const run = await census({
outDir: "artifacts/2025-08-21",
concurrency: 8,
reservoir: 100,
intervalMs: 60_000,
fetchTelemetry: false,
});
console.log(run.summary);
```
---
## Minimal pipeline spec
1. **Models pass**
* GET `https://openrouter.ai/api/v1/models` → write raw → validate → extract rows. ([OpenRouter][2])
2. **Endpoints pass**
* For each `{author}/{slug}`, GET `/api/v1/models/:author/:slug/endpoints` → write raw → validate → rows. ([OpenRouter][4])
3. **(Optional) Telemetry pass**
* For provided generation IDs or inline usage payloads, normalize and append to `telemetry.jsonl`. ([OpenRouter][3])
4. **Join & export**
* Create `joined.parquet` + schemas + manifest + checksums.
---
## What you can visualize later (suggestions)
* **Price vs context** (per endpoint)
* **Parameter support heatmap** (models Ă— params; intersected, not union)
* **Short-window reliability** (endpoint uptime)
* **Observed latency & effective \$/1k** from telemetry (native tokens)
---
## References
* Models API & schema. ([OpenRouter][2])
* List models (union of `supported_parameters`). ([OpenRouter][1])
* Per-model endpoints (provider-specific limits/pricing/uptime). ([OpenRouter][4])
* Usage accounting (native tokens, cached tokens, reasoning tokens). ([OpenRouter][7])
* Precise token accounting & `/generation`. ([OpenRouter][3])
* Reasoning tokens overview. ([OpenRouter][8])
* OpenRouter limits (guidance for throttling). ([OpenRouter][5])
* Bottleneck patterns for Node rate limiting. ([DEV Community][6])
---
## License
MIT for code. Data artifacts inherit upstream terms; see `NOTICE.txt`.
[1]: https://openrouter.ai/docs/api-reference/list-available-models?utm_source=chatgpt.com "List available models | OpenRouter | Documentation"
[2]: https://openrouter.ai/docs/models?utm_source=chatgpt.com "Access 400+ AI Models Through One API - OpenRouter"
[3]: https://openrouter.ai/docs/api-reference/overview?utm_source=chatgpt.com "OpenRouter API Reference | Complete API Documentation"
[4]: https://openrouter.ai/docs/api-reference/list-endpoints-for-a-model?utm_source=chatgpt.com "List endpoints for a model | OpenRouter | Documentation"
[5]: https://openrouter.ai/docs/api-reference/limits?utm_source=chatgpt.com "API Rate Limits | Configure Usage Limits in OpenRouter"
[6]: https://dev.to/arifszn/prevent-api-overload-a-comprehensive-guide-to-rate-limiting-with-bottleneck-c2p?utm_source=chatgpt.com "Prevent API Overload: A Comprehensive Guide to Rate Limiting with ..."
[7]: https://openrouter.ai/docs/use-cases/usage-accounting?utm_source=chatgpt.com "Usage Accounting | Track AI Model Usage with OpenRouter"
[8]: https://openrouter.ai/docs/use-cases/reasoning-tokens?utm_source=chatgpt.com "Reasoning Tokens | Enhanced AI Model Reasoning with OpenRouter"