An open API service indexing awesome lists of open source software.

https://github.com/ejfox/openrouter-census

🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard
https://github.com/ejfox/openrouter-census

Last synced: 9 months ago
JSON representation

🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard

Awesome Lists containing this project

README

          

# 🤖 OpenRouter Model Census

A complete analysis tool for **all 316+ AI models** on OpenRouter—featuring fast scraping, Python-style data analysis, and beautiful Observable Plot visualizations. Get pricing insights, provider comparisons, and model capabilities in an interactive dashboard.

![Dashboard Preview](https://img.shields.io/badge/Charts-7%20Interactive-blue) ![Models](https://img.shields.io/badge/Models-316+-green) ![Providers](https://img.shields.io/badge/Providers-54+-orange)

---

## 🚀 Quick Start

```bash
# Clone and install
git clone https://github.com/your-username/openrouter-scrape
cd openrouter-scrape
npm install

# Run complete pipeline (scrape + analyze + serve dashboard)
npm run full-pipeline
```

**That's it!** Your dashboard opens at `http://localhost:8080` with fresh data on all 316+ models.

## ✨ Features

* **⚡ Fast scraping** - All 316 models in under 1 second
* **📊 Beautiful charts** - 7 interactive Observable Plot visualizations
* **đź’° Pricing insights** - Find the cheapest models, price vs context analysis
* **🏢 Provider analysis** - Market share, strategy mapping, capabilities
* **📱 Responsive design** - Works great on mobile and desktop
* **🔄 Easy updates** - Re-run anytime for fresh data

---

## What gets collected

### Layer 1 — Models (`GET /api/v1/models`)

* Identity: `id`, `name`, `canonical_slug`, `created`, `description`
* Architecture: `input_modalities`, `output_modalities`, `tokenizer`, `instruct_type`
* Aggregates: `context_length` (model-wide), `supported_parameters` (**union across providers**)
* Pricing (as published): `prompt`, `completion`, `request`, `image`, `web_search`, `internal_reasoning`, `input_cache_read`, `input_cache_write`
* Top provider summary: `context_length`, `max_completion_tokens`, `is_moderated`
Notes: `supported_parameters` at model level is a **union**, not guaranteed at any single endpoint. ([OpenRouter][2])

### Layer 2 — Provider Endpoints (`GET /api/v1/models/:author/:slug/endpoints`)

Per provider that hosts the model:

* `provider_name`, `name` (endpoint), `status`, `uptime_last_30m`
* Limits: `context_length`, `max_prompt_tokens`, `max_completion_tokens`, `quantization`
* Capabilities: endpoint-level `supported_parameters`
* Pricing overrides: `prompt`, `completion`, `request`, `image`
Use these to resolve real availability and price for your route. ([OpenRouter][4])

### Layer 3 — Optional Generation Telemetry (`GET /api/v1/generation` or inline **usage accounting**)

If you have generation IDs or enable usage accounting in your workloads:

* Identity & routing: `model`, `provider_name`, `origin`, `upstream_id`
* Tokens (native): `native_tokens_prompt`, `native_tokens_completion`, `native_tokens_reasoning`
* Tokens (gateway): `tokens_prompt`, `tokens_completion`
* Cost: `total_cost`, `cache_discount`, `upstream_inference_cost`
* Timings: `latency`, `generation_time`, moderation latency
* Termination: `finish_reason`, `native_finish_reason`
* Media/search counters; cached token counts (reads)
This enables empirical latency/cost analyses grounded in the provider’s native tokenizer. ([OpenRouter][3])

---

## Outputs (immutable, analysis-ready)

* `artifacts/YYYY-MM-DD/`

* `models.jsonl` (one model per line; raw API objects)
* `endpoints.jsonl` (provider-expanded rows)
* `joined.parquet` (denormalized: model ⨯ endpoint)
* `telemetry.jsonl` (optional generation/usage rows)
* `schema/*.json` (JSON Schema for each file)
* `MANIFEST.sha256` (file → SHA-256)
* `FETCH_METADATA.json` (request URLs, headers, etags, timestamps, versions)
* `CITATION.cff` (how to cite this snapshot)
* `NOTICE.txt` (license + terms provenance)
* All files compressed with `zstd` if `--zstd` is set.

---

## Install

```bash
# Node 20+ recommended
pnpm i -g openrouter-census # or: npx openrouter-census
```

Local dev:

```bash
git clone https://github.com/yourorg/openrouter-census
cd openrouter-census
pnpm install
```

---

## Configuration

```bash
# Optional: required only if you fetch protected usage for your own generations
export OPENROUTER_API_KEY=...

# Rate limits (Bottleneck): default is conservative; override as needed
export CENSUS_RESERVOIR=100 # max requests per window
export CENSUS_INTERVAL_MS=60_000 # window length
export CENSUS_MAX_CONCURRENCY=8 # parallelism
```

> Tip: tune Bottleneck’s **reservoir** + **interval** for burst/steady control. Keep headroom for Cloudflare/DDOS protections and any per-account limits. ([OpenRouter][5], [DEV Community][6])

---

## Usage (CLI & TUI)

```bash
openrouter-census tui
```

**TUI features**

* Live counters: queued / inflight / completed / failed
* Throttles: adjust concurrency & reservoir at runtime
* Filters: by provider, family, modality, supports(param)
* Export panel: write JSONL/Parquet + schemas + manifest
* Retry controls: backoff/rehydrate failed requests

**Non-interactive**

```bash
# Full scrape → immutable snapshot directory
openrouter-census run --out artifacts/$(date +%F) --zstd

# Fetch with telemetry (ids from stdin) and merge
cat ids.txt | openrouter-census usage --append artifacts/2025-08-21/telemetry.jsonl
```

---

## Data model (schemas)

* `schema/models.schema.json` – exact API fields from `/models` with permissive `additionalProperties: true`
* `schema/endpoints.schema.json` – rows keyed by `(model_id, provider_name, endpoint_name)`
* `schema/telemetry.schema.json` – union of `/generation` + inline usage accounting fields
* `schema/joined.schema.json` – normalized and namespaced (`model.*`, `endpoint.*`, `price.*`, `limits.*`)

Schemas are versioned; each run stores the **exact** schema copy used to validate outputs.

---

## Reproducibility & provenance

* **Deterministic traversal:** stable sort of model IDs; endpoint requests queued in lexical order.
* **Version pins:** `pnpm-lock.yaml` checked in; run logs emit Node + package versions.
* **Raw preservation:** we persist **unmodified** API payloads alongside normalized tables.
* **Integrity:** SHA-256 digests and `FETCH_METADATA.json` (timestamps, etags, cache status).
* **Citations:** `CITATION.cff` includes title, version, DOI placeholder, and snapshot date.
* **License/Terms capture:** `NOTICE.txt` records the doc URLs and access dates for API docs.

---

## Limits, caveats, ethics

* `supported_parameters` at model level is **union**; intersect with endpoint capabilities for truth in UX or routing guarantees. ([OpenRouter][1])
* Endpoint `uptime_last_30m` is a short pulse, not an SLA. Prefer aggregates if you collect telemetry. ([OpenRouter][4])
* Pricing may differ per endpoint; do not assume model-level pricing applies to your chosen provider. ([OpenRouter][4])
* Respect rate limits & terms; don’t stress shared free endpoints. Consider backoffs and caching. ([OpenRouter][5])

---

## Programmatic API (library usage)

```ts
import { census } from "openrouter-census";

const run = await census({
outDir: "artifacts/2025-08-21",
concurrency: 8,
reservoir: 100,
intervalMs: 60_000,
fetchTelemetry: false,
});
console.log(run.summary);
```

---

## Minimal pipeline spec

1. **Models pass**

* GET `https://openrouter.ai/api/v1/models` → write raw → validate → extract rows. ([OpenRouter][2])
2. **Endpoints pass**

* For each `{author}/{slug}`, GET `/api/v1/models/:author/:slug/endpoints` → write raw → validate → rows. ([OpenRouter][4])
3. **(Optional) Telemetry pass**

* For provided generation IDs or inline usage payloads, normalize and append to `telemetry.jsonl`. ([OpenRouter][3])
4. **Join & export**

* Create `joined.parquet` + schemas + manifest + checksums.

---

## What you can visualize later (suggestions)

* **Price vs context** (per endpoint)
* **Parameter support heatmap** (models Ă— params; intersected, not union)
* **Short-window reliability** (endpoint uptime)
* **Observed latency & effective \$/1k** from telemetry (native tokens)

---

## References

* Models API & schema. ([OpenRouter][2])
* List models (union of `supported_parameters`). ([OpenRouter][1])
* Per-model endpoints (provider-specific limits/pricing/uptime). ([OpenRouter][4])
* Usage accounting (native tokens, cached tokens, reasoning tokens). ([OpenRouter][7])
* Precise token accounting & `/generation`. ([OpenRouter][3])
* Reasoning tokens overview. ([OpenRouter][8])
* OpenRouter limits (guidance for throttling). ([OpenRouter][5])
* Bottleneck patterns for Node rate limiting. ([DEV Community][6])

---

## License

MIT for code. Data artifacts inherit upstream terms; see `NOTICE.txt`.

[1]: https://openrouter.ai/docs/api-reference/list-available-models?utm_source=chatgpt.com "List available models | OpenRouter | Documentation"
[2]: https://openrouter.ai/docs/models?utm_source=chatgpt.com "Access 400+ AI Models Through One API - OpenRouter"
[3]: https://openrouter.ai/docs/api-reference/overview?utm_source=chatgpt.com "OpenRouter API Reference | Complete API Documentation"
[4]: https://openrouter.ai/docs/api-reference/list-endpoints-for-a-model?utm_source=chatgpt.com "List endpoints for a model | OpenRouter | Documentation"
[5]: https://openrouter.ai/docs/api-reference/limits?utm_source=chatgpt.com "API Rate Limits | Configure Usage Limits in OpenRouter"
[6]: https://dev.to/arifszn/prevent-api-overload-a-comprehensive-guide-to-rate-limiting-with-bottleneck-c2p?utm_source=chatgpt.com "Prevent API Overload: A Comprehensive Guide to Rate Limiting with ..."
[7]: https://openrouter.ai/docs/use-cases/usage-accounting?utm_source=chatgpt.com "Usage Accounting | Track AI Model Usage with OpenRouter"
[8]: https://openrouter.ai/docs/use-cases/reasoning-tokens?utm_source=chatgpt.com "Reasoning Tokens | Enhanced AI Model Reasoning with OpenRouter"