https://github.com/ejfox/openrouter-census

🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard
https://github.com/ejfox/openrouter-census
Last synced: 9 months ago
JSON representation
🤖 Complete analysis of 316+ AI models on OpenRouter with interactive Observable Plot dashboard
Host: GitHub
URL: https://github.com/ejfox/openrouter-census
Owner: ejfox
Created: 2025-08-21T18:25:17.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-08-21T18:25:21.000Z (10 months ago)
Last Synced: 2025-09-06T18:00:17.847Z (9 months ago)
Language: JavaScript
Size: 132 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # 🤖 OpenRouter Model Census

A complete analysis tool for **all 316+ AI models** on OpenRouter—featuring fast scraping, Python-style data analysis, and beautiful Observable Plot visualizations. Get pricing insights, provider comparisons, and model capabilities in an interactive dashboard.

![Dashboard Preview](https://img.shields.io/badge/Charts-7%20Interactive-blue) ![Models](https://img.shields.io/badge/Models-316+-green) ![Providers](https://img.shields.io/badge/Providers-54+-orange)

---

## 🚀 Quick Start

```bash

# Clone and install

git clone https://github.com/your-username/openrouter-scrape

cd openrouter-scrape

npm install

# Run complete pipeline (scrape + analyze + serve dashboard)

npm run full-pipeline

```

**That's it!** Your dashboard opens at `http://localhost:8080` with fresh data on all 316+ models.

## ✨ Features

* **⚡ Fast scraping** - All 316 models in under 1 second

* **📊 Beautiful charts** - 7 interactive Observable Plot visualizations  

* **💰 Pricing insights** - Find the cheapest models, price vs context analysis

* **🏢 Provider analysis** - Market share, strategy mapping, capabilities

* **📱 Responsive design** - Works great on mobile and desktop

* **🔄 Easy updates** - Re-run anytime for fresh data

---

## What gets collected

### Layer 1 — Models (`GET /api/v1/models`)

* Identity: `id`, `name`, `canonical_slug`, `created`, `description`

* Architecture: `input_modalities`, `output_modalities`, `tokenizer`, `instruct_type`

* Aggregates: `context_length` (model-wide), `supported_parameters` (**union across providers**)

* Pricing (as published): `prompt`, `completion`, `request`, `image`, `web_search`, `internal_reasoning`, `input_cache_read`, `input_cache_write`

* Top provider summary: `context_length`, `max_completion_tokens`, `is_moderated`

  Notes: `supported_parameters` at model level is a **union**, not guaranteed at any single endpoint. ([OpenRouter][2])

### Layer 2 — Provider Endpoints (`GET /api/v1/models/:author/:slug/endpoints`)

Per provider that hosts the model:

* `provider_name`, `name` (endpoint), `status`, `uptime_last_30m`

* Limits: `context_length`, `max_prompt_tokens`, `max_completion_tokens`, `quantization`

* Capabilities: endpoint-level `supported_parameters`

* Pricing overrides: `prompt`, `completion`, `request`, `image`

  Use these to resolve real availability and price for your route. ([OpenRouter][4])

### Layer 3 — Optional Generation Telemetry (`GET /api/v1/generation` or inline **usage accounting**)

If you have generation IDs or enable usage accounting in your workloads:

* Identity & routing: `model`, `provider_name`, `origin`, `upstream_id`

* Tokens (native): `native_tokens_prompt`, `native_tokens_completion`, `native_tokens_reasoning`

* Tokens (gateway): `tokens_prompt`, `tokens_completion`

* Cost: `total_cost`, `cache_discount`, `upstream_inference_cost`

* Timings: `latency`, `generation_time`, moderation latency

* Termination: `finish_reason`, `native_finish_reason`

* Media/search counters; cached token counts (reads)

  This enables empirical latency/cost analyses grounded in the provider’s native tokenizer. ([OpenRouter][3])

---

## Outputs (immutable, analysis-ready)

* `artifacts/YYYY-MM-DD/`

  * `models.jsonl` (one model per line; raw API objects)

  * `endpoints.jsonl` (provider-expanded rows)

  * `joined.parquet` (denormalized: model ⨯ endpoint)

  * `telemetry.jsonl` (optional generation/usage rows)

  * `schema/*.json` (JSON Schema for each file)

  * `MANIFEST.sha256` (file → SHA-256)

  * `FETCH_METADATA.json` (request URLs, headers, etags, timestamps, versions)

  * `CITATION.cff` (how to cite this snapshot)

  * `NOTICE.txt` (license + terms provenance)

* All files compressed with `zstd` if `--zstd` is set.

---

## Install

```bash

# Node 20+ recommended

pnpm i -g openrouter-census   # or: npx openrouter-census

```

Local dev:

```bash

git clone https://github.com/yourorg/openrouter-census

cd openrouter-census

pnpm install

```

---

## Configuration

```bash

# Optional: required only if you fetch protected usage for your own generations

export OPENROUTER_API_KEY=... 

# Rate limits (Bottleneck): default is conservative; override as needed

export CENSUS_RESERVOIR=100        # max requests per window

export CENSUS_INTERVAL_MS=60_000   # window length

export CENSUS_MAX_CONCURRENCY=8    # parallelism

```

> Tip: tune Bottleneck’s **reservoir** + **interval** for burst/steady control. Keep headroom for Cloudflare/DDOS protections and any per-account limits. ([OpenRouter][5], [DEV Community][6])

---

## Usage (CLI & TUI)

```bash

openrouter-census tui

```

**TUI features**

* Live counters: queued / inflight / completed / failed

* Throttles: adjust concurrency & reservoir at runtime

* Filters: by provider, family, modality, supports(param)

* Export panel: write JSONL/Parquet + schemas + manifest

* Retry controls: backoff/rehydrate failed requests

**Non-interactive**

```bash

# Full scrape → immutable snapshot directory

openrouter-census run --out artifacts/$(date +%F) --zstd

# Fetch with telemetry (ids from stdin) and merge

cat ids.txt | openrouter-census usage --append artifacts/2025-08-21/telemetry.jsonl

```

---

## Data model (schemas)

* `schema/models.schema.json` – exact API fields from `/models` with permissive `additionalProperties: true`

* `schema/endpoints.schema.json` – rows keyed by `(model_id, provider_name, endpoint_name)`

* `schema/telemetry.schema.json` – union of `/generation` + inline usage accounting fields

* `schema/joined.schema.json` – normalized and namespaced (`model.*`, `endpoint.*`, `price.*`, `limits.*`)

Schemas are versioned; each run stores the **exact** schema copy used to validate outputs.

---

## Reproducibility & provenance

* **Deterministic traversal:** stable sort of model IDs; endpoint requests queued in lexical order.

* **Version pins:** `pnpm-lock.yaml` checked in; run logs emit Node + package versions.

* **Raw preservation:** we persist **unmodified** API payloads alongside normalized tables.

* **Integrity:** SHA-256 digests and `FETCH_METADATA.json` (timestamps, etags, cache status).

* **Citations:** `CITATION.cff` includes title, version, DOI placeholder, and snapshot date.

* **License/Terms capture:** `NOTICE.txt` records the doc URLs and access dates for API docs.

---

## Limits, caveats, ethics

* `supported_parameters` at model level is **union**; intersect with endpoint capabilities for truth in UX or routing guarantees. ([OpenRouter][1])

* Endpoint `uptime_last_30m` is a short pulse, not an SLA. Prefer aggregates if you collect telemetry. ([OpenRouter][4])

* Pricing may differ per endpoint; do not assume model-level pricing applies to your chosen provider. ([OpenRouter][4])

* Respect rate limits & terms; don’t stress shared free endpoints. Consider backoffs and caching. ([OpenRouter][5])

---

## Programmatic API (library usage)

```ts

import { census } from "openrouter-census";

const run = await census({

  outDir: "artifacts/2025-08-21",

  concurrency: 8,

  reservoir: 100,

  intervalMs: 60_000,

  fetchTelemetry: false,

});

console.log(run.summary);

```

---

## Minimal pipeline spec

1. **Models pass**

   * GET `https://openrouter.ai/api/v1/models` → write raw → validate → extract rows. ([OpenRouter][2])

2. **Endpoints pass**

   * For each `{author}/{slug}`, GET `/api/v1/models/:author/:slug/endpoints` → write raw → validate → rows. ([OpenRouter][4])

3. **(Optional) Telemetry pass**

   * For provided generation IDs or inline usage payloads, normalize and append to `telemetry.jsonl`. ([OpenRouter][3])

4. **Join & export**

   * Create `joined.parquet` + schemas + manifest + checksums.

---

## What you can visualize later (suggestions)

* **Price vs context** (per endpoint)

* **Parameter support heatmap** (models × params; intersected, not union)

* **Short-window reliability** (endpoint uptime)

* **Observed latency & effective \$/1k** from telemetry (native tokens)

---

## References

* Models API & schema. ([OpenRouter][2])

* List models (union of `supported_parameters`). ([OpenRouter][1])

* Per-model endpoints (provider-specific limits/pricing/uptime). ([OpenRouter][4])

* Usage accounting (native tokens, cached tokens, reasoning tokens). ([OpenRouter][7])

* Precise token accounting & `/generation`. ([OpenRouter][3])

* Reasoning tokens overview. ([OpenRouter][8])

* OpenRouter limits (guidance for throttling). ([OpenRouter][5])

* Bottleneck patterns for Node rate limiting. ([DEV Community][6])

---

## License

MIT for code. Data artifacts inherit upstream terms; see `NOTICE.txt`.

[1]: https://openrouter.ai/docs/api-reference/list-available-models?utm_source=chatgpt.com "List available models | OpenRouter | Documentation"

[2]: https://openrouter.ai/docs/models?utm_source=chatgpt.com "Access 400+ AI Models Through One API - OpenRouter"

[3]: https://openrouter.ai/docs/api-reference/overview?utm_source=chatgpt.com "OpenRouter API Reference | Complete API Documentation"

[4]: https://openrouter.ai/docs/api-reference/list-endpoints-for-a-model?utm_source=chatgpt.com "List endpoints for a model | OpenRouter | Documentation"

[5]: https://openrouter.ai/docs/api-reference/limits?utm_source=chatgpt.com "API Rate Limits | Configure Usage Limits in OpenRouter"

[6]: https://dev.to/arifszn/prevent-api-overload-a-comprehensive-guide-to-rate-limiting-with-bottleneck-c2p?utm_source=chatgpt.com "Prevent API Overload: A Comprehensive Guide to Rate Limiting with ..."

[7]: https://openrouter.ai/docs/use-cases/usage-accounting?utm_source=chatgpt.com "Usage Accounting | Track AI Model Usage with OpenRouter"

[8]: https://openrouter.ai/docs/use-cases/reasoning-tokens?utm_source=chatgpt.com "Reasoning Tokens | Enhanced AI Model Reasoning with OpenRouter"
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ejfox/openrouter-census

Awesome Lists containing this project

README