https://github.com/maureranton/artificialanalysis-ai-parser

Parser for artificialanalysis.ai — extract AI model pricing, benchmarks & speed without an API key. Python (CLI) + JavaScript (browser & Node.js). Rewrites the broken demianarc/artificialanalysisscrapper.
https://github.com/maureranton/artificialanalysis-ai-parser

ai-models artificial-analysis artificialanalysis benchmarks data-extraction llm model-data parser pricing python rsc scraper

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/maureranton/artificialanalysis-ai-parser
Owner: MaurerAnton
License: other
Created: 2026-05-10T21:08:07.000Z (about 2 months ago)
Default Branch: master
Last Pushed: 2026-05-10T21:44:08.000Z (about 2 months ago)
Last Synced: 2026-05-10T23:24:04.954Z (about 2 months ago)
Topics: ai-models, artificial-analysis, artificialanalysis, benchmarks, data-extraction, llm, model-data, parser, pricing, python, rsc, scraper
Language: JavaScript
Size: 24.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # artificialanalysis-ai-parser

Parser for [artificialanalysis.ai](https://artificialanalysis.ai) — extracts AI model data (pricing, benchmarks, speed) **without an API key**.

## Why?

The idea started from [demianarc/artificialanalysisscrapper](https://github.com/demianarc/artificialanalysisscrapper) — a Python scraper that fetched model data from the Artificial Analysis Next.js RSC endpoint. It was a clever approach: the site's React Server Components stream exposed the full dataset (`hostsModels`) in a single 10 MB response, no authentication needed.

However, after the site's redesign ("A new look for Artificial Analysis"), the old line-based parser broke completely. The RSC format changed from simple `key:value` pairs to a chunk-referenced wire format with `I[...]` inline references and `$c:props:...` circular links.

This project:

- **Rewrites the extraction** using regex + bracket-counting instead of line-based parsing

- **Deduplicates** 867 host-model pairs down to 326 unique models (keeping first occurrence with full non-circular data)

- **Cleans** the output to only essential fields (pricing, IQ, speed, context window)

- **Outputs** `models.json` — 314 models with full input/output/cache pricing, ready for downstream use

The result is a self-contained Python script with zero dependencies beyond the standard library.

## Quick start

### C++

```bash

g++ -std=c++17 -O2 artificialanalysis.ai-parser.cpp -lcurl -o aaparser

./aaparser --minimal --pretty          # fetch + save to models.json

```

Requires: `libcurl`, `nlohmann/json` (header-only, auto-downloaded if missing).

### Python

```bash

python3 artificialanalysis.ai-parser.py --minimal --pretty

```

### JavaScript (Node.js)

```js

// Node.js — works without CORS restrictions

const { AAParser } = require('./artificialanalysis.ai-parser.js');

const models = await AAParser.fetch({ minimal: true });

console.log(models[0].name, models[0].price_1m_input_tokens);

```

> **Note:** The JS parser does **not** work directly in the browser. The RSC endpoint requires the custom `rsc` header which triggers a CORS preflight, and the server does not return `Access-Control-Allow-Headers`. Use in Node.js or through a CORS proxy.

### Output

```

Downloading RSC data from https://artificialanalysis.ai/leaderboards/providers?_rsc=hgvan ...

Downloaded 10,481,155 bytes

Extracted 867 raw entries (host-model pairs)

Deduplicated to 326 unique models

Models with pricing: 314

Saved 314 models to models.json (134,549 bytes)

Top model: GPT-5.5 (xhigh) (OpenAI)

  IQ: 60.24 | Coding: 59.12 | Math: None

  Price: $5.00 in / $30.00 out

  Speed: 57 tok/s

```

## models.json structure

Each entry:

| Field | Description |

|---|---|

| `name` | Model name |

| `creator` | AI lab / company |

| `slug` | URL-friendly identifier |

| `intelligence_index` | AA Intelligence Index score |

| `coding_index` | AA Coding Index score |

| `math_index` | AA Math Index score |

| `price_1m_input_tokens` | Input price per 1M tokens (USD) |

| `price_1m_output_tokens` | Output price per 1M tokens (USD) |

| `price_1m_cache_hit` | Cache hit price per 1M tokens (USD) |

| `blended_price_3_1` | Blended price at 3:1 input:output ratio |

| `context_window_tokens` | Context window size |

| `output_tokens_per_second` | Generation speed |

| `time_to_first_token_ms` | Latency to first token |

| `reasoning` | Whether it's a reasoning model |

| `open_weights` | Whether weights are open |

## Data coverage

| Metric | Coverage |

|---|---|

| Pricing (input/output) | 100% (314/314) |

| Intelligence Index | 87% |

| Coding Index | 90% |

| Math Index | 60% |

| Speed (tok/s) | 100% |

| Cache pricing | 33% |

## How it works

```text

artificialanalysis.ai

  └─ /leaderboards/providers?_rsc=hgvan

       └─ Next.js RSC stream (10 MB, text/x-component)

            └─ Contains "hostsModels":[{...}] with ~867 entries

                 └─ Extract JSON via bracket-counting

                      └─ Deduplicate by model_id

                           └─ Clean & output models.json

```

The RSC endpoint requires specific headers (`rsc: 1`, `next-router-state-tree`, `next-url`) but no cookies or authentication.

## Limitations

- **No API key = fragile.** The RSC endpoint is an internal Next.js mechanism. If the site changes its chunk format again, the bracket-counting may need updating.

- **Circular references.** From the 2nd entry onward, some nested model fields use `$c:props:...` reference strings instead of actual values. We keep only the *first* occurrence per `model_id` (which has full data).

- **Official API is preferred** for production use. This parser is a workaround for when you don't have (or don't want) an API key. See [artificialanalysis.ai/documentation](https://artificialanalysis.ai/documentation) for the free API tier (1,000 req/day).

## Companion: interactive cost calculator

`dashboard.html` — a dark-themed token cost dashboard that lets you see how much you'd spend using different AI model providers.

`compact-dashboard.html` — a lightweight version: no charts, 4 top models compared side by side. Each model card shows estimated total cost for your token data at a glance.

**Try it live:**  

[Full dashboard](https://maureranton.github.io/dashboard/dashboard.html) — charts, model selector, date range filter  

[Compact dashboard](https://maureranton.github.io/dashboard/compact-dashboard.html) — 4 models, instant cost comparison

**To run locally:**

1. Open `dashboard.html` or `compact-dashboard.html` in a browser (or serve via any HTTP server)

2. They load `paths.json` → `data.json` + `models.json`

3. Select a model — prices auto-fill from Artificial Analysis data

4. Tweak token counts — costs recalculate instantly

Example files included:

- `example-paths.json` — points to `example-data.json` and `models.json`

- `example-data.json` — 7 days of synthetic token data for demo

To use your own data, rename `example-paths.json` → `paths.json`, point it at your data file, and update your `data.json` with real token counts.

## License

GPL-3.0 — Copyright (C) 2026 Anton Maurer

## Credits

- Original scraping concept by [demianarc/artificialanalysisscrapper](https://github.com/demianarc/artificialanalysisscrapper)

- Model data source: [artificialanalysis.ai](https://artificialanalysis.ai)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/maureranton/artificialanalysis-ai-parser

Awesome Lists containing this project

README