{"id":50733316,"url":"https://github.com/thepriben/statswiki","last_synced_at":"2026-06-10T11:01:19.950Z","repository":{"id":247061673,"uuid":"766290222","full_name":"thepriben/StatsWiki","owner":"thepriben","description":"Lightweight, forkable Wikipedia pageview rankings powered by Wikidata and Parquet, here English version.","archived":false,"fork":false,"pushed_at":"2026-06-09T16:34:40.000Z","size":48099,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-06-09T18:16:29.805Z","etag":null,"topics":["2024","2026","english-wikipedia","pageview-ranking","parquet-files","python","statswiki","vuejs","wikidata"],"latest_commit_sha":null,"homepage":"https://statswiki.info","language":"Vue","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thepriben.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-02T21:10:44.000Z","updated_at":"2026-06-09T16:41:43.000Z","dependencies_parsed_at":"2024-07-21T16:15:35.965Z","dependency_job_id":null,"html_url":"https://github.com/thepriben/StatsWiki","commit_stats":null,"previous_names":["benprieur/statswiki","thepriben/statswiki"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thepriben/StatsWiki","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepriben%2FStatsWiki","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepriben%2FStatsWiki/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepriben%2FStatsWiki/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepriben%2FStatsWiki/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thepriben","download_url":"https://codeload.github.com/thepriben/StatsWiki/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepriben%2FStatsWiki/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34149132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["2024","2026","english-wikipedia","pageview-ranking","parquet-files","python","statswiki","vuejs","wikidata"],"created_at":"2026-06-10T11:01:18.486Z","updated_at":"2026-06-10T11:01:19.944Z","avatar_url":"https://github.com/thepriben.png","language":"Vue","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StatsWiki\n\n**Most-read articles on English Wikipedia** — daily rankings from **July 1, 2015** to yesterday.\n\n**Live site:** https://statswiki.info/\n\n**X:** https://x.com/statswiki\n\n**Bluesky:** https://bsky.app/profile/statswiki.bsky.social\n\nMIT license — fork for [another language or project → ADAPT.md](ADAPT.md).\n\n---\n\n## At a glance\n\n| | |\n|---|---|\n| **Data** | Wikimedia Pageviews API → Parquet → static JSON |\n| **Site** | Vue 3 SPA on GitHub Pages (no runtime API calls) |\n| **Updates** | Daily cron + manual backfill for history |\n| **Enrichment** | Wikidata QID, label, description, image |\n| **Rankings** | Top 50 per day, month, year, all-time |\n\n---\n\n## What the site shows\n\n### Home\n\nThree live panels (top 50 each), with fallback to the latest available period when yesterday / this month are not yet ingested:\n\n- **Yesterday** (or latest day)\n- **This month** (or latest month)\n- **This year**\n\n### Period pages\n\n| View | URL | Content |\n|------|-----|---------|\n| Day | `/2026/05/31` | Top 50 that day |\n| Month | `/2026/05` | Top 50 aggregated over the month |\n| Year | `/2026` | Top 50 aggregated over the year |\n| All time | `/alltime` | Top 50 since July 2015 |\n\nBrowse via **Year / Month / Day** dropdowns in the header (no date in the page title).\n\n### Article stats (QID)\n\nClick a **Wikidata QID** in any table → `/q/Q22686` with monthly / yearly view charts, total views, peak period.\n\nEach row: rank, Wikipedia link, QID, description, thumbnail (links to Wikimedia Commons), view count.\n\n### Wikirace\n\nCompare **daily Wikipedia pageviews** for a group of articles over any date range.\n\n| View | URL | Content |\n|------|-----|---------|\n| Builder | `/wikirace` | Search catalog, pick articles, set dates |\n| Race | `/wikirace/Q1+Q2/YYYY-MM-DD/YYYY-MM-DD` | Chart, Race% table, shareable link |\n| Help | `/wikirace/help` | Public guide (from `docs/wikirace-help.md`) |\n\n**Race%** = one article’s views as a % of the group total (area under the curve). Data is fetched live from the Wikimedia Pageviews API.\n\n**Docs:** [docs/wikirace.md](docs/wikirace.md) (maintainer README) · [docs/wikirace-help.md](docs/wikirace-help.md) (public help → `npm run build:help`)\n\n---\n\n## Architecture\n\n```\nWikimedia Pageviews API     one HTTP request per day\n         │\n         ▼\ndata/pageviews/             Parquet (date, article, views, rank)\ndata/articles.parquet       Wikidata catalog\n         │\n         ▼  aggregate + merge by QID\nweb/public/data/            static JSON (top 50 per period)\n         │\n         ▼\nVue 3 SPA                   GitHub Pages CDN\n```\n\n**Day → month → year:** months and years are **sums of daily rows**, never fetched separately. See [consolidation](#day--month--year) below.\n\n**Redirects:** old article titles that share a Wikidata item have views merged before ranking.\n\n---\n\n## Quick start (local)\n\n```bash\n# Pipeline\ncd pipeline \u0026\u0026 python3 -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e .\n\nsw-fetch --date 2026-05-01          # one day\nsw-backfill --year 2026               # full year\nsw-daily                              # yesterday + export\nsw-export-qids                        # QID time-series JSON\n\n# Frontend\ncd web \u0026\u0026 npm ci \u0026\u0026 npm run dev\n# → http://localhost:5173/\n```\n\n---\n\n## Deployment (GitHub Pages)\n\n**Custom domain:** [statswiki.info](https://statswiki.info) — DNS at the registrar, `web/public/CNAME`, and **Settings → Pages → Custom domain** on [thepriben/StatsWiki](https://github.com/thepriben/StatsWiki).\n\n1. **Settings → Pages → Source: GitHub Actions** (one-time).\n2. Push to `main` — **Deploy Pages** runs when `web/` or `data/` changes.\n3. Backfill and daily workflows **commit data, then deploy** in the same run.\n\n| Workflow | Trigger | Role |\n|----------|---------|------|\n| **Deploy Pages** | Push or manual | Build Vue → publish |\n| **Daily update** | 08:00 \u0026 14:00 UTC or manual | Yesterday → daily top 5 + period posts → commit → deploy |\n| **Backfill** | Manual (pick year) | One year of history |\n| **Backfill sequence** | Manual | 2025 → 2016 in one job |\n\n### Backfill order (recommended)\n\n1. **Current year** first — homepage needs recent data.\n2. **Backfill sequence** (or year-by-year) down to **2015** (July 1 for 2015).\n3. Leave **Daily update** enabled.\n\n~5–10 minutes per year on GitHub Actions.\n\n### Daily fetch schedule\n\nWikimedia publishes **top/day** pageviews roughly **24 hours after UTC midnight**. The workflow runs twice:\n\n| Run | UTC | Purpose |\n|-----|-----|---------|\n| Primary | **08:00** | Fetch yesterday, enrich, export |\n| Retry | **14:00** | Same pipeline if morning data was not ready |\n\n**If data is not available yet:** the fetch retries up to 3× per attempt (with backoff), then the job exits without commit or deploy. The 14:00 run tries again automatically.\n\n**If yesterday is already in the database** (e.g. after a successful morning run), the fetch is skipped but enrich/export still run — useful if Wikidata mapping changed.\n\n### Social posts (@statswiki on X and Bluesky)\n\nAfter each successful daily run:\n\n| Trigger | When | Post |\n|---------|------|------|\n| **Day** | Every run | Top 5 for yesterday |\n| **Week** | Yesterday was **Sunday** | Top 5 for Mon–Sun (e.g. `Mon 26 May – Sun 1 Jun 2026`) |\n| **Month** | Yesterday was the **last day of the month** | Top 5 for that month |\n| **Year** | Yesterday was **31 December** | Top 5 for that year |\n\nManual dry-run: `sw-period-posts --dry-run --date YYYY-MM-DD --force`\n\n---\n\n## Repository layout\n\n```\nStatsWiki/\n├── web/                         # Vue 3 frontend\n│   ├── src/\n│   │   ├── App.vue              # routing, header, home\n│   │   ├── QidPage.vue          # article stats + chart\n│   │   ├── RankingTable.vue\n│   │   ├── wikirace/            # Wikirace feature\n│   │   └── lib.js\n│   ├── public/wikirace/         # groups.json, catalog.json, help.json\n│   └── public/data/             # generated JSON (+ q/Q*.json)\n├── docs/\n│   ├── wikirace.md              # Wikirace maintainer README\n│   └── wikirace-help.md         # Wikirace public help (English)\n├── data/                        # Parquet source of truth\n│   ├── pageviews/year=Y/month=M/\n│   ├── articles.parquet\n│   └── manifest.json\n├── pipeline/src/statswiki/      # Python ETL\n└── .github/workflows/\n```\n\n---\n\n## Pipeline commands\n\n| Command | Purpose |\n|---------|---------|\n| `sw-fetch --date YYYY-MM-DD` | Ingest one day |\n| `sw-backfill --year YYYY` | Ingest year + Wikidata top 1000 + export |\n| `sw-daily` | Yesterday + enrich + export recent |\n| `sw-enrich --top 500` | Re-enrich top articles by total views |\n| `sw-enrich --refresh-shadows 100` | Retry unresolved QIDs |\n| `sw-export --recent` | Rebuild yesterday / month / year / alltime JSON |\n| `sw-export --year YYYY` | Export all periods for one year |\n| `sw-export-qids` | Export `data/q/Q*.json` time series for charts |\n| `sw-wikirace-catalog` | Export `web/public/wikirace/catalog.json` for autocomplete |\n| `sw-period-posts` | Post week/month/year top 5 to X and Bluesky when due |\n\n| npm (in `web/`) | Purpose |\n|-----------------|---------|\n| `npm run build:help` | `docs/wikirace-help.md` → `web/public/wikirace/help.json` |\n\nAll ingest is **idempotent** — existing days are skipped.\n\n---\n\n## Data model\n\n### Pageviews (`data/pageviews/`)\n\n| Column | Description |\n|--------|-------------|\n| `date` | Day |\n| `article` | Title with underscores (as in API) |\n| `views` | View count |\n| `rank` | Position in daily top ~1000 |\n\n### Articles catalog (`data/articles.parquet`)\n\n| Column | Description |\n|--------|-------------|\n| `article` | Pageview title |\n| `qid` | Wikidata QID (e.g. Q22686) |\n| `resolved_title` | Canonical title after Wikipedia redirects |\n| `label`, `description`, `image` | From Wikidata |\n| `updated_at` | Last enrichment |\n\n### Export JSON (`web/public/data/`)\n\nEach file has `period`, `lines` (array of ranked articles), and optionally `nav` (sub-links on year/month views).\n\n| Field | Description |\n|-------|-------------|\n| `rank` | 1–50 |\n| `title` | Wikipedia title (`Article_Name`) |\n| `label` | Display name from Wikidata |\n| `description` | Short Wikidata description |\n| `views` | View count for the period |\n| `qid` | Wikidata ID (e.g. `Q12345`) |\n| `image` | Commons thumbnail URL |\n\n`manifest.json` — `start`, `end`, `updated`, `language`.\n\n---\n\n## Day → month → year\n\n```\n1 API call / day  →  Parquet row per (date, article)\n                         │\n                         ├─ SUM(days in month)  →  month/YYYY/MM.json\n                         ├─ SUM(days in year)   →  year/YYYY.json\n                         └─ SUM(all days)       →  alltime.json\n```\n\n---\n\n## Wikidata\n\nBatched enrichment (50 titles / request):\n\n1. **QID** — Wikipedia `pageprops`, follows redirects\n2. **Fallbacks** — Wikidata search + opensearch\n3. **Entity** — label, description, image (P18 / P154)\n4. **Export** — merge views by QID before top-50 ranking\n\nManual overrides in `filters.py` for edge cases. Shadow QIDs (`Q_en_…`) retried on high-traffic articles.\n\nModules: `wikidata.py`, `mapping.py`, `qid_export.py`.\n\n---\n\n## Fork for another language\n\nThis repo tracks **English Wikipedia only**. To run StatsWiki for French, German, Japanese, etc.:\n\n→ **[ADAPT.md](ADAPT.md)** — step-by-step fork guide (config, Pages URL, Wikidata language, backfill).\n\nMulti-language in a **single** site is not implemented. One fork per language is the intended model. **Pull requests to this repo are not accepted** — fork under MIT and maintain your own copy.\n\n---\n\n## License\n\n**Code:** [MIT](LICENSE)\n\n**Data** (Wikipedia / Wikidata content shown on the site): [Wikimedia Terms of Use](https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use), [Wikidata CC0](https://creativecommons.org/publicdomain/zero/1.0/) (Commons images retain their own licenses).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthepriben%2Fstatswiki","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthepriben%2Fstatswiki","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthepriben%2Fstatswiki/lists"}