{"id":35237264,"url":"https://github.com/mrshu/github-statuses","last_synced_at":"2026-04-09T05:32:03.918Z","repository":{"id":41056989,"uuid":"502458912","full_name":"mrshu/github-statuses","owner":"mrshu","description":"The \"Missing GitHub Status Page\" -- a Flat Data attempt at historically documenting GitHub statuses","archived":false,"fork":false,"pushed_at":"2026-04-03T05:49:14.000Z","size":2439,"stargazers_count":141,"open_issues_count":3,"forks_count":10,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-04-03T13:31:26.103Z","etag":null,"topics":["data-extraction","flat-data","github","ner","open-data","status","status-page","uptime"],"latest_commit_sha":null,"homepage":"https://mrshu.github.io/github-statuses/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrshu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-06-11T21:23:19.000Z","updated_at":"2026-04-03T13:08:06.000Z","dependencies_parsed_at":"2026-01-26T08:04:58.068Z","dependency_job_id":null,"html_url":"https://github.com/mrshu/github-statuses","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mrshu/github-statuses","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrshu%2Fgithub-statuses","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrshu%2Fgithub-statuses/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrshu%2Fgithub-statuses/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrshu%2Fgithub-statuses/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrshu","download_url":"https://codeload.github.com/mrshu/github-statuses/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrshu%2Fgithub-statuses/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31587798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"online","status_checked_at":"2026-04-09T02:00:06.848Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","flat-data","github","ner","open-data","status","status-page","uptime"],"created_at":"2025-12-30T04:04:52.600Z","updated_at":"2026-04-09T05:32:03.911Z","avatar_url":"https://github.com/mrshu.png","language":"HTML","funding_links":[],"categories":["HTML"],"sub_categories":[],"readme":"# github-statuses\n\nA Flat Data attempt at historically documenting GitHub statuses.\n\n## About\n\nThis project builds the **\"missing GitHub status page\"**: a historical mirror that shows actual uptime\npercentages and incidents across the entire platform, plus per-service uptime based on the incident data.\nIt reconstructs timelines from the Atom feed history and turns them into structured outputs and a static site.\n\n## Mentioned in\n\n- [GitHub seems to be struggling with three nines availability](https://www.theregister.com/2026/02/10/github_outages/)\n  (The Register, 2026-02-10), which cites this project's reconstructed \"missing GitHub status page\" as\n  an unofficial historical source.\n- [\"Github might be in trouble\"](https://www.youtube.com/watch?v=f3u57jkwBFE)\n  (The PrimeTime on YouTube, 2026-03-09), which later covered this project in a YouTube video.\n- [\"AI Has Broken the Internet\"](https://www.youtube.com/watch?v=44JBZwAsfJI\u0026t=78s)\n  (ForrestKnight on YouTube, 2026-03-30), which referenced this project's 90-day uptime and\n  incident counts while discussing GitHub's recent outage pattern.\n\n## Prerequisites\n\n### uv\n\nInstall [uv](https://github.com/astral-sh/uv) per the [installation docs](https://docs.astral.sh/uv/getting-started/installation/).\n\n### Python (for running the extractor)\n\nThe extractor depends on `onnxruntime`, which supports **Python 3.11–3.13** (not 3.14+).\nCheck your version: `python --version` or `python3 --version`.\n\nIf you need a compatible version, uv can install it:\n\n```bash\nuv python install 3.13\n```\n\nThen create the virtual environment with that Python:\n\n```bash\nuv venv --python 3.13\nuv sync\n```\n\nOr use `uv sync` with an existing venv. If `uv sync` fails with an `onnxruntime` error, your Python is\nlikely 3.14+; switch to 3.11–3.13 as above.\n\n### Dependencies\n\nRun `uv sync` before using the extractor scripts. **To view the static site only**, no dependencies are\nneeded—the `parsed/` directory in the repo contains pre-generated data; just serve the repo root with\nany HTTP server.\n\n## Quick start (uv)\n\n```\nuv venv\nuv sync\n```\n\nRun the extractor across all history:\n\n```\nuv run python scripts/extract_incidents.py --out out\n```\n\nRun the extractor for the last year (UTC example):\n\n```\nuv run python scripts/extract_incidents.py --out out_last_year --since 2025-01-03 --until 2026-01-03\n```\n\nUse JSONL (default) or split per-incident outputs:\n\n```\nuv run python scripts/extract_incidents.py --out out --incidents-format jsonl\nuv run python scripts/extract_incidents.py --out out --incidents-format split\n```\n\nEnrich incidents with impact level by scraping the incident pages (cached):\n\n```\nuv run python scripts/extract_incidents.py --out out --enrich-impact\n```\n\nInfer missing components with GLiNER2 (used only when the incident page lacks \"affected components\"):\n\n```\nuv run python scripts/extract_incidents.py --out out --infer-components gliner2\n```\n\n## Automation\n\nAfter each Flat data update, a GitHub Action runs the parser and commits outputs to `parsed/`.\n\nRun tests:\n\n```\nuv run python -m unittest discover -s tests\n```\n\n## Status site\n\nStatic site lives in `site/` and reads data from `parsed/`.\nNo build step—serve the repo root with any static HTTP server:\n\n```bash\npython -m http.server 8000\n# or: python3 -m http.server 8000\n```\n\nThen open \u003chttp://localhost:8000/site/\u003e in your browser.\n\n## GLiNER2 component inference\n\nSome incident pages do not list \"affected components\". In those cases we use GLiNER2 as a fallback:\n\n- Input text: incident title + non-Resolved updates.\n- Labels: the 10 GitHub services with short descriptions.\n- Thresholded inference (default: 0.75 confidence).\n- Final filter: the label must also appear via explicit service aliases in the text.\n\nThis keeps HTML tags as the source of truth and uses ML only to fill gaps.\n\n### GLiNER2 experiment (evaluation + audit)\n\nTo validate the fallback approach, an experiment is run that produces:\n\n- **Audit**: every GLiNER2-tagged incident with text evidence snippets.\n- **Evaluation**: GLiNER2 predictions compared against incidents that *do* have HTML \"affected components\".\n\nReproduce the experiment at a fixed time point (numbers will change as new data arrives):\n\n```\nuv run python scripts/run_gliner_experiment.py --as-of 2026-01-08 --output-dir tagging-experiment\n```\n\nOutputs are written to:\n\n- `tagging-experiment/gliner2_audit.jsonl` (tagged incidents + evidence snippets)\n- `tagging-experiment/gliner2_eval.json` (metrics, per-label breakdown, sample mismatches)\n- `tagging-experiment/error_analysis.md` (diff-style table of sample errors)\n\nLatest results (as-of 2026-01-08, threshold 0.75, alias filter on, non-Resolved text only):\n\n| Metric | Value |\n|---|---:|\n| Evaluated incidents | 447 |\n| Predicted non-empty | 419 |\n| Precision | 0.950 |\n| Recall | 0.883 |\n| Exact match rate | 0.785 |\n| Audit count (missing-tag incidents) | 51 |\n\nPer-label precision/recall (top-level service components):\n\n| Label | Precision | Recall | TP | FP | FN |\n|---|---:|---:|---:|---:|---:|\n| Git Operations | 0.968 | 0.909 | 60 | 2 | 6 |\n| Webhooks | 0.938 | 0.918 | 45 | 3 | 4 |\n| API Requests | 0.915 | 0.915 | 54 | 5 | 5 |\n| Issues | 1.000 | 0.286 | 22 | 0 | 55 |\n| Pull Requests | 0.948 | 0.979 | 92 | 5 | 2 |\n| Actions | 0.958 | 0.947 | 161 | 7 | 9 |\n| Packages | 0.917 | 0.971 | 33 | 3 | 1 |\n| Pages | 0.855 | 0.964 | 53 | 9 | 2 |\n| Codespaces | 0.982 | 0.982 | 110 | 2 | 2 |\n| Copilot | 1.000 | 0.967 | 58 | 0 | 2 |\n\nSummary: the fallback is high-precision and mostly conservative. Most errors are **false negatives**\n(missing a true component), while false positives are typically \"extra\" components inferred from\nmulti-service incident titles.\n\nSample errors (diffs highlighted with `+` for extra and `-` for missing):\n\n| Type | Incident | Predicted | Truth |\n|---|---|---|---|\n| false_positive | [Incident with GitHub Actions and Codespaces](https://www.githubstatus.com/incidents/vxvyrmy9w1vp) | Actions, `+Codespaces` | Actions |\n| false_positive | [Incident with GitHub Packages and GitHub Pages](https://www.githubstatus.com/incidents/b40k7ckrs7sp) | Packages, `+Pages` | Packages |\n| false_positive | [Incident with Pull Requests and Webhooks](https://www.githubstatus.com/incidents/d1stj7xcn4fy) | Pull Requests, `+Webhooks` | Pull Requests |\n| false_negative | [Incident on 2022-09-06 22:05 UTC](https://www.githubstatus.com/incidents/cwp52gsftl5n) | `none` | `-Git Operations`, `-Visit www`, `-Webhooks` |\n| false_negative | [Incident on 2022-09-06 22:56 UTC](https://www.githubstatus.com/incidents/wl09fvhb20x8) | `none` | `-Git Operations`, `-Visit www`, `-Webhooks` |\n| false_negative | [Incident with API Requests](https://www.githubstatus.com/incidents/4vr2qz8lgdmq) | `none` | `-API Requests` |\n\n## Outputs\n\nOutputs are written to the directory passed to `--out` (local examples use `out/`).\nThe GLiNER2 experiment writes to `tagging-experiment/` by default.\nThe automation workflow writes to `parsed/`.\n\n- `\u003cout\u003e/incidents.json`: merged incident timeline records\n- `\u003cout\u003e/incidents.jsonl`: one JSON object per incident (default)\n- `\u003cout\u003e/incidents/`: per-incident JSON files when using `--incidents-format split`\n- `\u003cout\u003e/segments.csv`: per-status timeline segments for Gantt/phase views\n- `\u003cout\u003e/downtime_windows.csv`: downtime windows for incident bar charts\n\nIncident records include optional `impact` and `components` fields when enrichment is enabled.\nService components are sourced as follows:\n\n- **Primary**: the incident page \"affected components\" section (if present).\n- **Fallback**: GLiNER2 schema-driven extraction from the incident title + non-resolved updates, filtered\n  by explicit service aliases to avoid generic matches.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrshu%2Fgithub-statuses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrshu%2Fgithub-statuses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrshu%2Fgithub-statuses/lists"}