https://github.com/mrshu/github-statuses

The "Missing GitHub Status Page" -- a Flat Data attempt at historically documenting GitHub statuses
https://github.com/mrshu/github-statuses

data-extraction flat-data github ner open-data status status-page uptime

Last synced: 3 months ago
JSON representation

The "Missing GitHub Status Page" -- a Flat Data attempt at historically documenting GitHub statuses

Host: GitHub
URL: https://github.com/mrshu/github-statuses
Owner: mrshu
License: mit
Created: 2022-06-11T21:23:19.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2026-04-03T05:49:14.000Z (3 months ago)
Last Synced: 2026-04-03T13:31:26.103Z (3 months ago)
Topics: data-extraction, flat-data, github, ner, open-data, status, status-page, uptime
Language: HTML
Homepage: https://mrshu.github.io/github-statuses/
Size: 2.33 MB
Stars: 141
Watchers: 1
Forks: 10
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome - mrshu/github-statuses - The "Missing GitHub Status Page" -- a Flat Data attempt at historically documenting GitHub statuses (HTML)

README

          # github-statuses

A Flat Data attempt at historically documenting GitHub statuses.

## About

This project builds the **"missing GitHub status page"**: a historical mirror that shows actual uptime

percentages and incidents across the entire platform, plus per-service uptime based on the incident data.

It reconstructs timelines from the Atom feed history and turns them into structured outputs and a static site.

## Mentioned in

- [GitHub seems to be struggling with three nines availability](https://www.theregister.com/2026/02/10/github_outages/)

  (The Register, 2026-02-10), which cites this project's reconstructed "missing GitHub status page" as

  an unofficial historical source.

- ["Github might be in trouble"](https://www.youtube.com/watch?v=f3u57jkwBFE)

  (The PrimeTime on YouTube, 2026-03-09), which later covered this project in a YouTube video.

- ["AI Has Broken the Internet"](https://www.youtube.com/watch?v=44JBZwAsfJI&t=78s)

  (ForrestKnight on YouTube, 2026-03-30), which referenced this project's 90-day uptime and

  incident counts while discussing GitHub's recent outage pattern.

## Prerequisites

### uv

Install [uv](https://github.com/astral-sh/uv) per the [installation docs](https://docs.astral.sh/uv/getting-started/installation/).

### Python (for running the extractor)

The extractor depends on `onnxruntime`, which supports **Python 3.11–3.13** (not 3.14+).

Check your version: `python --version` or `python3 --version`.

If you need a compatible version, uv can install it:

```bash

uv python install 3.13

```

Then create the virtual environment with that Python:

```bash

uv venv --python 3.13

uv sync

```

Or use `uv sync` with an existing venv. If `uv sync` fails with an `onnxruntime` error, your Python is

likely 3.14+; switch to 3.11–3.13 as above.

### Dependencies

Run `uv sync` before using the extractor scripts. **To view the static site only**, no dependencies are

needed—the `parsed/` directory in the repo contains pre-generated data; just serve the repo root with

any HTTP server.

## Quick start (uv)

```

uv venv

uv sync

```

Run the extractor across all history:

```

uv run python scripts/extract_incidents.py --out out

```

Run the extractor for the last year (UTC example):

```

uv run python scripts/extract_incidents.py --out out_last_year --since 2025-01-03 --until 2026-01-03

```

Use JSONL (default) or split per-incident outputs:

```

uv run python scripts/extract_incidents.py --out out --incidents-format jsonl

uv run python scripts/extract_incidents.py --out out --incidents-format split

```

Enrich incidents with impact level by scraping the incident pages (cached):

```

uv run python scripts/extract_incidents.py --out out --enrich-impact

```

Infer missing components with GLiNER2 (used only when the incident page lacks "affected components"):

```

uv run python scripts/extract_incidents.py --out out --infer-components gliner2

```

## Automation

After each Flat data update, a GitHub Action runs the parser and commits outputs to `parsed/`.

Run tests:

```

uv run python -m unittest discover -s tests

```

## Status site

Static site lives in `site/` and reads data from `parsed/`.

No build step—serve the repo root with any static HTTP server:

```bash

python -m http.server 8000

# or: python3 -m http.server 8000

```

Then open  in your browser.

## GLiNER2 component inference

Some incident pages do not list "affected components". In those cases we use GLiNER2 as a fallback:

- Input text: incident title + non-Resolved updates.

- Labels: the 10 GitHub services with short descriptions.

- Thresholded inference (default: 0.75 confidence).

- Final filter: the label must also appear via explicit service aliases in the text.

This keeps HTML tags as the source of truth and uses ML only to fill gaps.

### GLiNER2 experiment (evaluation + audit)

To validate the fallback approach, an experiment is run that produces:

- **Audit**: every GLiNER2-tagged incident with text evidence snippets.

- **Evaluation**: GLiNER2 predictions compared against incidents that *do* have HTML "affected components".

Reproduce the experiment at a fixed time point (numbers will change as new data arrives):

```

uv run python scripts/run_gliner_experiment.py --as-of 2026-01-08 --output-dir tagging-experiment

```

Outputs are written to:

- `tagging-experiment/gliner2_audit.jsonl` (tagged incidents + evidence snippets)

- `tagging-experiment/gliner2_eval.json` (metrics, per-label breakdown, sample mismatches)

- `tagging-experiment/error_analysis.md` (diff-style table of sample errors)

Latest results (as-of 2026-01-08, threshold 0.75, alias filter on, non-Resolved text only):

| Metric | Value |

|---|---:|

| Evaluated incidents | 447 |

| Predicted non-empty | 419 |

| Precision | 0.950 |

| Recall | 0.883 |

| Exact match rate | 0.785 |

| Audit count (missing-tag incidents) | 51 |

Per-label precision/recall (top-level service components):

| Label | Precision | Recall | TP | FP | FN |

|---|---:|---:|---:|---:|---:|

| Git Operations | 0.968 | 0.909 | 60 | 2 | 6 |

| Webhooks | 0.938 | 0.918 | 45 | 3 | 4 |

| API Requests | 0.915 | 0.915 | 54 | 5 | 5 |

| Issues | 1.000 | 0.286 | 22 | 0 | 55 |

| Pull Requests | 0.948 | 0.979 | 92 | 5 | 2 |

| Actions | 0.958 | 0.947 | 161 | 7 | 9 |

| Packages | 0.917 | 0.971 | 33 | 3 | 1 |

| Pages | 0.855 | 0.964 | 53 | 9 | 2 |

| Codespaces | 0.982 | 0.982 | 110 | 2 | 2 |

| Copilot | 1.000 | 0.967 | 58 | 0 | 2 |

Summary: the fallback is high-precision and mostly conservative. Most errors are **false negatives**

(missing a true component), while false positives are typically "extra" components inferred from

multi-service incident titles.

Sample errors (diffs highlighted with `+` for extra and `-` for missing):

| Type | Incident | Predicted | Truth |

|---|---|---|---|

| false_positive | [Incident with GitHub Actions and Codespaces](https://www.githubstatus.com/incidents/vxvyrmy9w1vp) | Actions, `+Codespaces` | Actions |

| false_positive | [Incident with GitHub Packages and GitHub Pages](https://www.githubstatus.com/incidents/b40k7ckrs7sp) | Packages, `+Pages` | Packages |

| false_positive | [Incident with Pull Requests and Webhooks](https://www.githubstatus.com/incidents/d1stj7xcn4fy) | Pull Requests, `+Webhooks` | Pull Requests |

| false_negative | [Incident on 2022-09-06 22:05 UTC](https://www.githubstatus.com/incidents/cwp52gsftl5n) | `none` | `-Git Operations`, `-Visit www`, `-Webhooks` |

| false_negative | [Incident on 2022-09-06 22:56 UTC](https://www.githubstatus.com/incidents/wl09fvhb20x8) | `none` | `-Git Operations`, `-Visit www`, `-Webhooks` |

| false_negative | [Incident with API Requests](https://www.githubstatus.com/incidents/4vr2qz8lgdmq) | `none` | `-API Requests` |

## Outputs

Outputs are written to the directory passed to `--out` (local examples use `out/`).

The GLiNER2 experiment writes to `tagging-experiment/` by default.

The automation workflow writes to `parsed/`.

- `/incidents.json`: merged incident timeline records

- `/incidents.jsonl`: one JSON object per incident (default)

- `/incidents/`: per-incident JSON files when using `--incidents-format split`

- `/segments.csv`: per-status timeline segments for Gantt/phase views

- `/downtime_windows.csv`: downtime windows for incident bar charts

Incident records include optional `impact` and `components` fields when enrichment is enabled.

Service components are sourced as follows:

- **Primary**: the incident page "affected components" section (if present).

- **Fallback**: GLiNER2 schema-driven extraction from the incident title + non-resolved updates, filtered

  by explicit service aliases to avoid generic matches.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mrshu/github-statuses

Awesome Lists containing this project

README