https://github.com/vdutts7/speedtests
https://github.com/vdutts7/speedtests
crawl data isp latency ookla speedtests
Last synced: about 6 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/vdutts7/speedtests
- Owner: vdutts7
- Created: 2026-06-27T09:15:29.000Z (4 days ago)
- Default Branch: main
- Last Pushed: 2026-06-27T10:08:15.000Z (4 days ago)
- Last Synced: 2026-06-27T11:19:55.072Z (4 days ago)
- Topics: crawl, data, isp, latency, ookla, speedtests
- Language: HTML
- Homepage: https://vdutts7.github.io/speedtests/
- Size: 354 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
speedtests
crawl public speedtest result pages → ISP intelligence map
https://vdutts7.github.io/speedtests
---

## Issue
result IDs opaque; no bulk export API
❌ one-off lookups: can't build ISP/regional aggregates from manual page visits
❌ third-party datasets: stale, licensed, or missing server/latency fields you need
❌ naive crawl: 403/rate limits on sequential IDs; gaps without checkpoint/resume
## Realization
public [speedtest.net](https://www.speedtest.net) result pages embed `window.OOKLA.INIT_DATA` JSON with **no auth**
## How it works
anchor from your own run:
1. open [speedtest.net](https://www.speedtest.net), run a test
2. result URL in the bar- trailing digits are the ID (`/result/19360616699`)
3. pass that ID to `sweep.py --start`
4. sweep decrements by 1 each fetch (`19360616699`, `19360616698`, …)
5. each `/result/{id}` hit parses into `data/*.jsonl`
```text
+-------------+ +----------+ +--------+ +----------+
| your result | --> | sweep.py | --> | data/ | --> | query.py |
| ID | | decrement| | jsonl | | |
+-------------+ +----------+ +--------+ +----------+
```
## Setup
```bash
python3 --version # stdlib only; sweep shells out to /usr/bin/curl
```
## Run
```bash
# --start = ID from your own speedtest.net result URL
OUTPUT=data/ookla_results.jsonl python3 sweep.py --start 19360616699 --count 50000
```
```bash
OUTPUT=data/ookla_results.jsonl python3 sweep.py --start 19360616699 --count 50000 --resume
```
```bash
python3 query.py --input data/ookla_results.jsonl --isp Airtel
python3 query.py --input data/ookla_results.jsonl --top 20
python3 query.py --input data/ookla_results.jsonl --csv > airtel_export.csv
```
## Gotchas
| symptom | fix | stability | why |
|---|---|---|---|
| 403 bursts | lower `RATE_S` or `--rate` | intermittent | speedtest.net edge throttle |
| sparse hit rate | re-run your own test; use fresher `--start` ID | stable | not every decremented ID exists |
## Tools Used

## Contact