https://github.com/ipanalytics/ip-knowledge-layer
Open IP enrichment knowledge layer: CIDR, ASN, cloud, CDN, crawler, Tor, and VPN-adjacent network context with source provenance and confidence.
https://github.com/ipanalytics/ip-knowledge-layer
asn bot-detection cloud-ranges edge-computing fraud-detection ip-enrichment ip-intelligence ip-ranges threat-intelligence tor-relays vpn-detection
Last synced: 5 days ago
JSON representation
Open IP enrichment knowledge layer: CIDR, ASN, cloud, CDN, crawler, Tor, and VPN-adjacent network context with source provenance and confidence.
- Host: GitHub
- URL: https://github.com/ipanalytics/ip-knowledge-layer
- Owner: ipanalytics
- License: other
- Created: 2026-05-20T06:20:28.000Z (29 days ago)
- Default Branch: main
- Last Pushed: 2026-06-06T09:10:19.000Z (12 days ago)
- Last Synced: 2026-06-06T11:09:00.587Z (12 days ago)
- Topics: asn, bot-detection, cloud-ranges, edge-computing, fraud-detection, ip-enrichment, ip-intelligence, ip-ranges, threat-intelligence, tor-relays, vpn-detection
- Language: Python
- Homepage:
- Size: 160 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# IP Knowledge Layer
Open IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and
VPN-adjacent network intelligence.
It also includes satellite-internet prefix intelligence derived from public
operator GeoIP feeds, subnet-to-PoP mappings, BGP evidence, and ownership
signals.
This repository is data-first: the main output is a set of machine-readable files
that can be pulled directly with `curl`, GitHub Actions, SIEM pipelines, WAF
tooling, anti-fraud systems, and internal enrichment jobs.
## Why This Exists
Most public IP repositories publish one narrow list: cloud IPs, Tor IPs, crawler
IPs, or ASN mappings. IP Knowledge Layer combines multiple public and derived
signals into one normalized enrichment layer.
The value is context:
```text
CIDR or ASN -> layer -> provider -> service -> tags -> confidence -> source
```
Instead of only knowing that a prefix exists, consumers can understand whether it
belongs to cloud hosting, CDN edge, GitHub infrastructure, AI crawlers, Tor, or a
satellite internet provider, or a VPN-adjacent ASN signal.
## Current Release
| Metric | Value |
|---|---:|
| Updated | `2026-06-12T20:38:20Z` |
| Release | [data-20260612-203820Z](https://github.com/ipanalytics/IP-Knowledge-Layer/releases/tag/data-20260612-203820Z) |
| Records | 131,835 |
| Prefix records | 131,835 |
| ASN signals | 0 |
| Sources | 12 |
| Collector errors | 1 |
| Layer | Records |
|---|---:|
| `hosting-cloud` | 101,531 |
| `anonymity` | 11,520 |
| `satellite-internet` | 11,468 |
| `crawler-bot` | 7,316 |
| Top Provider | Records |
|---|---:|
| Azure | 75,773 |
| AWS | 16,063 |
| Tor | 11,520 |
| GitHub | 7,476 |
| starlink | 5,625 |
## Download URLs
Replace `main` with another branch if needed.
```bash
BASE="https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current"
curl -fsSL "$BASE/summary.json"
curl -fsSL "$BASE/source-index.json"
curl -fsSL "$BASE/ip-knowledge.jsonl"
curl -fsSL "$BASE/ip-knowledge.csv"
curl -fsSL "$BASE/cloud-prefixes.csv"
curl -fsSL "$BASE/asn-signals.csv"
curl -fsSL "$BASE/cidr-tags.txt"
```
## Which File Should I Use?
| Need | Use this file | Why |
|---|---|---|
| I want the full knowledge layer | `ip-knowledge.jsonl` | Best for pipelines, `jq`, streaming, and preserving nested fields |
| I want Excel/BI/SIEM-friendly data | `ip-knowledge.csv` | Same broad dataset in tabular form |
| I only need cloud/CDN/developer platform ranges | `cloud-prefixes.csv` | Smaller and focused on AWS, Azure, GCP, Cloudflare, Fastly, GitHub, Oracle |
| I need quick CIDR-to-tags lookup | `cidr-tags.txt` | Lightweight text file: one CIDR plus comma-separated tags per line |
| I care about VPN-heavy/provider ASN signals | `asn-signals.csv` | ASN-level aggregate evidence, without raw VPN IP publication |
| I need to check source health and counts | `summary.json` | Current run status, layer counts, provider/source aggregates |
| I need source provenance | `source-index.json` | Source URLs, source types, and record counts |
For most users:
```text
Start with cloud-prefixes.csv if you only need cloud/datacenter/CDN ranges.
Start with ip-knowledge.jsonl if you want the full enrichment layer.
Start with cidr-tags.txt if you want the simplest possible feed.
```
## Files
| File | Purpose | Approx size |
|---|---:|---:|
| `data/current/summary.json` | Current build summary, counts, layer/provider/source aggregates | 8 KB |
| `data/current/source-index.json` | Source metadata, URLs, source types, record counts | 3 KB |
| `data/current/ip-knowledge.jsonl` | Full normalized knowledge layer, one JSON record per line | 49 MB |
| `data/current/ip-knowledge.csv` | Full normalized knowledge layer as CSV | 25 MB |
| `data/current/cloud-prefixes.csv` | Official cloud/CDN/developer-platform prefixes only | 22 MB |
| `data/current/asn-signals.csv` | ASN-level VPN-adjacent aggregate signals | 399 KB |
| `data/current/cidr-tags.txt` | Simple `CIDR tags` text file for lightweight consumers | 4.7 MB |
| `data/history/summary.csv` | Build history | small |
| `data/snapshots/*.json` | Compact summary snapshots, not full data copies | small |
## Layers
### `hosting-cloud`
Official cloud, CDN, edge, and developer-platform IP ranges.
Current providers:
- AWS
- Azure
- Google Cloud
- Google public infrastructure
- Cloudflare
- Fastly
- GitHub
- Oracle Cloud
### `crawler-bot`
Crawler, AI bot, monitoring probe, scanner, SEO bot, and social preview ranges
derived from [CrawlerScope](https://github.com/ipanalytics/CrawlerScope).
### `anonymity`
Tor relay host routes derived from [Tor-Radar](https://github.com/ipanalytics/Tor-Radar).
### `satellite-internet`
Satellite internet and satellite service provider prefixes derived from
[Sat-geoip](https://github.com/ipanalytics/Sat-geoip). Records preserve operator,
orbit class, BGP state, GeoIP semantics, PoP assignment, and confidence evidence
in JSONL `metrics`.
### `asn-signal`
ASN-level VPN-adjacent aggregate signals from provider analysis. This layer does
not publish raw VPN IP lists. It only publishes aggregate provider-to-ASN evidence.
## Source Inventory
Official/public sources:
- AWS IP ranges: `https://ip-ranges.amazonaws.com/ip-ranges.json`
- Azure Service Tags: `https://www.microsoft.com/en-us/download/details.aspx?id=56519`
- Google Cloud ranges: `https://www.gstatic.com/ipranges/cloud.json`
- Google public ranges: `https://www.gstatic.com/ipranges/goog.json`
- Cloudflare ranges: `https://www.cloudflare.com/ips-v4`, `https://www.cloudflare.com/ips-v6`
- Fastly public IP list: `https://api.fastly.com/public-ip-list`
- GitHub Meta API: `https://api.github.com/meta`
- Oracle Cloud ranges: `https://docs.oracle.com/en-us/iaas/tools/public_ip_ranges.json`
Derived project sources:
- CrawlerScope: crawler, AI bot, monitoring, scanner, and SEO bot ranges
- Tor-Radar: Tor relay and exit IPs
- Sat-geoip: satellite internet prefixes, operator attribution, BGP/PoP/GeoIP evidence
- VPN provider ASN summary: aggregate ASN signals, no raw VPN IP feed
## Record Shape
Example `hosting-cloud` JSONL record:
```json
{"prefix":"104.16.0.0/13","layer":"hosting-cloud","provider":"Cloudflare","service":"edge","tags":["cdn","edge","proxy"],"confidence":0.99,"source_id":"cloudflare-v4"}
```
Example `crawler-bot` JSONL record:
```json
{"prefix":"66.249.64.0/19","layer":"crawler-bot","provider":"Google","service":"Google common crawlers","tags":["bot","crawler","search"],"confidence":0.95,"source_id":"crawler-scope"}
```
Example `anonymity` JSONL record:
```json
{"prefix":"185.220.101.1/32","layer":"anonymity","provider":"Tor","service":"exit","tags":["anonymity-network","tor","tor-exit"],"confidence":0.98,"source_id":"tor-radar"}
```
Example `satellite-internet` JSONL record:
```json
{"prefix":"143.105.187.0/24","layer":"satellite-internet","provider":"starlink","service":"satellite_internet","tags":["satellite","satellite-internet","leo","bgp_announced"],"confidence":0.985,"source_id":"sat-geoip"}
```
Example `asn-signal` JSONL record:
```json
{"layer":"asn-signal","provider":"NordVPN","asn":9009,"asn_name":"M247","tags":["asn-signal","vpn-adjacent"],"confidence":0.7}
```
## Usage Examples
Get current build stats:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/summary.json | jq .
```
Download cloud prefixes:
```bash
curl -fsSLO https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv
```
Extract Cloudflare rows:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv \
| awk -F, '$3 == "Cloudflare" { print }'
```
Extract Tor exits from JSONL:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
| jq -r 'select(.layer=="anonymity" and .service=="exit") | .prefix'
```
Extract AI crawler prefixes:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \
| jq -r 'select(.layer=="crawler-bot" and (.tags | index("ai-crawler"))) | .prefix'
```
Use as a lightweight block/allow enrichment feed:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cidr-tags.txt \
| grep 'cloud'
```
Find all ASN signals for a provider:
```bash
curl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/asn-signals.csv \
| awk -F, '$3 == "NordVPN" { print }'
```
## What It Can Help With
- IP enrichment for fraud/risk systems
- WAF and SIEM context
- Cloud/datacenter detection
- CDN/edge infrastructure classification
- AI crawler and bot visibility
- Tor relay context
- ASN-level VPN-adjacent signals
- Source provenance for explainable decisions
- Building internal allowlists, denylists, and review queues
This project is not a malware or abuse blacklist. It provides operational
network context with source provenance and confidence.
## Local Update
```bash
python3 scripts/update.py
```
The collector prefers local sibling project outputs when present:
```text
../crawler-scope/data/current/crawlers.json
../tor-radar/data/current/network.json
../release/analysis/data/provider_asn.csv
```
When those files are not present, it pulls the public raw GitHub project outputs
where possible.
## GitHub Actions
The workflow runs every 6 hours and commits updated files under `data/`.
```text
.github/workflows/ip-knowledge-layer.yml
```
The workflow intentionally stores full data only in `data/current/*`. Historical
snapshots are compact summaries to avoid repository bloat.
## Planned Improvements
Planned additions inspired by projects such as `ipverse`:
- `asn-knowledge.csv`: ASN-level rollup with tags, cloud presence, Tor presence,
crawler presence, VPN-adjacent evidence, and confidence.
- `asn-prefixes.csv.gz`: compressed bulk ASN-to-prefix layer, kept separate from
`ip-knowledge.jsonl` to avoid making the main file too large.
- `provider-index.json`: normalized provider metadata and aliases.
- `overlap-summary.csv`: overlap between cloud/CDN, crawler, Tor, and
VPN-adjacent ASN signals.
- `diff/current.json`: added/removed prefix summary between runs.
The intent is not to clone `ipverse`. The goal is to build a higher-level
knowledge layer with source provenance, tags, and confidence.
## Notes
- The project avoids full IPv4 expansion.
- The project avoids mass RDAP/whois lookups in GitHub Actions.
- `vpn-adjacent` signals are aggregate ASN-level indicators, not a raw VPN IP
dump.
- Confidence is source-level confidence, not a claim that traffic from a network
is malicious.
- Some official providers publish overlapping service rows for the same prefix.
Those rows are preserved because service labels carry useful context.
## License
CC0-1.0. See `LICENSE`.