{"id":50181497,"url":"https://github.com/ipanalytics/ip-knowledge-layer","last_synced_at":"2026-06-12T22:01:04.300Z","repository":{"id":359054572,"uuid":"1244295795","full_name":"ipanalytics/IP-Knowledge-Layer","owner":"ipanalytics","description":"Open IP enrichment knowledge layer: CIDR, ASN, cloud, CDN, crawler, Tor, and VPN-adjacent network context with source provenance and confidence.","archived":false,"fork":false,"pushed_at":"2026-06-06T09:10:19.000Z","size":167452,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-06T11:09:00.587Z","etag":null,"topics":["asn","bot-detection","cloud-ranges","edge-computing","fraud-detection","ip-enrichment","ip-intelligence","ip-ranges","threat-intelligence","tor-relays","vpn-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ipanalytics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-20T06:20:28.000Z","updated_at":"2026-06-06T09:10:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ipanalytics/IP-Knowledge-Layer","commit_stats":null,"previous_names":["ipanalytics/ip-knowledge-layer"],"tags_count":55,"template":false,"template_full_name":null,"purl":"pkg:github/ipanalytics/IP-Knowledge-Layer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FIP-Knowledge-Layer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FIP-Knowledge-Layer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FIP-Knowledge-Layer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FIP-Knowledge-Layer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ipanalytics","download_url":"https://codeload.github.com/ipanalytics/IP-Knowledge-Layer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipanalytics%2FIP-Knowledge-Layer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34263874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asn","bot-detection","cloud-ranges","edge-computing","fraud-detection","ip-enrichment","ip-intelligence","ip-ranges","threat-intelligence","tor-relays","vpn-detection"],"created_at":"2026-05-25T07:00:29.257Z","updated_at":"2026-06-12T22:01:04.291Z","avatar_url":"https://github.com/ipanalytics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IP Knowledge Layer\n\nOpen IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and\nVPN-adjacent network intelligence.\nIt also includes satellite-internet prefix intelligence derived from public\noperator GeoIP feeds, subnet-to-PoP mappings, BGP evidence, and ownership\nsignals.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./site/banner.png\" alt=\"Open IP enrichment knowledge layer for CIDR, ASN, cloud, crawler, Tor, and\nVPN-adjacent network intelligence.\" width=\"100%\"\u003e\n\u003c/p\u003e\n\nThis repository is data-first: the main output is a set of machine-readable files\nthat can be pulled directly with `curl`, GitHub Actions, SIEM pipelines, WAF\ntooling, anti-fraud systems, and internal enrichment jobs.\n\n\n\n## Why This Exists\n\nMost public IP repositories publish one narrow list: cloud IPs, Tor IPs, crawler\nIPs, or ASN mappings. IP Knowledge Layer combines multiple public and derived\nsignals into one normalized enrichment layer.\n\nThe value is context:\n\n```text\nCIDR or ASN -\u003e layer -\u003e provider -\u003e service -\u003e tags -\u003e confidence -\u003e source\n```\n\nInstead of only knowing that a prefix exists, consumers can understand whether it\nbelongs to cloud hosting, CDN edge, GitHub infrastructure, AI crawlers, Tor, or a\nsatellite internet provider, or a VPN-adjacent ASN signal.\n\n## Current Release\n\n\u003c!-- IPKL_SUMMARY_START --\u003e\n| Metric | Value |\n|---|---:|\n| Updated | `2026-06-12T20:38:20Z` |\n| Release | [data-20260612-203820Z](https://github.com/ipanalytics/IP-Knowledge-Layer/releases/tag/data-20260612-203820Z) |\n| Records | 131,835 |\n| Prefix records | 131,835 |\n| ASN signals | 0 |\n| Sources | 12 |\n| Collector errors | 1 |\n\n| Layer | Records |\n|---|---:|\n| `hosting-cloud` | 101,531 |\n| `anonymity` | 11,520 |\n| `satellite-internet` | 11,468 |\n| `crawler-bot` | 7,316 |\n\n| Top Provider | Records |\n|---|---:|\n| Azure | 75,773 |\n| AWS | 16,063 |\n| Tor | 11,520 |\n| GitHub | 7,476 |\n| starlink | 5,625 |\n\u003c!-- IPKL_SUMMARY_END --\u003e\n\n## Download URLs\n\nReplace `main` with another branch if needed.\n\n```bash\nBASE=\"https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current\"\n\ncurl -fsSL \"$BASE/summary.json\"\ncurl -fsSL \"$BASE/source-index.json\"\ncurl -fsSL \"$BASE/ip-knowledge.jsonl\"\ncurl -fsSL \"$BASE/ip-knowledge.csv\"\ncurl -fsSL \"$BASE/cloud-prefixes.csv\"\ncurl -fsSL \"$BASE/asn-signals.csv\"\ncurl -fsSL \"$BASE/cidr-tags.txt\"\n```\n\n## Which File Should I Use?\n\n| Need | Use this file | Why |\n|---|---|---|\n| I want the full knowledge layer | `ip-knowledge.jsonl` | Best for pipelines, `jq`, streaming, and preserving nested fields |\n| I want Excel/BI/SIEM-friendly data | `ip-knowledge.csv` | Same broad dataset in tabular form |\n| I only need cloud/CDN/developer platform ranges | `cloud-prefixes.csv` | Smaller and focused on AWS, Azure, GCP, Cloudflare, Fastly, GitHub, Oracle |\n| I need quick CIDR-to-tags lookup | `cidr-tags.txt` | Lightweight text file: one CIDR plus comma-separated tags per line |\n| I care about VPN-heavy/provider ASN signals | `asn-signals.csv` | ASN-level aggregate evidence, without raw VPN IP publication |\n| I need to check source health and counts | `summary.json` | Current run status, layer counts, provider/source aggregates |\n| I need source provenance | `source-index.json` | Source URLs, source types, and record counts |\n\nFor most users:\n\n```text\nStart with cloud-prefixes.csv if you only need cloud/datacenter/CDN ranges.\nStart with ip-knowledge.jsonl if you want the full enrichment layer.\nStart with cidr-tags.txt if you want the simplest possible feed.\n```\n\n## Files\n\n| File | Purpose | Approx size |\n|---|---:|---:|\n| `data/current/summary.json` | Current build summary, counts, layer/provider/source aggregates | 8 KB |\n| `data/current/source-index.json` | Source metadata, URLs, source types, record counts | 3 KB |\n| `data/current/ip-knowledge.jsonl` | Full normalized knowledge layer, one JSON record per line | 49 MB |\n| `data/current/ip-knowledge.csv` | Full normalized knowledge layer as CSV | 25 MB |\n| `data/current/cloud-prefixes.csv` | Official cloud/CDN/developer-platform prefixes only | 22 MB |\n| `data/current/asn-signals.csv` | ASN-level VPN-adjacent aggregate signals | 399 KB |\n| `data/current/cidr-tags.txt` | Simple `CIDR tags` text file for lightweight consumers | 4.7 MB |\n| `data/history/summary.csv` | Build history | small |\n| `data/snapshots/*.json` | Compact summary snapshots, not full data copies | small |\n\n## Layers\n\n### `hosting-cloud`\n\nOfficial cloud, CDN, edge, and developer-platform IP ranges.\n\nCurrent providers:\n\n- AWS\n- Azure\n- Google Cloud\n- Google public infrastructure\n- Cloudflare\n- Fastly\n- GitHub\n- Oracle Cloud\n\n### `crawler-bot`\n\nCrawler, AI bot, monitoring probe, scanner, SEO bot, and social preview ranges\nderived from [CrawlerScope](https://github.com/ipanalytics/CrawlerScope).\n\n### `anonymity`\n\nTor relay host routes derived from [Tor-Radar](https://github.com/ipanalytics/Tor-Radar).\n\n### `satellite-internet`\n\nSatellite internet and satellite service provider prefixes derived from\n[Sat-geoip](https://github.com/ipanalytics/Sat-geoip). Records preserve operator,\norbit class, BGP state, GeoIP semantics, PoP assignment, and confidence evidence\nin JSONL `metrics`.\n\n### `asn-signal`\n\nASN-level VPN-adjacent aggregate signals from provider analysis. This layer does\nnot publish raw VPN IP lists. It only publishes aggregate provider-to-ASN evidence.\n\n## Source Inventory\n\nOfficial/public sources:\n\n- AWS IP ranges: `https://ip-ranges.amazonaws.com/ip-ranges.json`\n- Azure Service Tags: `https://www.microsoft.com/en-us/download/details.aspx?id=56519`\n- Google Cloud ranges: `https://www.gstatic.com/ipranges/cloud.json`\n- Google public ranges: `https://www.gstatic.com/ipranges/goog.json`\n- Cloudflare ranges: `https://www.cloudflare.com/ips-v4`, `https://www.cloudflare.com/ips-v6`\n- Fastly public IP list: `https://api.fastly.com/public-ip-list`\n- GitHub Meta API: `https://api.github.com/meta`\n- Oracle Cloud ranges: `https://docs.oracle.com/en-us/iaas/tools/public_ip_ranges.json`\n\nDerived project sources:\n\n- CrawlerScope: crawler, AI bot, monitoring, scanner, and SEO bot ranges\n- Tor-Radar: Tor relay and exit IPs\n- Sat-geoip: satellite internet prefixes, operator attribution, BGP/PoP/GeoIP evidence\n- VPN provider ASN summary: aggregate ASN signals, no raw VPN IP feed\n\n## Record Shape\n\nExample `hosting-cloud` JSONL record:\n\n```json\n{\"prefix\":\"104.16.0.0/13\",\"layer\":\"hosting-cloud\",\"provider\":\"Cloudflare\",\"service\":\"edge\",\"tags\":[\"cdn\",\"edge\",\"proxy\"],\"confidence\":0.99,\"source_id\":\"cloudflare-v4\"}\n```\n\nExample `crawler-bot` JSONL record:\n\n```json\n{\"prefix\":\"66.249.64.0/19\",\"layer\":\"crawler-bot\",\"provider\":\"Google\",\"service\":\"Google common crawlers\",\"tags\":[\"bot\",\"crawler\",\"search\"],\"confidence\":0.95,\"source_id\":\"crawler-scope\"}\n```\n\nExample `anonymity` JSONL record:\n\n```json\n{\"prefix\":\"185.220.101.1/32\",\"layer\":\"anonymity\",\"provider\":\"Tor\",\"service\":\"exit\",\"tags\":[\"anonymity-network\",\"tor\",\"tor-exit\"],\"confidence\":0.98,\"source_id\":\"tor-radar\"}\n```\n\nExample `satellite-internet` JSONL record:\n\n```json\n{\"prefix\":\"143.105.187.0/24\",\"layer\":\"satellite-internet\",\"provider\":\"starlink\",\"service\":\"satellite_internet\",\"tags\":[\"satellite\",\"satellite-internet\",\"leo\",\"bgp_announced\"],\"confidence\":0.985,\"source_id\":\"sat-geoip\"}\n```\n\nExample `asn-signal` JSONL record:\n\n```json\n{\"layer\":\"asn-signal\",\"provider\":\"NordVPN\",\"asn\":9009,\"asn_name\":\"M247\",\"tags\":[\"asn-signal\",\"vpn-adjacent\"],\"confidence\":0.7}\n```\n\n## Usage Examples\n\nGet current build stats:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/summary.json | jq .\n```\n\nDownload cloud prefixes:\n\n```bash\ncurl -fsSLO https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv\n```\n\nExtract Cloudflare rows:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cloud-prefixes.csv \\\n  | awk -F, '$3 == \"Cloudflare\" { print }'\n```\n\nExtract Tor exits from JSONL:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \\\n  | jq -r 'select(.layer==\"anonymity\" and .service==\"exit\") | .prefix'\n```\n\nExtract AI crawler prefixes:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/ip-knowledge.jsonl \\\n  | jq -r 'select(.layer==\"crawler-bot\" and (.tags | index(\"ai-crawler\"))) | .prefix'\n```\n\nUse as a lightweight block/allow enrichment feed:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/cidr-tags.txt \\\n  | grep 'cloud'\n```\n\nFind all ASN signals for a provider:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/ipanalytics/IP-Knowledge-Layer/main/data/current/asn-signals.csv \\\n  | awk -F, '$3 == \"NordVPN\" { print }'\n```\n\n## What It Can Help With\n\n- IP enrichment for fraud/risk systems\n- WAF and SIEM context\n- Cloud/datacenter detection\n- CDN/edge infrastructure classification\n- AI crawler and bot visibility\n- Tor relay context\n- ASN-level VPN-adjacent signals\n- Source provenance for explainable decisions\n- Building internal allowlists, denylists, and review queues\n\nThis project is not a malware or abuse blacklist. It provides operational\nnetwork context with source provenance and confidence.\n\n## Local Update\n\n```bash\npython3 scripts/update.py\n```\n\nThe collector prefers local sibling project outputs when present:\n\n```text\n../crawler-scope/data/current/crawlers.json\n../tor-radar/data/current/network.json\n../release/analysis/data/provider_asn.csv\n```\n\nWhen those files are not present, it pulls the public raw GitHub project outputs\nwhere possible.\n\n## GitHub Actions\n\nThe workflow runs every 6 hours and commits updated files under `data/`.\n\n```text\n.github/workflows/ip-knowledge-layer.yml\n```\n\nThe workflow intentionally stores full data only in `data/current/*`. Historical\nsnapshots are compact summaries to avoid repository bloat.\n\n## Planned Improvements\n\nPlanned additions inspired by projects such as `ipverse`:\n\n- `asn-knowledge.csv`: ASN-level rollup with tags, cloud presence, Tor presence,\n  crawler presence, VPN-adjacent evidence, and confidence.\n- `asn-prefixes.csv.gz`: compressed bulk ASN-to-prefix layer, kept separate from\n  `ip-knowledge.jsonl` to avoid making the main file too large.\n- `provider-index.json`: normalized provider metadata and aliases.\n- `overlap-summary.csv`: overlap between cloud/CDN, crawler, Tor, and\n  VPN-adjacent ASN signals.\n- `diff/current.json`: added/removed prefix summary between runs.\n\nThe intent is not to clone `ipverse`. The goal is to build a higher-level\nknowledge layer with source provenance, tags, and confidence.\n\n## Notes\n\n- The project avoids full IPv4 expansion.\n- The project avoids mass RDAP/whois lookups in GitHub Actions.\n- `vpn-adjacent` signals are aggregate ASN-level indicators, not a raw VPN IP\n  dump.\n- Confidence is source-level confidence, not a claim that traffic from a network\n  is malicious.\n- Some official providers publish overlapping service rows for the same prefix.\n  Those rows are preserved because service labels carry useful context.\n\n## License\n\nCC0-1.0. See `LICENSE`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipanalytics%2Fip-knowledge-layer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fipanalytics%2Fip-knowledge-layer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipanalytics%2Fip-knowledge-layer/lists"}