{"id":30283222,"url":"https://github.com/tn3w/ipset","last_synced_at":"2025-09-12T11:49:47.744Z","repository":{"id":296672282,"uuid":"994108105","full_name":"tn3w/IPSet","owner":"tn3w","description":"A comprehensive IP address categorization and lookup tool that collects addresses from VPN providers, Tor exit nodes, datacenter ASNs, and known proxy lists.","archived":false,"fork":false,"pushed_at":"2025-08-11T02:26:57.000Z","size":1261481,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-11T04:20:56.681Z","etag":null,"topics":["ip-database","ip-lookup","network-security","vpn-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tn3w.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-01T08:13:12.000Z","updated_at":"2025-08-11T02:27:09.000Z","dependencies_parsed_at":"2025-07-14T04:28:03.880Z","dependency_job_id":"ffd64b36-5d06-4179-a571-b53ae2aa0a44","html_url":"https://github.com/tn3w/IPSet","commit_stats":null,"previous_names":["tn3w/ipset"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tn3w/IPSet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tn3w%2FIPSet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tn3w%2FIPSet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tn3w%2FIPSet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tn3w%2FIPSet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tn3w","download_url":"https://codeload.github.com/tn3w/IPSet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tn3w%2FIPSet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270746924,"owners_count":24638449,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ip-database","ip-lookup","network-security","vpn-detection"],"created_at":"2025-08-16T17:33:52.245Z","updated_at":"2025-08-16T17:33:52.819Z","avatar_url":"https://github.com/tn3w.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \n# IPSet\n\n🔒  A comprehensive IP address categorization and lookup tool that collects addresses from VPN providers, Tor exit nodes, datacenter ASNs, and known proxy lists. \n\n![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/tn3w/IPSet/main.yml?label=Build\u0026style=for-the-badge)\n\n### IPInfo Category\n[IPSet](https://github.com/tn3w/IPSet) | [ProtonVPN-IPs](https://github.com/tn3w/ProtonVPN-IPs) | [TunnelBear-IPs](https://github.com/tn3w/TunnelBear-IPs)\n\n\u003c/div\u003e\n\n## 🚀 Key Features\n\n- ✅ Fast IP categorization and lookup in ~70ms using optimized lookup\n- ✅ 1.7M+ IPs and CIDRs contained in the dataset\n- ✅ Sources from 7+ VPN providers (ExpressVPN, Surfshark, ProtonVPN, TunnelBear, Private-Internet-Access, CyberGhost, Mullvad)\n- ✅ Includes Tor exit nodes\n- ✅ Datacenter/hosting ASN identification\n- ✅ Multiple optimized output formats (JSON, text)\n- ✅ Support for both IPv4 and IPv6\n\n## 📊 Data Files\n\nThe repository maintains five regularly updated data files:\n\n1. `ipset.json`: The primary dataset containing mappings from group names to lists of IP addresses and CIDR ranges. Groups include:\n   - Tor exit nodes\n   - VPN providers (ExpressVPN, Surfshark, ProtonVPN, TunnelBear, Private-Internet-Access, CyberGhost, Mullvad)\n   - Awesome-Proxies list\n   - StopForumSpam\n   - Firehol-Level1 (CIDRs)\n   - Firehol-Proxies\n   - Datacenter (CIDRs)\n\n2. `iplookup.json`: An inverse mapping of the data in `ipset.json` for faster O(1) lookups.\n\n3. `iplist.json`: A flat list of all IP addresses without any group information.\n\n4. `iplist.txt`: A text file version of the flat list for easy integration with other tools.\n\n5. `datacenter_asns.json`: A list of Autonomous System Numbers (ASNs) associated with datacenters and hosting providers.\n\n## 🛠️ Usage\n\n### Installing and Running\n\nInstall the dependencies:\n```bash\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n```\n\nRun the following command to generate the JSON files:\n\n```bash\npython main.py\n```\n\n### Creating the Optimized Lookup Structure\n\nThe `iplookup.json` file is automatically created when running `main.py`. This creates an inverse mapping of the data in `ipset.json` for faster O(1) lookups.\n\n```python\ndef create_ip_lookup_file(group_to_ips: Dict[str, List[str]]) -\u003e None:\n    \"\"\"\n    Create a more efficient lookup structure where keys are IPs and values are lists of groups.\n    This makes it easy to determine which groups an IP belongs to.\n    \"\"\"\n    print(\"Creating IP lookup file...\")\n    ip_to_groups: Dict[str, List[str]] = {}\n\n    for group, ips in group_to_ips.items():\n        for ip in ips:\n            if ip not in ip_to_groups:\n                ip_to_groups[ip] = []\n            ip_to_groups[ip].append(group)\n\n    with open(LOOKUP_FILE, \"w\", encoding=\"utf-8\") as json_file:\n        json.dump(ip_to_groups, json_file)\n\n    print(f\"Successfully created {LOOKUP_FILE} with {len(ip_to_groups)} unique IPs\")\n```\n\n### Searching for IP Group Membership\n\nYou can use the following functions to check which groups an IP belongs to:\n\n```python\nimport json\nfrom typing import List\nfrom netaddr import IPAddress, IPNetwork\n\ndef search_ip_in_ipset(ip: str, ipset_file: str = \"ipset.json\") -\u003e List[str]:\n    \"\"\"\n    Search for an IP address in the ipset.json file and return all groups it belongs to.\n    This is slower as it has to iterate through all groups and their IP lists.\n    Supports both direct IP matches and CIDR range matches.\n\n    Args:\n        ip: The IP address to search for\n        ipset_file: Path to the ipset.json file\n\n    Returns:\n        List of group names that contain the IP address\n    \"\"\"\n    try:\n        with open(ipset_file, \"r\", encoding=\"utf-8\") as f:\n            group_to_ips = json.load(f)\n\n        ip_obj = IPAddress(ip)\n        ip_version = ip_obj.version\n        matching_groups = []\n        for group, ips in group_to_ips.items():\n            for ip_or_cidr in ips:\n                if \"/\" in ip_or_cidr:\n                    cidr = IPNetwork(ip_or_cidr)\n                    if cidr.version != ip_version:\n                        continue\n\n                    ip_int = int(ip_obj)\n                    net_int = int(cidr.network)\n                    prefix_len = cidr.prefixlen\n\n                    if ip_version == 4:\n                        mask = ((1 \u003c\u003c 32) - 1) ^ ((1 \u003c\u003c (32 - prefix_len)) - 1)\n                    else:\n                        mask = ((1 \u003c\u003c 128) - 1) ^ ((1 \u003c\u003c (128 - prefix_len)) - 1)\n\n                    if (ip_int \u0026 mask) != (net_int \u0026 mask):\n                        continue\n\n                    if ip_obj in cidr:\n                        matching_groups.append(group)\n                        break\n                elif ip == ip_or_cidr:\n                    matching_groups.append(group)\n                    break\n\n        return matching_groups\n    except Exception as e:\n        print(f\"Error searching for IP in ipset.json: {e}\")\n        return []\n\n\ndef search_ip_in_lookup(ip: str, lookup_file: str = \"iplookup.json\") -\u003e List[str]:\n    \"\"\"\n    Search for an IP address in the iplookup.json file and return all groups it belongs to.\n    This checks for direct IP matches and also if the IP is contained within any CIDR ranges.\n\n    Args:\n        ip: The IP address to search for\n        lookup_file: Path to the iplookup.json file\n\n    Returns:\n        List of group names that contain the IP address\n    \"\"\"\n    try:\n        with open(lookup_file, \"r\", encoding=\"utf-8\") as f:\n            ip_to_groups = json.load(f)\n\n        matching_groups = ip_to_groups.get(ip, [])\n\n        ip_obj = IPAddress(ip)\n        ip_version = ip_obj.version\n        for ip_or_cidr, groups in ip_to_groups.items():\n            if \"/\" in ip_or_cidr:\n                cidr = IPNetwork(ip_or_cidr)\n                if cidr.version != ip_version:\n                    continue\n\n                ip_int = int(ip_obj)\n                net_int = int(cidr.network)\n                prefix_len = cidr.prefixlen\n\n                if ip_version == 4:\n                    mask = ((1 \u003c\u003c 32) - 1) ^ ((1 \u003c\u003c (32 - prefix_len)) - 1)\n                else:\n                    mask = ((1 \u003c\u003c 128) - 1) ^ ((1 \u003c\u003c (128 - prefix_len)) - 1)\n\n                if (ip_int \u0026 mask) != (net_int \u0026 mask):\n                    continue\n\n                if ip_obj in cidr:\n                    for group in groups:\n                        if group not in matching_groups:\n                            matching_groups.append(group)\n\n        return matching_groups\n    except Exception as e:\n        print(f\"Error searching for IP in iplookup.json: {e}\")\n        return []\n```\n\n### Searching for IP Group Membership (Optimized)\n\nThis is a more optimized way to search for IP group membership. It uses a dictionary of IP addresses and their groups, and a dictionary of CIDR ranges and their groups. It loads the data into memory and then uses the dictionary to search for the IP address.\n\nOutput:\n```\nIP: 8.8.4.4\n  Time taken: 0.06616806983947754 seconds\n  Groups: ['Datacenter']\n\nIP: 185.220.101.33\n  Time taken: 0.06744527816772461 seconds\n  Groups: ['TorExitNodes', 'StopForumSpam']\n\nIP: 76.240.243.24\n  Time taken: 0.06687688827514648 seconds\n  Groups: None\n```\n\nExample:\n```python\nimport json\nimport time\nfrom typing import List, Dict, Tuple\nfrom netaddr import IPAddress, IPNetwork\n\n\ndef load_ip_file(\n    lookup_file: str = \"ipset.json\",\n) -\u003e Tuple[Dict[str, List[str]], Dict[IPNetwork, List[str]]]:\n    \"\"\"Load the lookup IP file into a dictionary.\"\"\"\n    try:\n        with open(lookup_file, \"r\", encoding=\"utf-8\") as f:\n            data = json.load(f)\n\n            ip_to_groups: Dict[str, List[str]] = {}\n            cidrs_to_ips: Dict[IPNetwork, List[str]] = {}\n\n            for group, ips in data.items():\n                for ip in ips:\n                    if \"/\" in ip:\n                        ip_obj = IPNetwork(ip)\n                        if ip_obj not in cidrs_to_ips:\n                            cidrs_to_ips[ip_obj] = []\n                        cidrs_to_ips[ip_obj].append(group)\n                        continue\n\n                    if ip not in ip_to_groups:\n                        ip_to_groups[ip] = []\n                    ip_to_groups[ip].append(group)\n\n            return ip_to_groups, cidrs_to_ips\n    except Exception as e:\n        print(f\"Error loading lookup IP file: {e}\")\n        return {}, {}\n\n\ndef search_ip_in_lookup(\n    ip: str, ips: Dict[str, List[str]], cidrs: Dict[IPNetwork, List[str]]\n) -\u003e List[str]:\n    \"\"\"\n    Search for an IP address in the iplookup.json file and return all groups it belongs to.\n    This checks for direct IP matches and also if the IP is contained within any CIDR ranges.\n\n    Args:\n        ip: The IP address to search for\n        ips: The dictionary of IP addresses and their groups\n\n    Returns:\n        List of group names that contain the IP address\n    \"\"\"\n    try:\n        matching_groups = ips.get(ip, [])\n\n        ip_obj = IPAddress(ip)\n        ip_version = ip_obj.version\n\n        for cidr, groups in cidrs.items():\n            if cidr.version != ip_version:\n                continue\n\n            ip_int = int(ip_obj)\n            net_int = int(cidr.network)\n            prefix_len = cidr.prefixlen\n\n            if ip_version == 4:\n                mask = ((1 \u003c\u003c 32) - 1) ^ ((1 \u003c\u003c (32 - prefix_len)) - 1)\n            else:\n                mask = ((1 \u003c\u003c 128) - 1) ^ ((1 \u003c\u003c (128 - prefix_len)) - 1)\n\n            if (ip_int \u0026 mask) != (net_int \u0026 mask):\n                continue\n\n            if ip_obj in cidr:\n                for group in groups:\n                    if group not in matching_groups:\n                        matching_groups.append(group)\n\n        return matching_groups\n    except Exception as e:\n        print(f\"Error searching for IP in iplookup.json: {e}\")\n        return []\n\n\nif __name__ == \"__main__\":\n    ips, cidrs = load_ip_file()\n    for ip in [\n        \"8.8.4.4\",  # Google DNS (datacenter)\n        \"185.220.101.33\",  # Known Tor exit node\n        \"76.240.243.24\",  # Typical residential IP\n    ]:\n        print(f\"IP: {ip}\")\n        start_time = time.time()\n        groups = search_ip_in_lookup(ip, ips, cidrs)\n        end_time = time.time()\n        print(f\"  Time taken: {end_time - start_time} seconds\")\n        print(f\"  Groups: {groups if groups else 'None'}\\n\")\n```\n\n### Working with Datacenter ASNs\n\n\u003e [!NOTE]\n\u003e This can be deprecated since the datacenter CIDRs are already in the `ipset.json` file.\n\u003e This is only useful if you want to check if an ASN belongs to a datacenter.\n\u003e For normal IP lookups, you should use the `ipset.json` or `iplookup.json` files.\n\nThe `datacenter_asns.json` file contains a list of ASNs (Autonomous System Numbers) associated with datacenter and hosting providers. You can use this list to identify traffic coming from non-residential sources.\n\nHere's an example of how to efficiently check if an ASN belongs to a datacenter:\n\n```python\nimport json\n\ndef load_datacenter_asns(asn_file: str = \"datacenter_asns.json\") -\u003e set:\n    \"\"\"Load datacenter ASNs into a set for O(1) lookups.\"\"\"\n    try:\n        with open(asn_file) as f:\n            return set(json.load(f))\n    except Exception as e:\n        print(f\"Error loading ASNs: {e}\")\n        return set()\n\ndef is_datacenter_asn(asn: str, asns: set = None) -\u003e bool:\n    \"\"\"Check if ASN belongs to a datacenter.\"\"\"\n    if not asns:\n        asns = load_datacenter_asns()\n    return asn.replace(\"AS\", \"\") in asns\n\nif __name__ == \"__main__\":\n    asns = load_datacenter_asns()\n    for asn in [\"AS16509\", \"AS14618\"]:  # Amazon, Cloudflare\n        print(f\"{asn} is{' not' if not is_datacenter_asn(asn, asns) else ''} a datacenter ASN\")\n```\n\n## 📜 License\nCopyright 2025 TN3W\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftn3w%2Fipset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftn3w%2Fipset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftn3w%2Fipset/lists"}