{"id":19684239,"url":"https://github.com/elliotwutingfeng/inversion-dnsbl-generator","last_synced_at":"2025-04-29T06:30:25.344Z","repository":{"id":38949717,"uuid":"431464987","full_name":"elliotwutingfeng/Inversion-DNSBL-Generator","owner":"elliotwutingfeng","description":"Generate malicious URL blocklists for DNSBL applications like pfBlockerNG or Pi-hole by scanning various public URL sources using the Safe Browsing API from Google and/or Yandex.","archived":false,"fork":false,"pushed_at":"2024-08-20T05:48:00.000Z","size":461,"stargazers_count":20,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-20T08:05:46.339Z","etag":null,"topics":["aiohttp","blocklist","dnsbl","domains-project","domcop","fasttld","firewall","google-safe-browsing","icann","pfblockerng","pi-hole","pihole","python","python3","ray","safebrowsing","sqlite3","top1m","trancolist","yandex-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elliotwutingfeng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-24T11:53:29.000Z","updated_at":"2024-08-20T05:48:03.000Z","dependencies_parsed_at":"2023-12-13T08:54:26.644Z","dependency_job_id":"9bd0347e-075c-460c-8fb3-b7cd55cbec00","html_url":"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FInversion-DNSBL-Generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FInversion-DNSBL-Generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FInversion-DNSBL-Generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2FInversion-DNSBL-Generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elliotwutingfeng","download_url":"https://codeload.github.com/elliotwutingfeng/Inversion-DNSBL-Generator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224151082,"owners_count":17264436,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiohttp","blocklist","dnsbl","domains-project","domcop","fasttld","firewall","google-safe-browsing","icann","pfblockerng","pi-hole","pihole","python","python3","ray","safebrowsing","sqlite3","top1m","trancolist","yandex-api"],"created_at":"2024-11-11T18:17:12.414Z","updated_at":"2025-04-29T06:30:25.322Z","avatar_url":"https://github.com/elliotwutingfeng.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch3 align=\"center\"\u003eInversion DNSBL (Domain Name System-based blackhole list) Generator\u003c/h3\u003e\n  \u003cimg src=\"images/inversion_logo.svg\" alt=\"Logo\" width=\"200\" height=\"200\"\u003e\n  \u003cp align=\"center\"\u003e\n    Generate malicious URL blocklists for \u003ca href=\"https://en.wikipedia.org/wiki/Domain_Name_System-based_blackhole_list\"\u003eDNSBL\u003c/a\u003e applications like \u003ca href=\"https://linuxincluded.com/block-ads-malvertising-on-pfsense-using-pfblockerng-dnsbl\"\u003epfBlockerNG\u003c/a\u003e or \u003ca href=\"https://pi-hole.net\"\u003ePi-hole\u003c/a\u003e by scanning various public URL sources using the Safe Browsing API from \u003ca href=\"https://developers.google.com/safe-browsing\"\u003eGoogle\u003c/a\u003e and/or \u003ca href=\"https://yandex.com/dev/safebrowsing\"\u003eYandex\u003c/a\u003e.\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Blocklists/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Blocklists/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://python.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/Python-FFD43B?style=for-the-badge\u0026logo=python\u0026logoColor=blue\" alt=\"Python\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.sqlite.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/SQLite-07405E?style=for-the-badge\u0026logo=sqlite\u0026logoColor=white\" alt=\"SQLite\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://docs.aiohttp.org/en/stable\"\u003e\u003cimg src=\"https://img.shields.io/badge/AIOHTTP-2C5BB4?style=for-the-badge\u0026logo=aiohttp\u0026logoColor=white\" alt=\"AIOHTTP\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.ray.io\"\u003e\u003cimg src=\"https://img.shields.io/badge/Ray-028CF0?style=for-the-badge\u0026logo=ray\u0026logoColor=white\" alt=\"Ray\"/\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"GitHub stars\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator/watchers\"\u003e\u003cimg src=\"https://img.shields.io/github/watchers/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"GitHub watchers\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator/network/members\"\u003e\u003cimg src=\"https://img.shields.io/github/forks/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"GitHub forks\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator/issues\"\u003e\u003cimg src=\"https://img.shields.io/github/issues/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"GitHub issues\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://codeclimate.com/github/elliotwutingfeng/Inversion-DNSBL-Generator\"\u003e\u003cimg src=\"https://img.shields.io/codeclimate/maintainability/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"Code Climate Maintainability\"/\u003e\u003c/a\u003e\n    \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/LICENSE-BSD--3--CLAUSE-GREEN?style=for-the-badge\" alt=\"GitHub license\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/elliotwutingfeng/Inversion-DNSBL-Generator/commits/master)\"\u003e\u003cimg src=\"https://img.shields.io/github/commit-activity/w/elliotwutingfeng/Inversion-DNSBL-Generator?style=for-the-badge\" alt=\"GitHub commit activity\"/\u003e\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\u003cdetails\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\u003ca href=\"#blocklists-available-for-download\"\u003eBlocklists available for download\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#url-sources\"\u003eURL sources\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#safe-browsing-api-vendors\"\u003eSafe Browsing API vendors\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#requirements\"\u003eRequirements\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#system-mandatory\"\u003eSystem (mandatory)\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#safe-browsing-api-access-mandatory\"\u003eSafe Browsing API Access (mandatory)\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#url-feed-access-optional\"\u003eURL feed access (optional)\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#uploading-blocklists-to-github-optional\"\u003eUploading blocklists to GitHub (optional)\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#download-limits\"\u003eDownload limits\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#setup-instructions\"\u003eSetup instructions\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#declare-environment-variables\"\u003eDeclare environment variables\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#install-dependencies\"\u003eInstall dependencies\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#download-domains-project-urls-optional\"\u003eDownload Domains Project URLs (optional)\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#download-google-safe-browsing-api-hashes\"\u003eDownload Google Safe Browsing API hashes\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#download-and-identify-malicious-urls-from-tranco-top1m\"\u003eDownload and Identify malicious URLs from Tranco TOP1M\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#other-examples\"\u003eOther Examples\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#download-domcop-top10m-urls\"\u003eDownload DomCop TOP10M URLs\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#download-and-identify-malicious-urls-from-all-sources\"\u003eDownload and Identify malicious URLs from all sources\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#retrieve-urls-marked-as-malicious-from-past-scans-from-database\"\u003eRetrieve URLs marked as malicious from past scans from database\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#display-help-message\"\u003eDisplay help message\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#known-issues\"\u003eKnown Issues\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#disclaimer\"\u003eDisclaimer\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#references\"\u003eReferences\u003c/a\u003e\u003c/li\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n---\n\n## Blocklists available for download\n\n![Total Blocklist URLs](https://tokei-rs.onrender.com/b1/github/elliotwutingfeng/Inversion-DNSBL-Blocklists?label=Total%20Blocklist%20URLS\u0026style=for-the-badge)\n\nYou may download the blocklists [here](https://github.com/elliotwutingfeng/Inversion-DNSBL-Blocklists#inversion-dnsbl-domain-name-system-based-blackhole-list-blocklists)\n\n## URL sources\n\n| Name | URL Count | Source | Description |\n|-|-|-|-|\n| Tranco TOP1M | 1M | \u003chttps://tranco-list.eu\u003e | A Research-Oriented Top Sites Ranking Hardened Against Manipulation |\n| DomCop TOP10M | 10M | \u003chttps://www.domcop.com/top-10-million-domains\u003e | Top 10 million domains Based on Open PageRank data |\n| Registrar R01 | 6M | \u003chttps://r01.ru\u003e | Zone files for .ru .su .rf domains |\n| CubDomain.com | 196M | \u003chttps://cubdomain.com\u003e | Aggregator that tracks newly registered domains daily |\n| ICANN CZDS (Centralized Zone Data Service) | 247M | \u003chttps://czds.icann.org\u003e | ICANN's centralized point for interested parties to request access to Zone Files provided by participating Top Level Domain Registries |\n| Domains Project | 2.1B | \u003chttps://domainsproject.org\u003e | World’s single largest Internet domains dataset |\n| Amazon Web Services EC2 | 57M | \u003chttps://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames\u003e | Amazon Elastic Compute Cloud hostnames |\n| Google Compute Engine | 11M | \u003chttps://www.gstatic.com/ipranges/cloud.json\u003e | Google Compute Engine |\n| OpenINTEL.nl | 6M | \u003chttps://openintel.nl\u003e | Zone files for .se .nu .ee domains |\n| Switch.ch | 3.3M | \u003chttps://switch.ch/open-data\u003e | Zone files for .ch .li domains |\n| AFNIC.fr | 7M | \u003chttps://www.afnic.fr/en/products-and-services/fr-and-associated-services/shared-data-reuse-fr-data\u003e | Daily newly registered .fr .re .pm .tf .wf .yt domains |\n| Internet.ee | 153K | \u003chttps://www.internet.ee/domains/ee-zone-file\u003e | Estonian Internet Foundation (.ee) |\n| Internetstiftelsen | 1.7M | \u003chttps://zonedata.iis.se\u003e | Swedish Internet Foundation |\n| SK-NIC.sk | 400K | \u003chttps://sk-nic.sk/subory/domains.txt\u003e | Domain Registry of the Slovak Republic (.sk) |\n| Google TAG IOCs | 200 | \u003chttps://blog.google/threat-analysis-group\u003e | Google Threat Analysis Group Indicators of Compromise |\n| IPv4 Addresses | 4.2B | 0.0.0.0 - 255.255.255.255 | Exhaustive list of all IPv4 addresses |\n\n## Safe Browsing API vendors\n\n| \u003ca href=\"https://developers.google.com/safe-browsing\"\u003e\u003cimg height=\"100px\" src=\"images/google.svg\" alt=\"Google Safe Browsing API\" /\u003e\u003c/a\u003e | \u003ca href=\"https://yandex.com/dev/safebrowsing\"\u003e\u003cimg height=\"100px\" src=\"images/yandex.png\" alt=\"Yandex Safe Browsing API\" /\u003e\u003c/a\u003e |\n|:-:|:-:|\n|[Google](https://developers.google.com/safe-browsing)|[Yandex](https://yandex.com/dev/safebrowsing)|\n|[Terms-of-Service](https://developers.google.com/safe-browsing/terms)|[Terms-of-Service](https://yandex.ru/legal/yandex_sb_api/?lang=en)|\n\n## Requirements\n\n### System (mandatory)\n\n- Linux or macOS\n- Python 3.10+\n- Multi-core x86-64 CPU; for Python Ray support\n- RAM: At least 32GB\n- SSD Storage Space: At least 700GB required to process all URL sources\n\n### Safe Browsing API Access (mandatory)\n\nChoose at least one\n\n- Google: [Obtain a Google Developer API key and set it up for the Safe Browsing API](https://developers.google.com/safe-browsing/v4/get-started)\n- Yandex: [Obtain a Yandex Developer API key](https://yandex.com/dev/safebrowsing)\n\n### URL feed access (optional)\n\n- ICANN Zone Files: [Sign up for a ICANN CZDS account](https://czds.icann.org)\n- Once registered, turn off email notifications in the user settings (otherwise they will send you hundreds of acknowledgement emails),\nthen select `Create New Request` on the Dashboard to request for zone file access.\n\n### Uploading blocklists to GitHub (optional)\n\n- [Create a GitHub API Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)\n\n### Download limits\n\n- **ICANN CZDS (Centralized Zone Data Service):** Once every 24 hours per zone file\n- **Switch.ch:** Once every 24 hours per zone file\n\n## Setup instructions\n\n`git clone` and `cd` into the project directory\n\n### Declare environment variables\n\n```bash\ncp --update=none .env-dev .env\n```\n\nIn `.env`, fill in the following variables\n\n```bash\n# Mandatory: At least one of the following Safe Browsing API keys\nGOOGLE_API_KEY=\nYANDEX_API_KEY=\n\n# Optional: ICANN zone file access\nICANN_ACCOUNT_USERNAME=\nICANN_ACCOUNT_PASSWORD=\n# Some registrars will not accept your request reason unless you include your Name, Email, IP Address, Physical Address (Building, Street, Postcode etc.), and Phone Number\nICANN_REQUEST_REASON='Detection of potentially malicious domains for cybersecurity research. Name: _ Email: _ IP Address: _ Physical Address: _ Phone Number: _'\n\n# Optional: Upload generated blocklists to your GitHub repository\nGITHUB_ACCESS_TOKEN=\nBLOCKLIST_REPOSITORY_NAME=\n```\n\n### Install dependencies\n\nAccording to [PEP 668](https://peps.python.org/pep-0668), use of a virtual environment is [strongly recommended](https://packaging.python.org/en/latest/specifications/externally-managed-environments) as of 2023.\n\n```bash\npython3 -m venv venv\nvenv/bin/python3 -m pip install --upgrade pip\nvenv/bin/python3 -m pip install -r requirements.txt\n```\n\n### Download Domains Project URLs (optional)\n\n```bash\n# Dataset size ~49Gb\ncd ../\ngit clone https://github.com/tb0hdan/domains.git\ncd domains\ngit lfs install # you will need to install Git LFS first (https://git-lfs.github.com)\n```\n\nEdit `unpack.sh` and remove `combine` from the last line, then run:\n\n```bash\n./unpack.sh\n```\n\n## Getting Started\n\n### Download Google Safe Browsing API hashes\n\n\u003e :warning: As of 4 August 2023, the following command will make around 9000 calls (exact number depends on number of hashes in Google's dataset) to Google Safe Browsing API. As the daily limit is 10,000 calls, `--update-hashes` should be run no more than once every 24 hours.\n\n```bash\nvenv/bin/python3 main.py --update-hashes --vendors google\n```\n\n### Download and Identify malicious URLs from Tranco TOP1M\n\n- :heavy_check_mark: Add Tranco TOP1M URLs to database\n- :heavy_check_mark: Identify malicious URLs from database using Safe Browsing API hashes, and generate a blocklist\n- :heavy_check_mark: Update database with latest malicious URL statuses\n- :memo: Sources: **Tranco TOP1M**\n- :shield: Vendors: **Google**\n\n```bash\nvenv/bin/python3 main.py --fetch-urls --identify-malicious-urls --sources top1m --vendors google\n```\n\n## Other Examples\n\n### Download DomCop TOP10M URLs\n\n- :heavy_check_mark: Add DomCop TOP10M URLs to database (no blocklist will be generated)\n- :memo: Sources: **DomCop TOP10M**\n- :shield: Vendors: **Not Applicable**\n\n```bash\nvenv/bin/python3 main.py --fetch-urls --sources top10m\n```\n\n### Download and Identify malicious URLs from all sources\n\n\u003e :warning: Requires at least 700GB free space.\n\u003e\n\u003e :information_source: If you have not downloaded any Safe Browsing API hashes yet, add the `--update-hashes` flag to the following command.\n\n- :heavy_check_mark: Add URLs from all sources to database\n- :heavy_check_mark: Identify malicious URLs from database using Safe Browsing API hashes, and generate a blocklist\n- :heavy_check_mark: Update database with latest malicious URL statuses\n- :memo: Sources: Everything\n- :shield: Vendors: **Google**\n\n```bash\nvenv/bin/python3 main.py --fetch-urls --identify-malicious-urls --vendors google\n```\n\n### Retrieve URLs marked as malicious from past scans from database\n\n- :heavy_check_mark: Retrieve URLs with malicious statuses (attained from past scans) from database, and generate a blocklist\n- :memo: Sources: **DomCop TOP10M**, **Domains Project**\n- :shield: Vendors: **Google**\n\n```bash\nvenv/bin/python3 main.py --retrieve-known-malicious-urls --sources top10m domainsproject --vendors google\n```\n\n### Display help message\n\n```bash\nvenv/bin/python3 main.py --help\n```\n\n## Known Issues\n\n- Yandex Safe Browsing Update API appears to be unserviceable. Yandex Technical support has been notified.\n\n## Disclaimer\n\n- This project is not sponsored, endorsed, or otherwise affiliated with Google and/or Yandex.\n\n- Google works to provide the most accurate and up-to-date information about unsafe web resources. However, Google cannot guarantee that its information is comprehensive and error-free: some risky sites may not be identified, and some safe sites may be identified in error.\n\n- URLs detected with the Safe Browsing API usually have a malicious validity period of about 5 minutes. As the blocklists are updated only once every 24 hours, the blocklists must not be used to display user warnings.\n\n**More information on Google Safe Browsing API usage limits:** \u003chttps://developers.google.com/safe-browsing/v4/usage-limits\u003e\n\n## References\n\n- \u003chttps://developers.google.com/safe-browsing\u003e\n- \u003chttps://yandex.com/dev/safebrowsing\u003e\n- \u003chttps://remusao.github.io/posts/few-tips-sqlite-perf.html\u003e\n- \u003chttps://github.com/icann/czds-api-client-python\u003e\n- \u003chttps://jpmens.net/2021/05/18/dns-open-zone-data\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Finversion-dnsbl-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felliotwutingfeng%2Finversion-dnsbl-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Finversion-dnsbl-generator/lists"}