{"id":20932488,"url":"https://github.com/adulau/domainclassifier","last_synced_at":"2025-10-31T00:49:16.144Z","repository":{"id":2296754,"uuid":"3255194","full_name":"adulau/DomainClassifier","owner":"adulau","description":"DomainClassifier is a Python (2/3) library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text files following their DNS existence, localization or attributes.","archived":false,"fork":false,"pushed_at":"2024-01-31T20:53:37.000Z","size":142,"stargazers_count":77,"open_issues_count":0,"forks_count":11,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-08T01:42:35.218Z","etag":null,"topics":["csirt","data-mining","location-discovery","network-security","python-library"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adulau.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-01-24T10:59:04.000Z","updated_at":"2025-04-24T22:03:02.000Z","dependencies_parsed_at":"2024-01-19T08:09:57.315Z","dependency_job_id":"bb40fca5-ab5a-4602-8c47-8d18ae44efd9","html_url":"https://github.com/adulau/DomainClassifier","commit_stats":{"total_commits":63,"total_committers":3,"mean_commits":21.0,"dds":0.09523809523809523,"last_synced_commit":"1e55e0a5a7a573c0da4ca565695b1507eb2cd464"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adulau%2FDomainClassifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adulau%2FDomainClassifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adulau%2FDomainClassifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adulau%2FDomainClassifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adulau","download_url":"https://codeload.github.com/adulau/DomainClassifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020902,"owners_count":22000805,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csirt","data-mining","location-discovery","network-security","python-library"],"created_at":"2024-11-18T21:48:50.403Z","updated_at":"2025-10-31T00:49:11.110Z","avatar_url":"https://github.com/adulau.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"DomainClassifier\n================\n\nDomainClassifier is a simple Python library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text files following their existence, localization or attributes.\n\nDomainClassifier can be used to extract Internet hosts from any free texts or collected unstructured information. A passive dns output is also available.\n\n![An overview of the DomainClassifier methods](https://raw.github.com/adulau/DomainClassifier/master/doc/domainclassifier-flow.png)\n\nInstall\n-------\n\n[DomainClassifier](https://pypi.python.org/pypi/DomainClassifier/) is part of the pypi package. It can be installed using the pip command:\n\n`pip install DomainClassifier`\n\n```python\n\nIn [11]: c = DomainClassifier.domainclassifier.Extract(rawtext=\"www.google.com foo.bar ppp.ppp\")\n\nIn [12]: c.potentialdomain()\nOut[12]: ['www.google.com', 'foo.bar']\n```\n\nHow To Use It\n-------------\n\n\n```python\nimport DomainClassifier.domainclassifier\n\nc = DomainClassifier.domainclassifier.Extract( rawtext = \"www.xxx.com this is a text with a domain called test@foo.lu another test abc.lu something a.b.c.d.e end of 1.2.3.4 foo.be www.belnet.be ht\ntp://www.cert.be/ www.public.lu www.allo.lu quuxtest www.eurodns.com something-broken-www.google.com www.google.lu trailing test www.facebook.com www.nic.ru www.youporn.com 8.8.8.\n8 201.1.1.1\")\n\n# extracting potentially valid domains from rawtext\nprint(c.potentialdomain())\n\n# reduce set of potentially valid domains to existing domains\n# (based on SOA,A,AAAA,CNAME,MX records)\nprint(c.validdomain(extended=True))\n\n# reduce set of valid domains with DNS records associated to a\n# specified country\nprint(\"US:\")\nprint(c.localizedomain(cc='US'))\nprint(\"LU:\")\nprint(c.localizedomain(cc='LU'))\nprint(\"BE:\")\nprint(c.localizedomain(cc='BE'))\nprint(\"Ranking:\")\nprint(c.rankdomain())\n\n# extract valid IPv4 addresses (using the potential list of valid domains)\nprint(\"List of ip addresses:\")\nprint(c.ipaddress(extended=True))\n\n# some more filtering\nprint(\"Include dot.lu:\")\nprint(c.include(expression=r'\\.lu$'))\nprint(\"Exclude dot.lu:\")\nprint(c.exclude(expression=r'\\.lu$'))\n```\n\n### Sample output\n\n```python\n['www.xxx.com', 'foo.lu', 'abc.lu', 'a.b.c.d.e', '1.2.3.4', 'foo.be', 'www.belnet.be', 'www.cert.be', 'www.public.lu', 'www.allo.lu', 'www.eurodns.com', 'something-broken-www.google.com', 'www.google.lu', 'www.facebook.com', 'www.nic.ru', 'www.youporn.com', '8.8.8.8', '201.1.1.1']\n[('www.xxx.com', 'A', \u003cDNS IN A rdata: 67.23.112.226\u003e), ('abc.lu', 'SOA', \u003cDNS IN SOA rdata: neptun.vo.lu. Administrator.vo.lu. 2006063001 86400 7200 2419200 3600\u003e), ('abc.lu', 'MX', \u003cDNS IN MX rdata: 10 proteus.vo.lu.\u003e), ('foo.be', 'A', \u003cDNS IN A rdata: 188.65.217.78\u003e), ('foo.be', 'AAAA', \u003cDNS IN AAAA rdata: 2001:6f8:202:2df::2\u003e), ('foo.be', 'SOA', \u003cDNS IN SOA rdata: ka.quuxlabs.com. adulau.foo.be. 2010121901 21600 3600 604800 86400\u003e), ('foo.be', 'MX', \u003cDNS IN MX rdata: 10 mail.foo.be.\u003e), ('www.belnet.be', 'A', \u003cDNS IN A rdata: 193.190.130.15\u003e), ('www.belnet.be', 'AAAA', \u003cDNS IN AAAA rdata: 2001:6a8:3c80:8300::15\u003e), ('www.belnet.be', 'CNAME', \u003cDNS IN CNAME rdata: fiorano.belnet.be.\u003e), ('www.cert.be', 'A', \u003cDNS IN A rdata: 193.190.198.61\u003e), ('www.cert.be', 'AAAA', \u003cDNS IN AAAA rdata: 2001:6a8:3c80::61\u003e), ('www.cert.be', 'SOA', \u003cDNS IN SOA rdata: ns.belnet.be. hostmaster.belnet.be. 2013053039 360 180 1209600 3600\u003e), ('www.cert.be', 'MX', \u003cDNS IN MX rdata: 10 asp-mxa.belnet.be.\u003e), ('www.cert.be', 'CNAME', \u003cDNS IN CNAME rdata: cert.be.\u003e), ('www.public.lu', 'A', \u003cDNS IN A rdata: 194.154.200.74\u003e), ('www.allo.lu', 'A', \u003cDNS IN A rdata: 80.90.47.69\u003e), ('www.eurodns.com', 'A', \u003cDNS IN A rdata: 80.92.65.165\u003e), ('www.google.lu', 'A', \u003cDNS IN A rdata: 173.194.66.94\u003e), ('www.google.lu', 'AAAA', \u003cDNS IN AAAA rdata: 2a00:1450:400c:c03::5e\u003e), ('www.facebook.com', 'A', \u003cDNS IN A rdata: 31.13.64.1\u003e), ('www.facebook.com', 'AAAA', \u003cDNS IN AAAA rdata: 2a03:2880:10:8f07:face:b00c::1\u003e), ('www.facebook.com', 'MX', \u003cDNS IN MX rdata: 10 msgin.t.facebook.com.\u003e), ('www.facebook.com', 'CNAME', \u003cDNS IN CNAME rdata: star.c10r.facebook.com.\u003e), ('www.nic.ru', 'A', \u003cDNS IN A rdata: 194.85.61.42\u003e), ('www.nic.ru', 'MX', \u003cDNS IN MX rdata: 0 nomail.nic.ru.\u003e), ('www.youporn.com', 'A', \u003cDNS IN A rdata: 31.192.116.24\u003e), ('www.youporn.com', 'SOA', \u003cDNS IN SOA rdata: pdns1.ultradns.net. dns.manwin.com. 2012041840 86400 86400 86400 86400\u003e), ('www.youporn.com', 'MX', \u003cDNS IN MX rdata: 20 smtp-scan01.mx.reflected.net.\u003e), ('www.youporn.com', 'CNAME', \u003cDNS IN CNAME rdata: youporn.com.\u003e)]\nUS:\n[('www.xxx.com', 'A', \u003cDNS IN A rdata: 67.23.112.226\u003e), ('www.google.lu', 'A', \u003cDNS IN A rdata: 173.194.66.94\u003e)]\nLU:\n[('www.public.lu', 'A', \u003cDNS IN A rdata: 194.154.200.74\u003e), ('www.allo.lu', 'A', \u003cDNS IN A rdata: 80.90.47.69\u003e), ('www.eurodns.com', 'A', \u003cDNS IN A rdata: 80.92.65.165\u003e)]\nBE:\n[('foo.be', 'A', \u003cDNS IN A rdata: 188.65.217.78\u003e), ('www.belnet.be', 'A', \u003cDNS IN A rdata: 193.190.130.15\u003e), ('www.belnet.be', 'CNAME', \u003cDNS IN CNAME rdata: fiorano.belnet.be.\u003e), ('www.cert.be', 'A', \u003cDNS IN A rdata: 193.190.198.61\u003e), ('www.cert.be', 'CNAME', \u003cDNS IN CNAME rdata: cert.be.\u003e)]\nRanking:\n[(1.0, 'www.youporn.com'), (1.0, 'www.youporn.com'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.cert.be'), (1.0000120563271599, 'www.cert.be'), (1.0000372023809501, 'foo.be'), (1.0001395089285701, 'www.public.lu'), (1.00015419407895, 'www.allo.lu'), (1.0003662109375, 'www.eurodns.com'), (1.0004111842105301, 'www.xxx.com'), (1.0005944293478299, 'www.nic.ru'), (1.0024646577381, 'www.facebook.com'), (1.0024646577381, 'www.facebook.com'), (1.002635288165, 'www.google.lu')]\nList of ip addresses:\n('15169', 'AU', \u003cDNS IN TXT rdata: \"15169 | 1.2.3.0/24 | AU | apnic | 2011-08-11\"\u003e)\n('15169', 'US', \u003cDNS IN TXT rdata: \"15169 | 8.8.8.0/24 | US | arin | 1992-12-01\"\u003e)\n('27699', 'BR', \u003cDNS IN TXT rdata: \"27699 | 201.1.0.0/17 | BR | lacnic | 2003-12-08\"\u003e)\nset([('201.1.1.1', '(\\'27699\\', \\'BR\\', \u003cDNS IN TXT rdata: \"27699 | 201.1.0.0/17 | BR | lacnic | 2003-12-08\"\u003e)'), ('8.8.8.8', '(\\'15169\\', \\'US\\', \u003cDNS IN TXT rdata: \"15169 | 8.8.8.0/24 | US | arin | 1992-12-01\"\u003e)'), ('1.2.3.4', '(\\'15169\\', \\'AU\\', \u003cDNS IN TXT rdata: \"15169 | 1.2.3.0/24 | AU | apnic | 2011-08-11\"\u003e)')])\nInclude dot.lu:\n['abc.lu', 'abc.lu', 'www.public.lu', 'www.allo.lu', 'www.google.lu', 'www.google.lu']\nExclude dot.lu:\n['www.xxx.com', 'foo.be', 'foo.be', 'foo.be', 'foo.be', 'www.belnet.be', 'www.belnet.be', 'www.belnet.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.eurodns.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.nic.ru', 'www.nic.ru', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com']\n```\n\n### Software Required\n\n* Python (tested successfully on version 2.6, 2.7 and 3.5)\n* dnspython library - http://www.dnspython.org/\n* IPy library\n* [pybgpranking](https://github.com/D4-project/BGP-Ranking/tree/master/client) to get malicious ranking of BGP AS number via [BGP Ranking](https://github.com/D4-project/BGP-Ranking)\n\n### Software using DomainClassifier\n\n* [AIL framework - Analysis Information Leak framework](https://github.com/ail-project/ail-framework)\n\n### License\n\n~~~~\nCopyright (C) 2012-2023 Alexandre Dulaunoy - a(at)foo.be\nCopyright (C) 2021 Aurelien Thirion\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU Affero General Public License as\npublished by the Free Software Foundation, either version 3 of the\nLicense, or (at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU Affero General Public License for more details.\n\nYou should have received a copy of the GNU Affero General Public License\nalong with this program.  If not, see \u003chttp://www.gnu.org/licenses/\u003e.\n~~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadulau%2Fdomainclassifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadulau%2Fdomainclassifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadulau%2Fdomainclassifier/lists"}