An open API service indexing awesome lists of open source software.

https://github.com/adulau/domainclassifier

DomainClassifier is a Python (2/3) library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text files following their DNS existence, localization or attributes.
https://github.com/adulau/domainclassifier

csirt data-mining location-discovery network-security python-library

Last synced: 5 days ago
JSON representation

DomainClassifier is a Python (2/3) library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text files following their DNS existence, localization or attributes.

Awesome Lists containing this project

README

        

DomainClassifier
================

DomainClassifier is a simple Python library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text files following their existence, localization or attributes.

DomainClassifier can be used to extract Internet hosts from any free texts or collected unstructured information. A passive dns output is also available.

![An overview of the DomainClassifier methods](https://raw.github.com/adulau/DomainClassifier/master/doc/domainclassifier-flow.png)

Install
-------

[DomainClassifier](https://pypi.python.org/pypi/DomainClassifier/) is part of the pypi package. It can be installed using the pip command:

`pip install DomainClassifier`

```python

In [11]: c = DomainClassifier.domainclassifier.Extract(rawtext="www.google.com foo.bar ppp.ppp")

In [12]: c.potentialdomain()
Out[12]: ['www.google.com', 'foo.bar']
```

How To Use It
-------------

```python
import DomainClassifier.domainclassifier

c = DomainClassifier.domainclassifier.Extract( rawtext = "www.xxx.com this is a text with a domain called [email protected] another test abc.lu something a.b.c.d.e end of 1.2.3.4 foo.be www.belnet.be ht
tp://www.cert.be/ www.public.lu www.allo.lu quuxtest www.eurodns.com something-broken-www.google.com www.google.lu trailing test www.facebook.com www.nic.ru www.youporn.com 8.8.8.
8 201.1.1.1")

# extracting potentially valid domains from rawtext
print(c.potentialdomain())

# reduce set of potentially valid domains to existing domains
# (based on SOA,A,AAAA,CNAME,MX records)
print(c.validdomain(extended=True))

# reduce set of valid domains with DNS records associated to a
# specified country
print("US:")
print(c.localizedomain(cc='US'))
print("LU:")
print(c.localizedomain(cc='LU'))
print("BE:")
print(c.localizedomain(cc='BE'))
print("Ranking:")
print(c.rankdomain())

# extract valid IPv4 addresses (using the potential list of valid domains)
print("List of ip addresses:")
print(c.ipaddress(extended=True))

# some more filtering
print("Include dot.lu:")
print(c.include(expression=r'\.lu$'))
print("Exclude dot.lu:")
print(c.exclude(expression=r'\.lu$'))
```

### Sample output

```python
['www.xxx.com', 'foo.lu', 'abc.lu', 'a.b.c.d.e', '1.2.3.4', 'foo.be', 'www.belnet.be', 'www.cert.be', 'www.public.lu', 'www.allo.lu', 'www.eurodns.com', 'something-broken-www.google.com', 'www.google.lu', 'www.facebook.com', 'www.nic.ru', 'www.youporn.com', '8.8.8.8', '201.1.1.1']
[('www.xxx.com', 'A', ), ('abc.lu', 'SOA', ), ('abc.lu', 'MX', ), ('foo.be', 'A', ), ('foo.be', 'AAAA', ), ('foo.be', 'SOA', ), ('foo.be', 'MX', ), ('www.belnet.be', 'A', ), ('www.belnet.be', 'AAAA', ), ('www.belnet.be', 'CNAME', ), ('www.cert.be', 'A', ), ('www.cert.be', 'AAAA', ), ('www.cert.be', 'SOA', ), ('www.cert.be', 'MX', ), ('www.cert.be', 'CNAME', ), ('www.public.lu', 'A', ), ('www.allo.lu', 'A', ), ('www.eurodns.com', 'A', ), ('www.google.lu', 'A', ), ('www.google.lu', 'AAAA', ), ('www.facebook.com', 'A', ), ('www.facebook.com', 'AAAA', ), ('www.facebook.com', 'MX', ), ('www.facebook.com', 'CNAME', ), ('www.nic.ru', 'A', ), ('www.nic.ru', 'MX', ), ('www.youporn.com', 'A', ), ('www.youporn.com', 'SOA', ), ('www.youporn.com', 'MX', ), ('www.youporn.com', 'CNAME', )]
US:
[('www.xxx.com', 'A', ), ('www.google.lu', 'A', )]
LU:
[('www.public.lu', 'A', ), ('www.allo.lu', 'A', ), ('www.eurodns.com', 'A', )]
BE:
[('foo.be', 'A', ), ('www.belnet.be', 'A', ), ('www.belnet.be', 'CNAME', ), ('www.cert.be', 'A', ), ('www.cert.be', 'CNAME', )]
Ranking:
[(1.0, 'www.youporn.com'), (1.0, 'www.youporn.com'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.belnet.be'), (1.0000120563271599, 'www.cert.be'), (1.0000120563271599, 'www.cert.be'), (1.0000372023809501, 'foo.be'), (1.0001395089285701, 'www.public.lu'), (1.00015419407895, 'www.allo.lu'), (1.0003662109375, 'www.eurodns.com'), (1.0004111842105301, 'www.xxx.com'), (1.0005944293478299, 'www.nic.ru'), (1.0024646577381, 'www.facebook.com'), (1.0024646577381, 'www.facebook.com'), (1.002635288165, 'www.google.lu')]
List of ip addresses:
('15169', 'AU', )
('15169', 'US', )
('27699', 'BR', )
set([('201.1.1.1', '(\'27699\', \'BR\', )'), ('8.8.8.8', '(\'15169\', \'US\', )'), ('1.2.3.4', '(\'15169\', \'AU\', )')])
Include dot.lu:
['abc.lu', 'abc.lu', 'www.public.lu', 'www.allo.lu', 'www.google.lu', 'www.google.lu']
Exclude dot.lu:
['www.xxx.com', 'foo.be', 'foo.be', 'foo.be', 'foo.be', 'www.belnet.be', 'www.belnet.be', 'www.belnet.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.cert.be', 'www.eurodns.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.facebook.com', 'www.nic.ru', 'www.nic.ru', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com', 'www.youporn.com']
```

### Software Required

* Python (tested successfully on version 2.6, 2.7 and 3.5)
* dnspython library - http://www.dnspython.org/
* IPy library
* [pybgpranking](https://github.com/D4-project/BGP-Ranking/tree/master/client) to get malicious ranking of BGP AS number via [BGP Ranking](https://github.com/D4-project/BGP-Ranking)

### Software using DomainClassifier

* [AIL framework - Analysis Information Leak framework](https://github.com/ail-project/ail-framework)

### License

~~~~
Copyright (C) 2012-2023 Alexandre Dulaunoy - a(at)foo.be
Copyright (C) 2021 Aurelien Thirion

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see .
~~~~