Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/opendns/og-miner
OpenDNS Graph Miner
https://github.com/opendns/og-miner
Last synced: 21 days ago
JSON representation
OpenDNS Graph Miner
- Host: GitHub
- URL: https://github.com/opendns/og-miner
- Owner: opendns
- License: mit
- Created: 2017-03-27T22:12:15.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-04-28T18:02:45.000Z (over 7 years ago)
- Last Synced: 2024-08-05T17:42:26.605Z (4 months ago)
- Language: Python
- Homepage:
- Size: 15.6 MB
- Stars: 45
- Watchers: 9
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hacking-lists - opendns/og-miner - OpenDNS Graph Miner (Python)
README
# OG-Miner
Data Crawling on Steroids
## Installation
```
git clone [email protected]:opendns/og-miner.git
cd og-miner
pip install -r requirements.txt
```After this step, you will need to configure your API keys in *conf.json*
## Usage
```
$ ./miner.py --help
Miner Script (version 3.7)
usage: miner.py [-h] [--domain DOMAIN] [--domains DOMAINS] [--url URL]
[--urls URLS] [--ip IP] [--ips IPS] [--asn ASN] [--asns ASNS]
[--email EMAIL] [--emails EMAILS] [--hash HASH]
[--hashes HASHES] [--regex REGEX] [--regexes REGEXES]
[--query QUERY] [--json JSON] [--pull PULL] [--push PUSH]
[--config CONFIG] [--profile PROFILE] [--token TOKEN]
[--ttl TTL] [--title TITLE] [--explore EXPLORE]
[--operate OPERATE] [--depth DEPTH] [--workers WORKERS]
[--output OUTPUT] [--mongo MONGO] [--reset] [--no-output]
[--stats]optional arguments:
-h, --help show this help message and exit
--domain DOMAIN Mine from a domain.
--domains DOMAINS Mine from a list of domains in a file.
--url URL Mine from a URL.
--urls URLS Mine from a list of URLs in a file.
--ip IP Mine from an IP.
--ips IPS Mine from a list of IPs in a file.
--asn ASN Mine from an ASN.
--asns ASNS Mine from a list of ASNs in a file.
--email EMAIL Mine from an email address.
--emails EMAILS Mine from a list of emails in a file.
--hash HASH Mine from a hash.
--hashes HASHES Mine from a list of hashes in a file.
--regex REGEX Mine from a regex.
--regexes REGEXES Mine from a list of regexes in a file.
--query QUERY Mine from graph vertices matching the query
--json JSON Load custom tasks from a JSON file.
--pull PULL Pull entries to mine from a ZMQ stream.
--push PUSH Push mined results to a ZMQ stream.
--config CONFIG Select a configuration file.
--profile PROFILE Select a mining profile.
--token TOKEN Set the mining token.
--ttl TTL Set the mining token TTL (in seconds).
--title TITLE Set the dataset title.
--explore EXPLORE Set the list of explorers.
--operate OPERATE Set the list of operators.
--depth DEPTH Set the mining maximum depth.
--workers WORKERS Set the number of worker threads.
--output OUTPUT Set the output JSON filename.
--mongo MONGO Use MongoDB as a graph database.
--reset Reset graph.
--no-output No JSON output.
--stats Compute performance metrics.
```## Documentation
The miner script is a powerful data mining tool that helps users discover and build relationships between various entry points in a graph oriented fashion. Multiple sources of data already are implemented using a modular plugin system. and can be easily integrated using a modular plugin system.
Before digging too deep into the miner details, it is important to see the big picture. At OpenDNS, we build the "Security Graph". This security graph can be seen as a complex relational database representing Internet entities (Domains, IPs, ASNs, Whois ...) built on one hand from our DNS logs, on the other hand from external parties (Whois DB, MaxMind GeoIP, etc.). We connect those entites using several relationships (Co-occurrence, Related Domains, Domain-IP mapping, Registration etc.)
In other words, all this agglomerated data can be seen as a giant graph connecting dots of information. The miner script is a useful tool to extract parts of this graph ("subgraphs"). It digs inside the whole data network from given entry points using a certain mining profile. You can define as many entry points as you want from the command line and the mining profile is defined in a JSON file inside the "profiles" folder. If no profile is defined, it will fall back to the default one.
Once the miner has finished running, the output is a graph dataset stored in the JSON format. You can define the name of the resulting file with the --output argument and this file can be analyzed and loaded with various graph analysis softwares (ex: OpenGraphiti).
# Entry points
You can start from any domain, IP, email, ASN or binary hash. Use the --domain, --ip, --email, --asn and --hash arguments if you have only one (Or only one of each). You can use the arguments --domains, --ips, --emails, --asns and --hashes if you need to pass a list contained in a file. The file needs to have only one entry per line.
Examples:
Starts digging from test.com
```
$ ./miner.py --domain test.com
```Starts digging from domain test.com, ip 8.8.8.8 and asn 1234.
```
$ ./miner.py --domain test.com --ip 8.8.8.8 --asn 1234
```Starts digging from all domains located in 'domains.txt', saves the result in 'result.json' and sets the title of the dataset.
```
$ ./miner.py --domains domains.txt --output result.json --title "Infected Domains"
```# Mining Profiles
In reality, the data mining process is nothing more than a customizable Breadth First Traversal.
Long story short, here is what it does :
- Start from a set of seed nodes.
- Parse all neighbors
- Parse the neighbors of the neighbors
- Parse the neighbors of the neighbors of the neighbors
- Repeats that until we can't find new neighbors or until a certain limit is reached or a certain condition is met (Depth, number of nodes, memory size ...).The mining profiles help you customize a couple of things :
- The types of nodes and edges you want to parse
- The node and edges attributes you want to extract.
- The neighbor selection method.Different mining profiles will give you different results. Usually, a certain profile corresponds to a certain use case. For example, the "default.json" profile parses every type of node, edges and attributes but select only a bunch of neighbors at every iteration and is limited to a small depth. This is only intended to give you a relatively small dataset for an overview of a certain node neighborhood and a quick understanding of the various types of data that we collect.
Please take a look at "profiles/default.json" for concrete examples.