Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/erbbysam/dnsgrep
Quickly Search Large DNS Datasets
https://github.com/erbbysam/dnsgrep
Last synced: 11 days ago
JSON representation
Quickly Search Large DNS Datasets
- Host: GitHub
- URL: https://github.com/erbbysam/dnsgrep
- Owner: erbbysam
- License: mit
- Created: 2019-02-09T17:12:46.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-12-21T02:27:33.000Z (almost 4 years ago)
- Last Synced: 2024-10-26T11:31:54.584Z (18 days ago)
- Language: Go
- Size: 1.44 MB
- Stars: 580
- Watchers: 19
- Forks: 109
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DNSGrep
A utility for quickly searching presorted DNS names. Built around the Rapid7 rdns & fdns dataset.# How does it work?
This utility assumes the file provided is presorted (both alphabetical, and symbols).
The algorithm is pretty simple:
1) Use a binary search algorithm to seek through the file, looking for a substring match against the query.
2) Once a match is found, the file is scanned backwards in 10KB increments looking for a non-matching substring.
3) Once a non-matching substring is found, the file is scanned forwards until all exact matches are returned.# Limits
There is a built-in limit system. This prevents 2 things:
1) scanning too far backwards (`MaxScan`)
2) scanning too far forwards after scanning backwards (`MaxOutputLines`)This allows for any input while stopping requests that are taking too long.
Additionally, this utility does not handle the edge cases(start/end) of files and will return an error if encountered.
# Install
`go get` the following packages:
```
# used for dnsgrep cli flags
go get "github.com/jessevdk/go-flags"
# used by the experimental server for http routing
go get "github.com/gorilla/mux"
# pull in a string reversal function
go get "github.com/golang/example/stringutil"```
# Run
The following steps were tested with Ubuntu 16.04 & go 1.11.5.
Generate fdns_a.sort.txt and rdns.sort.txt first using the scripts found in the scripts/ folder:
```
# Each of these scripts requires:
# * 3 hours+ on an SSD
# * 300GB+ temp disk space (under the same folder)
# * ~65GB for output output (under the same folder)
# * jq to be installed
./scripts/fdns_a.sh
./scripts/rdns.sh
```Run the command line utility:
```
go run dnsgrep.go -f DNSBinarySearch/test_data.txt -i "amiccom.com.tw"
```Run the experimental server in the same folder as fdns_a.sort & rdns.sort.txt:
```
go run experimentalServer.go
```# Docker
You can also run the command line utility using Docker:
```
docker build -t dnsgrep .
docker run --rm -it -v "$PWD"/DNSBinarySearch:/files dnsgrep -f /files/test_data.txt -i ".amiccom.com.tw"
```# Data Source
The source of this data referenced throughout this repository is Rapid7 Labs. Please review the Terms of Service:
https://opendata.rapid7.com/about/https://opendata.rapid7.com/sonar.rdns_v2/
https://opendata.rapid7.com/sonar.fdns_v2/
# Stack Overflow References
via https://unix.stackexchange.com/a/35472
* we need to sort with LC_COLLATE=C to also sort ., charsvia https://unix.stackexchange.com/a/350068
* To sort a large file: split it into chunks, sort the chunks and then simply merge the results# License
See LICENSE file.