https://github.com/quartz/ark
Tools for processing Ark traceroute data
https://github.com/quartz/ark
ark caida postgresql qz-things tool
Last synced: 7 months ago
JSON representation
Tools for processing Ark traceroute data
- Host: GitHub
- URL: https://github.com/quartz/ark
- Owner: Quartz
- Created: 2016-04-08T14:31:12.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-06-10T18:10:09.000Z (over 9 years ago)
- Last Synced: 2025-01-12T02:03:19.431Z (9 months ago)
- Topics: ark, caida, postgresql, qz-things, tool
- Language: Ruby
- Homepage:
- Size: 1.19 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ark
Tools for processing traceroute data from CAIDA's [Ark project](http://www.caida.org/projects/ark/).
Tools in the `ark-tools` folder were provided by CAIDA's [Young Hyun](http://www.caida.org/~youngh/).
## Setup
```
mkvirtualenv ark
pip install -r requirements.txtcd ark-tools
gem install rb-asfinder-0.10.1.gem rb-wartslib-1.4.2.gem
cd ..
```This project also requires a running, local instance of Postgres with a no-password user named `ark` who owns a geo-enabled database named `ark`:
```
createdb -O ark ark
psql -q ark -c "CREATE EXTENSION postgis;"
```## Sourcing the data
This script uses data from CAIDA's [Ark IPv4 Routed /24 Topology Dataset](http://www.caida.org/data/active/ipv4_routed_24_topology_dataset.xml). The following script will will download all data, for all three monitoring teams, for every day in March of 2014. Caution: **This is 87GB of data**. The script can be stopped and started without starting over.
```
./fetch.sh
```You will also need to download the following files to the root project directory:
* [MaxMind GeoLite2 Country database](http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz) (unzip it)
* [BGP Reports mapping of AS numbers to names](http://bgp.potaroo.net/as6447/asnames.txt)## Building the database
Warning: This script will run for around a half hour **per day of data**. If you're loading a month of data it could easily take a full day. (Assuming you even have the disk space to hold it all.)
```
./process.py
```## Running queries
```
cat by_country.sql | psql -q ark
cat by_monitor.sql | psql -q ark
```## Analyzing a trace path
You can pass a `trace` from the database into `parse_trace.py` to generate detailed path data in CSV format:
```
./parse_trace.py "216.66.30.102:6939,216.66.30.101:6939,213.248.67.125:1299,213.155.130.34:1299,157.130.60.13:701,:q,:r,108.51.141.48:701" > nyc_to_dc.csv
```