https://github.com/martelogan/cftrace
Experiments with worldwide traceroutes
https://github.com/martelogan/cftrace
Last synced: 2 months ago
JSON representation
Experiments with worldwide traceroutes
- Host: GitHub
- URL: https://github.com/martelogan/cftrace
- Owner: martelogan
- License: apache-2.0
- Created: 2024-11-27T10:05:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-16T07:07:48.000Z (over 1 year ago)
- Last Synced: 2025-01-26T08:14:17.602Z (over 1 year ago)
- Language: Ruby
- Size: 101 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Traceroute Data Collection Tool
This tool collects traceroute data from Cloudflare colos to a specified target IP or domain and organizes the results into structured JSON files and aggregate CSVs for analysis. It supports geo-location lookups, hop analysis, and optional GCP region identification for targets hosted on Google Cloud.
---
## **Directory Structure**
The results are saved in a structured directory hierarchy based on colos and regions:
- **Regions**: `na/` (North America), `emea/` (Europe, Middle East, and Africa), `apac/` (Asia-Pacific), `latam/` (Latin America).
- **Files**: Each file is named using a human-readable target name and the target IP, e.g., `c1-gclb_34.49.121.93.json`.
---
## **CSV Formats**
### **Traceroute Summary CSV**
The `traceroute_summary.csv` summarizes traceroute data for each `` pair. Below is the list of columns in the CSV, ordered as they appear:
| **Column Name** | **Description** |
|-----------------------------|--------------------------------------------------------------------------------------------------|
| `start_region` | Region grouping for the colo, e.g., `na`, `emea`, etc. |
| `start_colo` | The colo's short name, e.g., `pdx`. |
| `trace_target` | Human-readable name for the target IP, e.g., `c1-gclb`. |
| `rtt_ms` | Mean round-trip time across all hops in milliseconds (rounded to an integer). |
| `hops_count` | Total number of hops in the traceroute. |
| `start_city` | City, state, and country of the colo, e.g., `Portland, OR, US`. |
| `approx_final_hop` | Geo-location of the final hop, e.g., `Mountain View, CA, US`. |
| `approx_nearest_gcp` | Closest GCP region to the final hop, e.g., `us-east1`. |
| `target_distance_km` | Approximate distance (in kilometers) between the colo and the target. Defaults to `unknown`. |
| `start_subcolo` | The sub-colo (if provided in the API response), e.g., `pdx02`. |
| `target_ip` | Target IP address, e.g., `34.49.121.93`. |
| `target_domain` | Domain name of the target, e.g., `google.com`. Defaults to `unknown` if unavailable. |
| `traceroute_time_ms` | Total time for the traceroute in milliseconds. |
| `traceroute_packet_count` | Total packets sent to the target. |
| `min_rtt_ms` | Minimum RTT observed across all hops. |
| `max_rtt_ms` | Maximum RTT observed across all hops. |
| `std_dev_rtt_ms` | Standard deviation of RTTs across all hops. |
| `colo_lat` | Latitude of the colo. Defaults to `unknown` if unavailable. |
| `colo_long` | Longitude of the colo. Defaults to `unknown` if unavailable. |
| `colo_country` | Country of the colo, e.g., `US`. Defaults to `unknown`. |
| `target_lat` | Latitude of the target. Defaults to `unknown` if unavailable. |
| `target_long` | Longitude of the target. Defaults to `unknown` if unavailable. |
| `target_country` | Country of the target, e.g., `US`. Defaults to `unknown`. |
### **Skipped Colos CSV**
The `skipped_colos.csv` tracks colos that were skipped during data collection. Below is the list of columns:
| **Column Name** | **Description** |
|-----------------------------|--------------------------------------------------------------------------------------------------|
| `start_region` | Region grouping for the colo, e.g., `na`, `emea`, etc. |
| `start_colo` | The colo's short name, e.g., `pdx`. |
| `trace_target` | Human-readable name for the target IP, e.g., `c1-gclb`. |
| `target_ip` | Target IP address, e.g., `34.49.121.93`. |
| `target_domain` | Domain name of the target, e.g., `google.com`. Defaults to `unknown` if unavailable. |
| `skipped_reason` | Reason the colo was skipped, e.g., `no_traceroute_response`. |
---
## **Geo-Location and GCP Region Lookup**
### Geo-Location
- Geo-location data (`colo_lat`, `colo_long`, `colo_country`, `target_lat`, `target_long`, `target_country`) is fetched using a GeoIP API.
- Defaults to `unknown` if the API call fails or data is missing.
### GCP Region
- When the `target_is_gcp=true` flag is enabled, the `approx_nearest_gcp` column identifies the closest GCP region (e.g., `us-east1`) based on the final hop's geo-location.
- This is implemented using an in-memory lookup table mapping GCP regions to approximate lat/long coordinates.
---
## **Hop Analysis**
### Congested Hops
- Hops with **>50% packet loss** are flagged as congested.
- Stored in the JSON result files.
### Slowest Hops
- Hops with RTT >1 standard deviation above the mean are flagged as slow.
- Stored in the JSON result files.
---
## **Usage**
```bash
ruby traceroute_collector.rb --traceroute-uri URI --output-dir DIR --cf-colo-file FILE --targets IP:NAME:DOMAIN --region REGION