https://github.com/messede-degod/sstable-migrator
Generate SStables From CSV. The Data Conversion Workhorse behind https://ip.thc.org
https://github.com/messede-degod/sstable-migrator
cassandra-database segfault subdomain-discovery subdomains-enumeration thc
Last synced: 5 months ago
JSON representation
Generate SStables From CSV. The Data Conversion Workhorse behind https://ip.thc.org
- Host: GitHub
- URL: https://github.com/messede-degod/sstable-migrator
- Owner: messede-degod
- License: gpl-3.0
- Created: 2023-11-14T16:25:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-12-15T16:50:13.000Z (6 months ago)
- Last Synced: 2025-12-18T22:57:50.824Z (6 months ago)
- Topics: cassandra-database, segfault, subdomain-discovery, subdomains-enumeration, thc
- Language: Java
- Homepage: https://ip.thc.org
- Size: 56.9 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### sstable-migrator
# Building
- install and use **java 8**, check with `java -version`
- compile - `mvn compile`
- run - `MAVEN_OPTS="-Xmx7114M" mvn exec:java -DargLine="-Xms6144m -Xmx7144m"` to convert `input/*` to sstables in `/output`
# Setup Cassandra
- Start Container - `sudo docker run -v ./output/:/ferret/dnsdata -d --name cassandra --hostname cassandra --network cassandra cassandra` (Allow upto a minute for bootup)
- Start a cqlsh shell - `sudo docker exec -it cassandra cqlsh`
- Create Keyspace
```
CREATE KEYSPACE ferret WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
```
- Create RDNS Table
```
CREATE TABLE ferret.rdnsv4 (
ip8 INET,
ip16 INET,
ip24 INET,
ipAddress INET,
p1 VARCHAR,
p2 VARCHAR,
p3 VARCHAR,
p4 VARCHAR,
p5 VARCHAR,
p6 VARCHAR,
p7 VARCHAR,
country VARCHAR,
city VARCHAR,
asn INT,
as_name VARCHAR,
source VARCHAR,
sourceRecordType VARCHAR,
firstSeen timestamp,
lastSeen timestamp,
updatedAt timestamp,
PRIMARY KEY (ip8, ip16, ip24, ipAddress, p1, p2, p3, p4, p5, p6, p7)
);
```
- Create SubDomains table -
```
CREATE TABLE ferret.subdomains (
p1 VARCHAR,
p2 VARCHAR,
p3 VARCHAR,
p4 VARCHAR,
p5 VARCHAR,
p6 VARCHAR,
p7 VARCHAR,
source VARCHAR,
sourceRecordType VARCHAR,
firstSeen timestamp,
lastSeen timestamp,
updatedAt timestamp,
PRIMARY KEY ((p1, p2, p3), p4, p5, p6, p7)
);
```
- Create CNAME table -
```
CREATE TABLE ferret.cnames (
target VARCHAR,
apexDomain VARCHAR,
domain VARCHAR,
source VARCHAR,
firstSeen timestamp,
lastSeen timestamp,
updatedAt timestamp,
PRIMARY KEY (target, apexDomain, domain)
);
```
- Move Data - `sudo docker container exec -it cassandra sstableloader -d 172.18.0.2 /ferret/dnsdata/`
# Possible Improvements
- use java FileChannel to read files (possible performance improvements) (no improvements observed)
- use fastjson parser
- use multithreaded writes to CQLSSTableWriter (https://issues.apache.org/jira/browse/CASSANDRA-7463) (bad idea, write performance is far better when keys are in order, writes with out of order keys take up a lot of cpu, but yield no improvement in conversion time)
# TLD Source
- https://data.iana.org/TLD/tlds-alpha-by-domain.txt