https://github.com/mrboombastic/dsacord
Simple utility to download Discord data from DSA Transparency Database to Postgres database
https://github.com/mrboombastic/dsacord
discord downloader dsa osint tool transparency
Last synced: 2 months ago
JSON representation
Simple utility to download Discord data from DSA Transparency Database to Postgres database
- Host: GitHub
- URL: https://github.com/mrboombastic/dsacord
- Owner: MrBoombastic
- License: mit
- Created: 2025-04-28T21:34:32.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-04-29T10:03:58.000Z (about 1 year ago)
- Last Synced: 2025-04-29T10:48:29.919Z (about 1 year ago)
- Topics: discord, downloader, dsa, osint, tool, transparency
- Language: Go
- Homepage:
- Size: 2.44 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# DSAcord
A simple utility for downloading Discord data from
the [DSA Transparency Database](https://transparency.dsa.ec.europa.eu/explore-data/download?from_date=&to_date=&uuid=caca0689-3c4f-4a72-8a10-ddc719d22256)
and storing it locally in your Postgres.
Written in Go, of course.

*Ugly image by ChatGPT. Thanks to [MinerPL](https://github.com/MinerPL) for inspiring me to create this tool. đģ*
## Functionality
This project is designed to download transparency data from the Digital Services Act (DSA) Transparency Database and
store it locally in a PostgreSQL database.
The tool automates the downloading of ZIP archives, extracts detailed records,
and inserts them in bulk.
You can specify the date range of the required data, and the tool will handle parallel
downloads, processing, and data insertion, while keeping track of execution time and table size.
â
Download daily data dumps based on user-specified date ranges.
â
Extracting nested ZIP files in parallel using goroutines and a WaitGroup.
â
Showing a conditional progress bar only if there is a single worker.
â
Bulk insertion into PostgreSQL with transaction handling to ensure atomicity.
â
Displaying the total number of rows inserted, the time taken, and the size of the database table upon completion.
> [!NOTE]
> There is no data available to download before 2024-08-21.
> Also, fresh data may be delayed.
> Watch out!
## Usage Examples
> [!WARNING]
> Be careful with the number of workers.
> The memory usage can be very high.
> [!NOTE]
> The database must already exist before importing.
> The table will be created automatically.
### Help
```bash
./dsacord --help
```
### Single worker (for slower CPUs/lower memory machines):
```bash
./dsacord --dbhost=localhost --dbuser=postgres --dbpassword=secret --from=2024-12-28 --to=2025-03-24 --workers=1
```
### Multiple workers (much faster):
```bash
./dsacord --dbhost=localhost --dbuser=postgres --dbpassword=secret --from=2024-12-28 --to=2025-03-24 --workers=5
```
> [!NOTE]
> There are two recently added flags: `overwriteDuplicates` and `skipCheckingDuplicates`.
> There are actually duplicated entries in the source files,
> so the first flag is recommended to use if you don't care about single entries being overwritten.
> The latter one is experimental and may increase or decrease insert time in various scenarios - test it yourself.
## Database notes
The data is stored in a table called `decisions` with a schema that matches the one in the CSV files.
However, for clarity, PlatformUID is split into SnowflakeTime, EntityID and EntityType.
The table is created automatically if it does not exist, but the selected database IS NOT.
The table will follow the rules of [automigration by Gorm](https://gorm.io/docs/migration.html) along with all the
nuances.
## Test
```bash
./dsacord --dbuser postgres --dbpassword root --from=2024-12-28 --to=2025-08-08 --workers=5 --overwriteDuplicates --skipCheckingDuplicates
âšī¸ DSAcord v0.2.0
â
Connected to the database
đ Importing from 2024-12-28 to 2025-08-08
â ī¸ Your --to date is in the future or in today. This may result in excess 404 errors.
đž Inserting decisions in parallel. Progress bar will not be shown.
đ Watch out: duplicated keys will be silently overwritten!
2025/08/07 22:43:51 Start!
(cut...)
2025/08/07 22:49:54 đ Downloading https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-08-full.zip
2025/08/07 22:49:54 Error: download failed for https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-07-full.zip: forbidden or does not exist
2025/08/07 22:49:54 Error: download failed for https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-08-full.zip: forbidden or does not exist
â
Rows inserted: 14405318
âą Elapsed time: 6m19.644562s
đ Table size: 15 GB
```