{"id":27862563,"url":"https://github.com/mrboombastic/dsacord","last_synced_at":"2026-04-15T07:31:47.858Z","repository":{"id":290538724,"uuid":"974492539","full_name":"MrBoombastic/DSAcord","owner":"MrBoombastic","description":"Simple utility to download Discord data from DSA Transparency Database to Postgres database","archived":false,"fork":false,"pushed_at":"2025-04-29T10:03:58.000Z","size":2556,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-29T10:48:29.919Z","etag":null,"topics":["discord","downloader","dsa","osint","tool","transparency"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MrBoombastic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-28T21:34:32.000Z","updated_at":"2025-04-29T10:04:02.000Z","dependencies_parsed_at":"2025-05-04T20:31:55.897Z","dependency_job_id":null,"html_url":"https://github.com/MrBoombastic/DSAcord","commit_stats":null,"previous_names":["mrboombastic/dsacord"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrBoombastic%2FDSAcord","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrBoombastic%2FDSAcord/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrBoombastic%2FDSAcord/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrBoombastic%2FDSAcord/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MrBoombastic","download_url":"https://codeload.github.com/MrBoombastic/DSAcord/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252395604,"owners_count":21741065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["discord","downloader","dsa","osint","tool","transparency"],"created_at":"2025-05-04T20:30:43.392Z","updated_at":"2025-10-25T17:21:53.739Z","avatar_url":"https://github.com/MrBoombastic.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DSAcord\n\nA simple utility for downloading Discord data from\nthe [DSA Transparency Database](https://transparency.dsa.ec.europa.eu/explore-data/download?from_date=\u0026amp;to_date=\u0026amp;uuid=caca0689-3c4f-4a72-8a10-ddc719d22256)\nand storing it locally in your Postgres.\nWritten in Go, of course.\n\n![hero.png](docs/hero.png)\n*Ugly image by ChatGPT. Thanks to [MinerPL](https://github.com/MinerPL) for inspiring me to create this tool. 😻*\n\n## Functionality\n\nThis project is designed to download transparency data from the Digital Services Act (DSA) Transparency Database and\nstore it locally in a PostgreSQL database.\nThe tool automates the downloading of ZIP archives, extracts detailed records,\nand inserts them in bulk.\nYou can specify the date range of the required data, and the tool will handle parallel\ndownloads, processing, and data insertion, while keeping track of execution time and table size.\n\n✅ Download daily data dumps based on user-specified date ranges.\n✅ Extracting nested ZIP files in parallel using goroutines and a WaitGroup.\n✅ Showing a conditional progress bar only if there is a single worker.\n✅ Bulk insertion into PostgreSQL with transaction handling to ensure atomicity.\n✅ Displaying the total number of rows inserted, the time taken, and the size of the database table upon completion.\n\n\u003e [!NOTE]  \n\u003e There is no data available to download before 2024-08-21.\n\u003e Also, fresh data may be delayed.\n\u003e Watch out!\n\n## Usage Examples\n\n\u003e [!WARNING]  \n\u003e Be careful with the number of workers.\n\u003e The memory usage can be very high.\n\n\u003e [!NOTE]  \n\u003e The database must already exist before importing.\n\u003e The table will be created automatically.\n\n### Help\n\n```bash\n./dsacord --help\n```\n\n### Single worker (for slower CPUs/lower memory machines):\n\n```bash\n./dsacord --dbhost=localhost --dbuser=postgres --dbpassword=secret --from=2024-12-28 --to=2025-03-24 --workers=1\n```\n\n### Multiple workers (much faster):\n\n```bash\n./dsacord --dbhost=localhost --dbuser=postgres --dbpassword=secret --from=2024-12-28 --to=2025-03-24 --workers=5\n```\n\n\u003e [!NOTE]  \n\u003e There are two recently added flags: `overwriteDuplicates` and `skipCheckingDuplicates`.\n\u003e There are actually duplicated entries in the source files,\n\u003e so the first flag is recommended to use if you don't care about single entries being overwritten.\n\u003e The latter one is experimental and may increase or decrease insert time in various scenarios - test it yourself.\n\n## Database notes\n\nThe data is stored in a table called `decisions` with a schema that matches the one in the CSV files.\nHowever, for clarity, PlatformUID is split into SnowflakeTime, EntityID and EntityType.\nThe table is created automatically if it does not exist, but the selected database IS NOT.\nThe table will follow the rules of [automigration by Gorm](https://gorm.io/docs/migration.html) along with all the\nnuances.\n\n## Test\n\n```bash\n./dsacord --dbuser postgres --dbpassword root --from=2024-12-28 --to=2025-08-08 --workers=5 --overwriteDuplicates --skipCheckingDuplicates\nℹ️  DSAcord v0.2.0\n✅  Connected to the database\n📆  Importing from 2024-12-28 to 2025-08-08\n⚠️  Your --to date is in the future or in today. This may result in excess 404 errors.\n💾  Inserting decisions in parallel. Progress bar will not be shown.\n💀  Watch out: duplicated keys will be silently overwritten!\n2025/08/07 22:43:51 Start!\n\n(cut...)\n\n2025/08/07 22:49:54 🌍  Downloading https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-08-full.zip\n2025/08/07 22:49:54 Error: download failed for https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-07-full.zip: forbidden or does not exist\n2025/08/07 22:49:54 Error: download failed for https://dsa-sor-data-dumps.s3.eu-central-1.amazonaws.com/sor-discord-netherlands-bv-2025-08-08-full.zip: forbidden or does not exist\n\n✅  Rows inserted: 14405318\n⏱  Elapsed time: 6m19.644562s\n📁  Table size: 15 GB\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrboombastic%2Fdsacord","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrboombastic%2Fdsacord","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrboombastic%2Fdsacord/lists"}