{"id":16025064,"url":"https://github.com/justintimperio/gdelt-diff","last_synced_at":"2026-06-01T08:32:28.595Z","repository":{"id":116968651,"uuid":"193986511","full_name":"JustinTimperio/gdelt-diff","owner":"JustinTimperio","description":"An Automated File Manager for Maintaining a Local Copy of GDELT Source Files","archived":false,"fork":false,"pushed_at":"2020-10-22T18:22:05.000Z","size":76,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-31T19:54:29.675Z","etag":null,"topics":["gdelt","gdelt-data","gdelt-events","gdelt-files","gdelt-knowledge-graph"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JustinTimperio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-26T22:38:01.000Z","updated_at":"2024-02-23T22:27:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"89e233c3-ee00-491f-bb14-7c275b91160e","html_url":"https://github.com/JustinTimperio/gdelt-diff","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/JustinTimperio/gdelt-diff","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinTimperio%2Fgdelt-diff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinTimperio%2Fgdelt-diff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinTimperio%2Fgdelt-diff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinTimperio%2Fgdelt-diff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JustinTimperio","download_url":"https://codeload.github.com/JustinTimperio/gdelt-diff/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinTimperio%2Fgdelt-diff/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33767435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gdelt","gdelt-data","gdelt-events","gdelt-files","gdelt-knowledge-graph"],"created_at":"2024-10-08T19:41:14.007Z","updated_at":"2026-06-01T08:32:28.579Z","avatar_url":"https://github.com/JustinTimperio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GDELT-Diff\n![Codacy grade](https://img.shields.io/codacy/grade/1596ab013d1f4ac99d5cfb86db94d7f2?style=for-the-badge)\n![GitHub](https://img.shields.io/github/license/justintimperio/gdelt-diff?style=for-the-badge)\n## Abstract\nThis small tool is designed to automate the download, orginization, and storage of [GDELT source files](https://www.gdeltproject.org/data.html#rawdatafiles). GDELT-Diff includes a daemon that runs every 60 mins fetching any new or missing files and sorts them into folders for easy storage. Additionally, an extremely lightweight tool is provided to maintain a copy of only the streams most recent files in /tmp/gdelt-live. This is for anyone doing real-time analysis of the GDELT and doesn't require a full copy of the source files.\n\n## What is the GDELT?\nThe GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of \"big data\" study of global human society. Advanced users and those with unique use cases can download the entire underlying event and graph datasets in CSV format. Deep technical knowledge and extensive experience working with large datasets is required to make use of these datasets, with the GKG alone requiring more than 2.5TB of storage compressed.\n\nTo learn more about the GDELT and the records that make up its database, check out the [offical documentaion page](https://www.gdeltproject.org/data.html#documentation).\n  \n## Install Instructions  \n_NOTE: This utlity is designed for large servers with a MINIMUM +100GB OS Drive, +10TB of storage, and +32GB of RAM. Also please consider how many files you need to sync before running._  \n  \n1. If you have a pre-existing directory of GDELT files, **YOU MUST** ensure that files are organized into folders by stream, year and month(`/path/stream/2015/05/`) \n2. Install GDELT-Diff:\n```\ncurl https://raw.githubusercontent.com/JustinTimperio/gdelt-diff/master/build/install.sh | bash\n```\n3. Edit Your User Config File With The Paths You Wish to Use:\n```\nsudo vi /etc/gdelt-diff/config\n```\n4. Manually Run GDELT-Diff to Ensure Everything is Setup:\n```\nsudo gdelt-diff -d\n```\n5. Enable Automatic Downloads With:\n```\nsudo systemctl enable gdelt-diff.timer\n```\n6. Enable Automatic Live Downloads With:\n```\nsudo systemctl enable gdelt-live.timer\n```\n\n## Uninstall GDELT-Diff:\n**This will NOT remove the files you have downloaded**\n```\nsudo /opt/gdelt-diff/build/remove.sh\n```\n\n## CLI-Tool\nWhen using the utlity manually simply stop the systemd.timers and call gdelt-diff manually:\n```\nsudo gdelt-diff --diff\n```\n\nTo sync only one stream use:\n```\nsudo gdelt-diff --diff_english\n```\nOR\n```\nsudo gdelt-diff --diff_translation\n```\n\nTo force a fetch of  404'd URLs use:\n```\nsudo gdelt-diff --retry\n```\n\nTo refresh the database of synced files:\n```\nsudo gdelt-diff --refresh_database\n```\n\nTo see all options and flags:\n```\nsudo gdelt-diff -help\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustintimperio%2Fgdelt-diff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustintimperio%2Fgdelt-diff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustintimperio%2Fgdelt-diff/lists"}