{"id":24835755,"url":"https://github.com/tobilg/analyze-twitter-export","last_synced_at":"2025-07-27T09:36:54.946Z","repository":{"id":262917034,"uuid":"888388210","full_name":"tobilg/analyze-twitter-export","owner":"tobilg","description":"Analyze exported Twitter data","archived":false,"fork":false,"pushed_at":"2024-11-15T01:25:35.000Z","size":456,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-15T02:24:19.593Z","etag":null,"topics":["dataanalytics","duckdb","twitter"],"latest_commit_sha":null,"homepage":"https://sql-workbench.com","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tobilg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-14T10:08:31.000Z","updated_at":"2024-11-15T01:52:55.000Z","dependencies_parsed_at":"2024-11-15T05:15:33.731Z","dependency_job_id":null,"html_url":"https://github.com/tobilg/analyze-twitter-export","commit_stats":null,"previous_names":["tobilg/analyze-twitter-export"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobilg%2Fanalyze-twitter-export","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobilg%2Fanalyze-twitter-export/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobilg%2Fanalyze-twitter-export/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobilg%2Fanalyze-twitter-export/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tobilg","download_url":"https://codeload.github.com/tobilg/analyze-twitter-export/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236464833,"owners_count":19152979,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataanalytics","duckdb","twitter"],"created_at":"2025-01-31T04:51:37.894Z","updated_at":"2025-01-31T04:51:38.374Z","avatar_url":"https://github.com/tobilg.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# analyze-twitter-export\nAnalyze your Twitter export data with the help of [DuckDB](https://duckdb.org/).\n\n## Usage\nThe following steps are required to analyze your Twitter export data.\n\n1. Install DuckDB.  \n    This can be done with running the `scripts/install_duckdb.sh` script (it assumes you're on a Linux machine). Otherwise you could do a `brew install duckdb` on MacOS, or follow the [instructions](https://duckdb.org/docs/installation) for your platform from the DuckDB website.\n2. Copy the downloaded Twitter export data to the `src-data` directory.  \n    This should be the zip file you downloaded from Twitter.\n3. Prepare the Twitter export data for import into DuckDB.  \n    The data needs to be converted into a format that can be imported into DuckDB. This can be done with running the `scripts/prepare_tweets.sh` script.\n4. Create a DuckDB database from your Twitter export data.  \n    This can be done with running the `scripts/create_database.sh` script. The result will be a file called `twitter.duckdb` in the `data` directory.\n5. Analyze the data.  \n    This can be done with running `duckdb data/twitter.duckdb` in the project root directory, and then executing the SQL queries inside the started DuckDB CLI.\n \n## Entity Relationship Diagram\nThe following diagram shows the structure of the resulting database.\n\n![Twitter Export Database ERD](docs/erd.png)\n\n## SQL Workbench\nYou can use [SQL Workbench](https://sql-workbench.com) to analyze the data locally, in the browser. Just drag \u0026 drop the `data/twitter.duckdb` file in SQL Workbench's file dropping area.\n\nYou need to make sure though that you **add the database name as prefix to the table names** in your queries (e.g. `SELECT * FROM twitter.tweet LIMIT 10;`).\n\n![SQL Workbench](docs/screenshot.png)\n\n## Example Queries\nThe following example queries can be used to analyze the data.\n\n### Show all tweets and replies\n```sql\nSELECT \n    * \nFROM \n    tweet\nORDER BY created_at DESC;\n```\n\n### Show all tweets with expanded content (w/o replies)\n```sql\nSELECT \n    tweet_id, created_at, content_expanded, favorite_count, retweet_count, language\nFROM \n    tweet\nWHERE\n    is_reply = false\nORDER BY created_at DESC;\n```\n\n### Show most liked tweets\n```sql\nSELECT \n    tweet_id, created_at, content_expanded, favorite_count, retweet_count\nFROM \n    tweet\nORDER BY favorite_count DESC;\n```\n\n### Number of tweets per day\n```sql\nSELECT \n    strftime(created_at, '%Y-%m-%d') as day, COUNT(*) as count\nFROM \n    tweet\nGROUP BY day\nORDER BY day;\n```\n\n### Most used hashtags\n```sql\nSELECT \n    h.hashtag, COUNT(distinct rh.tweet_id) as count\nFROM \n    hashtag h\nINNER JOIN\n    rel_tweet_hashtag rh ON h.hashtag_id = rh.hashtag_id\nGROUP BY h.hashtag\nORDER BY count DESC;\n```\n\n### Most mentioned users\n```sql\nSELECT \n    u.screen_name, COUNT(distinct ru.tweet_id) as count\nFROM \n    user u\nINNER JOIN\n    rel_tweet_mentioned_user ru ON u.user_id = ru.user_id\nGROUP BY u.screen_name\nORDER BY count DESC;\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobilg%2Fanalyze-twitter-export","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftobilg%2Fanalyze-twitter-export","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobilg%2Fanalyze-twitter-export/lists"}