{"id":16275152,"url":"https://github.com/vijinho/tweets-cli","last_synced_at":"2025-10-29T15:31:21.371Z","repository":{"id":71221196,"uuid":"151031847","full_name":"vijinho/tweets-cli","owner":"vijinho","description":"Stand-alone PHP CLI script to batch/post-process downloaded full twitter backup-archive files","archived":false,"fork":false,"pushed_at":"2024-04-17T00:22:33.000Z","size":172,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2024-12-22T06:20:31.018Z","etag":null,"topics":["php-cli-script","php-tweet","php-twitter","tweet-analysis","tweets","twitter","twitter-cli","twitter-cli-client","twitter-client","twitter-data"],"latest_commit_sha":null,"homepage":"https://github.com/vijinho/tweets-gb","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vijinho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-10-01T03:24:03.000Z","updated_at":"2024-04-17T00:22:37.000Z","dependencies_parsed_at":"2023-05-23T18:45:39.423Z","dependency_job_id":null,"html_url":"https://github.com/vijinho/tweets-cli","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vijinho%2Ftweets-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vijinho%2Ftweets-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vijinho%2Ftweets-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vijinho%2Ftweets-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vijinho","download_url":"https://codeload.github.com/vijinho/tweets-cli/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238845334,"owners_count":19540326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["php-cli-script","php-tweet","php-twitter","tweet-analysis","tweets","twitter","twitter-cli","twitter-cli-client","twitter-client","twitter-data"],"created_at":"2024-10-10T18:32:14.319Z","updated_at":"2025-10-29T15:31:16.040Z","avatar_url":"https://github.com/vijinho.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tweets.php\n\nA command-line (CLI) script to batch-process and work with the files unzipped from the twitter backup archive zip file.  \n\n## Features\n\n- It can be used to generate 'grailbird' javascript files which are compatible with the default twitter archive viewer application, of which I have written an experimental updated version called [tweets-gb](https://github.com/vijinho/tweets-gb) where the generated files can be dropped-in and optionally linked via a **file:///** URL to the physical file on disk when browsed off-line, locally and in a web browser.\n- Exported grailbird data can be viewed in [@vijinho/tweets-gb](https://github.com/vijinho/tweets-gb)\n- It can also import grailbird files, join them, and optionally merge into existing tweets.js data.\n- It can unshorten all short-links and resolve all links fully (saving the results to *urls.json* for re-use on successive runs (it's a time-consuming process to check-links which can take hours!).  This can be used with the **--offline** option to speed-up subsequent processing further. Media entity attributes will be updated to reflect changes.\n- Option **local** will check subfolders for content and add file and path information into the tweet under new attributes (videos, images, files).  Also these local files will be swapped-in for the remote-ones for viewing off-line and loading faster.\n- The **--delete** option will delete lower-bitrate video files and keep the highest bitrate file if used with the local-files option **local*- Can filter tweets on date/time (from and to specific dates) using [PHP strtotime](https://secure.php.net/manual/en/function.strtotime.php) for flexible date/time format\n- An option exists to also delete duplicate local tweet files.  These are files named 9999999999-XXXXXXXX.(jpg|png|mp4|...) and the resultant file will chop off the numeric tweet_id at the start of the filename (and dash) and just rename one of the duplicate files to XXXXXXXX.(jpg|png|mp4|...), deleting the rest.  The script will update the file and entity links to reflect the new filename.\n- Option to filter results on a list of given attributes/keys **--keys-filter** and also to drop keys altogether from tweets with **--keys-remove**.\n- Can filter tweets based on executing a [PHP regular-expression](https://secure.php.net/manual/en/function.preg-match.php) (and optionally save the regular expression results in the tweet as a new attribute **regexps**)\n- Creates a new tweet attribute: **created_at_unixtime** which is the unixtime of the tweet.\n- Creates a new tweet attribute: **text** which is the cleaned-up tweet-text after processing, also named to be compatible with default twitter export.\n*.\n- Option to specify previous output of batch processing as input with **--tweets-file**\n- All processed tweets are saved to **output.json** by default but this can be changed with **--filename**\n- Output can also be optionally changed to .txt or serialized php.\n- Option to discard tweets which are mentions or retweets (**--no-retweets** and **--no-mentions**)\n- Can just return a json file of either of the following: js/json files, images, videos or all files in the twitter backup folder.\n- Save basic info of all users mentioned or RT'd to *users.json* with **--list-users**\n- Adds new tweet attribute 'rt' if RT containing RT'd username\n\n## Usage - CLI Options\n\nThis is intentionally written as a stand-alone self-contained command-line php script, hacked-together, written in a procedural style.  These are the command-line options available:\n\n```\nUsage: php tweets.php\n\n-h,  --help                   Display this help and exit\n-v,  --verbose                Run in verbose mode\n-d,  --debug                  Run in debug mode (implies also -v, --verbose)\n-t,  --test                   Run in test mode, show what would be done, NO filesystem changes.\n     --dir={.}                Directory of unzipped twitter backup files (current dir if not specified)\n     --dir-output={.}         Directory to output files in (default to -dir above)\n     --format={json}          Output format for script data: txt|php|json (default)\n-f,  --filename={output.}     Filename for output data from operation, default is 'output.{--OUTPUT_FORMAT}'\n     --grailbird-import={dir} Import in data from the grailbird json files of the standard twitter export. If specified with '-a' will merge into existing tweets before outputting new file.\n-g,  -g={dir}        Generate json output files compatible with the standard twitter export feature to dir\n     --grailbird-media        Copy local media files to grailbird folder, using same file path\n     --media-prefix           Prefix to local media folder instead of direct file:// path, e.g. '/' if media folders are to be replicated under webroot for serving via web and prefixing a URL path, implies --local\n     --list                   Only list all files in export folder and halt - filename\n     --list-js                Only List all javascript files in export folder and halt\n     --list-images            Only list all image files in export folder and halt\n     --list-videos            Only list all video files in export folder and halt\n     --list-users             Only list all users in tweets, (default filename 'users.json') and halt\n     --list-missing-media     List media URLs for which no local file exists and halt (implies --local)\n     --organize-media         Organize local downloaded media, for example split folder into date/month subfolders\n     --download-missing-media Download missing media (from --list-missing-media) and halt, e.g.. missing media files (implies --local)\n     --list-profile-images    Only list users profile images, (in filename 'users.json') and halt\n     --download-profile-images  WARNING: This can be a lot of users! Download profile images.\n     --tweets-count           Only show the total number of tweets and halt\n-i,  --tweets-file={tweet.js} Load tweets from different json input file instead of default twitter 'tweet.js' or 'tweet.json' (priority if exists)\n-a,  --tweets-all             Get all tweets (further operations below will depend on this)\n     --date-from              Filter tweets from date/time, see: https://secure.php.net/manual/en/function.strtotime.php\n     --date-to                Filter tweets up-to date/time, see: https://secure.php.net/manual/en/function.strtotime.php\n     --no-retweets            Drop re-tweets (RT's)\n     --no-mentions            Drop tweets starting with mentions\n      --minimal               Minimal output for each tweet, no superfluous data like tweet IDs.\n     --media-only             Only media tweets\n     --urls-expand            Expand URLs where shortened and data available (offline) in tweet (new attribute: text)\n-u,  --urls-resolve           Unshorten and dereference URLs in tweet (in new attribute: text) - implies --urls-expand\n     --urls-check             Check every single target url (except for twitter.com and youtube.com) and update - implies --urls-resolve\n     --urls-check-source      Check failed source urls - implies --urls-resolve\n     --urls-check-force       Forcibly checks every single failed (numeric) source and target url and update - implies --urls-check\n-o,  --offline                Do not go-online when performing tasks (only use local files for url resolution for example)\n-l,  --local                  Fetch local file information (if available) (new attributes: images,videos,files)\n-x,  --delete                 DANGER! At own risk. Delete files where savings can occur (i.e. low-res videos of same video), run with -t to test only and show files\n     --dupes                  List (or delete) duplicate files. Requires '-x/--delete' option to delete (will rename duplicated file from '{tweet_id}-{id}.{ext}' to '{id}.{ext}). Preview with '--test'!\n     --keys-required=k1,k2,.  Returned tweets which MUST have all of the specified keys\n-r,  --keys-remove=k1,k2,.    List of keys to remove from tweets, comma-separated (e.g. 'sizes,lang,source,id_str')\n-k,  --keys-filter=k1,k2,.    List of keys to only show in output - comma, separated (e.g. id,created_at,text)\n     --regexp='/\u003cpattern\u003e/i'  Filter tweet text on regular expression, i.e /(google)/i see https://secure.php.net/manual/en/function.preg-match.php\n     --regexp-save=name       Save --regexp results in the tweet under the key 'regexps' using the key/id name given\n     --thread=id              Returned tweets for the thread with id\n```\n\n## Usage Examples\n\n```\nReport duplicate tweet media files and output to 'dupes.json':\n        tweets.php -fdupes.json --dupes\n\nDelete duplicate tweet media files (will rename them from '{tweet_id}-{id}.{ext}' to '{id}.{ext})':\n        tweets.php --delete --dupes\n\nShow total tweets in tweets file:\n        tweets.php --tweets-count --format=txt\n\nWrite all users mentioned in tweets to default file 'users.json':\n        tweets.php --list-users\n\nShow javascript files in backup folder:\n        tweets.php -v --list-js\n\nResolve all URLs in 'tweet.js' file, writing output to 'tweet.json':\n        tweets.php -v -u --filename=tweet.json\n\nResolve all URLs in 'tweet.js' file, writing output to grailbird files in 'grailbird' folder and also 'tweet.json':\n        tweets.php -u --filename=tweet.json -g=export/grailbird\n\nGet tweets from 1 Jan 2017 to 'last friday', only id, created and text keys:\n        tweets.php -d -v -o -u --keys-filter=id,created_at,text,files --date-from '2017-01-01' --date-to='last friday'\n\nList URLs for which there are missing local media files:\n        tweets.php -v --list-missing-media\n\nDownload files from URLs for which there are missing local media files:\n        tweets.php -v --download-missing-media\n\nOrganize 'tweet_media' folder into year/month subfolders:\n        tweets.php -v --organize-media\n\nPrefix the local media with to a URL path 'assets':\n        tweets.php -v --media-prefix='/assets'\n\nGenerate grailbird files with expanded/resolved URLs:\n        tweets.php -v -u -g=export/grailbird\n\nGenerate grailbird files with expanded/resolved URLs using offline saved url data - no fresh checking:\n        tweets.php -v -o -u -g=export/grailbird\n\nGenerate grailbird files with expanded/resolved URLs using offline saved url data and using local file references where possible:\n        tweets.php -v -o -u -l -g=export/grailbird\n\nGenerate grailbird files with expanded/resolved URLs using offline saved url data and using local file references, dropping retweets:\n        tweets.php -v -o -u -l -g=export/grailbird --no-retweets\n\nFilter tweet text on word 'hegemony' since last year, exporting grailbird:\n        tweets.php -v -o -u -l -g=export/grailbird --regexp='/(hegemony)/i' --regexp-save=hegemony\n\nExtract the first couple of words of the tweet and name the saved regexp 'words':\n        tweets.php -v -o -u -l -x -g=export/grailbird --regexp='/^(?P\u003cfirst\u003e[a-zA-Z]+)\\s+(?P\u003csecond\u003e[a-zA-Z]+)/i' --regexp-save=words\n\nImport grailbird tweets and export tweets with local media files to web folder:\n        tweets.php -v -g=www/vijinho/ --media-prefix='/vijinho/' --grailbird-media --grailbird-import=vijinho/import/data/js/tweets\n\nImport twitter grailbird files,check URL and export new grailbird files:\n        tweets.php -v -g=www/vijinho/ --grailbird-import=import/data/js/tweets --urls-check\n\nImport and merge grailbird files from 'import/data/js/tweets', fully-resolving links and local files:\n        tweets.php -v -o -l -u --grailbird-import=import/data/js/tweets -g=export/grailbird\n\nExport only tweets which have the 'withheld_in_countries' key to export/grailbird folder:\n        tweets.php -v -u -o --keys-required='withheld_in_countries' -g=export/grailbird\n\nExport only tweets containing text 'youtu':\n        tweets.php -v --regexp='/youtu/' -g=www/vijinho/ --media-prefix='/vijinho/' --grailbird-media\n\nExport only no mentions, no RTs':\n        tweets.php -v -g=www/vijinho/ --media-prefix='/vijinho/' --grailbird-media --no-retweets --no-mentions\n\nExport only media tweets only':\n        tweets.php -v -g=www/vijinho/ --media-prefix='/vijinho/' --grailbird-media --media-only\n\nExport the tweet thread 967915766195609600 as grailbird export files, to tweets to thread.json and folder called thread:\n        tweets.php -v --thread=967915766195609600 --filename=www/thread/data/js/thread.json -g=www/thread/ --media-prefix='/thread/' --grailbird-media\n\nExport the tweet thread 967915766195609600 as a js file test/test.json, and copy media files too:\n        tweets.php -v --dir=vijinho --thread=1108500373298442240 --filename=test/test.json --copy-media=test\n\nExport the tweet thread 967915766195609600 as markdown, and copy media files too:\n        tweets.php -d -v --dir=vijinho --thread=967915766195609600 --filename=thread/vijinho_967915766195609600_md/item.md --media-prefix=/vijinho_967915766195609600_md/ --copy-media=thread/vijinho_967915766195609600_md --format=md        \n\nResolve URLs from tweets.js/tweets.json file and create a complete grailbird-data export, creating a new tweets.json file after to\n        tweets.php -v -d  --date-from '2019-05-01' --urls-expand --urls-resolve --grailbird-media --media-prefix='/' --grailbird=grailbird --filename=\"tweet.json\"\n\nGenerate markdown output file of all tweets except RTs and mentions for threads which have at least 10 tweets\n        tweets.php -v -d --no-retweets --no-mentions --format=md --filename=output.md --threads-tweets=10\n```\n\n## Note\n\n- *I have only tested it on MacOS* but it should work under Linux.\n- This script is memory-hungry, I had to increase my limit to 512MB to handle 10 years and over 30,000 tweets.\n\n## Re-constructing the folder structure from a standard (old) twitter year/month file export\n\nSupposing `tweets.php` is in the folder 'cli' and you are running for a user 'euromoan'.\n\n### Make the following folders:\n\n```\neuromoan/www/euromoan - this is the top-level folder of the un-zipped file (containing the twitter index.html file)\neuromoan/profile_media\neuromoan/tweet_media\neuromoan/tweet_files\n```\n\n### Create the following files\n\nIn the euromoan folder, copying the data from the account `data/js/user_details.js` and from browsing the twitter page for the user:\n\n`account.js`:\n\n```\nwindow.YTD.account.part0 = [{\n        \"account\": {\n            \"email\": \"euromoan@example.com\",\n            \"createdVia\": \"web\",\n            \"username\": \"euromoan\",\n            \"accountId\": \"816715694133964800\",\n            \"createdAt\": \"2007-01-01T00:00:00.000Z\",\n            \"accountDisplayName\": \"Mario Drago\",\n            \"timeZone\": \"Basel, Switzerland\"\n        }\n    }]\n```\n\n`profile.js`:\n\n```\nwindow.YTD.profile.part0 = [{\n        \"profile\": {\n            \"description\": {\n                \"bio\": \"Evil banker. #TBTJ untouchable Communist Head of ECB. I do whatever it takes to keep EU masses enslaved, enriching my cronies of the BIS, FSB, G30 etc PARODY!.\",\n                \"website\": \"\",\n                \"location\": \"Basel, Switzerland\"\n            },\n            \"avatarMediaUrl\": \"https://pbs.twimg.com/profile_images/986255258073657350/g8fvWiDX.jpg\",\n            \"headerMediaUrl\": \"https://pbs.twimg.com/profile_banners/816715694133964800/1523976777\"\n        }\n    }]\n```\n\nSave the URL images to files in `profile_media`\n\n### Combine files to make the `tweet.js` file\n\n#### Making a default `tweet.js` file\n\nThis will create the `tweet.js` similar to a full twitter backup download zip contains.\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan --grailbird-import=euromoan/www/euromoan/data/js/tweets --filename=tweet.js --debug`\n\nThis will also make `users.json` and `urls.json` files containing the use and url information contained therein.\n\n#### Resolve URLs when creating\n\nAfter the previous step, you can make a `tweet.json` (note extension change - by default `tweet.js` cli creates .json files) file with the un-shortened/resolved URLs:\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -a -itweet.js --filename=tweet.json -u --urls-check-source --debug`\n\nOr run the whole create step again with URL resolving:\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan --grailbird-import=euromoan/www/euromoan/data/js/tweets --filename=tweet.js -u --debug`\n\n#### Generate grailbird export data file using data from previous step\n\nThis will create the YYYY-MM.js files with the resolved URLs in a folder structure as with the original twitter download in `export/grailbird`.\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan --filename=tweet.json -itweet.js --filename=tweet.json -u -o -g=euromoan/www/euromoan --debug`\n\n#### Missing local `tweet_media` files\n\nThis will list the local `tweet_media` files that are missing and where they would be downloaded:\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -itweet.js --filename=missing.json -a -u -l --list-missing-media --debug`\n\nTo download:\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -itweet.js --filename=missing.json -a -u -l --download-missing-media debug`\n\nTo organize the `tweet_media` files into subfolders:\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -itweet.js --filename=missing.json -a -u -l --organize-media --debug`\n\n#### Generate locally viewable offline tweets linked to downloaded files\n\nFiles will be exported to `euromoan/export/grailbird` in the correct folder structure to overwrite/replace the original download or use as data files for [@vijinho/tweets-gb](https://github.com/vijinho/tweets-gb)\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -a -u -o -l -g=euromoan/www/euromoan --debug`\n\n#### Fully check all URLs\n\nThis will check/update the source and destination URLs (if they have been redirected/changed) unless they are twitter.com or www.youtube.com hosts.\n\n    `php cli/tweets.php --dir=euromoan --dir-output=euromoan -a -u --urls-check-force --debug`\n\n#### Exporting tweets and media files along with (grailbird) data for web browsing:\n\nAssuming your target data grailbird folder (containing files from [tweets-gb](https://github.com/vijinho/tweets-gb)) is in `euromoan/www/euromoan` and that `euromoan/www` is the webroot.\n\n#### Export tweets, with media files to web-viewable folder\n\nThis will process tweets in `euromoan`, exporting data and media files to `euromoan/www/euromoan` and the media file URLs will be prefixed with `/euromoan/` such that browsing from the webroot `euromoan/www` and starting a webserver there (with php) http://127.0.0.1:9012 will reference the local files under the webroot path `/euromoan/path/to/file`\n\n```\n$ php cli/tweets.php --dir=euromoan -g=euromoan/www/euromoan/ --grailbird-media  --media-prefix='/euromoan/' --debug\n$ cd euromoan/www\n$ php -S 127.0.0.1:9012\n```\n\n## To Do\n\n- Reduce memory-usage!\n- Work and process other files in the twitter backup fileset, e.g. for Twitter Moments\n- Option to export/copy a tweet and all associated files\n- Option to write filtered tweets to a different file formats, e.g. CSV or HTML\n- Option to generate markdown .md files from tweets in subfolders, compatible with [grav](https://getgrav.org/)\n\n## Project History\n\nThis was written after browsing [@mwichary/twitter-export-image-fill](https://github.com/mwichary/twitter-export-image-fill) and reading about [this issue](https://github.com/mwichary/twitter-export-image-fill/issues/10):\n\n\u003e \"Twitter has two ways of getting an archive. One is the way you show. The second requires going to:\n\u003e **Settings and privacy \u003e Your Twitter data \u003e Download your Twitter data \u003e Download data**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvijinho%2Ftweets-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvijinho%2Ftweets-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvijinho%2Ftweets-cli/lists"}