{"id":17239808,"url":"https://github.com/mhucka/taupe","last_synced_at":"2025-12-14T12:32:51.891Z","repository":{"id":63263476,"uuid":"565484535","full_name":"mhucka/taupe","owner":"mhucka","description":"Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a comma-separated values (CSV) format that you can use with other software tools.","archived":true,"fork":false,"pushed_at":"2023-04-13T00:42:29.000Z","size":180,"stargazers_count":33,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-18T01:43:07.136Z","etag":null,"topics":["archives","comma-separated-values","csv","data-extraction","markdown","twitter","twitter-archive","twitter-archives","url"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhucka.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null}},"created_at":"2022-11-13T15:03:19.000Z","updated_at":"2025-01-19T07:42:48.000Z","dependencies_parsed_at":"2023-01-25T18:30:11.641Z","dependency_job_id":null,"html_url":"https://github.com/mhucka/taupe","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":"mhucka/template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhucka%2Ftaupe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhucka%2Ftaupe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhucka%2Ftaupe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhucka%2Ftaupe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhucka","download_url":"https://codeload.github.com/mhucka/taupe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240258225,"owners_count":19772969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archives","comma-separated-values","csv","data-extraction","markdown","twitter","twitter-archive","twitter-archives","url"],"created_at":"2024-10-15T05:49:50.509Z","updated_at":"2025-12-14T12:32:51.853Z","avatar_url":"https://github.com/mhucka.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Taupe\u003cimg width=\"70em\" align=\"right\" src=\"https://raw.githubusercontent.com/mhucka/taupe/main/.graphics/taupe-icon.png\"\u003e\n\nA simple program to extract the URLs of your tweets, retweets, replies, quote tweets, and \"likes\" from a personal Twitter archive.\n\n[![License](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](https://choosealicense.com/licenses/mit)\n[![Latest release](https://img.shields.io/github/v/release/mhucka/taupe.svg?style=flat-square\u0026color=purple\u0026label=Release)](https://github.com/mhucka/taupe/releases)\n\n\n## Table of contents\n\n* [Introduction](#introduction)\n* [Installation](#installation)\n* [Usage](#usage)\n* [Known issues and limitations](#known-issues-and-limitations)\n* [Relationships to other similar tools](#relationships-to-other-similar-tools)\n* [Getting help](#getting-help)\n* [Contributing](#contributing)\n* [License](#license)\n* [Acknowledgments](#authors-and-acknowledgments)\n\n\n## Introduction\n\nWhen you [download your personal Twitter archive](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive), you receive a [ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format)) file. The contents are not necessarily in a format convenient for doing something with them. For example, you may want to send the URLs to the [Wayback Machine at the Internet Archive](https://archive.org/web/) or do something else with the URLs. For tasks like that, you need to extract URLs from your Twitter archive. That's the purpose of Taupe.\n\n_Taupe_ (a loose acronym of \u003cins\u003e\u003cb\u003eT\u003c/b\u003e\u003c/ins\u003ewitter \u003cins\u003e\u003cb\u003ea\u003c/b\u003e\u003c/ins\u003erchive \u003cins\u003e\u003cb\u003eU\u003c/b\u003e\u003c/ins\u003eRL \u003cins\u003e\u003cb\u003ep\u003c/b\u003e\u003c/ins\u003ears\u003cins\u003e\u003cb\u003ee\u003c/b\u003e\u003c/ins\u003er) takes a Twitter archive ZIP file, extracts the URLs corresponding to your tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a [comma-separated values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) format that you can easily use with other software tools. Once you have [installed it](#installation), using `taupe` is easy:\n```shell\n# Extract tweets, retweets, replies, and quote tweets:\ntaupe /path/to/your/twitter-archive.zip\n\n# Extract likes:\ntaupe --extract likes /path/to/your/twitter-archive.zip\n\n# Learn more:\ntaupe --help\n```\n\n## Installation\n\nThere are multiple ways of installing Taupe.  Please choose the alternative that suits you.\n\n### _Alternative 1: installing Taupe using `pipx`_\n\n[Pipx](https://pypa.github.io/pipx/) lets you install Python programs in a way that isolates Python dependencies, and yet the resulting `taupe` command can be run from any shell and directory \u0026ndash; like any normal program on your computer. If you use `pipx` on your system, you can install Taupe with the following command:\n```sh\npipx install taupe\n```\n\nPipx can also let you run Taupe directly using `pipx run taupe`, although in that case, you must always prefix every Taupe command with `pipx run`.  Consult the [documentation for `pipx run`](https://github.com/pypa/pipx#walkthrough-running-an-application-in-a-temporary-virtual-environment) for more information.\n\n\n### _Alternative 2: installing Taupe using `pip`_\n\nYou should be able to install `taupe` with [`pip`](https://pip.pypa.io/en/stable/installing/) for Python\u0026nbsp;3.  To install `taupe` from the [Python package repository (PyPI)](https://pypi.org), run the following command:\n```sh\npython3 -m pip install taupe\n```\n\nAs an alternative to getting it from [PyPI](https://pypi.org), you can use `pip` to install `taupe` directly from GitHub:\n```sh\npython3 -m pip install git+https://github.com/mhucka/taupe.git\n```\n\n_If you already installed Taupe once before_, and want to update to the latest version, add `--upgrade` to the end of either command line above.\n\n\n### _Alternative 3: installing Taupe from sources_\n\nIf  you prefer to install Taupe directly from the source code, you can do that too. To get a copy of the files, you can clone the GitHub repository:\n```sh\ngit clone https://github.com/mhucka/taupe\n```\n\nAlternatively, you can download the software source files as a ZIP archive directly from your browser using this link: \u003chttps://github.com/mhucka/taupe/archive/refs/heads/main.zip\u003e\n\nNext, after getting a copy of the files,  run `setup.py` inside the code directory:\n```sh\ncd taupe\npython3 setup.py install\n```\n\n\n## Usage\n\nIf the installation process described above is successful, you should end up with a program named `taupe` in a location where software is normally installed on your computer.  Running `taupe` should be as simple as running any other command-line program. For example, the following command should print a helpful message to your terminal:\n```shell\ntaupe --help\n```\n\nIf not given the option `--help` or `--version`, this program expects to be given a [personal Twitter archive file](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive), either on the command line (as an argument) or on standard input (from a pipe or file redirection). Here's an example (and note this path is fake \u0026ndash; substitute a real path on your computer when you do this!):\n```shell\ntaupe /path/to/twitter-archive.zip\n```\n\nThe URLs produced by `taupe` will be, by default, as they appear in the archive. If you want to [normalize the URLs](https://developer.twitter.com/en/blog/community/2020/getting-to-the-canonical-url-for-a-tweet) into the canonical form `https://twitter.com/twitter/status/TWEETID`, use the option `--canonical-urls` (`-c` for short):\n```shell\ntaupe -c /path/to/twitter-archive.zip\n```\n\n\n### The structure of the output\n\nThe option `--extract` controls both the content and the format of the output. The following options are recognized:\n\n| Value            | Synonym        | Output |\n|------------------|----------------|--------|\n| `all-tweets`     | `tweets`       | CSV table with all tweets and details (default) |\n| `my-tweets`      |                | list of URLs of only your original tweets |\n| `retweets`       |                | list of URLs of tweets that are retweets |\n| `quoted-tweets`  | `quote-tweets` | list of URLs of other tweets you quoted |\n| `replied-tweets` | `reply-tweets` | list of URLs of other tweets you replied to |\n| `liked`          | `likes`        | list of URLs of tweets you \"liked\" |\n\n\n#### `all-tweets`\n\nWhen using `--extract all-tweets` (the default), `taupe` produces a table with four columns.  Each row of the table corresponds to a type of event in the Twitter timeline: a tweet, a retweet, a reply to another tweet, or a quote tweet. The values in the columns provide details about the event. The following is a summary of the structure:\n\n| Column\u0026nbsp;1 | Column 2 | Column 3 | Column 4 |\n|:-------------:|----------|----------|----------|\n| tweet timestamp in ISO format  | The\u0026nbsp;URL of the tweet | The type; one of `tweet`, `reply`, `retweet`, or `quote` | (For type `reply` or `quote`.) The URL of the original or source tweet |\n\nThe last column only has a value for replies and quote-tweets; in those cases, the URL in the column refers to the tweet being replied to or the tweet being quoted.  The fourth column does not have a value for retweets even though it would be desirable, because the Twitter archive \u0026ndash; strangely \u0026ndash; does not provide the URLs of retweeted tweets.\n\nHere is an example of the output:\n```text\n2022-09-21T22:36:29+00:00,https://twitter.com/mhucka/status/1572716422857658368,quote,https://twitter.com/poppy_northcutt/status/1572714310077673472\n2022-10-10T22:04:20+00:00,https://twitter.com/mhucka/status/1579593701965582336,reply,https://twitter.com/arfon/status/1579572453726355456\n2022-10-14T04:17:01+00:00,https://twitter.com/mhucka/status/1580774654217625600,tweet\n2022-10-25T14:49:06+00:00,https://twitter.com/mhucka/status/1584919989307715586,retweet\n...\n```\n\n#### `my-tweets`\n\nWhen using `--extract my-tweets`, the output is just a single column (a list) of URLs, one per line, of just your original tweets. This list corresponds exactly to column 2 in the `--extract all-tweets` case above.\n\n\n#### `retweets`\n\nWhen using `--extract retweets`, the output is a single column (a list) of URLs, one per line, of tweets that are retweets of other tweets. This list corresponds to the values of column 2 above when the type is `retweet`. **Important**: the Twitter archive does not contain the original tweet's URL, only the URL of your retweet. Consequently, the output for `--extract retweets` is _your_ retweet's URL, not the URL of the source tweet.\n\n\n#### `quoted-tweets`\n\nWhen using `--extract quoted-tweets`, the output is a list of the URLs of other tweets that you have quoted. It corresponds to the subset of column 4 values above when the type is \"quote\". Note that these are the source tweet URLs, not the URLs of your tweets.\n\n\n#### `replied-tweets`\n\nWhen using `--extract replied-tweets`, the output is a list of the URLs of other tweets that you have replied to. It corresponds to the subset of column 4 values above when the type is \"reply\". Note that these are the source tweet URLs, not the URLs of your tweets.\n\n\n#### `likes`\n\nWhen using the option `--extract likes`, the output will only contain one column: the URLs of the \"liked\" tweets. `taupe` cannot provide more detail because the Twitter archive format does not contain date/time information for \"likes\". (This is also why \"likes\" are _not_ part of the output when `--extract all-tweets` is used \u0026ndash; there is no possible value for column 1.)\n\nHere is an example of the output when using `--extract likes` in combination with `--canonical-urls`:\n```\nhttps://twitter.com/twitter/status/1588146224376463365\nhttps://twitter.com/twitter/status/1588349144803905536\nhttps://twitter.com/twitter/status/1590475356976578560\n...\n```\n\n\n### Other options recognized by `taupe`\n\nRunning `taupe` with the option `--help` will make it print help text and exit without doing anything else.\n\nThe option `--output` controls where `taupe` writes the output. If the value given to `--output` is `-` (a single dash), the output is written to the terminal (stdout). Otherwise, the value must be a file.\n\nIf given the `--version` option, this program will print its version and other information, and exit without doing anything else.\n\nIf given the `--debug` argument, `taupe` will output a detailed trace of what it is doing. The debug trace will be sent to the given destination, which can be `-` to indicate console output, or a file path to send the debug output to a file.\n\n### _Summary of command-line options_\n\nThe following table summarizes all the command line options available.\n\n| Short\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;    | Long\u0026nbsp;form\u0026nbsp;opt\u0026nbsp;\u0026nbsp; | Meaning | Default |  |\n|---------------|------------------------|----------------------|---------|---|\n| `-c`          | `--canonical-urls`     | Normalize Twitter URLs | Leave as-is| |\n| `-h`          | `--help`               | Print help info and exit | | |\n| `-e`\u0026nbsp;_E_ | `--extract`\u0026nbsp;_E_   | Extract URL type _E_ | `all-tweets` | ⚑ |\n| `-o`\u0026nbsp;_O_ | `--output`\u0026nbsp;_O_    | Write output to file _O_ | Terminal | ✦ |\n| `-V`          | `--version`            | Print program version \u0026 exit | | |\n| `-@`\u0026nbsp;_OUT_ | `--debug`\u0026nbsp;_OUT_ | Write debug output to _OUT_ |  | ⚐ |\n\n ⚑ \u0026nbsp; Recognized values: `all-tweets`, `tweets`, `my-tweets`, `retweets`, `quoted-tweets`, `replied-tweets`, and `likes`. See [section above](#the-structure-of-the-output) for more information. \u003cbr\u003e\n✦ \u0026nbsp; To write to the console, you can also use the character `-` as the value of _O_; otherwise, _O_ must be the name of a file where the output should be written.\u003cbr\u003e\n⚐ \u0026nbsp; To write to the console, use the character `-` as the value of _OUT_; otherwise, _OUT_ must be the name of a file where the output should be written.\n\n\n## Known issues and limitations\n\nThis program assumes that the Twitter archive ZIP file is in the format which Twitter produced in mid-November 2022. Twitter probably used a different format in the past, and may change the format again in the future, so `taupe` may or may not work on Twitter archives obtained in different historical periods.\n\nThe Twitter archive format for \"likes\" contains only the tweet identifier and the text of the tweet; consequently, `taupe` cannot provide date/time information for this case.\n\nThis program does all its work in memory, which means that `taupe`'s ability to process a given archive depends on its size and how much RAM the computer has. It has only been tested with modest-sized archives. It is unknown how it will behave with exceptionally large archives.\n\n\n## Relationships to other similar tools\n\nTo the author's knowledge, Taupe is the only tool that will directly and easily extract the URLs of tweets and \"likes\" from a Twitter archive ZIP file. There do exist other software tools for working with Twitter archives; the following is a (possibly incomplete) list:\n* [twitter-archive-parser](https://github.com/timhutton/twitter-archive-parser) \u0026ndash; convert the contents of a Twitter archive into and extract other information such as lists of followers.\n* [Save Your Threads](https://archive.social) \u0026ndash; lets you download signed PDFs of Twitter URLs.\n* [tweetback Twitter Archive](https://github.com/tweetback/tweetback) \u0026ndash; \"Take ownership of your Twitter data\".\n* [twitter-tools](https://github.com/selfawaresoup/twitter-tools) \u0026ndash; perform various operations such as get details about specific tweets using the Twitter API\n* [Twitter-Archive](https://github.com/jarulsamy/Twitter-Archive) \u0026ndash; a Python CLI tool to download media from bookmarked tweets.\n* [get_twitter_bookmarks.py](https://gist.github.com/divyajyotiuk/9fb29c046e1dfcc8d5683684d7068efe#file-get_twitter_bookmarks_v3-py) \u0026ndash; extract the URLs from bookmarked tweets; requires first using your web browser's developer interface to grab Twitter's bookmarks JSON data.\n* [archive.alt-text.org](https://github.com/alt-text-org/www.alt-text.org) \u0026ndash; a tool for saving the alt text you've written on Twitter.\n* [twitter-archive-tweets](https://observablehq.com/@enjalot/twitter-archive-tweets) \u0026ndash; a notebook to use as a starting point for processing tweets from your Twitter archive.\n* [fork of TWINT](https://github.com/woluxwolu/twint) \u0026ndash; a fork of the now-defunct [Twitter Intelligence Tool](https://github.com/twintproject/twint). \n* [pleroma-bot](https://github.com/robertoszek/pleroma-bot) \u0026ndash; bot for mirroring your favorite Twitter accounts in the Fediverse as well as migrating your own to the Fediverse using a Twitter archive.\n* [twitter-archive-analysis](https://github.com/dangoldin/twitter-archive-analysis) \u0026ndash; a script to analyze your Twitter archive.\n* [twitter-archive-reader](https://github.com/alkihis/twitter-archive-reader) \u0026ndash; explore tweets, DMs, media and more in a Twitter archive.\n* [twitter-archive-parser](https://github.com/leandrojmp/twitter-archive-converter) \u0026ndash; extract tweets from a Twitter archive.\n\n\n## Getting help\n\nIf you find a problem or have a request or suggestion, please submit it in [the GitHub issue tracker](https://github.com/mhucka/taupe/issues) for this repository.\n\n\n## Contributing\n\nI would be happy to receive your help and participation if you are interested.  Everyone is asked to read and respect the [code of conduct](CONDUCT.md) when participating in this project.  Please feel free to [report issues](https://github.com/mhucka/taupe/issues) or do a [pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) to fix bugs or add new features.\n\n\n## License\n\nThis software is Copyright (C) 2022, by Michael Hucka.  This software is freely distributed under the MIT license.  Please see the [LICENSE](LICENSE) file for more information.\n\n\n## Acknowledgments\n\nThis work is a personal project developed by the author, using computing equipment owned by the [California Institute of Technology Library](https://www.library.caltech.edu).\n\nThe [vector artwork](https://thenounproject.com/icon/bird-233023/) of a bird, used as the icon for this repository, was created by [Noe Araujo](https://thenounproject.com/noearaujo/) from the Noun Project.  It is licensed under the Creative Commons [CC-BY 3.0](https://creativecommons.org/licenses/by/3.0/) license. I manually changed the color to be a shade of taupe.\n\nTaupe uses multiple other open-source packages, without which it would have taken much longer to write the software. I want to acknowledge this debt. In alphabetical order, the packages are:\n* [Aenum](https://github.com/ethanfurman/aenum) \u0026ndash; Python package for advanced enumerations\n* [CommonPy](https://github.com/caltechlibrary/commonpy) \u0026ndash; a collection of commonly-useful Python functions\n* [Plac](https://github.com/ialbert/plac) \u0026ndash; a command line argument parser\n* [Rich](https://github.com/Textualize/rich) \u0026ndash; library for writing styled text to the terminal\n* [Sidetrack](https://github.com/caltechlibrary/sidetrack) \u0026ndash; simple debug logging/tracing package\n* [Twine](https://github.com/pypa/twine) \u0026ndash; utilities for publishing Python packages on [PyPI](https://pypi.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhucka%2Ftaupe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhucka%2Ftaupe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhucka%2Ftaupe/lists"}