{"id":18300294,"url":"https://github.com/docnow/twarc-csv","last_synced_at":"2025-04-05T13:36:01.949Z","repository":{"id":43262885,"uuid":"352366163","full_name":"DocNow/twarc-csv","owner":"DocNow","description":"A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.","archived":false,"fork":false,"pushed_at":"2023-07-06T11:51:10.000Z","size":776,"stargazers_count":32,"open_issues_count":6,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-21T05:32:46.488Z","etag":null,"topics":["csv","dataframe","pandas","pandas-dataframe","twarc","twitter","twitter-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DocNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-28T15:30:15.000Z","updated_at":"2024-12-04T05:31:40.000Z","dependencies_parsed_at":"2024-11-05T15:12:09.142Z","dependency_job_id":"39f6f93f-12d4-4678-bc53-50fce8ae6905","html_url":"https://github.com/DocNow/twarc-csv","commit_stats":{"total_commits":115,"total_committers":3,"mean_commits":"38.333333333333336","dds":0.06956521739130439,"last_synced_commit":"cb4745396e70c18dc1b169cf5c69f383992c6f25"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Ftwarc-csv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Ftwarc-csv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Ftwarc-csv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Ftwarc-csv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DocNow","download_url":"https://codeload.github.com/DocNow/twarc-csv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247342703,"owners_count":20923642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","dataframe","pandas","pandas-dataframe","twarc","twitter","twitter-api"],"created_at":"2024-11-05T15:11:58.282Z","updated_at":"2025-04-05T13:36:01.467Z","avatar_url":"https://github.com/DocNow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# twarc-csv\n\nThis module adds CSV Export for Tweets to `twarc`.\n\nMake sure twarc is installed and configured:\n\n```\npip3 install --upgrade twarc\ntwarc2 configure\n```\n\nInstall this plugin:\n\n```\npip3 install --upgrade twarc-csv\n```\n\nA new `csv` command will be available in twarc. If you have collected some\ntweets in a file `tweets.jsonl` you can now convert them to CSV\n\n```\ntwarc2 search --limit 500 \"blacklivesmatter\" tweets.jsonl # collect some tweets\ntwarc2 csv tweets.jsonl tweets.csv # convert to CSV\n```\n\n## Extra Command Line Options\n\nRun\n\n```\ntwarc2 csv --help\n```\n\nFor a list of options.\n\n```\nUsage: twarc2 csv [OPTIONS] [INFILE] [OUTFILE]\n\n  Convert tweets to CSV.\n\nOptions:\n  --input-data-type [tweets|users|counts|compliance|lists]\n                                  Input data type - you can turn \"tweets\",\n                                  \"users\", \"counts\" or \"compliance\" or \"lists\"\n                                  data into CSV.\n  --inline-referenced-tweets / --no-inline-referenced-tweets\n                                  Output referenced tweets inline as separate\n                                  rows. Default: no.\n  --merge-retweets / --no-merge-retweets\n                                  Merge original tweet metadata into retweets.\n                                  The Retweet Text, metrics and entities are\n                                  merged from the original tweet. Default:\n                                  Yes.\n  --process-entities / --no-process-entities\n                                  Preprocess entities like URLs, mentions and\n                                  hashtags, providing expanded urls and lists\n                                  only instead of full json objects. Default:\n                                  Yes.\n  --json-encode-all / --no-json-encode-all\n                                  JSON encode / escape all fields. Default: no\n  --json-encode-text / --no-json-encode-text\n                                  Apply JSON encode / escape to text fields.\n                                  Default: no\n  --json-encode-lists / --no-json-encode-lists\n                                  JSON encode / escape lists. Default: yes\n  --allow-duplicates              List every tweets as is, including\n                                  duplicates. Default: No, only unique tweets\n                                  per row. Retweets are not duplicates.\n  --extra-input-columns TEXT      Manually specify extra input columns. Comma\n                                  separated string. Only modify this if you\n                                  have processed the json yourself. Default\n                                  output is all available object columns, no\n                                  extra input columns.\n  --output-columns TEXT           Specify what columns to output in the CSV.\n                                  Default is all input columns.\n  --batch-size INTEGER            How many lines to process per chunk. Default\n                                  is 100. Reduce this if output is slow.\n  --hide-stats                    Hide stats about the dataset on completion.\n                                  Always hidden if you're using stdin / stdout\n                                  pipes.\n  --hide-progress                 Hide the Progress bar. Always hidden if\n                                  you're using stdin / stdout pipes.\n  --help                          Show this message and exit.\n```\n\n## Issues with Twitter Data in CSV\n\nCSV isn't the best choice for storing twitter data. Always keep the original API responses, and perform feature extraction on json objects.\n\nThis export script is intended for convenience, for importing samples of data into other tools, there are many ways to format a CSV of tweets, and this is just one way.\n\n## Contributing\n\nSuggestions, opinions, and pull requests welcome and encouraged. Even if you are just interested in using this plugin, post your use case in the Issues.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocnow%2Ftwarc-csv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocnow%2Ftwarc-csv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocnow%2Ftwarc-csv/lists"}