Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/coskundeniz/twitter-data-extractor
Twitter Data Extractor
https://github.com/coskundeniz/twitter-data-extractor
data-extraction excel google-sheets mongodb python sqlite tweepy tweets-extraction twitter twitter-api
Last synced: 26 days ago
JSON representation
Twitter Data Extractor
- Host: GitHub
- URL: https://github.com/coskundeniz/twitter-data-extractor
- Owner: coskundeniz
- License: apache-2.0
- Created: 2022-04-21T13:17:30.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-06T09:15:22.000Z (over 1 year ago)
- Last Synced: 2025-01-12T05:45:47.933Z (30 days ago)
- Topics: data-extraction, excel, google-sheets, mongodb, python, sqlite, tweepy, tweets-extraction, twitter, twitter-api
- Language: Python
- Homepage:
- Size: 17.6 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Twitter Data Extractor
======================This command-line tool extracts user and tweet data from Twitter and reports the results to CSV, Excel, Google Sheets documents or MongoDB, SQLite databases.
[Related post on Medium](https://medium.com/@codenineeight/designing-a-twitter-data-extractor-tool-using-python-part-1-intro-50cd1c6fcb2e)
### Supported Features
* Extract single/multiple user data.
* Extract user’s friends/followers data.
* Extract tweets data for a user.
* Extract tweets data for a search keyword.
* Report results to CSV, Excel or Google Sheets documents.
* Report results to MongoDB or SQLite databases.**Fields to extract for user data**
* User ID
* Username
* Name
* Account creation date
* Bio
* URLs, Hashtags, Mentions
* Location
* Pinned Tweet ID
* Pinned Tweet
* Profile image URL
* Account protected flag
* Public metrics (followers/following/tweet/listed counts)
* External URL
* Verified flag**Fields to extract for tweet data**
* Tweet ID
* Tweet text
* Tweet creation date
* Source
* Language
* Public metrics (retweet/reply/like/quote count)
* URLs, Hashtags, Mentions
* Media (key, type, url, duration_ms(for video), width, height, public_metrics)
* Place (ID, full name, country, country code, place type, geo coordinates)
* Author data (for search tweets)You can see the user manual [here](https://github.com/coskundeniz/twitter-data-extractor/blob/main/docs/user_manual.pdf).
## How to setup
* Run the following commands to install required packages in the project directory.
* `python -m venv env`
* `source env/bin/activate`
* `python -m pip install -r requirements.txt`### Setting Environment Variables
For using the Twitter API service, set the `TWITTER_BEARER_TOKEN_CODE` environment variable with your bearer token value. Set the `TWITTER_CONSUMER_KEY_CODE` and `TWITTER_CONSUMER_SECRET_CODE` environment variables for your consumer key and consumer secret tokens to use the tool on behalf of another user account.
You can see the instructions to set environment variables [here for Linux](https://phoenixnap.com/kb/linux-set-environment-variable), [here for Windows](https://phoenixnap.com/kb/windows-set-environment-variable), and [here for Mac](https://phoenixnap.com/kb/set-environment-variable-mac).
### MongoDB Installation
If you will use MongoDB to save users/tweets data, install it from [here](https://docs.mongodb.com/manual/administration/install-community/).
You can check the running status after installation and start the database server with the following commands on Linux.
* `sudo service mongod status`
* `sudo service mongod start`## How to use
```sh
usage: python twitter_data_extractor.py [-h] [-c] [-cf CONFIGFILE] [--forme] [-u USER] [-ul USERS] [-fr] [-fl] [-ut] [-s SEARCH]
[-tc TWEET_COUNT] [-e EXCLUDES] [-ot OUTPUT_TYPE] [-of OUTPUT_FILE] [-sm SHARE_MAIL]optional arguments:
-h, --help show this help message and exit
-c, --useconfig Read configuration from config.json file
-cf CONFIGFILE, --configfile CONFIGFILE Read configuration from given file
--forme Determine API user(account owner or on behalf of a user)
-u USER, --user USER Extract user data for the given username
-ul USERS, --users USERS Extract user data for the given comma separated usernames
-fr, --friends Extract friends data for the given username
-fl, --followers Extract followers data for the given username
-ut, --user_tweets Extract tweets of user with the given username
-s SEARCH, --search SEARCH Extract latest tweets for the given search keyword
-tc TWEET_COUNT, --tweet_count TWEET_COUNT Limit the number of tweets gathered
-e EXCLUDES, --excludes EXCLUDES Fields to exclude from tweets queried as comma separated values (replies,retweets)
-ot OUTPUT_TYPE, --output_type OUTPUT_TYPE Output file type (csv, xlsx, gsheets, mongodb or sqlite)
-of OUTPUT_FILE, --output_file OUTPUT_FILE Output file name
-sm SHARE_MAIL, --share_mail SHARE_MAIL Mail address to share Google Sheets document
```* If config will be used for getting parameters, boolean parameters like --forme, --friends, --followers, --user_tweets still must be passed as command-line option.
* "user" and "users" field should be empty for "search" keyword to be used.The following is an example of config.json content.
```json
{
"user": "gvanrossum",
"users": "",
"search": "",
"excludes": "retweets",
"tweet_count": 20,
"output_type": "xlsx",
"output_file": "results.xlsx",
"share_mail": "[email protected]"
}
```### Basic Usage
The following commands are a few examples of getting user data, user’s friends, tweets or tweets of a given keyword.
* `python twitter_data_extractor.py -u gvanrossum`
* `python twitter_data_extractor.py --forme -ul "gvanrossum,nedbat"`
* `python twitter_data_extractor.py -u gvanrossum -fr`
* `python twitter_data_extractor.py --forme -u gvanrossum -ut`
* `python twitter_data_extractor.py -s python`Results are written to *results.xlsx* file by default.
Logs can be seen in the *tw_data_extractor.log* file in the project directory.### Example Commands
* Get user data for username *gvanrossum* and save results to *results.xlsx* file on behalf of another account.
* `python twitter_data_extractor.py -u gvanrossum`* Get user data for username *gvanrossum* and save results to *results.xlsx* file for your own account.
* `python twitter_data_extractor.py --forme -u gvanrossum`* Get user data for *gvanrossum* and write the results to *results.xlsx* by getting parameters from the default config file(*config.json*).
* `python twitter_data_extractor.py --forme -c`* Get user data for *gvanrossum* and write the results to *results.xlsx* by getting parameters from the given config file.
* `python twitter_data_extractor.py --forme -c -cf /home/coskun/custom_config.json`* Get user data for usernames *gvanrossum* and *nedbat*.
* `python twitter_data_extractor.py -ul "gvanrossum,nedbat"`* Get friends data for username *gvanrossum* and save results to *results.csv* file.
* `python twitter_data_extractor.py -u gvanrossum -fr -ot csv -of results.csv`* Get followers data for username *gvanrossum*.
* `python twitter_data_extractor.py -u gvanrossum -fl`* Get the last tweets data for username *gvanrossum*.
* `python twitter_data_extractor.py --forme -u gvanrossum -ut`* Get the last 50 tweets data for username *gvanrossum* and exclude both *replies* and *retweets*.
* `python twitter_data_extractor.py --forme -u gvanrossum -ut -tc 50 -e "replies,retweets"`* Get the last 50 tweets data for keyword *python*.
* `python twitter_data_extractor.py -s python -tc 50`* Get the last tweets data for keyword *python* and write results to Google Sheets document with name *last_tweets* and share with the given email.
* `python twitter_data_extractor.py -s python -ot gsheets -of last_tweets -sm [email protected]`### Example Runs & Outputs
* Search the last 20 tweet data for the keyword "python" and save it to the "results.xlsx" file.
* `python twitter_data_extractor.py -s python -tc 20`![Tweets Search](assets/python_tweet_search.gif)
* Get the last 5 tweets data excluding replies and retweets for the username "gvanrossum" and write results to Google Sheets document named as "gvanrossum_last_tweets" and share with the given email.
* `python twitter_data_extractor.py --forme -u gvanrossum -ut -tc 5 -e "replies-retweets" -ot gsheets -of gvanrossum_last_tweets -sm [email protected]`![User Tweets Run](assets/gvanrossum_tweets.gif)
![User Tweets Output](assets/gvanrossum_tweets.png)
---
## Support
If you need support, you can contact me by emailing to [email protected] with the “twitter_data_extractor” prefix in the subject. You can also see my Upwork profile [here](https://www.upwork.com/freelancers/~011e3fe44e575092f0).
If you benefit from this tool, please consider donating using the sponsor links.