https://github.com/souvic/mtweepy
Fastest scraping using multiple apps and user tokens for Twitter API!
https://github.com/souvic/mtweepy
scraping scraping-python scraping-web tweepy twitter twitter-api
Last synced: 9 months ago
JSON representation
Fastest scraping using multiple apps and user tokens for Twitter API!
- Host: GitHub
- URL: https://github.com/souvic/mtweepy
- Owner: Souvic
- License: mit
- Created: 2021-07-04T07:37:59.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-07-15T06:55:23.000Z (over 4 years ago)
- Last Synced: 2025-02-20T17:19:05.536Z (10 months ago)
- Topics: scraping, scraping-python, scraping-web, tweepy, twitter, twitter-api
- Language: Python
- Homepage: https://github.com/Souvic/mtweepy
- Size: 81.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Makes twitter scrapping with multiple twitters apps easy again!
[](https://opensource.org/licenses/MIT)
[]()
[]()
[](https://pypi.org/project/mtweepy/)
[]()
[](https://scrutinizer-ci.com/g/Souvic/mtweepy/build-status/main)
[](https://scrutinizer-ci.com/g/Souvic/mtweepy/?branch=main)
[]()
[]()
[](https://twitter.com/intent/tweet?text=I%20found%20this%20awesome%20repo%20on%20github%20%26%20PyPI%20that%20makes%20twitter%20scraping%20fastest%20with%20multpl%20token%20support%2C%20oauth1%262!!&url=https%3A%2F%2Fgithub.com%2FSouvic%2Fmtweepy)
### Support me
[](https://www.buymeacoffee.com/Souvic)
## Install from PyPi
```
pip3 install mtweepy
```
## Or Install from main branch
```
pip3 install git+https://github.com/Souvic/mtweepy.git
```
# Example usage
There are three functions in the repo: get_followers, get_timelines, get_users.
All the functions use all the auth tokens optimally for fastest scraping.
Apart from self explantory inputs:
1. As auths, a list of tweepy bearer tokens are expected if you want to use oauth2 limits for twitter api.
2. As auths, a list of \[_oauth_consumer_key, oauth_consumer_secret, client_secret, oauth_token, oauth_token_secret_] are expected if you want to use oauth1 limits for twitter api.
3. use_userid parameter is by default _False_. If it is passed as _True_ in get_followers, get_followers will treat the screen_name_or_userid parameter as userid for which follower is to be scraped.
4. output_folder is supposed to be an empty folder to save output from get_timelines and get_users functions.
An example usage is provided here.
### Gets 5000*ceil(max_num/5000) number of followers' userids as a list for screen_name INCIndia
```
from mtweepy import get_followers, get_users, get_timelines
list_followers= get_followers(auths, "INCIndia", max_num=500)#gets list of followers appended in chunk of 5000, if max_num<5000, will get last 5000 followers.
```
### Gets all the maximally extended user objects for list_followers(a list of user ids)
The output is saved in the output_folder as multiple jsonl files(one file per access token).
Each line of jsonl files contains the maximally extended user object for one user.
```
get_users(auths, list_followers, output_folder="./testfolder1")
```
### Gets all the tweets in the timelines of list_followers(a list of user ids)
The output is saved in the output_folder as multiple jsonl files(one file per access token).
Each line of jsonl files contains last 3200 tweets of a user.
```
get_timelines(auths, list_followers, output_folder="./testfolder2")
```
### To get the total number of lines written in files in the directory ./testfolder1
Type this in commandline at any point of data collection
```
find ./testfolder1 -name '*.jsonl' | xargs wc -l
```
For _get_users function_: Each line contains 100 users approximately.
For _get_timelines function_: Each line contains 1 user timeline.
So you can calculate an approximate rate with this function to know when data collection will be finished.