Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shaikhsajid1111/twitter-scraper-selenium
Python's package to scrap Twitter's front-end easily
https://github.com/shaikhsajid1111/twitter-scraper-selenium
automation contribution-welcome csv hacktoberfest json open-source pypi python python3 selenium social-media tweets twitter twitter-api twitter-bot twitter-hashtag twitter-profile twitter-profiles twitter-scraper web-scraping
Last synced: about 21 hours ago
JSON representation
Python's package to scrap Twitter's front-end easily
- Host: GitHub
- URL: https://github.com/shaikhsajid1111/twitter-scraper-selenium
- Owner: shaikhsajid1111
- License: mit
- Created: 2021-08-26T10:49:23.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-07T14:54:39.000Z (5 months ago)
- Last Synced: 2025-01-25T00:03:19.768Z (8 days ago)
- Topics: automation, contribution-welcome, csv, hacktoberfest, json, open-source, pypi, python, python3, selenium, social-media, tweets, twitter, twitter-api, twitter-bot, twitter-hashtag, twitter-profile, twitter-profiles, twitter-scraper, web-scraping
- Language: Python
- Homepage: https://pypi.org/project/twitter-scraper-selenium
- Size: 127 KB
- Stars: 334
- Watchers: 6
- Forks: 51
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Twitter scraper selenium
Python's package to scrape Twitter's front-end easily with selenium.
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)Table of Contents
Table of Contents
Getting Started
Usage
- Privacy
- License
Prerequisites
Installation
Installing from the source
Download the source code or clone it with:
```
git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
```
Open terminal inside the downloaded folder:
```
python3 setup.py install
```
Installing with PyPI
```
pip3 install twitter-scraper-selenium
```
Usage
Available Function In this Package - Summary
Function Name
Function Description
Scraping Method
Scraping Speed
scrape_profile()
Scrape's Twitter user's profile tweets
Browser Automation
Slow
get_profile_details()
Scrape's Twitter user details.
HTTP Request
Fast
scrape_profile_with_api()
Scrape's Twitter tweets by twitter profile username. It expects the username of the profile
Browser Automation & HTTP Request
Fast
Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.
To scrape twitter profile details:
```python
from twitter_scraper_selenium import get_profile_details
twitter_username = "TwitterAPI"
filename = "twitter_api_data"
browser = "firefox"
headless = True
get_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)
```
Output:
```js
{
"id": 6253282,
"id_str": "6253282",
"name": "Twitter API",
"screen_name": "TwitterAPI",
"location": "San Francisco, CA",
"profile_location": null,
"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
"url": "https:\/\/t.co\/8IkCzCDr19",
"entities": {
"url": {
"urls": [{
"url": "https:\/\/t.co\/8IkCzCDr19",
"expanded_url": "https:\/\/developer.twitter.com",
"display_url": "developer.twitter.com",
"indices": [
0,
23
]
}]
},
"description": {
"urls": []
}
},
"protected": false,
"followers_count": 6133636,
"friends_count": 12,
"listed_count": 12936,
"created_at": "Wed May 23 06:01:13 +0000 2007",
"favourites_count": 31,
"utc_offset": null,
"time_zone": null,
"geo_enabled": null,
"verified": true,
"statuses_count": 3656,
"lang": null,
"contributors_enabled": null,
"is_translator": null,
"is_translation_enabled": null,
"profile_background_color": null,
"profile_background_image_url": null,
"profile_background_image_url_https": null,
"profile_background_tile": null,
"profile_image_url": null,
"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
"profile_banner_url": null,
"profile_link_color": null,
"profile_sidebar_border_color": null,
"profile_sidebar_fill_color": null,
"profile_text_color": null,
"profile_use_background_image": null,
"has_extended_profile": null,
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null,
"translator_type": null
}
```
get_profile_details()
arguments:
Argument
Argument Type
Description
twitter_username
String
Twitter Username
output_filename
String
What should be the filename where output is stored?.
output_dir
String
What directory output file should be saved?
proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.
To scrape profile's tweets:
In JSON format:
```python
from twitter_scraper_selenium import scrape_profile
microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
```
Output:
```javascript
{
"1430938749840629773": {
"tweet_id": "1430938749840629773",
"username": "Microsoft",
"name": "Microsoft",
"profile_picture": "https://twitter.com/Microsoft/photo",
"replies": 29,
"retweets": 58,
"likes": 453,
"is_retweet": false,
"retweet_link": "",
"posted_time": "2021-08-26T17:02:38+00:00",
"content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
"hashtags": [],
"mentions": [],
"images": [],
"videos": [],
"tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
"link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
},...
}
```
In CSV format:
```python
from twitter_scraper_selenium import scrape_profile
scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")
```
Output:
tweet_id
username
name
profile_picture
replies
retweets
likes
is_retweet
retweet_link
posted_time
content
hashtags
mentions
images
videos
post_url
link
1430938749840629773
Microsoft
Microsoft
https://twitter.com/Microsoft/photo
64
75
521
False
2021-08-26T17:02:38+00:00
Easy to use and efficient for all – Windows 11 is committed to an accessible future.
Here's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW
[]
[]
[]
[]
https://twitter.com/Microsoft/status/1430938749840629773
https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC
...
scrape_profile()
arguments:
Argument
Argument Type
Description
twitter_username
String
Twitter username of the account
browser
String
Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox
proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.
tweets_count
Integer
Number of posts to scrape. Default is 10.
output_format
String
The output format, whether JSON or CSV. Default is JSON.
filename
String
If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.
directory
String
If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.
headless
Boolean
Whether to run crawler headlessly?. Default is True
Keys of the output
Key
Type
Description
tweet_id
String
Post Identifier(integer casted inside string)
username
String
Username of the profile
name
String
Name of the profile
profile_picture
String
Profile Picture link
replies
Integer
Number of replies of tweet
retweets
Integer
Number of retweets of tweet
likes
Integer
Number of likes of tweet
is_retweet
boolean
Is the tweet a retweet?
retweet_link
String
If it is retweet, then the retweet link else it'll be empty string
posted_time
String
Time when tweet was posted in ISO 8601 format
content
String
content of tweet as text
hashtags
Array
Hashtags presents in tweet, if they're present in tweet
mentions
Array
Mentions presents in tweet, if they're present in tweet
images
Array
Images links, if they're present in tweet
videos
Array
Videos links, if they're present in tweet
tweet_url
String
URL of the tweet
link
String
If any link is present inside tweet for some external website.
To Scrap profile's tweets with API:
```python
from twitter_scraper_selenium import scrape_profile_with_api
scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
```
scrape_profile_with_api()
Arguments:
Argument
Argument Type
Description
username
String
Twitter's Profile username
tweets_count
Integer
Number of tweets to scrape.
output_filename
String
What should be the filename where output is stored?.
output_dir
String
What directory output file should be saved?
proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.
browser
String
Which browser to use for extracting out graphql key. Default is firefox.
headless
String
Whether to run browser in headless mode?
Output:
```js
{
"1608939190548598784": {
"tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
"tweet_details":{
...
},
"user_details":{
...
}
}, ...
}
```
Using scraper with proxy (http proxy)
Just pass proxy
argument to function.
```python
from twitter_scraper_selenium import scrape_profile
scrape_profile("elonmusk", headless=False, proxy="66.115.38.247:5678", output_format="csv",filename="musk") #In IP:PORT format
```
Proxy that requires authentication:
```python
from twitter_scraper_selenium import scrape_profile
microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
proxy="sajid:[email protected]:5678") # username:password@IP:PORT
print(microsoft_data)
```
Privacy
This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.
LICENSE
MIT