https://github.com/shaikhsajid1111/twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://github.com/shaikhsajid1111/twitter-scraper-selenium

automation contribution-welcome csv hacktoberfest json open-source pypi python python3 selenium social-media tweets twitter twitter-api twitter-bot twitter-hashtag twitter-profile twitter-profiles twitter-scraper web-scraping

Last synced: 3 months ago
JSON representation

Python's package to scrap Twitter's front-end easily

Host: GitHub
URL: https://github.com/shaikhsajid1111/twitter-scraper-selenium
Owner: shaikhsajid1111
License: mit
Created: 2021-08-26T10:49:23.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-09-07T14:54:39.000Z (10 months ago)
Last Synced: 2025-03-31T04:04:11.791Z (3 months ago)
Topics: automation, contribution-welcome, csv, hacktoberfest, json, open-source, pypi, python, python3, selenium, social-media, tweets, twitter, twitter-api, twitter-bot, twitter-hashtag, twitter-profile, twitter-profiles, twitter-scraper, web-scraping
Language: Python
Homepage: https://pypi.org/project/twitter-scraper-selenium
Size: 127 KB
Stars: 338
Watchers: 6
Forks: 54
Open Issues: 35
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Twitter scraper selenium

Python's package to scrape Twitter's front-end easily with selenium.

[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)

Table of Contents

Getting Started
- Prerequisites
- Installation
  - Installing from source
  - Installing with PyPI

Usage
- Available Functions in this package- Summary
- Scraping profile's details

Privacy

License

Prerequisites

Internet Connection

Python 3.6+

Chrome or Firefox browser installed on your machine

Installation

Installing from the source

Download the source code or clone it with:

```
git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
```

Open terminal inside the downloaded folder:

```
python3 setup.py install
```

Installing with PyPI

```
pip3 install twitter-scraper-selenium
```

Usage

Available Function In this Package - Summary

Function Name
Function Description
Scraping Method
Scraping Speed

scrape_profile()
Scrape's Twitter user's profile tweets
Browser Automation
Slow

get_profile_details()
Scrape's Twitter user details.
HTTP Request
Fast

scrape_profile_with_api()
Scrape's Twitter tweets by twitter profile username. It expects the username of the profile
Browser Automation & HTTP Request
Fast

Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.

To scrape twitter profile details:

```python
from twitter_scraper_selenium import get_profile_details

twitter_username = "TwitterAPI"
filename = "twitter_api_data"
browser = "firefox"
headless = True
get_profile_details(twitter_username=twitter_username, filename=filename, browser=browser, headless=headless)

```
Output:
```js
{
"id": 6253282,
"id_str": "6253282",
"name": "Twitter API",
"screen_name": "TwitterAPI",
"location": "San Francisco, CA",
"profile_location": null,
"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
"url": "https:\/\/t.co\/8IkCzCDr19",
"entities": {
"url": {
"urls": [{
"url": "https:\/\/t.co\/8IkCzCDr19",
"expanded_url": "https:\/\/developer.twitter.com",
"display_url": "developer.twitter.com",
"indices": [
0,
23
]
}]
},
"description": {
"urls": []
}
},
"protected": false,
"followers_count": 6133636,
"friends_count": 12,
"listed_count": 12936,
"created_at": "Wed May 23 06:01:13 +0000 2007",
"favourites_count": 31,
"utc_offset": null,
"time_zone": null,
"geo_enabled": null,
"verified": true,
"statuses_count": 3656,
"lang": null,
"contributors_enabled": null,
"is_translator": null,
"is_translation_enabled": null,
"profile_background_color": null,
"profile_background_image_url": null,
"profile_background_image_url_https": null,
"profile_background_tile": null,
"profile_image_url": null,
"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
"profile_banner_url": null,
"profile_link_color": null,
"profile_sidebar_border_color": null,
"profile_sidebar_fill_color": null,
"profile_text_color": null,
"profile_use_background_image": null,
"has_extended_profile": null,
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null,
"translator_type": null
}
```

get_profile_details() arguments:

Argument
Argument Type
Description

twitter_username
String
Twitter Username

output_filename
String
What should be the filename where output is stored?.

output_dir
String
What directory output file should be saved?

proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.

Keys of the output:
Detail of each key can be found here.

To scrape profile's tweets:

In JSON format:

```python
from twitter_scraper_selenium import scrape_profile

microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
```
Output:
```javascript
{
"1430938749840629773": {
"tweet_id": "1430938749840629773",
"username": "Microsoft",
"name": "Microsoft",
"profile_picture": "https://twitter.com/Microsoft/photo",
"replies": 29,
"retweets": 58,
"likes": 453,
"is_retweet": false,
"retweet_link": "",
"posted_time": "2021-08-26T17:02:38+00:00",
"content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
"hashtags": [],
"mentions": [],
"images": [],
"videos": [],
"tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
"link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
},...
}
```

In CSV format:

```python
from twitter_scraper_selenium import scrape_profile

scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")

```

Output:

tweet_id
username
name
profile_picture
replies
retweets
likes
is_retweet
retweet_link
posted_time
content
hashtags
mentions
images
videos
post_url
link

1430938749840629773
Microsoft
Microsoft
https://twitter.com/Microsoft/photo
64
75
521
False

2021-08-26T17:02:38+00:00
Easy to use and efficient for all – Windows 11 is committed to an accessible future.

Here's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW
[]
[]
[]
[]
https://twitter.com/Microsoft/status/1430938749840629773
https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC

...

scrape_profile() arguments:

Argument
Argument Type
Description

twitter_username
String
Twitter username of the account

browser
String
Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox

proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.

tweets_count
Integer
Number of posts to scrape. Default is 10.

output_format
String
The output format, whether JSON or CSV. Default is JSON.

filename
String
If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.

directory
String
If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.

headless
Boolean
Whether to run crawler headlessly?. Default is True

Keys of the output

Key
Type
Description

tweet_id
String
Post Identifier(integer casted inside string)

username
String
Username of the profile

name
String
Name of the profile

profile_picture
String
Profile Picture link

replies
Integer
Number of replies of tweet

retweets
Integer
Number of retweets of tweet

likes
Integer
Number of likes of tweet

is_retweet
boolean
Is the tweet a retweet?

retweet_link
String
If it is retweet, then the retweet link else it'll be empty string

posted_time
String
Time when tweet was posted in ISO 8601 format

content
String
content of tweet as text

hashtags
Array
Hashtags presents in tweet, if they're present in tweet

mentions
Array
Mentions presents in tweet, if they're present in tweet

images
Array
Images links, if they're present in tweet

videos
Array
Videos links, if they're present in tweet

tweet_url
String
URL of the tweet

link
String
If any link is present inside tweet for some external website.

To Scrap profile's tweets with API:

```python
from twitter_scraper_selenium import scrape_profile_with_api

scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
```

scrape_profile_with_api() Arguments:

Argument
Argument Type
Description

username
String
Twitter's Profile username

tweets_count
Integer
Number of tweets to scrape.

output_filename
String
What should be the filename where output is stored?.

output_dir
String
What directory output file should be saved?

proxy
String
Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.

browser
String
Which browser to use for extracting out graphql key. Default is firefox.

headless
String
Whether to run browser in headless mode?

Output:

```js
{
"1608939190548598784": {
"tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
"tweet_details":{
...
},
"user_details":{
...
}
}, ...
}
```

Using scraper with proxy (http proxy)

Just pass proxy argument to function.

```python
from twitter_scraper_selenium import scrape_profile

scrape_profile("elonmusk", headless=False, proxy="66.115.38.247:5678", output_format="csv",filename="musk") #In IP:PORT format

```

Proxy that requires authentication:

```python

from twitter_scraper_selenium import scrape_profile

microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
proxy="sajid:[email protected]:5678") # username:password@IP:PORT
print(microsoft_data)

```

Privacy

This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.

LICENSE

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shaikhsajid1111/twitter-scraper-selenium

Awesome Lists containing this project

README

Twitter scraper selenium

Table of Contents

Prerequisites

Installation

Installing from the source

Installing with PyPI

Usage

Available Function In this Package - Summary

To scrape twitter profile details:

Keys of the output:
Detail of each key can be found here.

To scrape profile's tweets:

Using scraper with proxy (http proxy)

Privacy

LICENSE

https://github.com/shaikhsajid1111/twitter-scraper-selenium

Awesome Lists containing this project

README

Twitter scraper selenium

Table of Contents

Prerequisites

Installation

Installing from the source

Installing with PyPI

Usage

Available Function In this Package - Summary

To scrape twitter profile details:

Keys of the output: Detail of each key can be found here.

To scrape profile's tweets:

Using scraper with proxy (http proxy)

Privacy

LICENSE

Keys of the output:
Detail of each key can be found here.