https://github.com/sferez/twitter_toolbox
Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data
https://github.com/sferez/twitter_toolbox
data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api
Last synced: 2 months ago
JSON representation
Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data
- Host: GitHub
- URL: https://github.com/sferez/twitter_toolbox
- Owner: sferez
- Created: 2023-06-08T15:27:27.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-24T07:44:43.000Z (almost 2 years ago)
- Last Synced: 2025-03-24T03:03:12.914Z (3 months ago)
- Topics: data-collection, data-science, nlp, preprocessing, twitter, twitter-api, twitter-scraping, twitter-streaming-api
- Language: Python
- Homepage:
- Size: 29.3 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Twitter Toolbox
Welcome to the Twitter Toolbox, a comprehensive suite designed to simplify data acquisition, preprocessing, and analysis
from Twitter. This project is an up-to-date solution built in response to the recent changes in Twitter's API and front
end. Given that several existing libraries are no longer maintained or updated, this Twitter Toolbox ensures a seamless
data extraction process for data analysts, researchers, marketers, and developers alike.## Table of Contents
- [Twitter Toolbox](#twitter-toolbox)
- [Table of Contents](#table-of-contents)
- [Features](#features)
- [Articles](#articles)
- [Data Acquisition](#data-acquisition)
- [Preprocessing](#preprocessing)
- [NLP](#nlp)
- [Future Developments](#future-developments)
- [Contributions and Feedback](#contributions-and-feedback)
- [Disclaimer](#disclaimer)
- [Structure](#structure)## Features
The Twitter Toolbox offers a broad spectrum of functionalities, including:
**Data Acquisition**: Our toolbox equips you with everything you need to extract a variety of data from Twitter, from
streaming and scraping real-time data to making API calls and hydrating or dehydrating tweets.**Preprocessing**: Our tools offer data cleaning, language filtering, data labeling, and group generation features to
refine
your dataset for accurate and reliable analyses.**Natural Language Processing** (NLP): The toolbox is equipped with sentiment analysis, emotion analysis, topic
analysis,
and named entity recognition to provide you with meaningful insights from the content of tweets.Each of these capabilities is designed to help you make the most out of Twitter data, whether you're exploring public
sentiment, detecting emotional trends, identifying key themes, or recognizing named entities such as organizations or
individuals.## Articles
I have written a series of articles to explain how to use the Twitter Toolbox. You can find them here:
- [Your Guide to Real-Time Tweet Streaming](https://medium.com/@simeon.ferez/ep1-twitter-toolbox-17436c8ba4e6)
- [Your Ultimate Guide to Data Scraping](https://medium.com/python-in-plain-english/ep2-twitter-toolbox-your-ultimate-guide-to-data-scraping-fa9f7aa18b23)## Data Acquisition
Collect data from Twitter using scraping, streaming and Twitter API.
Learn more about the data
collection [here](https://github.com/sferez/Noisy_Entropy_Estimation/tree/main/src/dataAcquisition).## Preprocessing
In progress...
## NLP
In progress...
## Future Developments
The Twitter Toolbox is an evolving project. We plan to continue adding new features as they are developed. Stay tuned
for regular updates and improvements!## Contributions and Feedback
This toolbox is designed to grow with the contributions and feedback from the community. You are welcome to suggest new
features, report any issues, or even submit pull requests. Let's collaborate to create the most valuable Twitter Toolbox
possible!## Disclaimer
Please note that the use of the Twitter API and all data retrieved through this toolbox should comply with the Twitter
Terms of Service, Developer Agreement, and Developer Policy, including Twitter's privacy policy. This project includes a
dehydration script to comply with Twitter's terms of service, allowing for sharing only the tweet_id. Always de-identify
the information and respect user privacy when sharing or publishing data.## Structure
Project is structured as follows:
```
├── data (Data is not stored in the repository)
├── src
│ ├── dataAcquisition
│ ├── preprocessing
│ ├── nlp
├── docs
└──
```Data is stored in the following structure:
```
├── data
│ ├── (Scrape from user, hashtag or keyword)
│ │ ├──
│ │ │ ├── __.csv
│ │ │ ├── __.csv
│ │ │ └── ...
│ │ ├──
│ │ │ ├── __.csv
│ │ │ ├── __.csv
│ │ │ └── ...
│ │ └── ...
│ ├── (Stream 1% of tweets)
│ │ ├── .csv
│ │ ├── .csv
│ │ └── ...
│ ├── (Scrape from Github and rehydrate)
│ │ ├── .csv
│ │ ├── .csv
│ │ └── ...
│ └──
└──
```