Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alejo1630/whatsapp_chat_analysis
A Jupyter Notebook with the analysis for a Whatsapp Chat using several techniques of data wrangling, EDA and Sentiment Analysis
https://github.com/alejo1630/whatsapp_chat_analysis
data-visualization data-wrangling emojis exploratory-data-analysis jupyter-notebook sentiment-analysis whatsapp-chat
Last synced: about 2 months ago
JSON representation
A Jupyter Notebook with the analysis for a Whatsapp Chat using several techniques of data wrangling, EDA and Sentiment Analysis
- Host: GitHub
- URL: https://github.com/alejo1630/whatsapp_chat_analysis
- Owner: alejo1630
- Created: 2024-02-14T18:55:01.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-03-27T22:11:01.000Z (10 months ago)
- Last Synced: 2024-10-15T05:41:08.018Z (3 months ago)
- Topics: data-visualization, data-wrangling, emojis, exploratory-data-analysis, jupyter-notebook, sentiment-analysis, whatsapp-chat
- Language: Jupyter Notebook
- Homepage:
- Size: 13.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Whatsapp Chat Analysis
A Jupyter Notebook with the analysis for a Whatsapp Chat using several techniques of data wrangling, EDA and Sentiment Analysis
## ð° How does it work?
Data is downloaded as a .txt file from Whatsapp App Without Media and loaded in the Jupyer Notebook### Data Wrangling
- Chat data is organized into three columns:
- Complete Date
- User
- Message
- A **Regex Function** is used for several operations and create a *Message_Modified* column:
- Lowercase the messages
- Delete messages with URL
- Remove numbers
- Remove messages with spaces
- Remove special characters
- Remove repeated letters
- An **Emoji** column is created to identify all the emojis used in messages
- Complete Date information is used to create the following columns.
- Date [YYYY-MM-DD]
- Month (Name)
- Day (Number 1-31)
- Day (Week Name)
- Hour
This is what the final dataset looks like:### Exploratory Data Analysis (EDA)
Several questions were answered using the dataset information.#### *Who has sent the highest number of messages?*
#### *Messages interactions through time*
#### *Use of Emojis*
*Top Emojis User A*
`[('ð', 7403), ('ðĨš', 1058), ('ðŽ', 411), ('ðĪŠ', 277), ('âĪ', 266)]`*Top Emojis User L*
`[('ðĨš', 1378), ('ð', 1111), ('ðĨ°', 912), ('âš', 622), ('ð', 587)]`#### *Count of Words*
#### *WordCloud*
*Both Users*
*Both Users with Spanish Stop Words*
*A User*
*L User*
### Sentiment Analysis
#### *Spanish Analysis*
Sentiment score {Negative [0 : 0.33] | Neutral [0.33 : 0.66] ! Positive [0.66 : 1]}
*Positive Words*
*Negative Words*
*The *Spanish Sentiment Analysis* didn't get good results. Now we want to try to perform a Sentiment Analysis with the Messages Translations to English*
#### *Sentiment Analysis Using [Text Blob](https://www.analyticsvidhya.com/blog/2021/10/making-natural-language-processing-easy-with-textblob/)*
Sentiment score {Negative [-1 : -0.33] | Neutral [-0.33 : 0.33] ! Positive [0.33 : 1]}
*Positive Words*
*Negative Words*
#### *Sentiment Analysis Using [NLTK](https://akladyous.medium.com/sentiment-analysis-using-vader-c56bcffe6f24)*
Sentiment score {Negative [0 : -0.05] | Neutral [-0.05 : 0.05] ! Positive [0.05 : 1]}
*Positive Words*
*Negative Words*