https://github.com/bartekpog/messenger-analysis
Messenger chat analyzer. Take a look at the in-depth study of your chat history.
https://github.com/bartekpog/messenger-analysis
algorithms data-science data-visualization exploration messenger python text-analysis
Last synced: 7 months ago
JSON representation
Messenger chat analyzer. Take a look at the in-depth study of your chat history.
- Host: GitHub
- URL: https://github.com/bartekpog/messenger-analysis
- Owner: BartekPog
- Created: 2020-05-03T11:28:57.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-07-06T22:00:34.000Z (over 2 years ago)
- Last Synced: 2025-03-18T10:51:29.151Z (7 months ago)
- Topics: algorithms, data-science, data-visualization, exploration, messenger, python, text-analysis
- Language: Python
- Homepage:
- Size: 32.3 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Messenger Report Generator

For many of us, Messenger is the main communicator. It contains a lot of information about ourselves and our relationships. This repository contains a script that generates a bunch of charts about your messages history.
##### Charts generated by the script:
- messages count rank
- overall activity over time
- average activity over a day
- average activity over a week
- average message lengths in _significant_ chats
- word clouds of important phrases in chats
- activity over time per chat
- messages length distributions in _significant_ chats
- language diversity rank _(experimental)_
## Table of contents
1. [Usage](#usage)
- [Collecting data](#collecting-data)
- [Setting up the script](#setting-up-the-script)
- [Running script](#running-script)
1. [Examples](#examples)
- [Activity in chats plot](#activity-in-chats-plot)
- [Average messages length in significant chats](#average-messages-length-in-significant-chats)
- [Chat keyword cloud](#chat-keyword-cloud)
- [Language diversity rank](#language-diversity-rank)
1. [Contribute](#contribute)
## Usage
#### Collecting data
Facebook enables its users to get their Messenger **messages history**.
Data requesting steps:
1. Go to facebook settings and then proceed to [downloading your data](https://www.facebook.com/dyi/?referrer=yfi_settings).
1. Deselect all data and select only **Messages**
1. Choose data format to **JSON**
1. Choose the multimedia quality to **low** (all the media in chats are downloaded as well but they are omitted by the script)
1. Accept data request
Preparing data file shall not take more than 24h. You will be notified when your file is ready.
#### Setting up the script
After **cloning** this repository place the downloaded zip in `zips` subdirectory and setup the virtual environment for **python 3.8**.
On Linux you can use [virtualenv](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/26/python-virtual-env/).
On Windows you have to use `conda` virtual environment. You can use either
- [Miniconda](https://docs.conda.io/en/latest/miniconda.html) - install and run cmd via `Anaconda Prompt (miniconda3)` and `cd` to the cloned repository directory
- [Anaconda](https://www.anaconda.com/products/individual#windows) - install and run `Anaconda Navigator (anaconda3)`, then go to **Environments**, setup new environment, start it via cmd and `cd` to the repository directory.
After setting up the environment and opening the repository directory run:
```bash
pip install -r requirements.txt
python -m spacy download pl_core_news_md
python -m spacy download en_core_web_sm
```
In `params.json` you shall set your `"user"`, `"language"` and `"timezone"`.
```JSON
{
"user": "Bartek Pogod",
"language": "polish",
"timezone": "Europe/Warsaw",
[...]
}
```
#### Running script
If all is set up properly the charts shall be generated after running:
```bash
python messages_analysis.py
```
After a couple of minutes, all the plots shall appear in `figures` folder (or other specified in `params.json`).
## Examples:
#### Activity in chats plot
This plot can show how your relationships changed over time. It can show when your relationships started to form or to collapse. The lines are smoothened to increase visibility.

#### Average messages length in significant chats
This chart can say a lot about the interactions. Usually, longer messages are more formal, possibly more personal. It says _"in significant chats"_, because some chats have too few messages to be considered important.

#### Chat keyword cloud
It is generated using [TextRank algorithm](https://www.aclweb.org/anthology/W04-3252.pdf). Size of the words shall represent the importance of them in a chat. The example chart is in polish, because it is the first language of the author.

#### Language diversity rank
Language diversity score shall represent how diverse is the vocabulary of the speaker in a chat.
To calculate the score the messages sent by a chat participant are prepared - numbers, punctuation and entities are removed. All the words are lemmatized, to get the word base form. Then the messages sent by one person are divided into batches of 2000 words. For every 2000 words, there is calculated the quotient of lemmas number and batch size (2000). The final score is a mean of those quotients.

## Contribute
The possibilities are almost endless. Take a look at the **issues** tab to write your own ideas or see how you can help! Let's make something great :D.