https://github.com/andrewdarnall/the-observer
A Big Data processing pipeline wich a topic modeling model (BERTopic) using Mastodon data
https://github.com/andrewdarnall/the-observer
apache-kafka apache-spark bertopic dataengineering mastodon tapunict
Last synced: 3 months ago
JSON representation
A Big Data processing pipeline wich a topic modeling model (BERTopic) using Mastodon data
- Host: GitHub
- URL: https://github.com/andrewdarnall/the-observer
- Owner: AndrewDarnall
- License: gpl-3.0
- Created: 2023-05-28T13:37:14.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-14T23:13:19.000Z (over 1 year ago)
- Last Synced: 2025-10-26T13:38:47.429Z (7 months ago)
- Topics: apache-kafka, apache-spark, bertopic, dataengineering, mastodon, tapunict
- Language: Python
- Homepage:
- Size: 93 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# The Observer
Many of us have experienced long, uninterrupted sessions of scrolling through social media feeds. But has anyone ever paused to ask: what exactly are all these users discussing?
This project seeks to answer that question. Its primary objective is to provide an unbiased, comprehensive analysis of public Mastodon servers by leveraging the capabilities of the `BERTopic` model. The project is designed with an extensible architecture, allowing for flexible and scalable insights into public discourse across these platforms.
-------------------
## Architecture
Miro board [here](https://miro.com/app/board/uXjVMrHQaa4=/?share_link_id=492488903107)

-------------------
## Requirements
| Component | Version |
|---------------|-----------|
| Docker | `20.10.5` |
| Docker-Compose| `1.25.0` |
| Mastodon API | `2.0` |
--------------------
## Setup
Obtain the project
```bash
git clone https://github.com/AndrewDarnall/The-Observer.git
cd The-Observer
```
Run a setup shell script (builds some container images such as Apache Kafka)
```bash
bash ./Scripts/project_setup.sh
```
Build the project (this might take a while)
```bash
docker-compose build
```
Run the project
```bash
docker-compose up
```
-------------------
## Dashboard Configuration
Open your web browser of choice and enter:
```bash
localhost:5601/
```
1) Go to > Saved Objects > Import > The-Observer/Data_Visualization/saved-objects/dashboard_export.ndjson > click import
2) Reload the page as is
3) Go to dashboard and select the 'the_observer' dashboard
--------------------------
# End Result

--------------------------
