Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andrewdarnall/the-observer
A big data processing pipeline wich a topic modeling model (BERTopic) using Mastodon data
https://github.com/andrewdarnall/the-observer
apache-kafka apache-spark bertopic dataengineering mastodon tapunict
Last synced: about 8 hours ago
JSON representation
A big data processing pipeline wich a topic modeling model (BERTopic) using Mastodon data
- Host: GitHub
- URL: https://github.com/andrewdarnall/the-observer
- Owner: AndrewDarnall
- License: gpl-3.0
- Created: 2023-05-28T13:37:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-14T23:13:19.000Z (3 months ago)
- Last Synced: 2025-01-20T21:57:54.135Z (about 8 hours ago)
- Topics: apache-kafka, apache-spark, bertopic, dataengineering, mastodon, tapunict
- Language: Python
- Homepage:
- Size: 93 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![Project Logo](./img/The_Observer_candidate_2.png)
# The ObserverMany of us have experienced long, uninterrupted sessions of scrolling through social media feeds. But has anyone ever paused to ask: what exactly are all these users discussing?
This project seeks to answer that question. Its primary objective is to provide an unbiased, comprehensive analysis of public Mastodon servers by leveraging the capabilities of the `BERTopic` model. The project is designed with an extensible architecture, allowing for flexible and scalable insights into public discourse across these platforms.
-------------------
## Architecture
Miro board [here](https://miro.com/app/board/uXjVMrHQaa4=/?share_link_id=492488903107)
![Architecture Diagram](./img/The_Observer_Architecture.png)
-------------------
## Requirements
| Component | Version |
|---------------|-----------|
| Docker | `20.10.5` |
| Docker-Compose| `1.25.0` |
| Mastodon API | `2.0` |--------------------
## Setup
Obtain the project
```bash
git clone https://github.com/AndrewDarnall/The-Observer.git
cd The-Observer
```Run a setup shell script (builds some container images such as Apache Kafka)
```bash
bash ./Scripts/project_setup.sh
```Build the project (this might take a while)
```bash
docker-compose build
```Run the project
```bash
docker-compose up
```-------------------
## Dashboard Configuration
Open your web browser of choice and enter:
```bash
localhost:5601/
```
1) Go to > Saved Objects > Import > The-Observer/Data_Visualization/saved-objects/dashboard_export.ndjson > click import
2) Reload the page as is
3) Go to dashboard and select the 'the_observer' dashboard--------------------------
# End Result
![Dashboard One](./img/Project_Dashboard_1.png)
--------------------------
![Dashboard Two](./img/Project_Dashboard_2.png)