{"id":21071217,"url":"https://github.com/andrewdarnall/the-observer","last_synced_at":"2026-02-17T22:01:50.842Z","repository":{"id":174810573,"uuid":"646463483","full_name":"AndrewDarnall/The-Observer","owner":"AndrewDarnall","description":"A Big Data processing pipeline wich a topic modeling model (BERTopic) using Mastodon data","archived":false,"fork":false,"pushed_at":"2024-10-14T23:13:19.000Z","size":97519,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-26T13:38:47.429Z","etag":null,"topics":["apache-kafka","apache-spark","bertopic","dataengineering","mastodon","tapunict"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AndrewDarnall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-28T13:37:14.000Z","updated_at":"2025-06-12T20:33:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"dbc53f6e-ec18-4d3a-933c-796e6b9bad4d","html_url":"https://github.com/AndrewDarnall/The-Observer","commit_stats":null,"previous_names":["andrewdarnall/tap_project","andrewdarnall/the-observer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AndrewDarnall/The-Observer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndrewDarnall%2FThe-Observer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndrewDarnall%2FThe-Observer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndrewDarnall%2FThe-Observer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndrewDarnall%2FThe-Observer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AndrewDarnall","download_url":"https://codeload.github.com/AndrewDarnall/The-Observer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndrewDarnall%2FThe-Observer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29559961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T21:50:49.831Z","status":"ssl_error","status_checked_at":"2026-02-17T21:46:15.313Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-kafka","apache-spark","bertopic","dataengineering","mastodon","tapunict"],"created_at":"2024-11-19T18:50:47.641Z","updated_at":"2026-02-17T22:01:50.821Z","avatar_url":"https://github.com/AndrewDarnall.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Project Logo](./img/The_Observer_candidate_2.png)\n# The Observer\n\nMany of us have experienced long, uninterrupted sessions of scrolling through social media feeds. But has anyone ever paused to ask: what exactly are all these users discussing?\n\nThis project seeks to answer that question. Its primary objective is to provide an unbiased, comprehensive analysis of public Mastodon servers by leveraging the capabilities of the `BERTopic` model. The project is designed with an extensible architecture, allowing for flexible and scalable insights into public discourse across these platforms.\n\n-------------------\n\n## Architecture\n\nMiro board [here](https://miro.com/app/board/uXjVMrHQaa4=/?share_link_id=492488903107)\n\n![Architecture Diagram](./img/The_Observer_Architecture.png)\n\n-------------------\n\n## Requirements\n\n| Component     | Version   |\n|---------------|-----------|\n| Docker        | `20.10.5` |\n| Docker-Compose| `1.25.0`  |\n| Mastodon API  | `2.0`     |\n\n--------------------\n\n## Setup\n\nObtain the project\n\n```bash\ngit clone https://github.com/AndrewDarnall/The-Observer.git\ncd The-Observer\n```\n\nRun a setup shell script (builds some container images such as Apache Kafka)\n\n```bash\nbash ./Scripts/project_setup.sh\n```\n\nBuild the project (this might take a while)\n\n```bash\ndocker-compose build\n```\n\nRun the project\n\n```bash\ndocker-compose up\n```\n\n-------------------\n\n## Dashboard Configuration\n\nOpen your web browser of choice and enter:\n\n```bash\nlocalhost:5601/\n```\n1) Go to \u003e Saved Objects \u003e Import \u003e The-Observer/Data_Visualization/saved-objects/dashboard_export.ndjson \u003e click import\n2) Reload the page as is\n3) Go to dashboard and select the 'the_observer' dashboard\n\n--------------------------\n\n# End Result\n\n![Dashboard One](./img/Project_Dashboard_1.png)\n\n--------------------------\n\n![Dashboard Two](./img/Project_Dashboard_2.png)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewdarnall%2Fthe-observer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewdarnall%2Fthe-observer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewdarnall%2Fthe-observer/lists"}