{"id":16362297,"url":"https://github.com/dpacassi/dynamic-event-detection-in-data-streams","last_synced_at":"2025-07-10T20:35:22.688Z","repository":{"id":90609612,"uuid":"249940960","full_name":"dpacassi/dynamic-event-detection-in-data-streams","owner":"dpacassi","description":null,"archived":false,"fork":false,"pushed_at":"2020-03-25T09:54:33.000Z","size":63652,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-30T06:12:33.174Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dpacassi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-25T09:49:54.000Z","updated_at":"2023-04-26T11:08:31.000Z","dependencies_parsed_at":"2023-06-01T15:00:48.177Z","dependency_job_id":null,"html_url":"https://github.com/dpacassi/dynamic-event-detection-in-data-streams","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dpacassi%2Fdynamic-event-detection-in-data-streams","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dpacassi%2Fdynamic-event-detection-in-data-streams/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dpacassi%2Fdynamic-event-detection-in-data-streams/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dpacassi%2Fdynamic-event-detection-in-data-streams/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dpacassi","download_url":"https://codeload.github.com/dpacassi/dynamic-event-detection-in-data-streams/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239763622,"owners_count":19692812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T02:23:49.670Z","updated_at":"2025-02-20T02:21:01.185Z","avatar_url":"https://github.com/dpacassi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dynamic Event Detection in Data Streams\n\n## Abstract\nDetecting events in data streams can be difficult,\nespecially if the definition, content, or properties of an event change over time.\n\nThis bachelor thesis focuses on the development and evaluation of an online clustering solution\nin which events are defined either as changes in existing clusters or as the formation of new clusters.\nThe solution is a text mining software, which receives new news articles over a data stream and processes them.\nArticles are assigned to different clusters due to their similarity to other articles.\nThe assumption is that very similar articles write about the same news story.\nIn addition, the evaluation of the clustering quality is measured with a custom scoring function.\n\nThe first part of this work consists of determining a suitable data set,\nwhich will be the subject of the clustering and provides the ground truth for evaluating the results.\nThe implemented solution uses HDBSCAN as the clustering method\nand compares it with the state-of-the-art method *k*-means.\nIt turned out that the use of HDBSCAN has advantages over *k*-means in terms of both performance and precision.\nFurthermore, various text preprocessing methods and vector space models are evaluated,\nwith Text Lemmatization and tf-idf providing the most promising results.\nOnce applied in a simulated online setting,\nthe final evaluation found that the noise rate in the overall clustering reduces the precision in the event detection.\n\nThe resulting precision of the clustering is 72% with a standard deviation of 12%.\nThe precision for detecting new events results in 62% with a standard deviation of 43%.\nDetecting changes in existing events results in a precision 69% with a standard deviation of 16%.\nA continuation of this work should focus on improving the overall clustering to increase the precision of the event detection.\n\n## Thesis\nSee [doc/thesis.pdf](doc/thesis.pdf).\n\n## Authors\n- [Daniel Milenkovic](http://danielmilenkovic.me/)\n- [David Pacassi Torrico](https://pacassi.ch/)\n\n## Supervisors\n- [Dr. Andreas Weiler](https://www.zhaw.ch/de/ueber-uns/person/wele/)\n- [Prof. Dr. Kurt Stockinger](https://www.zhaw.ch/de/ueber-uns/person/stog/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdpacassi%2Fdynamic-event-detection-in-data-streams","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdpacassi%2Fdynamic-event-detection-in-data-streams","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdpacassi%2Fdynamic-event-detection-in-data-streams/lists"}