Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gaizkiaadeline/clustering-and-topic-extraction-of-twitter-x-responses-to-bsi-s-2023-ransomware-attack
A project analyzing user tweets about the 2023 BSI ransomware attack using clustering and topic extraction methods. Persona analysis is performed on both approaches, with a comparison of the results to extract key insights.
https://github.com/gaizkiaadeline/clustering-and-topic-extraction-of-twitter-x-responses-to-bsi-s-2023-ransomware-attack
cluster-analysis lda mining nltk text topic-extraction topic-modeling
Last synced: 9 days ago
JSON representation
A project analyzing user tweets about the 2023 BSI ransomware attack using clustering and topic extraction methods. Persona analysis is performed on both approaches, with a comparison of the results to extract key insights.
- Host: GitHub
- URL: https://github.com/gaizkiaadeline/clustering-and-topic-extraction-of-twitter-x-responses-to-bsi-s-2023-ransomware-attack
- Owner: gaizkiaadeline
- Created: 2024-10-16T15:33:44.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-19T06:58:57.000Z (3 months ago)
- Last Synced: 2024-11-06T10:14:02.211Z (about 2 months ago)
- Topics: cluster-analysis, lda, mining, nltk, text, topic-extraction, topic-modeling
- Language: Jupyter Notebook
- Homepage:
- Size: 752 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Clustering and Topic Extraction of Twitter (X) Responses to BSI's 2023 Ransomware Attack
This project explores user responses to the 2023 ransomware attack on BSI, which disrupted the bank's national operations. Twitter (X) was widely used for public discussion, and this project aims to model and analyze these discussions using both clustering and topic extraction methods.**The project includes:**
- Clustering Analysis: Different clustering models are applied to group user tweets. Various values of clusters (k) are tested, and the silhouette score is used to compare and justify the chosen number of clusters.
- Persona Analysis: Persona analysis is conducted on the resulting clusters to gain insights into the characteristics and behaviors of users in each group.
- Topic Extraction: An alternative approach using topic extraction is implemented to uncover the main topics discussed by users. The method is compared to clustering, and persona analysis is repeated based on the extracted topics.
- Comparison of Clustering vs. Topic Extraction: A detailed comparison is conducted between the results of clustering and topic extraction, highlighting key similarities, differences, and interesting patterns in the data.**Key Features:**
- Clustering: Multiple values of k are tested with silhouette score analysis.
- Persona Analysis: Insights into user personas based on both clustering and topic extraction.
- Topic Extraction: Models are applied to identify key topics discussed by users.
- Comparison: Detailed analysis comparing clustering and topic extraction results.**Technologies Used:**
- Python: For data processing and model development.
- Scikit-learn: For clustering and silhouette score analysis.
- NLTK: For text preprocessing.
- Latent Dirichlet Allocation (LDA): For topic modeling.
- Matplotlib / Seaborn: For visualizing results.![image](https://github.com/user-attachments/assets/1e3e42ec-4d7f-4291-a271-7a1a9c0019c0)