Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sukanyadutta52/topic_modeling
What are the most pressing concerns regarding ‘Climate Change’ among tweeters according to Topic Modeling?
https://github.com/sukanyadutta52/topic_modeling
climate-change gensim matplotlib nltk numpy pandas pyldavis regular-expression spacy
Last synced: 11 days ago
JSON representation
What are the most pressing concerns regarding ‘Climate Change’ among tweeters according to Topic Modeling?
- Host: GitHub
- URL: https://github.com/sukanyadutta52/topic_modeling
- Owner: sukanyadutta52
- Created: 2024-01-31T09:11:39.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-10-16T15:22:38.000Z (about 1 month ago)
- Last Synced: 2024-10-18T10:58:46.742Z (about 1 month ago)
- Topics: climate-change, gensim, matplotlib, nltk, numpy, pandas, pyldavis, regular-expression, spacy
- Language: Jupyter Notebook
- Homepage:
- Size: 1.89 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Topic Modeling
## What are the most pressing concerns regarding ‘Climate Change’ among tweeters according to Topic Modeling?In this project ‘Climate Change Tweets’ (Kaggle,2022) has been used. The file contains a list of top tweets containing the keyword ‘Climate Change’, comprising 9050 tweets and 11 columns with the titles UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, and Image links for the period 1/01/2022 through 19/07/2022.
* Objective : This paper explores the most concerning aspects of climate change based on a tweet dataset, analyzing whether the public is neglecting key threats while focusing on others, given the widespread impact of climate change on organizational, economic, and environmental levels.
* Procedure : Latent Dirichlet Allocation (LDA) is a generative probabilistic model used for topic modeling by assuming each document is a mixture of topics and each topic is a mixture of words. It involves dimensionality reduction and relies on parameters like alpha and beta to determine topic density and word distribution, with preprocessing quality and optimal topic number being key factors for meaningful results.
* Conclusion : The topic modeling analysis reveals that industrial and organizational aspects of climate change receive more focus than political or environmental concerns, with environmental issues being the least discussed. The key words derived mostly lack sentiment of concern, and future analysis with lexicon-based sentiment tools could help assess the subjectivity and polarity of these topics.