Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/luizabash/reddit-datahoarder
NLP project focused on the subreddit r/datahoarder
https://github.com/luizabash/reddit-datahoarder
eda nlp
Last synced: about 11 hours ago
JSON representation
NLP project focused on the subreddit r/datahoarder
- Host: GitHub
- URL: https://github.com/luizabash/reddit-datahoarder
- Owner: luizabash
- Created: 2024-09-10T09:10:31.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-12T16:41:44.000Z (4 months ago)
- Last Synced: 2024-11-11T19:03:51.812Z (2 months ago)
- Topics: eda, nlp
- Language: Python
- Homepage:
- Size: 678 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reddit-Datahoarder
this repository contains a data analysis project focused on the subreddit r/datahoarder
# Reddit Data Analysis - r/datahoarderThis project explores the community discussions and trends in the subreddit [r/datahoarder](https://www.reddit.com/r/datahoarder/). The analysis focuses on understanding what topics and themes are popular in the subreddit by examining word frequencies and visualizing key terms.
## Project Overview
- **Objective**: To analyze Reddit posts from `r/datahoarder` and identify the most common words, trends, and sentiment in the community's discussions.
- **Data Source**: Reddit posts were collected using the PRAW library (Python Reddit API Wrapper).
- **Tools and Libraries Used**: Python, Pandas, Matplotlib, NLTK, Gensim, and Streamlit.## Project Structure
- **`data/`**: Contains the cleaned dataset used for analysis.
- **`notebooks/`**: Jupyter Notebook with Exploratory Data Analysis (EDA) and visualizations.
- **`images/`**: Visualizations generated during EDA.
- **`streamlit_app.py`**: A Streamlit app to create an interactive dashboard.## Key Steps
1. **Data Collection**: Collected Reddit posts from `r/datahoarder` using the PRAW API.
2. **Data Preprocessing**: Cleaned the text data by removing URLs, punctuation, stop words, and lemmatizing.
3. **Exploratory Data Analysis (EDA)**: Conducted word frequency analysis and created visualizations like word clouds and bar plots.
4. **Sentiment Analysis** (Optional): Attempted sentiment analysis using VADER, but faced compatibility issues with the current setup.
5. **Results and Visualization**: Visualized the most common words in titles and post bodies, and identified key trends.