https://github.com/shamim-akhtar/tpcraw
https://github.com/shamim-akhtar/tpcraw
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/shamim-akhtar/tpcraw
- Owner: shamim-akhtar
- License: apache-2.0
- Created: 2025-03-10T08:18:58.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-22T08:47:23.000Z (about 1 month ago)
- Last Synced: 2025-03-22T09:27:32.202Z (about 1 month ago)
- Language: Python
- Size: 1.85 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TPCraw - TemasekPoly Reddit Sentiment Analysis
TPCraw is a Python-based project that crawls posts and comments from the TemasekPoly subreddit, then analyzes them using Google’s Generative AI (Gemini). The project produces sentiment analyses, summaries, and visualizations (via a web-based dashboard) to gain insights into discussions about Temasek Polytechnic on Reddit.
This site was built using [View the Dashboard](https://shamim-akhtar.github.io/tpcraw/).
## Features
- **Incremental Reddit Crawler**
Uses PRAW to fetch new posts and comments from r/TemasekPoly while avoiding duplicates.- **Data Storage**
Stores posts and comments as plain-text files in a structured format.- **AI-Based Analysis** Sentiment Analysis using Google Gemini for classifying text as positive, negative, or neutral.
Topic Extraction for categorizing discussions (e.g., facilities, academics, internships).
Summaries generate concise overviews of lengthy discussions.
- **Aggregated Results**
Combines raw and analyzed data, then saves them as JSON or CSV for further processing.- **Interactive Dashboard**
An index.html file that uses Chart.js to visualize sentiment counts (positive vs. negative). Clicking a bar reveals additional details about the post and its comments, including AI analysis.