Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jillmpla/sentimentanalysis
Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.
https://github.com/jillmpla/sentimentanalysis
python reddit sentiment-analysis sqlite web-scraper
Last synced: about 1 month ago
JSON representation
Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.
- Host: GitHub
- URL: https://github.com/jillmpla/sentimentanalysis
- Owner: jillmpla
- License: mit
- Created: 2017-11-28T23:48:30.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-08-27T02:52:54.000Z (over 4 years ago)
- Last Synced: 2024-05-02T04:05:22.183Z (8 months ago)
- Topics: python, reddit, sentiment-analysis, sqlite, web-scraper
- Language: HTML
- Homepage:
- Size: 8.49 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Comment Sentiment Analysis of the Top 25 Posts on a Subreddit (www.reddit.com) (from the last 24 hrs)
Languages: Python, SQL(SQLite)
Purpose of the program:
To define, evaluate, and visualize overall public sentiment towards various news articles.Three versions of the program are available in this repository:
- RedditbotSpidernews scraps and analyzes the top posts from the last 24 hrs on /r/news/.
- RedditbotSpiderpolitics scraps and analyzes the top posts from the last 24 hrs on /r/politics/.
- RedditbotSpiderworldnews scraps and analyzes the top posts from the last 24 hrs on /r/worldnews/.
Note: The program can be (theoretically) used on any subreddit by changing the address and (if needed) altering the XPath's within RedditbotSpider.py.
What the program does:
- Web scraper connects to subreddit and collects the top 25 post titles, as well as comments within each post.
- Data is inserted into a SQLite database.
- Data is cleaned up: any rows lacking a comment are deleted.
- Comments are combined for each corresponding title and placed into a new database table.
- A unique ID (1-25) is added for each title and corresponding group of comments.
- Lexicon (word-based) for sentiment analysis is applied to each set of comments.
- Data visualization: an interactive (html) bar chart, CSV file, and completion window are generated.
Lexicon used to extract an overall sentiment level:
Positive +1
Negative -1
good
fuck
great
corrupt
happy
stupid
win
irrelevant
love
colluding
nice
horrible
authentic
unfair
like
guilty
fun
foolish
appreciate
hateful
How to run the program:
- Download and install SQLite
- Download and install Python 3.6.3
- Make sure your System PATH includes the path to Python's interpreter
- In Windows Command Prompt do/install the following:
- pip3 install pandas
- pip3 install scrapy
- pip3 install plotly
- pip install pypiwin32
- Download this repository & unzip it
- sentimentanalysis-master->RedditbotSpidernews or RedditbotSpiderpolitics or RedditbotSpiderworldnews->right click on main.py, edit with IDLE->Run->Run Module
Note: Before running the program a second time, move or delete the generated/results files: test.db, temp-plot.html, and results.csv out of the RedditbotSpidernews/RedditbotSpiderpolitics/RedditbotSpiderworldnews folder.
Tools/Libraries/Packages used: