https://github.com/jillmpla/sentimentanalysis

Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.
https://github.com/jillmpla/sentimentanalysis

python reddit sentiment-analysis sqlite web-scraper

Last synced: 22 days ago
JSON representation

Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.

Host: GitHub
URL: https://github.com/jillmpla/sentimentanalysis
Owner: jillmpla
License: mit
Created: 2017-11-28T23:48:30.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-08-27T02:52:54.000Z (almost 5 years ago)
Last Synced: 2025-03-03T16:48:43.796Z (4 months ago)
Topics: python, reddit, sentiment-analysis, sqlite, web-scraper
Language: HTML
Homepage:
Size: 8.49 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

Comment Sentiment Analysis of the Top 25 Posts on a Subreddit (www.reddit.com) (from the last 24 hrs)

Languages: Python, SQL(SQLite)

Purpose of the program:
To define, evaluate, and visualize overall public sentiment towards various news articles.

Three versions of the program are available in this repository:

RedditbotSpidernews scraps and analyzes the top posts from the last 24 hrs on /r/news/.

RedditbotSpiderpolitics scraps and analyzes the top posts from the last 24 hrs on /r/politics/.

RedditbotSpiderworldnews scraps and analyzes the top posts from the last 24 hrs on /r/worldnews/.

Note: The program can be (theoretically) used on any subreddit by changing the address and (if needed) altering the XPath's within RedditbotSpider.py.

What the program does:

Web scraper connects to subreddit and collects the top 25 post titles, as well as comments within each post.

Data is inserted into a SQLite database.

Data is cleaned up: any rows lacking a comment are deleted.

Comments are combined for each corresponding title and placed into a new database table.

A unique ID (1-25) is added for each title and corresponding group of comments.

Lexicon (word-based) for sentiment analysis is applied to each set of comments.

Data visualization: an interactive (html) bar chart, CSV file, and completion window are generated.

Lexicon used to extract an overall sentiment level:

Positive +1
Negative -1

good
fuck

great
corrupt

happy
stupid

win
irrelevant

love
colluding

nice
horrible

authentic
unfair

like
guilty

fun
foolish

appreciate
hateful

How to run the program:

Download and install SQLite

Download and install Python 3.6.3

Make sure your System PATH includes the path to Python's interpreter

In Windows Command Prompt do/install the following:

pip3 install pandas

pip3 install scrapy

pip3 install plotly

pip install pypiwin32

Download this repository & unzip it

sentimentanalysis-master->RedditbotSpidernews or RedditbotSpiderpolitics or RedditbotSpiderworldnews->right click on main.py, edit with IDLE->Run->Run Module

Note: Before running the program a second time, move or delete the generated/results files: test.db, temp-plot.html, and results.csv out of the RedditbotSpidernews/RedditbotSpiderpolitics/RedditbotSpiderworldnews folder.

Tools/Libraries/Packages used: