Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jillmpla/sentimentanalysis

Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.
https://github.com/jillmpla/sentimentanalysis

python reddit sentiment-analysis sqlite web-scraper

Last synced: about 1 month ago
JSON representation

Comment sentiment analysis of the top 25 posts (from the last 24 hrs) on a subreddit (reddit.com) using a web scraper.

Awesome Lists containing this project

README

        

Comment Sentiment Analysis of the Top 25 Posts on a Subreddit (www.reddit.com) (from the last 24 hrs)

Languages: Python, SQL(SQLite)

Purpose of the program:
To define, evaluate, and visualize overall public sentiment towards various news articles.

Three versions of the program are available in this repository:


  • RedditbotSpidernews scraps and analyzes the top posts from the last 24 hrs on /r/news/.

  • RedditbotSpiderpolitics scraps and analyzes the top posts from the last 24 hrs on /r/politics/.

  • RedditbotSpiderworldnews scraps and analyzes the top posts from the last 24 hrs on /r/worldnews/.



Note: The program can be (theoretically) used on any subreddit by changing the address and (if needed) altering the XPath's within RedditbotSpider.py.


What the program does:


  • Web scraper connects to subreddit and collects the top 25 post titles, as well as comments within each post.

  • Data is inserted into a SQLite database.

  • Data is cleaned up: any rows lacking a comment are deleted.

  • Comments are combined for each corresponding title and placed into a new database table.

  • A unique ID (1-25) is added for each title and corresponding group of comments.

  • Lexicon (word-based) for sentiment analysis is applied to each set of comments.

  • Data visualization: an interactive (html) bar chart, CSV file, and completion window are generated.


Lexicon used to extract an overall sentiment level:


Positive +1
Negative -1


good
fuck


great
corrupt


happy
stupid


win
irrelevant


love
colluding


nice
horrible


authentic
unfair


like
guilty


fun
foolish


appreciate
hateful


How to run the program:


  • Download and install SQLite

  • Download and install Python 3.6.3

  • Make sure your System PATH includes the path to Python's interpreter

  • In Windows Command Prompt do/install the following:


    • pip3 install pandas

    • pip3 install scrapy

    • pip3 install plotly

    • pip install pypiwin32


  • Download this repository & unzip it

  • sentimentanalysis-master->RedditbotSpidernews or RedditbotSpiderpolitics or RedditbotSpiderworldnews->right click on main.py, edit with IDLE->Run->Run Module


Note: Before running the program a second time, move or delete the generated/results files: test.db, temp-plot.html, and results.csv out of the RedditbotSpidernews/RedditbotSpiderpolitics/RedditbotSpiderworldnews folder.


Tools/Libraries/Packages used: