Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/100/reddit-karma-prediction
Reddit Comment Karma Prediction and Analysis via Supervised and Unsupervised techniques
https://github.com/100/reddit-karma-prediction
flask graph-visualization machine-learning python reddit
Last synced: 13 days ago
JSON representation
Reddit Comment Karma Prediction and Analysis via Supervised and Unsupervised techniques
- Host: GitHub
- URL: https://github.com/100/reddit-karma-prediction
- Owner: 100
- Created: 2016-01-01T14:50:02.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-01-09T15:56:30.000Z (almost 9 years ago)
- Last Synced: 2024-11-30T15:57:24.141Z (23 days ago)
- Topics: flask, graph-visualization, machine-learning, python, reddit
- Language: Python
- Homepage: https://analyzekarma.herokuapp.com/
- Size: 129 MB
- Stars: 3
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#Predicting and Analyzing Reddit Comment Karma
##SEE IT LIVE [HERE](https://analyzekarma.herokuapp.com/)!
###Uses various data-science and NLP methods to analyze comment karma throughout Reddit and provide this information with visualizations and a RESTful API.
##Dependencies:
* Flask (0.9)
* Flask-Limiter (0.9.1)
* WTForms (2.1)
* gunicorn (0.17.2)
* textblob (0.11.0)
* numpy (1.10.1)
* scikit-learn (0.17)
* Python 2.7.11
* [Optional] Pickle files present in pickles folder##Implements:
* K-means unsupervised clustering
* Linear SVM supervised classification
* RESTful API
* Documented
* Rate-limited
* Extensive JSON parsing
* Various facets of Natural Language Processing
* Graph visualization
* JS (cytoscape.js)
* Various facets of web-design
* HTML
* CSS (Bootstrap)Inspired by the work done [here](http://cs229.stanford.edu/proj2014/Daria%20Lamberson,Leo%20Martel,%20Simon%20Zheng,Hacking%20the%20Hivemind.pdf) and [here](http://users.wpi.edu/~hsahay/assets/PredictingRedditPostPopularity.pdf), this application utilizes two classifiers and k-means clustering to provide insights on comment karma on Reddit. The first classifier used n-grams to classify comments as either positive or negative, and the second, which uses the classification from the first as one of its features, classifies comments into one of five bins based on score ranges based on various metadata and calculated NLP-related features. The application also, independent from the aforementioned tasks, parsed the aggregate data to cluster comments by average karma per comment, and created a clustered graph visualization of the top subreddits in the data-set. Lastly, the application implemented a fully-documented and rate-limited RESTful API to allow developers to use the prediction service in their own applications.