Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mubaris/urban-robot
Reddit bot :computer: which replies to sarcastic comments :trollface: :trollface:
https://github.com/mubaris/urban-robot
bot praw python reddit-bot sarcasm sentiment-analysis svm
Last synced: 3 months ago
JSON representation
Reddit bot :computer: which replies to sarcastic comments :trollface: :trollface:
- Host: GitHub
- URL: https://github.com/mubaris/urban-robot
- Owner: mubaris
- Created: 2017-09-15T10:52:57.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-12-22T11:23:16.000Z (about 7 years ago)
- Last Synced: 2024-08-03T01:22:07.211Z (6 months ago)
- Topics: bot, praw, python, reddit-bot, sarcasm, sentiment-analysis, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 21.5 MB
- Stars: 33
- Watchers: 2
- Forks: 15
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rainmana - mubaris/urban-robot - Reddit bot :computer: which replies to sarcastic comments :trollface: :trollface: (Jupyter Notebook)
README
# Urban Robot
![Urban Robot](ub.png)
Reddit bot which replies to sarcastic comments
## Libraries
* numpy, scipy - For Mathematical and Scientific processes
* nltk - NLP Application
* scikit - Model Training and Feature Extraction
* textblob - Sentiment Analysis
* pickle - Pickling Models and Vectorizers
* langdetect - Language Detection of comments
* praw - Reddit Bot## Features Used
* Sentiment Analysis of full text, equal 2 and 3 parts of text
* n-grams - 1 to 5
* Term Frequency–Inverse Document Frequency(TF-IFD) after stemming, tokenizing and using n-grams of 1 to 5
* Part of Speech Dictionary Vector
* Topic Modeling## Data Preprocessing
* Removed URLs
* Removed Stopwords
* Removed words with less than 4 tokens## Model Training and Classification
Using above Features and Preprocessing 4 models are trained,
* Logistic Regression
* Linear SVM
* SVM with Gaussian Kernel
* Random ForestIf a comment is predicted as 'sarcastic' by 3 out 4 models, it is treated as sarcastic.
## Files
* `classifier.py` - Training and Testing Models
* `bot.py` - Reddit Bot
* `cli_bot.py` - A Command Line Interactive Interface for the Reddit Bot
* `main.ipynb` - iPython Notebook led to the final model hypothesis## Running
1. Register for new Reddit App [here](https://www.reddit.com/prefs/apps/) and fill details (username, password, client id, client secret) under name 'bot1' in `praw.ini`
2. Run `classifier.py` with Python 3(Optional) or use pretrained models
3. Run `bot.py` with Python 3 for the automated Reddit Bot
4. Run `cli_bot.py` with Python 3 for an interactive version of the Reddit Bot.
That's it.
Logs can accessed at `comment.log`
How to fill [praw.ini](https://praw.readthedocs.io/en/v4.0.0/getting_started/configuration/prawini.html)
Final accuracy of models are in `final_accuracy.txt`
## Dataset
Dataset is available in `container`
Downloaded from [here](https://nlds.soe.ucsc.edu/sarcasm1)