https://github.com/nandahkrishna/sarcasmdetection

Detecting sarcasm in Reddit comments
https://github.com/nandahkrishna/sarcasmdetection

bert-embeddings classification explainable-ml jupyter-notebook machine-learning natural-language-processing python reddit sarcasm sarcasm-detection tfidf

Last synced: 29 days ago
JSON representation

Detecting sarcasm in Reddit comments

Host: GitHub
URL: https://github.com/nandahkrishna/sarcasmdetection
Owner: nandahkrishna
Created: 2020-06-28T07:16:28.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-06-28T07:58:41.000Z (almost 6 years ago)
Last Synced: 2025-10-21T19:24:46.370Z (8 months ago)
Topics: bert-embeddings, classification, explainable-ml, jupyter-notebook, machine-learning, natural-language-processing, python, reddit, sarcasm, sarcasm-detection, tfidf
Language: Jupyter Notebook
Homepage:
Size: 688 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Detecting Sarcasm in Reddit Comments

This was a small project I worked on, with Rubini and Vikram, during my 2020 Summer Internship at Carnegie Mellon University.

## Aim

The aim is to detect sarcasm in comments found on Reddit, using the [Sarcasm on Reddit](https://www.kaggle.com/danofer/sarcasm) dataset available from Kaggle. Through this, we also aim to identify features that are indicative of sarcasm, and explain our models' predictions.

## Methodology and Results

We experimented with TF-IDF and BERT Sentence Embeddings to extract features from text. We tried using various combinations of features, such as using only the comment, its characteristics and also its parent comment, to provide context. Additionally, we tried to use PCA for dimensionality reduction.

The classifiers we used include the Random Forest Classifier, Gradient Boosting Classifier and the Multi-Layer Perceptron, among others.

Our best-performing model was a Random Forest Classifier trained on TF-IDF features extracted from raw text (comment and parent) and also the comment's characteristics such as the subreddit and author. It obtained an F1-Score of 0.66 on the validation set. The comment's characteristics were deemed as very important features by the models we built.

## Code

The code is available as three Jupyter Notebook files, simply start up a Jupyter Notebook server and run the code. Ensure that the dependencies are installed before you run the code. To do so, simply execute this command in the Terminal:

```bash
pip install -r requirements.txt
```

## Presentation

Our [presentation](Sarcasm.pdf) is also available in this repository, and provides more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nandahkrishna/sarcasmdetection

Awesome Lists containing this project

README