{"id":17215569,"url":"https://github.com/nandahkrishna/sarcasmdetection","last_synced_at":"2026-05-14T20:06:28.058Z","repository":{"id":104910054,"uuid":"275529453","full_name":"nandahkrishna/SarcasmDetection","owner":"nandahkrishna","description":"Detecting sarcasm in Reddit comments","archived":false,"fork":false,"pushed_at":"2020-06-28T07:58:41.000Z","size":704,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-21T19:24:46.370Z","etag":null,"topics":["bert-embeddings","classification","explainable-ml","jupyter-notebook","machine-learning","natural-language-processing","python","reddit","sarcasm","sarcasm-detection","tfidf"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nandahkrishna.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-28T07:16:28.000Z","updated_at":"2020-09-24T09:24:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"ff64197f-7d12-4357-9513-fd736fcab69b","html_url":"https://github.com/nandahkrishna/SarcasmDetection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nandahkrishna/SarcasmDetection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nandahkrishna%2FSarcasmDetection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nandahkrishna%2FSarcasmDetection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nandahkrishna%2FSarcasmDetection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nandahkrishna%2FSarcasmDetection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nandahkrishna","download_url":"https://codeload.github.com/nandahkrishna/SarcasmDetection/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nandahkrishna%2FSarcasmDetection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33041255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-embeddings","classification","explainable-ml","jupyter-notebook","machine-learning","natural-language-processing","python","reddit","sarcasm","sarcasm-detection","tfidf"],"created_at":"2024-10-15T03:24:50.567Z","updated_at":"2026-05-14T20:06:28.038Z","avatar_url":"https://github.com/nandahkrishna.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Detecting Sarcasm in Reddit Comments\n\nThis was a small project I worked on, with Rubini and Vikram, during my 2020 Summer Internship at Carnegie Mellon University.\n\n## Aim\n\nThe aim is to detect sarcasm in comments found on Reddit, using the [Sarcasm on Reddit](https://www.kaggle.com/danofer/sarcasm) dataset available from Kaggle. Through this, we also aim to identify features that are indicative of sarcasm, and explain our models' predictions.\n\n## Methodology and Results\n\nWe experimented with TF-IDF and BERT Sentence Embeddings to extract features from text. We tried using various combinations of features, such as using only the comment, its characteristics and also its parent comment, to provide context. Additionally, we tried to use PCA for dimensionality reduction.\n\nThe classifiers we used include the Random Forest Classifier, Gradient Boosting Classifier and the Multi-Layer Perceptron, among others.\n\nOur best-performing model was a Random Forest Classifier trained on TF-IDF features extracted from raw text (comment and parent) and also the comment's characteristics such as the subreddit and author. It obtained an F1-Score of 0.66 on the validation set. The comment's characteristics were deemed as very important features by the models we built.\n\n## Code\n\nThe code is available as three Jupyter Notebook files, simply start up a Jupyter Notebook server and run the code. Ensure that the dependencies are installed before you run the code. To do so, simply execute this command in the Terminal:\n\n```bash\npip install -r requirements.txt\n```\n\n## Presentation\n\nOur [presentation](Sarcasm.pdf) is also available in this repository, and provides more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnandahkrishna%2Fsarcasmdetection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnandahkrishna%2Fsarcasmdetection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnandahkrishna%2Fsarcasmdetection/lists"}