{"id":18833033,"url":"https://github.com/declare-lab/cascade","last_synced_at":"2025-04-14T04:31:55.196Z","repository":{"id":40985240,"uuid":"137744868","full_name":"declare-lab/CASCADE","owner":"declare-lab","description":"This repo contains code to detect sarcasm from text in discussion forum using deep learning","archived":false,"fork":false,"pushed_at":"2023-07-06T21:24:42.000Z","size":71859,"stargazers_count":86,"open_issues_count":4,"forks_count":48,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-27T18:21:42.667Z","etag":null,"topics":["deep-learning","lstm","reddit","sarcasm-detection","stylometric-features","tensorflow","tweets","user-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/declare-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-06-18T11:43:40.000Z","updated_at":"2025-02-20T19:53:09.000Z","dependencies_parsed_at":"2022-09-01T11:40:25.772Z","dependency_job_id":"af4b2163-44be-4657-ae64-905f10c73320","html_url":"https://github.com/declare-lab/CASCADE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2FCASCADE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2FCASCADE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2FCASCADE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2FCASCADE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/declare-lab","download_url":"https://codeload.github.com/declare-lab/CASCADE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248821759,"owners_count":21166952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","lstm","reddit","sarcasm-detection","stylometric-features","tensorflow","tweets","user-embeddings"],"created_at":"2024-11-08T01:59:57.590Z","updated_at":"2025-04-14T04:31:50.187Z","avatar_url":"https://github.com/declare-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CASCADE: Contextual Sarcasm Detection in Online Discussion Forums\n\nCode for the paper [CASCADE: Contextual Sarcasm Detection in Online Discussion Forums](http://aclweb.org/anthology/C18-1156) (COLING 2018, New Mexico).\n\n## Description\n\nIn this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions (Reddit).\n\n## Requirements\n\n1. Clone this repo.\n2. Python (2.7 or 3.3-3.6)  \n3. Install your preferred version of TensorFlow 1.4.0 (for CPU, GPU; from PyPI, compiled, etc).\n4. Install the rest of the requirements: `pip install -r requirements.txt`\n5. Download the [FastText pre-trained embeddings](https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip) and extract it somewhere.\n6. Download the [`comments.json` dataset file](https://drive.google.com/file/d/1ew-85sh2z3fv1yGgIwBoeIHUvP8fMnxU/view?usp=sharing) [1] and place it in `data/`.\n7. If you want to run the Preprocessing steps (optional), install YAJL 2, download [the `train-balanced.csv` file](https://drive.google.com/file/d/18GwcTqXo_lcMJmc5ms6s2KaL0Dh-95GP/view), save it under `data/` and continue with the [Preprocessing instructions](#preprocessing). Otherwise, just download [user_gcca_embeddings.npz](https://drive.google.com/file/d/1mQoe_48LO67plyo98DVeCC9NabVXdm82/view?usp=sharing), place it in `users/user_embeddings/` and go directly to [Running CASCADE section](#running-cascade).\n\n## Preprocessing\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"cca.jpg\" alt=\"User Embeddings\" width=\"90%\"\u003e\n\u003c/p\u003e\n\n1. User Embeddings: Stylometric features.\n\n    The file `data/comments.json` has Reddit users and their corresponding comments. Per user, there might be multiple number of comments. Hence, we concatenate all the comments corresponding to the same user with the `\u003cEND\u003e` tag:\n\n    ```bash\n    cd users\n    python create_per_user_paragraph.py\n    ```\n\n    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:\n\n    ```bash\n    python train_stylometric.py\n    ```\n        \n    Generate `user_stylometric.csv` (user stylometric features) using the trained model:\n     \n    ```bash\n    python generate_stylometric.py\n    ```\n\n2. User Embeddings: Personality features\n\n    Pre-train a CNN-based model to detect personality features from text. The code utilizes two datasets to train. The second dataset [2] can be obtained by requesting it to the original authors.\n     \n    ```bash\n    python process_data.py [path/to/FastText_embedding]\n    python train_personality.py\n    ```\n\n    Generate `user_personality.csv` (user personality features) using this model:\n\n    ```bash\n    python generate_user_personality.py\n    ```\n    \n    To use the pre-trained model from our experiments, download the [model weights](https://drive.google.com/file/d/1KK0p6tStgaEXLtAni1u3_W2jGlq8g1Nq/view?usp=sharing) and unzip them inside the folder `user/`.\n\n3. User Embeddings: Multi-view fusion\n\n    Merge the `user_stylometric.csv` and `user_personality.csv` files into a single merged `user_view_vectors.csv` file:\n    \n    ```bash\n    python merge_user_views.py\n    ```\n    \n    Multi-view fusion of the user views (stylometric and personality) is performed using GCCA (~ CCA for two views). Generate fused user embeddings `user_gcca_embeddings.npz` using the following command:\n    \n    ```bash\n    python user_wgcca.py --input user_embeddings/user_view_vectors.csv --output user_embeddings/user_gcca_embeddings.npz --k 100 --no_of_views 2\n    ```\n    \n    This implementation of GCCA has been adapted from the [wgcca repo](https://github.com/abenton/wgcca).\n    \n    Finally:\n    \n    ```bash\n    cd ..\n    ```\n\n4. Discourse Embeddings\n\n    Similar to user stylometric features, create the discourse features for each discussion forum (sub-reddit):\n    \n    ```bash\n    cd discourse\n    python create_per_discourse_paragraph.py\n    ```\n    \n    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:\n    \n    ```bash\n    python train_discourse.py\n    ```\n    \n    Generate `discourse.csv` (user stylometric features) using the trained model:\n     \n    ```bash\n    python generate_discourse.py\n    ```\n    \n    Finally:\n    \n    ```bash\n    cd ..\n    ```\n\n## Running CASCADE\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"overall_model.jpg\" alt=\"Hybrid CNN\" width=\"90%\"\u003e\n\u003c/p\u003e\n\nHybrid CNN combining user-embeddings and discourse-features with textual modeling.\n \n```bash\ncd src\npython process_data.py [path/to/FastText_embedding]\npython train_cascade.py\n```\n\nThe CNN codebase has been adapted from the [repo cnn-text-classification-tf from Denny Britz](https://github.com/dennybritz/cnn-text-classification-tf).\n\n## Citation\n\nIf you use this code in your work then please cite the paper [CASCADE: Contextual Sarcasm Detection in Online Discussion Forums](http://aclweb.org/anthology/C18-1156) with the following:\n\n```\n@InProceedings{C18-1156,\n  author = \t\"Hazarika, Devamanyu\n\t\tand Poria, Soujanya\n\t\tand Gorantla, Sruthi\n\t\tand Cambria, Erik\n\t\tand Zimmermann, Roger\n\t\tand Mihalcea, Rada\",\n  title = \t\"CASCADE: Contextual Sarcasm Detection in Online Discussion Forums\",\n  booktitle = \t\"Proceedings of the 27th International Conference on Computational Linguistics\",\n  year = \t\"2018\",\n  publisher = \t\"Association for Computational Linguistics\",\n  pages = \t\"1837--1848\",\n  location = \t\"Santa Fe, New Mexico, USA\",\n  url = \t\"http://aclweb.org/anthology/C18-1156\"\n}\n```\n\n## References\n\n[1]. Khodak, Mikhail, Nikunj Saunshi, and Kiran Vodrahalli. [\"A large self-annotated corpus for sarcasm.\"](https://arxiv.org/abs/1704.05579) Proceedings of the Eleventh International Conference on Language Resources and Evaluation. 2018.\n\n[2]. Celli, Fabio, et al. [\"Workshop on computational personality recognition (shared task).\"](http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/download/6190/6306) Proceedings of the Workshop on Computational Personality Recognition. 2013.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fcascade","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeclare-lab%2Fcascade","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fcascade/lists"}