https://github.com/soujanyaporia/mustard
Multimodal Sarcasm Detection Dataset
https://github.com/soujanyaporia/mustard
multimodal-deep-learning multimodal-interactions sarcasm sarcasm-detection
Last synced: about 2 months ago
JSON representation
Multimodal Sarcasm Detection Dataset
- Host: GitHub
- URL: https://github.com/soujanyaporia/mustard
- Owner: soujanyaporia
- License: mit
- Created: 2019-02-20T02:08:20.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-08-22T07:50:29.000Z (9 months ago)
- Last Synced: 2025-03-17T06:02:00.690Z (2 months ago)
- Topics: multimodal-deep-learning, multimodal-interactions, sarcasm, sarcasm-detection
- Language: OpenEdge ABL
- Homepage: https://www.aclweb.org/anthology/P19-1455/
- Size: 75.4 MB
- Stars: 335
- Watchers: 8
- Forks: 62
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MUStARD: Multimodal Sarcasm Detection Dataset
[](https://colab.research.google.com/drive/11kCY9u9XuFzablYcluhhZq6DTXCViq8w?usp=sharing)
This repository contains the dataset and code for our ACL 2019 paper:
[Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)](https://www.aclweb.org/anthology/P19-1455/)We release the MUStARD dataset, a multimodal video corpus for research in automated sarcasm discovery. The dataset
is compiled from popular TV shows including *Friends*, *The Golden Girls*, *The Big Bang Theory*, and
*Sarcasmaholics Anonymous*. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is
accompanied by its context, providing additional information on the scenario where it occurs.## Example Instance

Example sarcastic utterance from the dataset along with its context and transcript.
## Raw Videos
We provide [the raw video clips](https://huggingface.co/datasets/MichiganNLP/MUStARD/resolve/main/mmsd_raw_data.zip),
including both the utterances and their respective context## Data Format
The annotations and transcripts of the audiovisual clips are available at [`data/sarcasm_data.json`](data/sarcasm_data.json).
Each instance in the JSON file is allotted one identifier (e.g., "1\_60"), which is a dictionary of the following items:| Key | Value |
|--------------------|:---------------------------------------------------------------------------:|
| `utterance` | The text of the target utterance to classify. |
| `speaker` | Speaker of the target utterance. |
| `context` | List of utterances (in chronological order) preceding the target utterance. |
| `context_speakers` | Respective speakers of the context utterances. |
| `sarcasm` | Binary label for sarcasm tag. |Example format in JSON:
```json
{
"1_60": {
"utterance": "It's just a privilege to watch your mind at work.",
"speaker": "SHELDON",
"context": [
"I never would have identified the fingerprints of string theory in the aftermath of the Big Bang.",
"My apologies. What's your plan?"
],
"context_speakers": [
"LEONARD",
"SHELDON"
],
"sarcasm": true
}
}
```## Citation
Please cite the following paper if you find this dataset useful in your research:
```bibtex
@inproceedings{mustard,
title = "Towards Multimodal Sarcasm Detection (An \_Obviously\_ Perfect Paper)",
author = "Castro, Santiago and
Hazarika, Devamanyu and
P{\'e}rez-Rosas, Ver{\'o}nica and
Zimmermann, Roger and
Mihalcea, Rada and
Poria, Soujanya",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = "7",
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
}
```## Run the code
1. Set up the environment with Conda:
```bash
conda env create
conda activate mustard
python -c "import nltk; nltk.download('punkt')"
```2. Download [Common Crawl pretrained GloVe word vectors of size 300d, 840B tokens](http://nlp.stanford.edu/data/glove.840B.300d.zip)
somewhere.3. [Download the pre-extracted visual features](https://huggingface.co/datasets/MichiganNLP/MUStARD/tree/main/features) to the `data/` folder (so `data/features/` contains the folders `context_final/` and `utterances_final/` with the features) or [extract the visual features](visual) yourself.
4. [Download the pre-extracted BERT features](https://huggingface.co/datasets/MichiganNLP/MUStARD/resolve/main/BERT_text_features.zip) and place the two files directly under the folder `data/` (so they are `data/bert-output.jsonl` and `data/bert-output-context.jsonl`), or extract the BERT features in another environment with Python 2 and TensorFlow 1.11.0 following
["Using BERT to extract fixed feature vectors (like ELMo)" from BERT's repo](https://github.com/google-research/bert/tree/d66a146741588fb208450bde15aa7db143baaa69#using-bert-to-extract-fixed-feature-vectors-like-elmo)
and running:```bash
# Download BERT-base uncased in some dir:
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
# Then put the location in this var:
BERT_BASE_DIR=...
python extract_features.py \
--input_file=data/bert-input.txt \
--output_file=data/bert-output.jsonl \
--vocab_file=${BERT_BASE_DIR}/vocab.txt \
--bert_config_file=${BERT_BASE_DIR}/bert_config.json \
--init_checkpoint=${BERT_BASE_DIR}/bert_model.ckpt \
--layers=-1,-2,-3,-4 \
--max_seq_length=128 \
--batch_size=8
```5. Check the options in `python train_svm.py -h` to select a run configuration (or modify [`config.py`](config.py)) and then run it:
```bash
python train_svm.py # Add the flags you want.
```6. Evaluation: We evaluate using a weighted F-score metric in a 5-fold cross-validation scheme. The fold indices are available at `data/split_incides.p`. Refer to our baseline scripts for more details.