Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/necla-ml/SNLI-VE
Dataset and starting code for visual entailment dataset
https://github.com/necla-ml/SNLI-VE
Last synced: 3 months ago
JSON representation
Dataset and starting code for visual entailment dataset
- Host: GitHub
- URL: https://github.com/necla-ml/SNLI-VE
- Owner: necla-ml
- License: bsd-3-clause
- Created: 2018-11-27T01:20:53.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-04-21T14:46:41.000Z (over 2 years ago)
- Last Synced: 2024-08-01T02:24:41.730Z (6 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/1901.06706
- Size: 1.12 MB
- Stars: 106
- Watchers: 8
- Forks: 7
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-self-supervised-multimodal-learning - Details - ml/SNLI-VE)|[Github](https://github.com/necla-ml/SNLI-VE)| (Summary of Common Multimodal Datasets / Image-Text Datasets)
README
# SNLI-VE: Visual Entailment Dataset
**Ning Xie, Farley Lai, Derek Doran, Asim Kadav****SNLI-VE** is the dataset proposed for the **Visual Entailment (VE)** task investigated in [Visual Entailment Task for Visually-Grounded Language Learning](https://arxiv.org/abs/1811.10582) accpeted to [NeurIPS 2018 ViGIL workshop](https://nips2018vigil.github.io/)).
Refer to our [full paper](https://arxiv.org/abs/1901.06706) for detailed analysis and evaluations.![Example](https://drive.google.com/uc?export=view&id=1Bo83CcaPKJqrNg0F_crbeAfCRiTDWlqz)
## Updates
12/10/2021:
- The Flickr images download is updated and now hosted by AlleNLP
- The Flickr features download link is updated but the archive may require newer unzip to decompress on LinuxNOTE:
- The data remains hosted by external parties and subject to change
## Leaderboard
[Checkout the leaderboard from paperswith code](https://paperswithcode.com/sota/visual-entailment-on-snli-ve-val)
**NOTE**
`e-SNLI-VE-2.0` relabels the `dev` as well as `test` splits of the neutral class and evalutes the resulting performance in order of the original, val-correction and val/test correction configurations.
## Overview
SNLI-VE is built on top of [SNLI](https://nlp.stanford.edu/projects/snli) and [Flickr30K](http://shannon.cs.illinois.edu/DenotationGraph).
The problem that VE is trying to solve is to reason about the relationship between an `image premise` **_Pimage_** and a `text hypothesis` **_Htext_**.Specifically, given an image as `premise`, and a natural language sentence as `hypothesis`, three labels (`entailment`, `neutral` and `contradiction`) are assigned based on the relationship conveyed by the (**_Pimage_**, **_Htext_**)
- `entailment` holds if there is enough evidence in **_Pimage_** to conclude that **_Htext_** is true.
- `contradiction` holds if there is enough evidence in **_Pimage_** to conclude that **_Htext_** is false.
- Otherwise, the relationship is `neutral`, implying the evidence in **_Pimage_** is insufficient to draw a conclusion about **_Htext_**.
### Examples from SNLI-VE
![Examples](https://drive.google.com/uc?export=view&id=1FUL2PkSQB6EaTbyMvNz1yj_fMCxU3g_W)
## SNLI-VE StatisticsBelow is some highlighted dataset statistic, details can be found in our [paper](https://arxiv.org/abs/1811.10582).
### Distribution by SplitThe data details of `train`, `dev` and `test` split is shown below. The instances of three labels (`entailment`, `neutral` and `contradiction`) are evenly distributed for each split.
| | **Train** | **Dev** | **Test** |
| ------------------- | --------- | ------- | -------- |
| **#Image** | 29783 | 1000 | 1000 |
| **#Entailment** | 176932 | 5959 | 5973 |
| **#Neutral** | 176045 | 5960 | 5964 |
| **#Contradiction** | 176550 | 5939 | 5964 |
| **Vocabulary Size** | 29550 | 6576 | 6592 |
### Dataset Comparision
Below is a dataset comparison among [SNLI-VE](https://bit.ly/2VgSfbI), [VQA-v2.0](https://bit.ly/2Vhe9vn) and [CLEVR](https://bit.ly/2VkpoD8).
| | [SNLI-VE](https://bit.ly/2VgSfbI) | [VQA-v2.0](https://bit.ly/2Vhe9vn) | [CLEVR](https://bit.ly/2VkpoD8) |
| ------------------- | --------- | ------ | ------ |
| **Partition Size:** | | | |
| Training | 529527 | 443757 | 699989 |
| Validation | 17858 | 214354 | 149991 |
| Test | 17901 | 555187 | 149988 |
| **Question Length:** | | | |
| Mean | 7.4 | 6.1 | 18.4 |
| Median | 7 | 6 | 17 |
| Mode | 6 | 5 | 14 |
| Max | **56** | 23 | 43 |
| **Vocabulary Size** | **32191** | 19174 | 87 |
### Question Length DistributionThe *question* here for SNLI-VE dataset is the `hypothesis`.
As shown in the figure, the question length of SNLI-VE dataset is distributed with a quite long tail.![Question length distribution](https://drive.google.com/uc?export=view&id=1d3iwptpIzQZjZdwn_d1GlXl8N1kBoCcH)
## CaveatsTo check the quality of SNLI-VE dataset, we randomly sampled 217 pairs from all three splits (565286 pairs in total).
Among all sampled pairs, 20 (about 9.2%) examples are incorrectly labeled, among which the majority is in the `neutral` class.
This is consistent to the analysis reported by [GTE](https://www.aclweb.org/anthology/C18-1199) in its Table 2.It is worth noting that the original SNLI dataset is not perfectly labeled,
with 8.8% of the sampled data not assigned a `gold label`,
implying the disagreement within human labelers.
SNLI-VE is no exception but we believe it is a common scenario in other large scale datasets.
However, if the dataset quality is a major concern to you,
we suggest dropping the `neutral` classs and only use `entailment` and `contradiction` examples.## SNLI-VE Creation
[snli_ve_generator.py](vet/tools/snli_ve_generator.py) generates the SNLI-VE dataset in `train`, `dev` and `test` splits with disjoint image sets.
Each entry contains a `Flickr30kID` field to associate with the original Flickr30K image id.[snli_ve_parser.py](vet/tools/snli_ve_parser.py) parses entires in SNLI-VE for applications and is free to revise.
Follow the instructions below to set up the environment and generate SNLI-VE:
1. Set the conda environment and dependencies
```sh
conda create -n vet37 python=3.7
conda activate vet37
conda install jsonlines
# conda install -c NECLA-ML ml
```2. Clone the repo
```sh
git clone https://github.com/necla-ml/SNLI-VE.git
```2. Generate SNLI-VE in `data/`
```sh
cd SNLI-VE
python -m vet.tools.snli_ve_generator.py
```3. Download dependent datasets: Flickr30K, Entities, SNLI, and RoI features
```sh
cd data
./download # y to all if necessary
```### SNLI-VE Extensions
[Flickr30k Entities](http://web.engr.illinois.edu/~bplumme2/Flickr30kEntities) dataset is an extension to Flickr30k, which contains grounded RoI and entity annotations.It is easy to extend our SNLI-VE dataset with **[Flickr30k Entities](http://web.engr.illinois.edu/~bplumme2/Flickr30kEntities/)** if fine-grained annotations are required in your experiments.
## Bibtex
The first is our full paper while the second is the ViGiL workshop version.
```
@article{xie2019visual,
title={Visual Entailment: A Novel Task for Fine-grained Image Understanding},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1901.06706},
year={2019}
}@article{xie2018visual,
title={Visual Entailment Task for Visually-Grounded Language Learning},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1811.10582},
year={2018}
}
```
Thank you for your interest in our dataset!
Please contact [us]([email protected]) for any questions, comments, or suggestions!