https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey

A datasets and methods survey about task-oriented dialogue, including recent datasets and SOTA leaderboards.
https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey

Last synced: 2 months ago
JSON representation

A datasets and methods survey about task-oriented dialogue, including recent datasets and SOTA leaderboards.

Host: GitHub
URL: https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey
Owner: AtmaHou
Created: 2018-05-01T15:02:45.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2022-11-08T09:02:20.000Z (over 2 years ago)
Last Synced: 2024-05-22T07:52:56.441Z (about 1 year ago)
Homepage:
Size: 279 KB
Stars: 1,238
Watchers: 66
Forks: 216
Open Issues: 2
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

Awesome-Paper-List - Task-Oriented Dialogue - AtmaHou-be8abf) ![](https://img.shields.io/github/stars/AtmaHou/Task-Oriented-Dialogue-Dataset-Survey) (Natural Language Processing)
awesome-ai-list-guide - Task-Oriented-Dialogue-Research-Progress-Survey - oriented dialogue, including recent datasets and SOTA leaderboards. (NLP)

README

# Task-Oriented Dialogue Research Progress Survey

### Content
- ##### [Introduction](#intro)
- ##### [Updates](#updates)
- ##### [Call for Contribution](#call)
- ##### [Leader Boards](#leader)
- ##### [Datasets Introduction](#detail)
- ##### [Acknowledgement](#acknowledgement)

## Introduction
This repo is a dataset and methods survey for Task-oriented Dialogue.

We investigated most existing dialogue datasets and summarized their basic information, such as brief, download link and size.

We also included leader boards of popular dataset to present research progress in the task oriented dialogue fields.

A Chinese intro & news for this project is available [here](https://mp.weixin.qq.com/s?__biz=MzIxMjAzNDY5Mg==&mid=2650793618&idx=1&sn=dc5e592c5d8b451531383350af76e254&chksm=8f477379b830fa6fb0b5909f6d6a3f85dae44e8a37aa0ab9763df354cabcc9224c211075f127&mpshare=1&scene=1&srcid=#rd)

#### Refer to this repo:
```
@misc{MAML_Pytorch,
author = {Yutai Hou},
title = {Task-Oriented Dialogue Research Progress Survey},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey/}},
commit = {master}
}
```

## Updates
This section records big updates to ease refer (See `./release_detail.md` or click links below):
- [Updates 2020.7.30](https://github.com/AtmaHou/Task-Oriented-Dialogue-Dataset-Survey/blob/master/release_detail.md#20200730): Add 3 new datasets, re-claim 6 abnormal NLU results, refine layout.
- [Updates 2020.7.27](https://github.com/AtmaHou/Task-Oriented-Dialogue-Dataset-Survey/blob/master/release_detail.md#20200727): Add 10 new datasets, 5 new SOTA results, 1 correctness.

## Call for Contributions
Contributions are welcomed, you are encouraged to:
- Directly pull request
- Send me new dataset info
- Send me new experiment results from published paper or public code implements.

## Leader Boards

The ranking is depended on published results of related papers. We are trying to keep it up-to-date. The ranking may be unfair because features used and train/dev set splitting in those papers may be different. However, it shows a trend of research, and would be helpful for someone to start a project about task-oriented dialogue.

### Dialogue State Tracking
Dialogue state tacking task aims to predict or give representation of dialogue state,
which usually contains a goal constraint, a set of requested slots, and the user's dialogue act.

#### [MultiWOZ 2.0](http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/) - Dialogue State Tracking

Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.

The new, corrected versions of the dataset are available at [MultiWOZ 2.1 (2019)](https://arxiv.org/abs/1907.01669), [MultiWOZ 2.2 (2020)](https://www.aclweb.org/anthology/2020.nlp4convai-1.13.pdf).

> Notice: Models marked with * are open-vocabulary based models.`

| Model | Joint Acc. | Slot Acc. | Paper / Source |
| ------------- | :-----:| :-----:| :-----:|
| SOM-DST (BERT-large)* (Kim et al, 2020) | 52.32 | - | [Efficient Dialogue State Tracking by Selectively Overwriting Memory](https://arxiv.org/pdf/1911.03906.pdf) |
| SOM-DST* (Kim et al, 2020) | 51.72 | - | [Efficient Dialogue State Tracking by Selectively Overwriting Memory](https://arxiv.org/pdf/1911.03906.pdf) |
| SAS (Hu et al, 2020) | 51.03 | 97.20 | [SAS: Dialogue State Tracking via Slot Attention and Slot Information Sharing](https://www.aclweb.org/anthology/2020.acl-main.567.pdf) |
| MERET (Huang et al, 2020) | 50.91 | 97.07 | [Meta-Reinforced Multi-Domain State Generator for Dialogue Systems](https://www.aclweb.org/anthology/2020.acl-main.636.pdf)|
| NADST* (Le et al, 2020) | 50.52 | - | [Non-Autoregressive Dialog State Tracking](https://openreview.net/pdf?id=H1e_cC4twS) |
| TRADE* (Wu et al, 2019) | 48.62 | 96.92 | [Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems](https://arxiv.org/pdf/1905.08743.pdf) |
| SUMBT (Lee et al, 2019) | 46.649 | 96.44 | [SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking](https://www.aclweb.org/anthology/P19-1546) |
| HyST* (Goel et al, 2019) | 44.24 | - | [HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking](https://arxiv.org/pdf/1907.00883.pdf) |
| Neural Reading (Gao et al, 2019) | 41.10 | - | [Dialog State Tracking: A Neural Reading Comprehension Approach](https://arxiv.org/pdf/1908.01946.pdf) |
| GLAD (Zhong et al., 2018) | 35.57 | 95.44 | [Global-Locally Self-Attentive Dialogue State Tracker](https://arxiv.org/abs/1805.09655) |
| MDBT (Ramadan et al., 2018) | 15.57 | 89.53 | [Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing](https://www.aclweb.org/anthology/P18-2069) |

#### DSTC2 - Dialogue State Tracking
Clarification of dataset types:

The main results we list here are obtained from pure DSTC2 dataset (ASR n-best).

However, we don't list other kinds of DSTC2 data source results such as **DSTC2-text**
(It formulates the dialog state tracking as a machine reading problem
which read the dialog transcriptions multiple times and answer the questions
about each of the slot,
for more info please refer to [paper](https://aaai.org/ocs/index.php/WS/AAAIW18/paper/view/17447/15652))
and **DSTC-cleaned**
(It is used by the NBT paper and fixes ASR noise and typo during training and include ASR noise during testing,
The cleaned version is available at [here](https://github.com/Divye02/baby-jarvis/tree/master/data/dstc2)),

| Model | Area | Food | Price | Joint | Paper / Source |
| ------------- | :-----:| :-----:| :-----:| :-----:| --- |
| Liu et al. (2018) | 90 | 84 | 92 | 72 | [Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems](https://arxiv.org/abs/1804.06512) |
| Neural belief tracker (Mrkšić et al., 2017) | 90 | 84 | 94 | 72 | [Neural Belief Tracker: Data-Driven Dialogue State Tracking](https://arxiv.org/abs/1606.03777) |
| RNN (Henderson et al., 2014) |92 | 86 | 86 | 69 | [Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate](http://svr-ftp.eng.cam.ac.uk/~sjy/papers/htyo14.pdf) |

### NLU: Slot Filling
Slot filling task aims to recognize key entity within user utterance, such as position and time.

#### Snips - Slot Filling

| Model | F1 | Paper / Source |
| ------------- | :-----:| :-----:|
| Enc-dec (focus) + BERT | 97.17 | [Code](https://github.com/sz128/slot_filling_and_intent_detection_of_SLU) |
| Stack-Propagation + BERT (Qin et al., 2019) | 97.0 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Joint BERT (Chen et al., 2019) | 97.0 | [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/pdf/1902.10909.pdf) |
| BLSTM-CRF + ELMo word embedding | 96.92 | [Code](https://github.com/sz128/slot_filling_and_intent_detection_of_SLU) |
| Stack-Propagation (Qin et al., 2019) | 94.2 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| ELMo + BLSTM-CRF (Siddhant et al., 2018) | 93.90 | [Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents](https://arxiv.org/pdf/1811.05370.pdf) |
| Capsule Neural Networks (Zhang et al., 2018) | 91.8 | [Joint Slot Filling and Intent Detection via Capsule Neural Networks](https://arxiv.org/pdf/1812.09471.pdf) |
| Slot-Gated (Full Atten.) (Goo et al., 2018) | 88.8 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |
| BLSTM-CRF (Siddhant et al., 2018) | 88.78 | [Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents](https://arxiv.org/pdf/1811.05370.pdf) |
| Slot-Gated (Intent Atten.) (Goo et al., 2018) | 88.3 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |

#### ATIS - Slot Filling

> Notice: The following works have abnormal high-scores, because they are considered to exploit special pre-processing steps:
>[Bi-model-Decoder (Wang et al., 2018)](https://aclweb.org/anthology/N18-2050), [Intent Gating + Self-atten. (Li et al., 2018)](http://aclweb.org/anthology/D18-1417), [Atten.-Based (Liu and Lane, 2016)](https://arxiv.org/pdf/1609.01454.pdf)

| Model | F1 | Paper / Source |
|:-------------:|:-----:|:-----:|
| Bi-model-Decoder (Wang et al., 2018) | 96.89 | [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](https://aclweb.org/anthology/N18-2050) |
| Intent Gating + Self-atten. (Li et al., 2018) | 96.52 | [A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding](http://aclweb.org/anthology/D18-1417) |
| Stack-Propagation + BERT (Qin et al., 2019) | 96.10 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Joint BERT (Chen et al., 2019) | 96.10 | [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/pdf/1902.10909.pdf) |
| Atomic Concept (Su Zhu and Kai Yu, 2018) | 96.08 | [Concept Transfer Learning for Adaptive Language Understanding](http://aclweb.org/anthology/W18-5047) |
| Atten.-Base + Delexicalization (Shin et al., 2018) | 96.08 | [Slot Filling with Delexicalized Sentence Generation](https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1808.pdf) |
| Atten.-Based (Liu and Lane, 2016) | 95.98 | [Attention-based recurrent neural network models for joint intent detection and slot fillin](https://arxiv.org/pdf/1609.01454.pdf) |
| Stack-Propagation (Qin et al., 2019) | 95.90 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Encoder-Decoder-Pointer (Zhai et al., 2017) | 95.86 | [Neural Models for Sequence Chunking](https://arxiv.org/pdf/1701.04027.pdf) |
| ELMo + BLSTM-CRF (Siddhant et al., 2018) | 95.62 | [Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents](https://arxiv.org/pdf/1811.05370.pdf) |
| Capsule Neural Networks (Zhang et al., 2018) | 95.2 | [Joint Slot Filling and Intent Detection via Capsule Neural Networks](https://arxiv.org/pdf/1812.09471.pdf) |
| Slot-Gated (Intent Atten.) (Goo et al., 2018) | 95.2 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |
| Slot-Gated (Full Atten.) (Goo et al., 2018) | 94.8 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |

### NLU: Intent Detection
Intent detection task aims to classify user utterance into different domain or intents.

#### Snips - Intent Detection

| Model | Acc. | Paper / Source |
| ------------- | :-----:| :-----:|
| ELMo + BLSTM-CRF (Siddhant et al., 2018) | 99.29 | [Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents](https://arxiv.org/pdf/1811.05370.pdf) |
| Enc-dec (focus) + ELMo | 99.14 | [Code](https://github.com/sz128/slot_filling_and_intent_detection_of_SLU) |
| Stack-Propagation + BERT (Qin et al., 2019) | 99.0 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Joint BERT (Chen et al., 2019) | 98.6 | [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/pdf/1902.10909.pdf) |
| Stack-Propagation (Qin et al., 2019) | 98.0 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Capsule Neural Networks (Zhang et al., 2018) | 97.7 | [Joint Slot Filling and Intent Detection via Capsule Neural Networks](https://arxiv.org/pdf/1812.09471.pdf) |
| Slot-Gated (Full Atten.) (Goo et al., 2018) | 97.0 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |
| Slot-Gated (Intent Atten.) (Goo et al., 2018) | 96.8 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |

#### ATIS - Intent Detection

> Notice-1: The following works have abnormal high-scores, because they are considered to exploit special pre-processing steps:
>[Bi-model-Decoder (Wang et al., 2018)](https://aclweb.org/anthology/N18-2050), [Intent Gating + Self-atten. (Li et al., 2018)](http://aclweb.org/anthology/D18-1417), [Atten.-Based (Liu and Lane, 2016)](https://arxiv.org/pdf/1609.01454.pdf), [BLSTM (Zhang et al., 2016)](https://www.ijcai.org/Proceedings/16/Papers/425.pdf)

| Model | Acc. | Paper / Source |
| ------------- | :-----:| :-----:|
| BLSTM + BERT | 99.10 | [Code](https://github.com/sz128/slot_filling_and_intent_detection_of_SLU) |
| Bi-model-Decoder (Wang et al., 2018) | 98.99 | [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](https://aclweb.org/anthology/N18-2050) |
| Intent Gating + Self-atten. (Li et al., 2018) | 98.77 | [A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding](http://aclweb.org/anthology/D18-1417) |
| Atten.-Based (Liu and Lane, 2016) | 98.43 | [Attention-based recurrent neural network models for joint intent detection and slot filling](https://arxiv.org/pdf/1609.01454.pdf) |
| BLSTM (Zhang et al., 2016) | 98.10 | [A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding](https://www.ijcai.org/Proceedings/16/Papers/425.pdf) |
| Joint BERT (Chen et al., 2019) | 97.9 | [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/pdf/1902.10909.pdf) |
| Stack-Propagation + BERT (Qin et al., 2019) | 97.5 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| ELMo + BLSTM-CRF (Siddhant et al., 2018) | 97.42 | [Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents](https://arxiv.org/pdf/1811.05370.pdf) |
| Stack-Propagation (Qin et al., 2019) | 96.9 | [A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding](https://www.aclweb.org/anthology/D19-1214/) |
| Capsule Neural Networks (Zhang et al., 2018) | 95.0 | [Joint Slot Filling and Intent Detection via Capsule Neural Networks](https://arxiv.org/pdf/1812.09471.pdf) |
| Slot-Gated (Intent Atten.) (Goo et al., 2018) | 94.1 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |
| Slot-Gated (Full Atten.) (Goo et al., 2018) | 93.6 | [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://www.aclweb.org/anthology/N18-2118) |

## Dataset Introductions
See the data details Here or in [Excel File](https://github.com/AtmaHou/Task-Oriented-Dialogue-Dataset-Survey/blob/master/Atma'sDatasetSurvey.xlsx?raw=true)

Following information is included for each dataset:
- Name
- Introduction
- Link (Download & Paper)
- Multi or single turn
- Task detail
- Whether Public Accessible
- Size & Stats
- Included Label
- Missing Label

> Tips: The table below may not be displayed completely, **scroll right** to see more~

| Name | Introduction | Links | Multi/Single Turn | Task Detail | Public Accessible | Size & Stats | Included Label | Missing Label |
| ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Few-shot Slot Tagging Benchmark | 1\. Dialogue slot tagging dataset for few-shot learning setting
2\. First few-shot sequence labeling benchmark (Meta-episode style data format)
3\. Also include 5 NER dataset for few-shot sequence labeling evaluation. | Download:https://atmahou.github.io/attachments/ACL2020data.zip
Paper: https://arxiv.org/pdf/2006.05702.pdf | S | 7 dialogue task:
Weather,play music, search, add to list, book, moive
5 NER task | Yes | For each task, it contains 100 episodes.
Each episode contains a query set (20 samples) and a support set (1-shot & 5-shot) | Slots | Intent |
| Taskmaster-2 (2020) | 1\. Unlike Taskmaster-1, which includes both written "self-dialogs" and spoken two-person dialogs, Taskmaster-2 consists entirely of spoken two-person dialogs.
2. Users were led to believe they were interacting with an automated system that “spoke” using text-to-speech (TTS)
3\. Intents are labeled on slots | Download: https://github.com/google-research-datasets/Taskmaster/tree/master/TM-2-2020/data
Homepage: https://github.com/google-research-datasets/Taskmaster/tree/master/TM-2-2020 | M | 7 domains:
restaurants, food ordering, movies, hotels, flights, music, sports | Yes | 17,289 dialogs:
restaurants (3276)
food ordering (1050)
movies (3047)
hotels (2355)
flights (2481)
music (1602)
sports (3478) | NLU(Intent, Slots) | |
| JDDC Corpus 2020 | 1\. A large-scale Multimodal Chinese E-commerce conversation corpus.
2\. Human2Human conversations | Download: https://jddc.jd.com/auth\_environment
Homepage: https://jddc.jd.com/description | M | Multimodal E-commerce conversation | Yes | Electronic: 130k dialogues, 950k utterances, 215k images.
Clothing: 116k dialogues, 810k utterances, 200k images. | Intents (Only on images),
Database | NLU(Intent, Slots) |
| CrossWOZ | 1\. CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz taskoriented dataset.
2\. Encourage natural transition across domains in conversation.
3\. Provide a user simulator
4\. Human2Human | Download: https://github.com/thu-coai/CrossWOZ
Paper: https://arxiv.org/pdf/2002.11893.pdf | M | 5 domains, including hotel, restaurant, attraction, metro, and taxi. | Yes | 5,012 dialogues,
84,692 turns,
16.9 Avg. turns,

Annotation:
72 slots, 7,871 vlaues, 6 intents | User Goals,
State (Intent, Slots),
Database | API calls |
| JDDC Corpus 2019 | 1. A large-scale real scenario Chinese E-commerce conversation corpus.
2. Human2Human conversations covers: task-oriented, chitchat and question-answering.
3\. Large scale: 1 million multi-turn dialogues, 20 million utterances.
4\. Main task: dialogue generation | Download: http://jddc.jd.com/auth\_environment
Paper: https://arxiv.org/pdf/1911.09969.pdf | M | E-commerce conversation | Yes | Totoal: 1 million dialogues, 20 million utterances.
Annotation: 289 different intents
Challenge1: 300 dialogues, 300 questions;
Challenge2: 15 dialogues, 168 questions;
Challenge3: 108 dialogues, 500 questions;
| Intent (Machine Labeled),
Database | Slot |
| CAIS | 1\. Dialogue utterances from the Chinese Artificial Intelligence Speakers (CAIS) annotated with slot tags and intent labels. | Download: https://github.com/Adaxry/CM-Net
Paper: https://www.aclweb.org/anthology/D19-1097.pdf | S | Most are music related tasks. | Yes | Train 7995;
Dev 994;
Test 1012;
11 Intents, 75 Slots | Intent
Slots | |
| Multimodal Dialogs (MMD) Dataset | 1\. Multimodal conversations in the fashion domain.
2\. Human-to-human
3\. Contain annotation of query type (Similar to intent)
4\. Large size: 150K conversation
| Download: https://amritasaha1812.github.io/MMD/
Paper:
https://arxiv.org/abs/1704.00200 | M | Shopping Assistant | Yes | 150K conversation session | question-type (Intent)
State Type (17 type of dialogue state class) | Slot |
| Taskmaster-1 (2019) | 1\. A task-based dataset collect with two different procedures: Wizard of Oz and self conversation.
2. Encourage realistic and diversity by giving up restrict speaker with knowledage base.
3\. Both Human2machine and Human2Human dialogues | Download: https://g.co/dataset/taskmaster-1
Paper: https://arxiv.org/pdf/1909.05358.pdf | M | 6 domains: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks and making restaurant reservations. | Yes | Human-Human: 7,708 dialogues, 169,469 utterances
Human-Machine: 10,438 dialogues, 132,610 utterances | API calls,
Argument (Slot) | Intent |
| MetaLWOz | 1\. Dialogue dataset for developing fast adaptation methods for conversation models. (Track in DSTC 8)
2\. Lots of domains and tasks: 47 domains and 227 tasks.
3\. Suitable for meta learning.
4\. Main task: dialogue generation | Download: https://www.microsoft.com/en-us/download/58389
Homepage: https://www.microsoft.com/en-us/research/project/metalwoz/ | M | 47 domains and 227 tasks. | Yes | 37,884 dialogues, ( >10 turns long),
47 domains and 227 tasks. | Only utterance | NLU(Intent, Slots) |
| Minecraft Dialogue corpus | 1\. The goal of this project is to develop systems that can collaborate and communicate with each other to solve tasks in a 3D environment.
2. Human2Human
3. Main task: given context and 3D scenes, generating response. | Download: http://juliahmr.cs.illinois.edu/Minecraft/
Paper: https://www.aclweb.org/anthology/P19-1537.pdf | M | "Architect" instruct the "Builder" to build a 3D structure. | Yes | 509 human-human dialogues;
15,926 utterances (train 6,548, dev 2,855, test 2,251 Architect utterances); | Golden utterance,
game log,
screenshots | NLU(Intent, Slots) |
| E-commerce Dialogue Corpus (EDC) | 1. Real-world conversations between customers and
customer service staff from our E-commerce partners in Taobao
2. Main task: response selection | Download:
https://github.com/cooelf/
Paper: https://arxiv.org/pdf/1806.09102.pdf | M | Contains 5 types of conversations: commodity consultation, logistics express, recommendation, negotiation and chitchat based on over 20 commodities. | Yes | Dialogues 1,020,000,
Utterance 7,500,000. | Only utterance | NLU(Intent, Slots) |
| Schema-Guided Dialogue State Tracking(DST8) | 1\. Largest by now & containing over 16k multi-domain conversations spanning 16 domains
2\. Present a schema-guided paradigm
3\. Enable zero-shot generalization to new APIs | Download: https://github.com/google-research-datasets/dstc8-schema-guided-dialogue
Paper: https://arxiv.org/pdf/1909.05855.pdf | M | 16 domains Alarm, Banks, Buses, Calendar, Events, Flights, Homes, Media, Messaging, Movies, etc. | Yes | Over 16k dialogues average number of turns are 20.44 multi-domain dialogues. 329,964 turns in total. | Schema for each service contains:
service\_name and description,
slots,
intents | \_ |
| MultiWOZ 2.0 | 1\. Proposed by EMNLP 2018 best paper.
2\. Largest by now & contain multi-domains.
3. Human2human
4. goal changes are encouraged | Download:
http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/
Paper:
https://arxiv.org/pdf/1810.00278.pdf
| M | 7 domains
Attraction, Hospital,
Police, Hotel, Restaurant, Taxi, Train. | Yes | Total 10438 dialogues
average number of turns are 8.93 and 15.39 for single and multi-domain dialogues respectively.
115, 434 turns in total. | Belief state
User Act(inform, request slots)
Agent Act(inform, request slots) | NLU(Intent, Slots) |
| Facebook Multilingual Task Oriented Dataset | 1\. (Faceboook) We release a dataset of around 57k annotated utterances
in English (43k), Spanish (8.6k) and Thai (5k) for three task oriented domains … ALARM,
REMINDER, and WEATHER.
2\. For cross-lingual natural language understanding | Download: https://fb.me/multilingual\_task\_oriented\_data
Paper: https://arxiv.org/pdf/1810.13327.pdf | S | 3 Domains: Alarm, Reminder, Weather

3 Languages: English, Spanish, Thai | Yes | English Train: 30,521
English Dev: 4,181
English Test: 8,621

Spanish Train: 3,617
Spanish Dev: 1,983
Spanish Test: 3,043

Thai Train: 2,156
Thai Dev: 1,235
Thai Test: 1,692 | Slot
Intent | |
| Medical DS | 1\. Our dataset is collected from the pediatric department in a Chinese online healthcare community
2\. Task-oriented Dialogue System for Automatic Diagnosis
| Download:
http://www.sdspeople.fudan.edu.cn/zywei/data/acl2018-mds.zip
Paper:
http://www.sdspeople.fudan.edu.cn/zywei/paper/liu-acl2018.pdf
| M | Automatic Diagnosis | Yes | 4 Disease
67 symptoms | Slot
Action | |
| Snips | 1\. Collected by Snips for model evaluation.
2\. For natural language understanding
3\. Homepage: https://medium.com/snips-ai/benchmarking-natural-language-understanding-systems-google-facebook-microsoft-and-snips-2b8ddcf9fb19 | Download:
https://github.com/snipsco/
nlu-benchmark/tree/master/ 2017-06-custom-intent-engines | S | 7 task:
Weather,play music, search, add to list, book, moive | Yes | Train:13,084
Test:700
7 intent 72 slot labels | Intent
Slots | |
| MIT Restaurant Corpus | 1\. The MIT Restaurant Corpus is a semantically tagged training and test corpus in BIO format.
2. For natural language understanding | Download:
https://groups.csail.mit.edu/sls/downloads/restaurant/ | S | Restaurant | Yes | Train, Dev, Test
6,894 766 1,521 | Slot | Intent |
| MIT Movie Corpus | 1\. The MIT Movie Corpus is a semantically tagged training and test corpus in BIO format. The eng corpus are simple queries, and the trivia10k13 corpus are more complex queries.
2\. For natural language understanding | Download:
https://groups.csail.mit.edu/sls/downloads/movie/ | S | Movie | Yes | Train, Dev, Test
MIT Movie Eng 8,798 977 2,443
MIT Movie Trivia 7,035 781 1,953
Refer to: Data Augmentation for Spoken Language Understanding via Joint Variational Generation | Slot | Intent |
| ATIS | 1\. The ATIS (Airline Travel Information Systems) dataset (Tur et al., 2010) is widely used in SLU research
2. For natural language understanding | Download:
1\. https://github.com/AtmaHou/Bi-LSTM\_PosTagger/tree/master/data
2.https://github.com/yvchen/JointSLU/tree/master/data | S | Airline Travel Information | Yes | Train: 4478
Test: 893
120 slot and 21 intent | Intent
Slots | |
| Microsoft Dialogue Challenge | 1\. Containing human-annotated conversational data in three domains an
2\. Experiment platform with built-in simulators in each domain, for training and evaluation purposes. | Paper：
https://arxiv.org/pdf/1807.11125.pdf | M | Movie-Ticket Booking
Restaurant Reservation
Taxi Ordering | Yes | Task Intents Slots Dialogues
Movie-Ticket Booking 11 29 2890
Restaurant Reservation 11 30 4103
Taxi Ordering 11 29 3094 | Intent
Slots | Database
API-call |
| CamRest676 | CamRest676 Human2Human dataset contains the following three json files:
1\. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.
2\. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.
3\. The ontology file, specific all the values the three informable slots can take. | Download:
https://www.repository.cam.ac.uk/handle/1810/260970
Paper:
https://arxiv.org/abs/1604.04562 | M | Booking restaurant | Yes | Total 676 Dialogues
Total 1500 Turns
Train:Dev:Test 3:1:1 (Test set not given) | Slot
User Act(inform, request slots)
Agent Act(inform, request slots) | Intent
API call
Database |
| Human-human goal oriented dataset | 1\. Maluuba reased a travel booking dataset
2\. Design for new task: frame tracking (allow comparing between history entities)
3\. Homepage: https://datasets.maluuba.com/Frames
4\. Human2Human | Download:
https://datasets.maluuba.com/Frames/dl
Paper:
https://arxiv.org/abs/1706.01690
https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6U | M | Travel Booking | Yes | Dialogues 1369
Turns 19986
Average user satisfaction (from 1-5) 4.58 | Frame
User agenda
User Act(inform, request slots)
Agent Act(inform, request slots)
API Call
User's satisfaction
Task successful
Database
Entity reference | Intent
|
| Dialog bAbI tasks data | 1\. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.
2\. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.
3\. A Shared database is included.
4\. This is the only task-oriented dataset among bAbI tasks.
5\. The goal of it is to evaluate end2end tasks, so there is not intents and slots.
| Download:
https://research.fb.com/downloads/babi/
Paper:
http://arxiv.org/abs/1605.07683 | M | Book a table at a restaurant | Yes | For each task,
training 1000
develop 1000
test 1000

For tasks 1-5,
second test set (with suffix -OOV.txt) that contains dialogs including entities not present.
| API call
Full Database | Slot
Intent
User Act
Agent Act |
| Stanford Dialog Dataset | 1\. Standford NLP group's data of car autopilot agent.
2\. Human2Human
3\. A quick intro http://m.sohu.com/n/499803391/ | Download:
http://nlp.stanford.edu/projects/kvret/kvret\_dataset\_public.zip
Paper:
https://arxiv.org/abs/1705.05414

| M | car autopilot agent: schedule, weather, navigation | Yes | Training Dialogues 2,425
Validation Dialogues 302
Test Dialogues 304
Avg. # of Utterances Per Dialogue 5.25 | Dialogue level database
User Act(inform, request slots)
Agent Act(inform, request slots) | API call
Intent
Slot |
| Stanford Dialog Dataset LU | 1\. Stanford data labeled by HIT, relabel slot & intent
2\. Human2Human
3\. A quick intro http://m.sohu.com/n/499803391/ to stanford data
4\. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2\_nPINe1Xm5Rza7V0jPnQV8io09hcFY/edit | N/A

| M | car autopilot agent: schedule, weather, navigation | No | Training Dialogues 2,425
Validation Dialogues 302
Test Dialogues 304
Avg. # of Utterances Per Dialogue 5.25 | Slot
Intent
| API call

Need to do sample alignment to get the following:
Dialogue level database
User Act(inform, request slots)
Agent Act(inform, request slots)
Agent Reply |
| DSTC-2 | 1\. Human2Bot restaurant booking dataset
2\. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf
3\. Each dialofue is stored in different folder, which contains log and label. | http://camdial.org/~mh521/dstc/ | M | Booking restaurant | Yes | Train 1612 calls
Dev 506 calls
Test 1117 dialogs | Slot
User Act(inform, request slots)
Agent Act(inform, request slots) | Intent
API call
Database |
| DSTC4 | 1\. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists
2\. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.
3\. Homepage: http://www.colips.org/workshop/dstc4/data.html
4\. Human2Human | N/A | M | Query touristic information | No | Train 20 dialogs
Test 15 dialogs | speech act (User & Agent)
semantic labels(Intent? User & Agent)
topic for turn (Intent?) | N/A |
| Movie Booking Dataset | 1\. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.
2\. Human2Human
| Download:
https://github.com/MiuLab/TC-Bot#data
Paper:
TC-bot | M | Booking Movie | Yes | 280 dialogues
turns per dialogue is approximately 11 | User Act(inform, request slots)
Agent Act(inform, request slots)
Intent
Slots | Database
API-call |
| Lingxi | 1\. The data is all single round user input divided into good words. There is more noise.
2\. Completed part of speech tagging and slot labeling
3\. Language: Chinese | N/A | S | conversational robot service user log | No | Utterance: 5132 | Slot
POS | Agent reply
Intent
API call
Database |
| TOP semantic parsing | 1\. (Facebook) A hierarchical semantic representation for task oriented dialog systems that can model compositional and nested queries. (hierarchical intent and slot)
2\. For natural language understanding
3\. Human2bot | Download:
http://fb.me/semanticparsingdialog
Paper:
https://arxiv.org/pdf/1810.07942.pdf | S | Navigation and event | Yes | Train 31279 utterances
Dev 4462 utterances
Test 9042 utterances | Hierarchical intents
Slots | |
| SwDA | 1\. The Switchboard Dialog Act Corpus (SwDA) extends the Switchboard-1 Telephone Speech Corpus, Release 2, with turn/utterance-level dialog-act tags.
2\. The tags summarize syntactic, semantic, and pragmatic information about the associated turn. The SwDA project was undertaken at UC Boulder in the late 1990s. | Download: http://compprag.christopherpotts.net/swda.html
Instruction: https://web.stanford.edu/~jurafsky/ws97/manual.august1.html | S | Switchboard Dialog | Yes | Train: 197,489 training-set utterances, 1115 conversations
Test: 40 conversations
Annotation: 42 Classes | Act | Slot |
## Acknowledgment

Thanks for supports from my adviser [Wanxiang Che](http://ir.hit.edu.cn/~car/english.htm).

Thanks for **public contributions** from:
[Shuai Lin](https://github.com/ha-lins),
[JiAnge](https://github.com/linjian93),
[Su Zhu](https://github.com/sz128),
[seeledu](https://github.com/seeledu),
[Tony Lin](https://github.com/tnlin),
[Jason Krone](https://github.com/jasonkrone),
[Libo Qin](https://github.com/yizhen20133868),
[HariiHe](https://github.com/HariiHe),
[Jelle Bosscher](https://github.com/jellebosscher),
.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey

Awesome Lists containing this project

README