https://github.com/victorsungo/MMDialog

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
https://github.com/victorsungo/MMDialog

chat dataset

Last synced: 2 months ago
JSON representation

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

Host: GitHub
URL: https://github.com/victorsungo/MMDialog
Owner: victorsungo
Created: 2022-11-09T09:48:21.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-09-03T08:31:01.000Z (almost 2 years ago)
Last Synced: 2024-11-16T07:33:23.866Z (8 months ago)
Topics: chat, dataset
Language: Python
Homepage:
Size: 2.86 MB
Stars: 190
Watchers: 4
Forks: 7
Open Issues: 1
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

StarryDivineSky - victorsungo/MMDialog

README

# MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation #

This repository is the official site of ACL'23 paper: [MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation](https://aclanthology.org/2023.acl-long.405/)

## About the dataset

**A Dialogue Case of MMDialog:**

Dataset ADialogueCase

**Statistics:**

Dataset Statistics

If you use it in your work, please cite our paper:
[![LINK](https://img.shields.io/badge/-Paper%20Link-lightgrey)](https://aclanthology.org/2023.acl-long.405/) [![PDF](https://img.shields.io/badge/-PDF-red)](https://aclanthology.org/2023.acl-long.405.pdf)

```
@inproceedings{feng-etal-2023-mmdialog,
title = "{MMD}ialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation",
author = "Feng, Jiazhan and Sun, Qingfeng and Xu, Can and Zhao, Pu and Yang, Yaming and Tao, Chongyang and Zhao, Dongyan and Lin, Qingwei",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.405",
doi = "10.18653/v1/2023.acl-long.405",
pages = "7348--7363"
}
```

**Dataset Folder Format:**

Dataset Format

**File: conversations.json**

Dialogue Case

**Note:**
1. Training set do not contains "negative_candidate_media_keys" and "negative_candidate_texts", which only exists in test and validation set. Each "negative_candidate_xxx" contains 999 negative candidates for retrieval task.
2. All image filenames are in "media_key.jpg" format.
3. Words like :smiling_face_with_smiling_eyes: and :raising_hands: are emotion tokens, please refer to https://github.com/carpedm20/emoji
4. To compute the CLIP scores in metric MM-Relevance, we provide a demo in [compute_mmrel.py](compute_mmrel.py).
5. We also provide an evaluation example for metrics evaluated within a single modality (e.g., BLEU, Recall) in [EvaluationExample.md](EvaluationExample.md).
## How to get the dataset

### To get this dataset, you and your organization require:
1. Who it's for: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
2. Non-commercial use: You should only use this access for non-commercial purposes.
3. Clearly Plan: You have a clearly defined research objective, and you have specific plans for how you intend to use and analyze this data from your research.
4. Promise your behavior: You should promise you would not share this dataset without our qualification review and permission.

If you don't meet **all of the requirements** above, we **would not** share you the dataset.

### We need you to fill in the form below:

| Item | Description |
| ----------- | ----------- |
| Your Name | [Your name here] |
| Your Role | [master’s student / doctoral candidate / post-doc / faculty / research-focused employee / others] |
| Your Study or Work Organization | e.g. Microsoft Research, DeepMind, Cornell University, ... |
| Your Personal Academic Homepage **With Publications** | Your [Google Scholar] or [Homepage_URL running on your organization website (e.g. yourname.people.xxx.edu / yourname.xxx.people.msr.microsoft.com)] with publications. |
| Non-commercial Use | I [promise / cannot promise] that I will not apply this MMDialog dataset to commercial scenarios or products. |
| Sharing Limitation | I [promise / cannot promise] I would not share this MMDialog dataset without your qualification review and permission. |
| Your Plan | (Describe your research plan and how you intend to use and analyze this data from your research. **>= 50 words**) |

Then use your **edu or research email account** to send the form to [[email protected]] for a review, if you meet **all** the requirements, we would share you a cloud folder which stores the pre-processed dataset **within a week**.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/victorsungo/MMDialog

Awesome Lists containing this project

README