Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/victorsungo/MMDialog
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
https://github.com/victorsungo/MMDialog
chat dataset
Last synced: 3 months ago
JSON representation
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
- Host: GitHub
- URL: https://github.com/victorsungo/MMDialog
- Owner: victorsungo
- Created: 2022-11-09T09:48:21.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-03T08:31:01.000Z (about 1 year ago)
- Last Synced: 2024-06-24T05:40:36.365Z (5 months ago)
- Topics: chat, dataset
- Language: Python
- Homepage:
- Size: 2.86 MB
- Stars: 179
- Watchers: 4
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
- StarryDivineSky - victorsungo/MMDialog
README
# MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation #
This repository is the official site of ACL'23 paper: [MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation](https://aclanthology.org/2023.acl-long.405/)
## About the dataset
**A Dialogue Case of MMDialog:**
**Statistics:**
If you use it in your work, please cite our paper:
[![LINK](https://img.shields.io/badge/-Paper%20Link-lightgrey)](https://aclanthology.org/2023.acl-long.405/) [![PDF](https://img.shields.io/badge/-PDF-red)](https://aclanthology.org/2023.acl-long.405.pdf)```
@inproceedings{feng-etal-2023-mmdialog,
title = "{MMD}ialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation",
author = "Feng, Jiazhan and Sun, Qingfeng and Xu, Can and Zhao, Pu and Yang, Yaming and Tao, Chongyang and Zhao, Dongyan and Lin, Qingwei",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.405",
doi = "10.18653/v1/2023.acl-long.405",
pages = "7348--7363"
}
```**Dataset Folder Format:**
**File: conversations.json**
**Note:**
1. Training set do not contains "negative_candidate_media_keys" and "negative_candidate_texts", which only exists in test and validation set. Each "negative_candidate_xxx" contains 999 negative candidates for retrieval task.
2. All image filenames are in "media_key.jpg" format.
3. Words like :smiling_face_with_smiling_eyes: and :raising_hands: are emotion tokens, please refer to https://github.com/carpedm20/emoji
4. To compute the CLIP scores in metric MM-Relevance, we provide a demo in [compute_mmrel.py](compute_mmrel.py).
5. We also provide an evaluation example for metrics evaluated within a single modality (e.g., BLEU, Recall) in [EvaluationExample.md](EvaluationExample.md).
## How to get the dataset### To get this dataset, you and your organization require:
1. Who it's for: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
2. Non-commercial use: You should only use this access for non-commercial purposes.
3. Clearly Plan: You have a clearly defined research objective, and you have specific plans for how you intend to use and analyze this data from your research.
4. Promise your behavior: You should promise you would not share this dataset without our qualification review and permission.If you don't meet **all of the requirements** above, we **would not** share you the dataset.
### We need you to fill in the form below:
| Item | Description |
| ----------- | ----------- |
| Your Name | [Your name here] |
| Your Role | [master’s student / doctoral candidate / post-doc / faculty / research-focused employee / others] |
| Your Study or Work Organization | e.g. Microsoft Research, DeepMind, Cornell University, ... |
| Your Personal Academic Homepage **With Publications** | Your [Google Scholar] or [Homepage_URL running on your organization website (e.g. yourname.people.xxx.edu / yourname.xxx.people.msr.microsoft.com)] with publications. |
| Non-commercial Use | I [promise / cannot promise] that I will not apply this MMDialog dataset to commercial scenarios or products. |
| Sharing Limitation | I [promise / cannot promise] I would not share this MMDialog dataset without your qualification review and permission. |
| Your Plan | (Describe your research plan and how you intend to use and analyze this data from your research. **>= 50 words**) |Then use your **edu or research email account** to send the form to [[email protected]] for a review, if you meet **all** the requirements, we would share you a cloud folder which stores the pre-processed dataset **within a week**.