{"id":13754332,"url":"https://github.com/victorsungo/MMDialog","last_synced_at":"2025-05-09T22:31:55.261Z","repository":{"id":63203032,"uuid":"563769112","full_name":"victorsungo/MMDialog","owner":"victorsungo","description":"The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation","archived":false,"fork":false,"pushed_at":"2023-09-03T08:31:01.000Z","size":3000,"stargazers_count":190,"open_issues_count":1,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-16T07:33:23.866Z","etag":null,"topics":["chat","dataset"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victorsungo.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-09T09:48:21.000Z","updated_at":"2024-10-14T03:19:50.000Z","dependencies_parsed_at":"2024-08-03T09:17:18.228Z","dependency_job_id":null,"html_url":"https://github.com/victorsungo/MMDialog","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorsungo%2FMMDialog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorsungo%2FMMDialog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorsungo%2FMMDialog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorsungo%2FMMDialog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victorsungo","download_url":"https://codeload.github.com/victorsungo/MMDialog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335805,"owners_count":21892738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chat","dataset"],"created_at":"2024-08-03T09:01:55.149Z","updated_at":"2025-05-09T22:31:50.239Z","avatar_url":"https://github.com/victorsungo.png","language":"Python","funding_links":[],"categories":["NLP语料和数据集","Python"],"sub_categories":["其他_文本生成、文本对话"],"readme":"\r\n# MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation #\r\n\r\n\r\n\r\nThis repository is the official site of ACL'23 paper: [MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation](https://aclanthology.org/2023.acl-long.405/)\r\n\r\n## About the dataset\r\n\r\n**A Dialogue Case of MMDialog:**\r\n\r\n\u003cimg title=\"Dataset ADialogueCase\" alt=\"Dataset ADialogueCase\" src=\"./ADialogueCase.PNG\" style=\"height: 800px;\"/\u003e\r\n\r\n\r\n**Statistics:**\r\n\r\n\u003cimg title=\"Dataset Statistics\" alt=\"Dataset Statistics\" src=\"./DatasetStatistics_1.png\" style=\"height: 260px;\"/\u003e\r\n\r\n\u003cimg title=\"Dataset Statistics\" alt=\"Dataset Statistics\" src=\"./DatasetStatistics_2.png\" style=\"height: 260px;\"/\u003e\r\n\r\nIf you use it in your work, please cite our paper:\r\n [![LINK](https://img.shields.io/badge/-Paper%20Link-lightgrey)](https://aclanthology.org/2023.acl-long.405/) [![PDF](https://img.shields.io/badge/-PDF-red)](https://aclanthology.org/2023.acl-long.405.pdf)\r\n\r\n```\r\n@inproceedings{feng-etal-2023-mmdialog,\r\n    title = \"{MMD}ialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation\",\r\n    author = \"Feng, Jiazhan and Sun, Qingfeng and Xu, Can and Zhao, Pu and Yang, Yaming and Tao, Chongyang and Zhao, Dongyan and Lin, Qingwei\",\r\n    booktitle = \"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\r\n    month = jul,\r\n    year = \"2023\",\r\n    address = \"Toronto, Canada\",\r\n    publisher = \"Association for Computational Linguistics\",\r\n    url = \"https://aclanthology.org/2023.acl-long.405\",\r\n    doi = \"10.18653/v1/2023.acl-long.405\",\r\n    pages = \"7348--7363\"\r\n}\r\n```\r\n\r\n**Dataset Folder Format:**\r\n\r\n\u003cimg title=\"Dataset Format\" alt=\"Dataset Format\" src=\"./DatasetTree.png\" style=\"height: 360px;\"/\u003e\r\n\r\n**File: conversations.json**\r\n\r\n\u003cimg title=\"Dialogue Case\" alt=\"Dialogue Case\" src=\"./ConvCase.png\"\u003e\r\n\r\n**Note:** \r\n1. Training set do not contains \"negative_candidate_media_keys\" and \"negative_candidate_texts\", which only exists in test and validation set. Each \"negative_candidate_xxx\" contains 999 negative candidates for retrieval task.\r\n2. All image filenames are in \"media_key.jpg\" format.\r\n3. Words like :smiling_face_with_smiling_eyes: and :raising_hands: are emotion tokens, please refer to https://github.com/carpedm20/emoji\r\n4. To compute the CLIP scores in metric MM-Relevance, we provide a demo in [compute_mmrel.py](compute_mmrel.py).\r\n5. We also provide an evaluation example for metrics evaluated within a single modality (e.g., BLEU, Recall) in [EvaluationExample.md](EvaluationExample.md).\r\n## How to get the dataset\r\n\r\n### To get this dataset, you and your organization require:\r\n1. Who it's for: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.\r\n2. Non-commercial use: You should only use this access for non-commercial purposes.\r\n3. Clearly Plan: You have a clearly defined research objective, and you have specific plans for how you intend to use and analyze this data from your research. \r\n4. Promise your behavior: You should promise you would not share this dataset without our qualification review and permission.\r\n\r\nIf you don't meet **all of the requirements** above, we **would not** share you the dataset.\r\n\r\n### We need you to fill in the form below:\r\n\r\n| Item      | Description |\r\n| ----------- | ----------- |\r\n| Your  Name      | [Your name here]       |\r\n| Your  Role      | [master’s student / doctoral candidate / post-doc / faculty / research-focused employee / others]       |\r\n| Your  Study or Work Organization | e.g. Microsoft Research, DeepMind, Cornell University, ...       |\r\n| Your  Personal Academic Homepage **With Publications** | Your [Google Scholar] or [Homepage_URL running on  your organization website (e.g. yourname.people.xxx.edu / yourname.xxx.people.msr.microsoft.com)] with publications. |\r\n| Non-commercial Use  | I [promise / cannot promise] that I will not apply this MMDialog dataset to commercial scenarios or products.  |\r\n| Sharing Limitation  | I [promise / cannot promise] I would not share this MMDialog dataset without your qualification review and permission.  |\r\n| Your Plan      | (Describe your research plan and how you intend to use and analyze this data from your research. **\u003e= 50 words**)   |\r\n\r\nThen use your **edu or research email account** to send the form to [fengjiazhan@pku.edu.cn] for a review, if you meet **all** the requirements, we would share you a cloud folder which stores the pre-processed dataset **within a week**.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictorsungo%2FMMDialog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvictorsungo%2FMMDialog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictorsungo%2FMMDialog/lists"}