{"id":13486363,"url":"https://github.com/microsoft/GODEL","last_synced_at":"2025-03-27T20:33:05.094Z","repository":{"id":37912232,"uuid":"490885207","full_name":"microsoft/GODEL","owner":"microsoft","description":"Large-scale pretrained models for goal-directed dialog","archived":false,"fork":false,"pushed_at":"2023-12-10T04:20:12.000Z","size":52249,"stargazers_count":865,"open_issues_count":29,"forks_count":112,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-03-20T12:14:24.167Z","etag":null,"topics":["conversational-ai","data-processing","dialogpt","dialogue","dialogue-systems","grounded-generation","language-grounding","language-model","machine-learning","pretrained-model","pytorch","text-data","text-generation","transformer","transformers"],"latest_commit_sha":null,"homepage":"http://aka.ms/GODEL","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-10T22:58:14.000Z","updated_at":"2025-03-13T15:00:14.000Z","dependencies_parsed_at":"2024-07-23T01:35:08.044Z","dependency_job_id":"e641132e-787b-49b1-9366-b2d7b1f3225c","html_url":"https://github.com/microsoft/GODEL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FGODEL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FGODEL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FGODEL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FGODEL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/GODEL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245858861,"owners_count":20684061,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversational-ai","data-processing","dialogpt","dialogue","dialogue-systems","grounded-generation","language-grounding","language-model","machine-learning","pretrained-model","pytorch","text-data","text-generation","transformer","transformers"],"created_at":"2024-07-31T18:00:44.671Z","updated_at":"2025-03-27T20:33:05.088Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"# GODEL: Large-Scale Pre-Training for Goal-Directed Dialog\n\n## News\n\n(Update 10/23/2022) We have released GODEL V1.1, which is trained on 551M multi-turn dialogs from Reddit discussion thread, and 5M instruction and knowledge grounded dialogs. It has shown significantly better results on our benchmark, especially in the zero-shot setting.\n\nPlease check out our model cards in the huggingface Transformers repository. With several lines of code, it should be pretty straightforward to chat with GODEL. A live demo is shown [here.](https://huggingface.co/spaces/microsoft/GODEL-Demo)\n\nBase model: https://huggingface.co/microsoft/GODEL-v1_1-base-seq2seq\n\nLarge model: https://huggingface.co/microsoft/GODEL-v1_1-large-seq2seq\n\n## Introduction\nThis repository showcases **building goal-directed dialog** using GODEL, and contains the dataset, source code and pre-trained model for the following paper:\n\n\n[GODEL: Large-Scale Pre-Training for Goal-Directed Dialog](https://www.microsoft.com/en-us/research/publication/godel-large-scale-pre-training-for-goal-directed-dialog/)\u003cbr\u003eBaolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao\n![image](doc/GODEL.png)\n\nGODEL is a large-scale pre-trained model for goal-directed dialogs. It is parameterized with a Transformer-based encoder-decoder model and trained for response generation grounded in external text, which allows more effective fine-tuning on dialog tasks that require conditioning the response on information that is external to the current conversation (e.g., a retrieved document). The pre-trained model can be efficiently fine-tuned and adapted to accomplish a new dialog task with a handful of task-specific dialogs.\n\nThis repository is based on Hugginface Transformers. Some evaluation scripts and dataset are adapted from [DSTC7-End-to-End-Conversation-Modeling](data/grounded), [DialoGPT](data/ungrounded), [UnifiedQA](https://github.com/allenai/unifiedqa), [MS MARCO](https://microsoft.github.io/msmarco/), [MultiWOZ](https://github.com/budzianowski/multiwoz), [Schema-Guided Dataset](https://github.com/google-research-datasets/dstc8-schema-guided-dialogue), etc.\n\nThe included scripts can be used to reproduce the results reported in the paper. Project and demo webpage: [https://aka.ms/GODEL](https://aka.ms/GODEL)\n\n## Installation \n**Requires** The interactive interface requries *node.js* and *npm*. Please refer to [here](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) for installation.\n\nPlease use the below commands to create the environment, clone the repo and install required packages.\n```\nconda create -n godel-env python=3.8\nconda activate godel-env\nconda install nodejs\ngit clone https://github.com/microsoft/GODEL.git\ncd GODEL\npip install -r requirements.txt\nexport PYTHONPATH=\"`pwd`\"\n```\nFetch and unzip the pretrained model based on which to continue finetune your own data.  \n\n```zsh\nwget https://bapengstorage.blob.core.windows.net/fileshare/godel_base.tar.gz\ntar -zxvf godel_base.tar.gz\n```\n## Pipeline\n**Data format**\n```json\n  {\n    \"Context\": \"Please remind me of calling to Jessie at 2PM.\",\n    \"Knowledge\": \"reminder_contact_name is Jessie, reminder_time is 2PM\",\n    \"Response\": \"Sure, set the reminder: call to Jesse at 2PM\"\n  },\n```\nWe use json format to represent a training example. As shown in the above example, it contains the following fields:\n* **Context** - The context from session beginning to current turn.\n* **Knowledge** - External or environment state represented in plain text.\n* **Reponse** - The target agent respose. It can be a template, an api call or natural language.\n\n**Fine-tuning**\n```Bash\nDATA_NAME={path_of_data}\nOUTPUT_DIR={path_of_fine-tuned_model}\nMODEL_PATH={path_of_pre-trained_model}\nEXP_NAME={experiment_name}\n\npython train.py --model_name_or_path ${MODEL_PATH} \\\n\t--dataset_name ${DATA_NAME} \\\n\t--output_dir ${OUTPUT_DIR} \\\n\t--per_device_train_batch_size=16 \\\n\t--per_device_eval_batch_size=16 \\\n\t--max_target_length 512 \\\n\t--max_length 512 \\\n\t--num_train_epochs 50 \\\n\t--save_steps 10000 \\\n\t--num_beams 5 \\\n\t--exp_name ${EXP_NAME} --preprocessing_num_workers 24\n```\n\n\n**Generation**\n```python\nDATA_NAME={path_of_data}\nOUTPUT_DIR={path_to_save_predictions}\nMODEL_PATH={path_of_fine-tuned_model}\n\npython generate.py --model_name_or_path ${MODEL_PATH}  \\\n\t--dataset_name ${DATA_NAME}  \\\n\t--output_dir ${OUTPUT_DIR}  \\\n\t--per_device_eval_batch_size=16  \\\n\t--max_target_length 128 \\\n\t--max_length 512  \\\n\t--preprocessing_num_workers 24  \\\n\t--num_beams 5 \n```\n\n**Interaction**  \n\nWe provide a demo interface to chat with finetuned models. The backend server is based on *flask* and the interface is based on *vue*, *bootstrap-vue*, and *BasicVueChat*.\n\nStart the backend server:\n```bash\n# Please create the backend server refering to e.g., dstc9_server.py\npython EXAMPLE_server.py # start the sever and expose 8080 \n```\n\nStart serving frontend page:\n```bash\ncd GODEL/html\nnpm install\nnpm run serve \n```\nOpen localhost:8080 in your web browser, you will see the following page. Note that the backend port should be consistent with the port used in html/compoents/chat.vue.\n\nA live demo is shown [here.](https://huggingface.co/spaces/microsoft/GODEL-Demo)\n\n## Models\n\nWe have released GODEL V1.1, which is trained on 551M multi-turn dialogs from Reddit discussion thread and 5M instruction and knowledge-grounded dialogs. More models will be released later.\n\n~~We have released three fine-tuned models which can be further fine-tuned on low-resource user-customized dataset. The total parameters in these models range from 117M to 2.7B.~~\n\n| Model      | Huggingface Model Cards |\n| :---: | :---: |\n| Base      |  [microsoft/GODEL-v1_1-base-seq2seq](https://huggingface.co/microsoft/GODEL-v1_1-base-seq2seq)      |\n| Large   |     [microsoft/GODEL-v1_1-large-seq2seq](https://huggingface.co/microsoft/GODEL-v1_1-large-seq2seq)    |\n\n\n## Training\n\n5/22/2023: Pretraining GODEL models with our codebase is no longer supported, but GODEL models remain available. See [here](TRAIN.md) for details.\n\n### Fine-tuning and Evaluation\n\nGODEL is fine-tuned and evaluated on four tasks. We provide scripts to create training and testing data in our format. Please refer to *create_downstream_dataset.sh* to download the original data and execute the following cmd.\n\n```Bash\ncd scripts \n./create_downstream_dataset.sh\n```\n\n```Bash\nGROUNDED_CHECKPOINT={path_to_saved_checkpoint}\nOUTPUT_DIR={path_to_save_predictions}\nTASK=wow\naccelerate launch --config_file configs/G16_config.yaml train.py \n\t--model_name_or_path ${GROUNDED_CHECKPOINT} \\\n\t--dataset_name ./datasets_loader/${TASK}_dataset.py \\\n\t--output_dir ${OUTPUT_DIR} \\\n\t--per_device_train_batch_size=16 \\\n\t--per_device_eval_batch_size=16 \\\n\t--max_target_length 256 \\\n\t--max_length 512 \\\n\t--num_train_epochs 10 \\\n\t--preprocessing_num_workers 24 \\\n\t--num_beams 5 \\\n\t--exp_name ${TASK}  \\\n\t--learning_rate 5e-5 \\\t\n\t--save_every_checkpoint \\\n\t--save_steps 50000 \n```\n\n## Tutorial - Adding a new task using GODEL\n\nIn this tutorial, you will build a grounded dialog model based on GODEL for DSTC9 task. Detailed information can be found at [here](https://github.com/alexa/alexa-with-dstc9-track1-dataset).\n\nFirstly download the data and convert it to GODEL format.\n```bash\ncd examples/dstc9\n./create_data.sh\n```\n*Finetune with the pre-trained GODEL model*\n```bash\ncd GODEL \nGODEL_MODEL={path_to_pre-trained_model}\npython train.py \n\t--model_name_or_path ${GODEL_MODEL}   \\\n\t--dataset_name ../examples/dstc9/dstc9_dataset.py   \\\n\t--output_dir ../examples/dstc9/ckpt   \\\n\t--per_device_train_batch_size=16  \\\n\t--per_device_eval_batch_size=16  \\\n\t--max_target_length 128  \\\n\t--max_length 512  \\\n\t--num_train_epochs 50  \\\n\t--save_steps 10000  \\\n\t--num_beams 5  \\\n\t--exp_name wow-test \\\n\t--preprocessing_num_workers 24 \\\n\t--save_every_checkpoint \n```\n*Interact with above trained model*\n```bash\ncd examples/dstc9\n# replace model path in dstc9_server with a trained ckpt in line 49\npython dstc9_server.py\n\ncd GODEL/html \nnpm install\nnpm run serve\n```\n\n## Disclaimer\nThis repository aims to facilitate research in a paradigm shift of building task bots at scale. This toolkit contains only part of the modeling machinery needed to actually produce a model weight file in a running dialog. On its own, this model provides only information about the weights of various text spans; in order for a researcher to actually use it, they will need to bring in-house conversational data of their own for future pre-training and decode the response generation from the pretrained/finetuned system. Microsoft is not responsible for any generation from the 3rd party utilization of the pretrained system.\n\n\u003c!-- ## Contact\nShould you have any questions/suggestions, feel free to contact bapeng@microsoft.com. --\u003e\n\n## Citation\nif you use this code and data in your research, please cite our arxiv paper:\n```\n@misc{peng2022godel,\nauthor = {Peng, Baolin and Galley, Michel and He, Pengcheng and Brockett, Chris and Liden, Lars and Nouri, Elnaz and Yu, Zhou and Dolan, Bill and Gao, Jianfeng},\ntitle = {GODEL: Large-Scale Pre-training for Goal-Directed Dialog},\nhowpublished = {arXiv},\nyear = {2022},\nmonth = {June},\nurl = {https://www.microsoft.com/en-us/research/publication/godel-large-scale-pre-training-for-goal-directed-dialog/},\n}\n```\n\n\n\u003c!-- # Project\n\n\u003e This repo has been populated by an initial template to help get you started. Please\n\u003e make sure to update the content to build a great experience for community-building.\n\nAs the maintainer of this project, please make a few updates:\n\n- Improving this README.MD file to provide a great experience\n- Updating SUPPORT.MD with content about this project's support experience\n- Understanding the security reporting process in SECURITY.MD\n- Remove this section from the README --\u003e\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","funding_links":[],"categories":["Uncategorized","Python"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FGODEL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FGODEL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FGODEL/lists"}