{"id":13486369,"url":"https://github.com/huggingface/transfer-learning-conv-ai","last_synced_at":"2025-05-15T13:08:52.943Z","repository":{"id":38106699,"uuid":"185401193","full_name":"huggingface/transfer-learning-conv-ai","owner":"huggingface","description":"🦄 State-of-the-Art Conversational AI with Transfer Learning","archived":false,"fork":false,"pushed_at":"2023-06-12T21:27:49.000Z","size":57,"stargazers_count":1749,"open_issues_count":69,"forks_count":433,"subscribers_count":83,"default_branch":"master","last_synced_at":"2025-05-08T05:39:54.408Z","etag":null,"topics":["chatbots","deep-learning","dialog","gpt","gpt-2","neural-networks","nlp","pytorch","transfer-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-07T12:51:08.000Z","updated_at":"2025-04-27T03:21:56.000Z","dependencies_parsed_at":"2022-07-20T01:18:27.357Z","dependency_job_id":"ee670426-f4b4-48c2-85e6-2e6671191bd8","html_url":"https://github.com/huggingface/transfer-learning-conv-ai","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftransfer-learning-conv-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftransfer-learning-conv-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftransfer-learning-conv-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftransfer-learning-conv-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/transfer-learning-conv-ai/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346625,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbots","deep-learning","dialog","gpt","gpt-2","neural-networks","nlp","pytorch","transfer-learning"],"created_at":"2024-07-31T18:00:44.755Z","updated_at":"2025-05-15T13:08:47.928Z","avatar_url":"https://github.com/huggingface.png","language":"Python","funding_links":[],"categories":["Uncategorized","Python","A01_文本生成_文本对话"],"sub_categories":["Uncategorized","大语言对话模型及数据"],"readme":"# 🦄 Building a State-of-the-Art Conversational AI with Transfer Learning\n\nThe present repo contains the code accompanying the blog post [🦄 How to build a State-of-the-Art Conversational AI with Transfer Learning](https://medium.com/@Thomwolf/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313).\n\nThis code is a clean and commented code base with training and testing scripts that can be used to train a dialog agent leveraging transfer Learning from an OpenAI GPT and GPT-2 Transformer language model.\n\nThis codebase can be used to reproduce the results of HuggingFace's participation to NeurIPS 2018 dialog competition [ConvAI2](http://convai.io/) which was state-of-the-art on the automatic metrics. The 3k+ lines of competition code was distilled in about 250 lines of training code with distributed \u0026 FP16 options to form the present repository.\n\nThis model can be trained in about one hour on a 8 V100 cloud instance (currently costs about $25) and a pre-trained model is also made available.\n\n## Installation\n\nTo install and use the training and inference scripts please clone the repo and install the requirements:\n\n```bash\ngit clone https://github.com/huggingface/transfer-learning-conv-ai\ncd transfer-learning-conv-ai\npip install -r requirements.txt\npython -m spacy download en\n```\n\n## Installation with Docker\n\nTo install using docker please build the self-contained image:\n\n```bash\ndocker build -t convai .\n```\n\n_Note: Make sure your Docker setup allocates enough memory to building the container. Building with the default of 1.75GB will fail due to large Pytorch wheel._\n\nYou can then enter the image  \n\n```bash\nip-192-168-22-157:transfer-learning-conv-ai loretoparisi$ docker run --rm -it convai bash\nroot@91e241bb823e:/# ls\nDockerfile  README.md  boot                  dev  home         lib    media  models  proc              root  sbin  sys  train.py  utils.py\nLICENCE     bin        convai_evaluation.py  etc  interact.py  lib64  mnt    opt     requirements.txt  run   srv   tmp  usr       var\n```\n\nYou can then run the `interact.py` script on the pretrained model:\n\n```bash\npython3 interact.py --model models/\n```\n\n## Pretrained model\n\nWe make a pretrained and fine-tuned model available on our S3 [here](https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/finetuned_chatbot_gpt.tar.gz). The easiest way to download and use this model is just to run the `interact.py` script to talk with the model. Without any argument, this script will automatically download and cache our model.\n\n## Using the training script\n\nThe training script can be used in single GPU or multi GPU settings:\n\n```bash\npython ./train.py  # Single GPU training\npython -m torch.distributed.launch --nproc_per_node=8 ./train.py  # Training on 8 GPUs\n```\n\nThe training script accept several arguments to tweak the training:\n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\ndataset_path | `str` | `\"\"` | Path or url of the dataset. If empty download from S3.\ndataset_cache | `str` | `'./dataset_cache.bin'` | Path or url of the dataset cache\nmodel | `str` | `\"openai-gpt\"` | Path, url or short name of the model\nnum_candidates | `int` | `2` | Number of candidates for training\nmax_history | `int` | `2` | Number of previous exchanges to keep in history\ntrain_batch_size | `int` | `4` | Batch size for training\nvalid_batch_size | `int` | `4` | Batch size for validation\ngradient_accumulation_steps | `int` | `8` | Accumulate gradients on several steps\nlr | `float` | `6.25e-5` | Learning rate\nlm_coef | `float` | `1.0` | LM loss coefficient\nmc_coef | `float` | `1.0` | Multiple-choice loss coefficient\nmax_norm | `float` | `1.0` | Clipping gradient norm\nn_epochs | `int` | `3` | Number of training epochs\npersonality_permutations | `int` | `1` | Number of permutations of personality sentences\ndevice | `str` | `\"cuda\" if torch.cuda.is_available() else \"cpu\"` | Device (cuda or cpu)\nfp16 | `str` | `\"\"` | Set to O0, O1, O2 or O3 for fp16 training (see apex documentation)\nlocal_rank | `int` | `-1` | Local rank for distributed training (-1: not distributed)\n\nHere is how to reproduce our results on a server with 8 V100 GPUs (adapt number of nodes and batch sizes to your configuration):\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=8 ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=2 --valid_batch_size=2\n```\n\nThis model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below).\n\nThese numbers are slightly lower than the number we obtained in the ConvAI2 competition. Here is what you can tweak to reach the same results:\n\n- in the ConvAI2 competition we also used tweaked position emebddings so that the history of the dialog always start at with the same embeddings. This is easy to add with pytorch-transformers and should improve the hits@1 metric.\n- in the ConvAI2 competition we used a beam search decoder. While the results are better in term of f1 metric, our feeling is that the human experience is less compelling with beam search versus the nucleus sampling detector which is provided in the present repository.\n\n## Using the interaction script\n\nThe training script saves all the experiments and checkpoints in a sub-folder named with the timestamp of the experiment in the `./runs` folder of the repository base folder.\n\nYou can then use the interactive script to interact with the model simply by pointing to this folder.\n\nHere is an example command line to run the interactive script:\n\n```bash\npython ./interact.py --model_checkpoint ./data/Apr17_13-31-38_thunder/  # run the interactive script with a training checkpoint\npython ./interact.py  # run the interactive script with the finetuned model on our S3\n```\n\nThe fine-tuned model will gives FINAL Hits@1: 0.715\n\nThe interactive script accept a few arguments to tweak the decoding algorithm:\n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\ndataset_path | `str` | `\"\"` | Path or url of the dataset. If empty download from S3.\ndataset_cache | `str` | `'./dataset_cache.bin'` | Path or url of the dataset cache\nmodel | `str` | `\"openai-gpt\"` | Path, url or short name of the model\nmax_history | `int` | `2` | Number of previous utterances to keep in history\ndevice | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu)\nno_sample | action `store_true` | Set to use greedy decoding instead of sampling\nmax_length | `int` | `20` | Maximum length of the output utterances\nmin_length | `int` | `1` | Minimum length of the output utterances\nseed | `int` | `42` | Seed\ntemperature | `int` | `0.7` | Sampling softmax temperature\ntop_k | `int` | `0` | Filter top-k tokens before sampling (`\u003c=0`: no filtering)\ntop_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`\u003c=0.0`: no filtering)\n\n## Running ConvAI2 evaluation scripts\n\nTo run the evaluation scripts of the ConvAI2 challenge, you first need to install `ParlAI` in the repo base folder like this:\n\n```bash\ngit clone https://github.com/facebookresearch/ParlAI.git\ncd ParlAI\npython setup.py develop\n```\n\nYou can then run the evaluation script from `ParlAI` base folder:\n\n```bash\ncd ParlAI\npython ../convai_evaluation.py --eval_type hits@1  # to download and evaluate our fine-tuned model on hits@1 metric\npython ../convai_evaluation.py --eval_type hits@1  --model_checkpoint ./data/Apr17_13-31-38_thunder/  # to evaluate a training checkpoint on hits@1 metric\n```\n\nThe evaluation script accept a few arguments to select the evaluation metric and tweak the decoding algorithm:\n\nArgument | Type | Default value | Description\n---------|------|---------------|------------\neval_type | `str` | `\"hits@1\"` | Evaluate the model on `hits@1`, `ppl` or `f1` metric on the ConvAI2 validation dataset\nmodel | `str` | `\"openai-gpt\"` | Path, url or short name of the model\nmax_history | `int` | `2` | Number of previous utterances to keep in history\ndevice | `str` | `cuda` if `torch.cuda.is_available()` else `cpu` | Device (cuda or cpu)\nno_sample | action `store_true` | Set to use greedy decoding instead of sampling\nmax_length | `int` | `20` | Maximum length of the output utterances\nmin_length | `int` | `1` | Minimum length of the output utterances\nseed | `int` | `42` | Seed\ntemperature | `int` | `0.7` | Sampling softmax temperature\ntop_k | `int` | `0` | Filter top-k tokens before sampling (`\u003c=0`: no filtering)\ntop_p | `float` | `0.9` | Nucleus filtering (top-p) before sampling (`\u003c=0.0`: no filtering)\n\n## Data Format\nsee `example_entry.py`, and the comment at the top.\n\n## Citation\n\nIf you use this code in your research, you can cite our NeurIPS CAI workshop [paper](http://arxiv.org/abs/1901.08149):\n\n```bash\n@article{DBLP:journals/corr/abs-1901-08149,\n  author    = {Thomas Wolf and\n               Victor Sanh and\n               Julien Chaumond and\n               Clement Delangue},\n  title     = {TransferTransfo: {A} Transfer Learning Approach for Neural Network\n               Based Conversational Agents},\n  journal   = {CoRR},\n  volume    = {abs/1901.08149},\n  year      = {2019},\n  url       = {http://arxiv.org/abs/1901.08149},\n  archivePrefix = {arXiv},\n  eprint    = {1901.08149},\n  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},\n  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-08149},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftransfer-learning-conv-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Ftransfer-learning-conv-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftransfer-learning-conv-ai/lists"}