{"id":13437668,"url":"https://github.com/bigscience-workshop/xmtf","last_synced_at":"2025-04-09T05:10:21.916Z","repository":{"id":62791272,"uuid":"553619111","full_name":"bigscience-workshop/xmtf","owner":"bigscience-workshop","description":"Crosslingual Generalization through Multitask Finetuning","archived":false,"fork":false,"pushed_at":"2024-09-22T00:38:33.000Z","size":30005,"stargazers_count":529,"open_issues_count":11,"forks_count":38,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-02T04:03:29.729Z","etag":null,"topics":["bloom","bloomz","instruction-tuning","language-models","large-language-models","mt0","multilingual-nlp","multitask-learning","t5","zero-shot-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2211.01786","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bigscience-workshop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-18T13:49:59.000Z","updated_at":"2025-03-18T09:38:32.000Z","dependencies_parsed_at":"2024-11-11T03:12:57.063Z","dependency_job_id":null,"html_url":"https://github.com/bigscience-workshop/xmtf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fxmtf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fxmtf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fxmtf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fxmtf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bigscience-workshop","download_url":"https://codeload.github.com/bigscience-workshop/xmtf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247980837,"owners_count":21027808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom","bloomz","instruction-tuning","language-models","large-language-models","mt0","multilingual-nlp","multitask-learning","t5","zero-shot-learning"],"created_at":"2024-07-31T03:00:59.166Z","updated_at":"2025-04-09T05:10:21.900Z","avatar_url":"https://github.com/bigscience-workshop.png","language":"Jupyter Notebook","readme":"# Crosslingual Generalization through Multitask Finetuning\n\n![](xmtf_banner.png)\n\nThis repository provides an overview of all components used for the creation of BLOOMZ \u0026 mT0 and xP3 introduced in the paper [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786). [Link to 25min video](https://www.youtube.com/watch?v=LG_N5ITizDo\u0026pp=ygU4Q3Jvc3NsaW5ndWFsIEdlbmVyYWxpemF0aW9uIHRocm91Z2ggTXVsdGl0YXNrIEZpbmV0dW5pbmc%3D) on the paper by Samuel Albanie; [Link to 4min video](https://www.youtube.com/watch?v=DFMH9f2cj3A\u0026t=8s\u0026pp=ygU4Q3Jvc3NsaW5ndWFsIEdlbmVyYWxpemF0aW9uIHRocm91Z2ggTXVsdGl0YXNrIEZpbmV0dW5pbmc%3D) on the paper by Niklas Muennighoff.\n\n\u003c!-- TOC --\u003e\n\n- [Data](#data)\n- [Models](#models)\n- [Create xP3](#create-xp3)\n- [Train models](#train-models)\n    - [BLOOMZ](#bloomz)\n    - [mT0](#mt0)\n- [Evaluate models](#evaluate-models)\n    - [Rank Evaluation](#rank-evaluation)\n    - [Generation Evaluation](#generation-evaluation)\n- [Plots \u0026 Tables](#plots--tables)\n    - [Plots](#plots)\n    - [Tables](#tables)\n- [Citation](#citation)\n\n\u003c!-- /TOC --\u003e\n\n## Data\n\n\u003ctable\u003e\n  \u003ctr\u003e\n\u003cth\u003eName\u003c/th\u003e\n\u003cth\u003eExplanation\u003c/th\u003e\n\u003cth\u003eExample models\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/Muennighoff/xP3x\u003exP3x\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003eMixture of 17 tasks in 277 languages with English prompts\u003c/td\u003e\n\u003ctd\u003eWIP - Join us at Project Aya @\u003ca href=https://cohere.for.ai/\u003eC4AI\u003c/a\u003e to help!\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/bigscience/xP3\u003exP3\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003eMixture of 13 training tasks in 46 languages with English prompts\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz\u003eBLOOMZ\u003c/a\u003e \u0026 \u003ca href=https://huggingface.co/bigscience/mt0-xxl\u003emT0-13B\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/bigscience/xP3mt\u003exP3mt\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003eMixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English)\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-mt\u003eBLOOMZ-MT\u003c/a\u003e \u0026 \u003ca href=https://huggingface.co/bigscience/mt0-xxl-mt\u003emT0-13B-MT\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/bigscience/xP3all\u003exP3all\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003exP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/bigscience/xP3megds\u003exP3megds\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003e\u003ca href=https://github.com/bigscience-workshop/Megatron-DeepSpeed\u003eMegatron-DeepSpeed\u003c/a\u003e processed version of xP3\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz\u003eBLOOMZ\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/datasets/Muennighoff/P3\u003eP3\u003c/a\u003e\u003c/t\u003e \n\u003ctd\u003eRepreprocessed version of the English-only \u003ca href=https://huggingface.co/datasets/bigscience/P3\u003eP3\u003c/a\u003e with 8 training tasks\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-p3\u003eBLOOMZ-P3\u003c/a\u003e \u0026 \u003ca href=https://huggingface.co/bigscience/mt0-xxl-p3\u003emT0-13B-P3\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Models\n\n\u003ctable\u003e\n  \u003ctr\u003e\n\u003cth colspan=\"12\"\u003eMultitask finetuned on \u003ca style=\"font-weight:bold\" href=https://huggingface.co/datasets/bigscience/xP3\u003exP3\u003c/a\u003e. Recommended for prompting in English.\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eParameters\u003c/td\u003e\n\u003ctd\u003e300M\u003c/td\u003e\n\u003ctd\u003e580M\u003c/td\u003e\n\u003ctd\u003e1.2B\u003c/td\u003e\n\u003ctd\u003e3.7B\u003c/td\u003e\n\u003ctd\u003e13B\u003c/td\u003e\n\u003ctd\u003e560M\u003c/td\u003e\n\u003ctd\u003e1.1B\u003c/td\u003e\n\u003ctd\u003e1.7B\u003c/td\u003e\n\u003ctd\u003e3B\u003c/td\u003e\n\u003ctd\u003e7.1B\u003c/td\u003e\n\u003ctd\u003e176B\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eFinetuned Model\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-small\u003emt0-small\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-base\u003emt0-base\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-large\u003emt0-large\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-xl\u003emt0-xl\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-xxl\u003emt0-xxl\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-560m\u003ebloomz-560m\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-1b1\u003ebloomz-1b1\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-1b7\u003ebloomz-1b7\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-3b\u003ebloomz-3b\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-7b1\u003ebloomz-7b1\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz\u003ebloomz\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tr\u003e\n  \u003ctr\u003e\n\u003cth colspan=\"12\"\u003eMultitask finetuned on \u003ca style=\"font-weight:bold\" href=https://huggingface.co/datasets/bigscience/xP3mt\u003exP3mt\u003c/a\u003e. Recommended for prompting in non-English.\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eFinetuned Model\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-xxl-mt\u003emt0-xxl-mt\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-7b1-mt\u003ebloomz-7b1-mt\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-mt\u003ebloomz-mt\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003cth colspan=\"12\"\u003eMultitask finetuned on \u003ca style=\"font-weight:bold\" href=https://huggingface.co/datasets/Muennighoff/P3\u003eP3\u003c/a\u003e. Released for research purposes only. Strictly inferior to above models!\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eFinetuned Model\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/mt0-xxl-p3\u003emt0-xxl-p3\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-7b1-p3\u003ebloomz-7b1-p3\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloomz-p3\u003ebloomz-p3\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003cth colspan=\"12\"\u003eOriginal pretrained checkpoints. Not recommended.\u003c/th\u003e\n\u003ctr\u003e\n\u003ctd\u003ePretrained Model\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/google/mt5-small\u003emt5-small\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/google/mt5-base\u003emt5-base\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/google/mt5-large\u003emt5-large\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/google/mt5-xl\u003emt5-xl\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/google/mt5-xxl\u003emt5-xxl\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom-560m\u003ebloom-560m\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom-1b1\u003ebloom-1b1\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom-1b7\u003ebloom-1b7\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom-3b\u003ebloom-3b\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom-7b1\u003ebloom-7b1\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=https://huggingface.co/bigscience/bloom\u003ebloom\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Create xP3(x)\n\nWe have processed \u0026 uploaded [xP3](https://huggingface.co/datasets/bigscience/xP3). If you want to recreate it, follow these steps:\n\n1. Get promptsource: For xP3mt `git clone -b xp3mt https://github.com/Muennighoff/promptsource.git`, for xP3 `git clone -b tr13 https://github.com/Muennighoff/promptsource.git` \u0026 install `cd promptsource; pip install -e .`\n2. Get packages `pip install -q datasets iso-639`\n3. Get the [creation script](https://github.com/bigscience-workshop/bigscience/blob/master/data/xp3/prepare_xp3_train.py) \u0026 edit it if necessary:\n- For xP3mt, set `USE_ENGLISH_PROMPTS = False` in the beginning\n- For xP3, set `USE_ENGLISH_PROMPTS = True` in the beginning\n4. Run the script, such as via `python prepare_xp3.py` or a [SLURM script](https://github.com/bigscience-workshop/bigscience/blob/master/data/xp3/prepare_xp3_train.slurm)\n\nFor the new extension of xP3, [xP3x](https://huggingface.co/datasets/Muennighoff/xP3x), the process is largely the same except:\n\n1. Install the `xp3x` branch instead i.e. `pip install git+https://github.com/Muennighoff/promptsource.git@xp3x`\n3. The creation script is in this repository \u0026 named `create_xp3x.py`.\n\nxP3x is a superset of xP3, so unless you want to reproduce the paper, we recommend always using xP3x (or xP3mt if you want machine-translated prompts).\n\n## Train models\n\n### BLOOMZ\n\n1. Download the pretrained model [checkpoint](https://huggingface.co/bigscience/bloom-optimizer-states), which is of shape PP=12, TP=4, DP=4. If you'd like to reshape the model you will also need to download [the universal checkpoint](https://huggingface.co/bigscience/bloom-optimizer-states/tree/global_step95000_universal). If you want to continue finetuning, you should use [our finetuned checkpoint](https://huggingface.co/bigscience/bloomz-optimizer-states), which is of shape PP=72, TP=1, DP=4.\n2. Setup the training code: `git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed` \u0026 follow its [setup guide](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/t0loading#get-started-fast) to create an environment with necessary packages.\n3. Download the Megatron-DeepSpeed processed [xP3megds](https://huggingface.co/datasets/bigscience/xP3megds) or repreprocess it for Megatron-DeepSpeed yourself by downloading [xP3](https://huggingface.co/datasets/bigscience/xP3), removing the `merged_{lang}.jsonl` files \u0026 preprocess it using the script [here](https://github.com/bigscience-workshop/bigscience/blob/master/data/xp3/xp3_jsonl_to_meg.slurm).\n4. Setup \u0026 run the training script: We use SLURM scripts available at [bigscience-workshop/bigscience/train/tr13-mtf](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr13-mtf) and referred to as `xp3capmixnewcodelonglossseq`. E.g. [this is the script launched to train bloomz](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr13-mtf/tr13-176B-mtf-xp3capmixnewcodelonglossseq.slurm). Important parts of the script to modify are:\n- `#SBATCH` variables, such as nodes, gpus, time, etc. - Our SLURM guide is [here](https://github.com/bigscience-workshop/bigscience/tree/master/jz/slurm#slurm-how-to)\n- `source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0` to point to your own conda environment setup via Megatron-DeepSpeed\n- PATH environment variables, notably\n    - `TRAIN_DATA_PATH` \u0026 `VALID_DATA_PATH`, which point to files pointing to your processed training and validation data. We provide our files in this repository (`xp3capmixnewcodelong_train.txt` \u0026 `xp3capmixnewcodelong_validation.txt`), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.\n- PP_SIZE=72, TP_SIZE=1 \u0026 BATCH SIZE \u0026 co specifying the layout. This will depend on the hardware available to you. If you change, you may have to reshape the model. For reshaping you need to use the universal checkpoint and use the `--universal` flag in the script. We recommend saving a new checkpoint right after \u0026 then continuing training without `--universal`, which will be faster.\n- If you want to restart from a saved checkpoint (e.g. after training a few steps like above), make sure to remove the `--no-load-optim` \u0026 `--reset-progress` flags\n- After training, you can convert the checkpoint to transformers format using the script [here](https://github.com/huggingface/transformers/blob/ee8e80a060d65ab349743ffcb5842365eb0e5606/src/transformers/models/bloom/convert_bloom_original_checkpoint_to_pytorch.py)\n\nHelpful resources:\n- [Blog post](https://huggingface.co/blog/bloom-megatron-deepspeed)\n- BLOOM community tab, such as [here](https://huggingface.co/bigscience/bloom/discussions/46)\n\n### mT0\n\nFollow the finetuning instructions [here](https://github.com/google-research/t5x/blob/main/docs/usage/finetune.md) making sure to use pretrained mT5 models \u0026 the xP3 dataset.\n\nHelpful resources:\n- [T5X paper](https://arxiv.org/abs/2203.17189)\n\n## Evaluate models\n\nEvaluation results are all available in this repository: https://huggingface.co/datasets/bigscience/evaluation-results under the respective models.\nBelow we explain how to run evaluation.\n\n### Rank Evaluation \n\nWe evaluate the models on Rank Evaluation on [XCOPA](https://huggingface.co/datasets/xcopa), [XNLI](https://huggingface.co/datasets/xnli), [XStoryCloze](https://huggingface.co/datasets/Muennighoff/xstory_cloze) \u0026 [XWinograd](https://huggingface.co/datasets/Muennighoff/xwinograd):\n\n1. Get promptsource fork: `git clone -b xp3mt https://github.com/Muennighoff/promptsource.git` \u0026 `cd promptsource; pip install -e .`\n2. Get t-zero fork: `git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git` \u0026 `cd t-zero; pip install -e .`\n3. Download model \u0026 run evaluation script, for example for [bloomz](https://github.com/bigscience-workshop/bigscience/blob/master/evaluation/results/tr13/tzeroeval/evaluate_t0_176b.slurm).\n\n### Generation Evaluation\n\nWe evaluate generation on translation \u0026 summarization during training for validation:\n\n1. Get promptsource fork: `git clone -b xp3mt https://github.com/Muennighoff/promptsource` \u0026 `cd promptsource; pip install -e .`\n2. Get [bigscience-workshop/lm-evaluation-harness](https://github.com/bigscience-workshop/lm-evaluation-harness): `git clone https://github.com/bigscience-workshop/lm-evaluation-harness`. The script for the 7.1B model, for example, is [here](https://github.com/bigscience-workshop/bigscience/blob/master/evaluation/results/tr13/lmeval/run_generation_7b1.slurm).\n\nWe also evaluate code generation on [HumanEval](https://huggingface.co/datasets/openai_humaneval):\n\n1. Get code evaluation code `git clone https://github.com/loubnabnl/bloom-code-evaluation` \u0026 go through its setup.\n2. Set `prepend_eos` to `False` in `code_eval.py` at `complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs)` i.e. `complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs)`.\n3. Download model \u0026 run evaluation script swapping out MODEL_CKPT for your path, for example for bloomz use [this](https://github.com/loubnabnl/bloom-code-evaluation/blob/master/generate_code_bloom.slurm).\n\n\n## Plots \u0026 Tables\n\n### Plots\n\n- Figure 1: `plotstables/xp3_taxonomy.drawio` \u0026 `plotstables/xp3_taxonomy.pdf`\n- Figure 2: `plotstables/xp3_languages.ipynb` \u0026 [colab](https://colab.research.google.com/drive/1yRDXktu030DnipFBj6-dwOGNVIdgktA9?usp=sharing)\n- Figure 3: `plotstables/xp3_variants.pdf` \u0026 [drawings](https://docs.google.com/drawings/d/1wSt_X0olUFcOFQ5D1UnMv1V-LKMr3WZIRIgaFypTP24/edit?usp=sharing)\n- Figure 4: `plotstables/xp3_generalization_bar.pdf` \u0026 [colab](https://colab.research.google.com/drive/1bz083LuBJi0-pLOqdr4_ycEctn6obYST?usp=sharing)\n- Figure 5: `plotstables/lang_generalization` \u0026 [colab](https://colab.research.google.com/drive/1lFFR6_ijR_iWJQnqIW5y5-LuRnRoRTS3?usp=sharing)\n- Figure 6: `plotstables/scale.pdf` \u0026 [colab](https://colab.research.google.com/drive/19GcYT5SJFpyu8B0RrewN462w3i461mZ5?usp=sharing)\n- Figure 7: `plotstables/validation.pdf` \u0026 [colab](https://colab.research.google.com/drive/1FWW7LMKC9kQNLgCLZXl_dBER5wBSPGMu?usp=sharing)\n- Figure 8: `plotstables/pretraining_sizes.pdf` \u0026 [colab](https://colab.research.google.com/drive/1hpW6xEnU56Ed7DmXrREzczGwEeNV8KJ2?usp=sharing)\n- Figure 9: `plotstables/english_task_generalization.pdf` \u0026 [colab](https://colab.research.google.com/drive/1lFFR6_ijR_iWJQnqIW5y5-LuRnRoRTS3?usp=sharing)\n- Figure 10: `plotstables/task_generalization.pdf` \u0026 [colab](https://colab.research.google.com/drive/1lFFR6_ijR_iWJQnqIW5y5-LuRnRoRTS3?usp=sharing)\n- Figure 11: `plotstables/roots_xp3_languages.pdf` \u0026 [colab](https://colab.research.google.com/drive/1ankXUcTqjPantCzIfUSwAjYfAhkR7M6o?usp=sharing) requiring some of the files in `plotstables/contamination`\n- Figure 12: `plotstables/examples/bloom_code_example.py` \u0026 `plotstables/examples/bloom_code_light.pdf` \u0026 `plotstables/examples/bloomz_code_light.pdf`; The raw code files can be found [here](https://huggingface.co/datasets/bigscience/evaluation-results/blob/main/bloom/codeeval/transformers/openai_humaneval/code_generations_bloom.zip) \u0026 [here](https://huggingface.co/datasets/bigscience/evaluation-results/blob/main/bloomz/codeeval/transformers/openai_humaneval/code_generations_bloomz.zip)\n- Figure 13 - Figure 16: `plotstables/examples/*.pdf` \u0026 `plotstables/examples/generations.drawio`\n\n### Tables\n\n- Table 1: [Colab](https://colab.research.google.com/drive/1ZhwHDaHBPUlZiTp-ZZxy7axuWgE68FkW?usp=sharing) \u0026 [Colab for complex version](https://colab.research.google.com/drive/1WCUgfjToVJ9b_fJHzkWKsuGzVofqv38x?usp=sharing)\n- Table 2: Adapted from the Codex paper\n- Table 3: Manual\n- Table 4: `plotstables/compute_codegen_len.ipynb` for generations \u0026 `plotstables/countcode.py` for xP3\n- Table 5: Manual\n- Table 6: Manual\n- Table 7: `plotstables/levenshtein.py`\n- Table 8: Same as Table 1 with languages swapped from L1 to L2\n- Table 9: [Colab](https://colab.research.google.com/drive/1AWJk3jbrD1VpiMARW-xATalrupwFzZN-?usp=sharing)\n- Table 10: [Colab](https://colab.research.google.com/drive/14t9w6QSf2K5BQP0cInyGsreAhY271DLB?usp=sharing)\n- Prompt Appendix: https://github.com/albanie/prompt_formatting_in_latex\n\n## Citation\n\n```bibtex\n@article{muennighoff2022crosslingual,\n  title={Crosslingual generalization through multitask finetuning},\n  author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},\n  journal={arXiv preprint arXiv:2211.01786},\n  year={2022}\n}\n```\n","funding_links":[],"categories":["xMTF - BigScience","Jupyter Notebook","Open-source Instruction Data","Instruct/Prompt Tuning Data","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fxmtf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbigscience-workshop%2Fxmtf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fxmtf/lists"}