{"id":20216225,"url":"https://github.com/thudm/multilingual-glm","last_synced_at":"2025-07-24T11:35:11.870Z","repository":{"id":88826295,"uuid":"516067489","full_name":"THUDM/Multilingual-GLM","owner":"THUDM","description":"The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective ","archived":false,"fork":false,"pushed_at":"2022-11-19T16:05:07.000Z","size":15654,"stargazers_count":62,"open_issues_count":4,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-24T13:11:18.606Z","etag":null,"topics":["deep-learning","language-model","nlp","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":"change_mp.py","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-20T17:04:34.000Z","updated_at":"2024-08-12T20:25:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"acf31667-8c72-4cb6-b22d-230d42db3d55","html_url":"https://github.com/THUDM/Multilingual-GLM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FMultilingual-GLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FMultilingual-GLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FMultilingual-GLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FMultilingual-GLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/Multilingual-GLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248241671,"owners_count":21071046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","language-model","nlp","pytorch"],"created_at":"2024-11-14T06:26:52.274Z","updated_at":"2025-04-10T15:04:56.729Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multilingual-GLM\nThis repository contains the code of mGLM: a multilingual variant of GLM, a general language model trained with an autoregressive blank infilling objective. \n\nYou may want to check out our [interactive demo](https://models.aminer.cn/mglm-1b/demo/) based on mGLM that generates a brief Chinese/English summary for your article in any commonly used language.\n\nThe backbone structure of this model is based on [GLM: General Language Model Pretraining with Autoregressive Blank Infilling](https://aclanthology.org/2022.acl-long.26/) (Du et al., ACL 2022) \n\nCode is mainly based on [THUDM/GLM](https://github.com/THUDM/GLM). Part of the code is also based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [PET](https://github.com/timoschick/pet).\n\n### Parameters\nHere we provide a comparison between the sizes of different multilingual language models. \n| Model     | Parameters |\n|  ----  | ---- | \n| [mBERT](https://github.com/google-research/bert/blob/master/multilingual.md) | 180M | \n| [XLM-R](https://arxiv.org/abs/1911.02116) | 550M |\n| [MT5-Large](https://arxiv.org/abs/2010.11934) | 1.2B                   |\n| GLM-Large | 1B                 |\n\n## Pretrained Models\n\nYou can download [Our Pretrained Checkpoint](https://lfs.aminer.cn/misc/sunmengyang/mglm1b/new_pretrained.pt) and specify the checkpoint path in a script. The multilingual tokenizer and configuration file of our model are already included in this repo. \n\n\n## Test Results\n\n### Tasks in XTREME Benchmark\n|  Model | XNLI | PAWS-X | XQuAD | MLQA | TyDiQA |\n|  ----  | ---- | ---- | ---- | ---- | ---- |\n| GLM-Large | 75.6 | 85.2 | 83.6/71.9 | 67.52/54.34 |69.6/55.6 |\n| [MT5-Large](https://github.com/google-research/multilingual-t5) | 81.1 | 88.9 | 77.8/61.5 | 71.2/51.7 | 69.9/52.2 |\n\n\n### Neural Cross Lingual Summarization\n\nThe following table contains our test results for the [NCLS](https://aclanthology.org/D19-1302/) English to Chinese(EN2ZHSUM) dataset\n\nMetric is Rouge-1/Rouge-2/Rouge-L\n\n|  Model | NCLS English to Chinese|\n|  ----  | ---- | \n| GLM-Large | 50.27/30.94/38.44 | \n| MT5-Large(Reproduced) | 42.31/22.40/31.33 |\n\n## Get Started\n\u003c!--\n### Docker Image\nWe prepare two docker images based on CUDA 10.2 and CUDA 11.2. You can pull the pre-built images from Docker Hub and run with docker v19.03+\n  ```shell\n    docker run --gpus all --rm -it --ipc=host zxdu20/glm-cuda102\n  ```\n  or replace `glm-cuda102` with `glm-cuda112`.\n\n  You can also modify the image according to your requirements in [docker/cuda102.dockerfile](docker/cuda102.dockerfile) and build the image yourself\n  ```shell\n    docker build -f cuda102.dockerfile . -t glm-cuda102\n  ```\n--\u003e\n### Manual Installation\nPlease first install PyTorch \n`pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html  --no-cache-dir`\nand [apex](https://github.com/NVIDIA/apex).\n\nThen install other dependencies\n`pip3 install -r requirements.txt`\n\n\n## Usage\n\n### XTREME\n\n- Download the [XTREME](https://sites.research.google/xtreme/) data and check the experiment setup in \n  [scripts/ds_finetune_superglue.sh](scripts/ds_finetune_superglue.sh). Note that `DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH` \n  need to be changed to your local path. You may also change the `batch-size` and `nproc_per_node` according to your \n  available hardware.\n\n- For Classification tasks, we use the script `scripts/ds_finetune_superglue.sh`.Run the following script to train on the XNLI dataset.\n```shell\n  bash scripts/ds_finetune_superglue.sh \\\n     config_tasks/model_blocklm_multilingual_large.sh \\\n     config_tasks/task_xnli.sh\n```\n\n- For QA tasks, we use the script `scripts/ds_finetune_seq2seq.sh`. Run the following script to train on the MLQA dataset.\n```shell\n  bash scripts/ds_finetune_seq2seq.sh  \\\n    config_tasks/model_blocklm_multilingual_large.sh  \\\n    config_tasks/seq_mlqa.sh\n```\n### Cross-lingual Summary\n- Download the [NCLS dataset](https://github.com/ZNLP/NCLS-Corpora)\n- For Summarization tasks, we use the script `scripts/ds_finetune_summary.sh`. Run the following to train on NCLS English to Chinese. \n```shell\n  bash scripts/ds_finetune_summary.sh  \\\n    config_tasks/model_blocklm_multilingual_large.sh  \\\n    config_tasks/seq_ncls.sh\n```\n\n### Blank Filling(Interactive)\n- Change `CHECKPOINT_PATH` in  `scripts/generate_block.sh` to your local path and run the following script.\n```shell\n  bash scripts/generate_block.sh  \\\n    config_tasks/model_blocklm_multilingual_large.sh\n```\n\n### Model Parallelism\nIf your encounter the `CUDA out of memory` error, which means you GPU memory is limited, you can try the model parallelism to divide the parameters into multiple GPUs. Take the two-way model parallelism as an example. First run `change_mp.py` to divide the checkpoint:\n```shell\n  python3 change_mp.py path_to_the_checkpoint 2\n```\nThen update the checkpoint path in the model config file (such as [config_tasks/model_blocklm_multilingual_large.sh](config_tasks/model_blocklm_multilingual_large.sh)) and change `MP_SIZE` in the script (such as [scripts/ds_finetune_superglue.sh](scripts/ds_finetune_superglue.sh)) to `2`.\n\n## Pretrain\nRun the following script to pre-train the mGLM-Large model\n```shell\n  bash scripts/ds_pretrain_nvidia.sh config/ds_multi_blockta_large.sh\n```\n\nThe script [scripts/ds_pretrain_nvidia.sh](scripts/ds_pretrain_nvidia.sh) launches the training program with DeepSpeed. You should change `NUM_WORKERS` and `NUM_GPUS_PER_WORKER` to the number of workers and the number of gpus per worker. Also change `HOST_FILE_PATH` to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found [here](https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node).\n\nThe file [config/ds_multi_blockta_large.sh](config/ds_multi_blockta_large.sh) defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, `--train-data` can be multiple keywords defined in `NAMED_CORPORA` in [data_utils/corpora.py](data_utils/corpora.py). The hyperparameters of the optimizer are defined in the corresponding json file under `config`. The semantics of the json file can be found [here](https://www.deepspeed.ai/docs/config-json).\n\n## MT5 Reproduction \nThe code for reproducing experiments in MT5 is at `mt5/finetune_mt5.py`. We use a tool called [wandb](https://wandb.ai/site) to track our experiments. After signing up for a new account, use `wandb login --relogin` to login. You can also use `wandb offline` to turn off wandb synchronizing your experiment online.\n\nIf you only want to use one GPU to train, use\n```shell\n  cd mt5\n  python3 finetune_mt5.py scisummnet simple\n``` \nto train on the [scisummnet dataset](https://cs.stanford.edu/~myasu/projects/scisumm_net/). \n\nOur distributed training is automated with [Accelerate](https://huggingface.co/docs/accelerate/index). `accelerate config` sets up the configuration for distributed training. `accelerate test` runs a sanity check.\n```shell\n  cd mt5\n  accelerate launch finetune_mt5.py scisummnet simple\n``` \nruns the training on the scisummnet dataset.\n\n## Citation \nCitation for the GLM paper： \n```\n@inproceedings{du-etal-2022-glm,\n    title = \"{GLM}: General Language Model Pretraining with Autoregressive Blank Infilling\",\n    author = \"Du, Zhengxiao  and\n      Qian, Yujie  and\n      Liu, Xiao  and\n      Ding, Ming  and\n      Qiu, Jiezhong  and\n      Yang, Zhilin  and\n      Tang, Jie\",\n    booktitle = \"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = may,\n    year = \"2022\",\n    address = \"Dublin, Ireland\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.acl-long.26\",\n    doi = \"10.18653/v1/2022.acl-long.26\",\n    pages = \"320--335\",\n    abstract = \"There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1.25{\\mbox{$\\times$}} parameters of BERT Large , demonstrating its generalizability to different downstream tasks.\",\n}\n```\n\nCitation for the Multilingual GLM paper to be released\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fmultilingual-glm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthudm%2Fmultilingual-glm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fmultilingual-glm/lists"}