{"id":24109251,"url":"https://github.com/beomi/transformers-language-modeling","last_synced_at":"2025-08-03T21:41:45.845Z","repository":{"id":75378692,"uuid":"368020909","full_name":"Beomi/transformers-language-modeling","owner":"Beomi","description":"Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3","archived":false,"fork":false,"pushed_at":"2021-05-20T05:44:21.000Z","size":624,"stargazers_count":23,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-07T06:55:34.534Z","etag":null,"topics":["bert","deepspeed","language-model","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Beomi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-05-17T01:20:11.000Z","updated_at":"2025-01-14T11:56:11.000Z","dependencies_parsed_at":"2023-06-06T08:15:32.225Z","dependency_job_id":null,"html_url":"https://github.com/Beomi/transformers-language-modeling","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Beomi/transformers-language-modeling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2Ftransformers-language-modeling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2Ftransformers-language-modeling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2Ftransformers-language-modeling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2Ftransformers-language-modeling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Beomi","download_url":"https://codeload.github.com/Beomi/transformers-language-modeling/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2Ftransformers-language-modeling/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268618276,"owners_count":24279243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-03T02:00:12.545Z","response_time":2577,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","deepspeed","language-model","transformers"],"created_at":"2025-01-11T00:05:55.201Z","updated_at":"2025-08-03T21:41:45.821Z","avatar_url":"https://github.com/Beomi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.\n\n\n\u003c!---\nCopyright 2020 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n--\u003e\n\n## Language model training\n\nFine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2,\nALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tuned using a causal language modeling\n(CLM) loss while ALBERT, BERT, DistilBERT and RoBERTa are trained or fine-tuned using a masked language modeling (MLM)\nloss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those\nobjectives in our [model summary](https://huggingface.co/transformers/model_summary.html).\n\nThere are two sets of scripts provided. The first set leverages the Trainer API. The second set with `no_trainer` in the suffix uses a custom training loop and leverages the 🤗 Accelerate library . Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.\n\n**Note:** The old script `run_language_modeling.py` is still available [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py).\n\nThe following examples, will run on datasets hosted on our [hub](https://huggingface.co/datasets) or with your own\ntext files for training and validation. We give examples of both below.\n\n### GPT-2/GPT and causal language modeling\n\nThe following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before\nthe tokenization). The loss here is that of causal language modeling.\n\n```bash\npython run_clm.py \\\n    --model_name_or_path gpt2 \\\n    --dataset_name wikitext \\\n    --dataset_config_name wikitext-2-raw-v1 \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-clm\n```\n\nThis takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches\na score of ~20 perplexity once fine-tuned on the dataset.\n\nTo run on your own training and validation files, use the following command:\n\n```bash\npython run_clm.py \\\n    --model_name_or_path gpt2 \\\n    --train_file path_to_train_file \\\n    --validation_file path_to_validation_file \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-clm\n```\n\nThis uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_clm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:\n\n```bash\npython run_clm_no_trainer.py \\\n    --dataset_name wikitext \\\n    --dataset_config_name wikitext-2-raw-v1 \\\n    --model_name_or_path gpt2 \\\n    --output_dir /tmp/test-clm\n```\n\n### RoBERTa/BERT/DistilBERT and masked language modeling\n\nThe following example fine-tunes RoBERTa on WikiText-2. Here too, we're using the raw WikiText-2. The loss is different\nas BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their\npre-training: masked language modeling.\n\nIn accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may, therefore,\nconverge slightly slower (over-fitting takes more epochs).\n\n```bash\npython run_mlm.py \\\n    --model_name_or_path roberta-base \\\n    --dataset_name wikitext \\\n    --dataset_config_name wikitext-2-raw-v1 \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-mlm\n```\n\nTo run on your own training and validation files, use the following command:\n\n```bash\npython run_mlm.py \\\n    --model_name_or_path roberta-base \\\n    --train_file path_to_train_file \\\n    --validation_file path_to_validation_file \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-mlm\n```\n\nIf your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script\nconcatenates all texts and then splits them in blocks of the same length).\n\nThis uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_mlm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:\n\n```bash\npython run_mlm_no_trainer.py \\\n    --dataset_name wikitext \\\n    --dataset_config_name wikitext-2-raw-v1 \\\n    --model_name_or_path roberta-base \\\n    --output_dir /tmp/test-mlm\n```\n\n**Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make\nsure all your batches have the same length.\n\n### Whole word masking\n\nThis part was moved to `examples/research_projects/mlm_wwm`.\n\n### XLNet and permutation language modeling\n\nXLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method\nto learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input\nsequence factorization order.\n\nWe use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding\ncontext length for permutation language modeling.\n\nThe `--max_span_length` flag may also be used to limit the length of a span of masked tokens used\nfor permutation language modeling.\n\nHere is how to fine-tune XLNet on wikitext-2:\n\n```bash\npython run_plm.py \\\n    --model_name_or_path=xlnet-base-cased \\\n    --dataset_name wikitext \\\n    --dataset_config_name wikitext-2-raw-v1 \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-plm\n```\n\nTo fine-tune it on your own training and validation file, run:\n\n```bash\npython run_plm.py \\\n    --model_name_or_path=xlnet-base-cased \\\n    --train_file path_to_train_file \\\n    --validation_file path_to_validation_file \\\n    --do_train \\\n    --do_eval \\\n    --output_dir /tmp/test-plm\n```\n\nIf your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script\nconcatenates all texts and then splits them in blocks of the same length).\n\n**Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make\nsure all your batches have the same length.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeomi%2Ftransformers-language-modeling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeomi%2Ftransformers-language-modeling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeomi%2Ftransformers-language-modeling/lists"}