{"id":20298074,"url":"https://github.com/shaochenze/PatchTrain","last_synced_at":"2025-05-07T20:34:10.665Z","repository":{"id":248950674,"uuid":"830113989","full_name":"shaochenze/PatchTrain","owner":"shaochenze","description":"Code for paper \"Patch-Level Training for Large Language Models\"","archived":false,"fork":false,"pushed_at":"2024-11-15T16:25:47.000Z","size":1566,"stargazers_count":71,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-15T17:27:52.559Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shaochenze.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-17T16:03:23.000Z","updated_at":"2024-11-15T16:25:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"f75542c8-ab3c-4936-9a79-89091835420c","html_url":"https://github.com/shaochenze/PatchTrain","commit_stats":null,"previous_names":["shaochenze/patchtrain"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaochenze%2FPatchTrain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaochenze%2FPatchTrain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaochenze%2FPatchTrain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaochenze%2FPatchTrain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shaochenze","download_url":"https://codeload.github.com/shaochenze/PatchTrain/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252953716,"owners_count":21830890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T16:02:05.040Z","updated_at":"2025-05-07T20:34:10.639Z","avatar_url":"https://github.com/shaochenze.png","language":"Python","readme":"# Patch-Level Training for Large Language Models\n\nThis repo contains the code for our paper [Patch-Level Training for Large Language Models](https://arxiv.org/pdf/2407.12665).\n\nPatch-level training is an efficient training approach for large language models (LLMs), in which models read training data in patches and learn to predict the next patch. Following this, a small amount of training data is used to adjust the model to the token-level. This approach can achieve an even lower loss in comparison with training from scratch, while reducing training costs by half.\n\n## Usage\n\nThe implementation of patch-level training is quite straightforward, with only 10 lines of core code – feel free to directly incorporate it into your LLM training code. Our implementation is based on the LLaMA model [modeling_llama.py](https://github.com/shaochenze/PatchTrain/blob/main/modeling_llama.py), with the primary changes as follows:\n\n### Model Input\n```python\n884    num_patches = seq_length // self.patch_size\n885    inputs_embeds = inputs_embeds.view(batch_size, num_patches, self.patch_size, -1).mean(2)\n886    position_ids = position_ids[:, :num_patches]\n```\n\n### Loss Calculation\n```python\n1058    shift_logits = logits[..., :-1, :].reshape(-1, self.config.vocab_size)\n1059    shift_labels = labels[..., self.patch_size:].reshape(-1, self.patch_size)\n1060    loss = 0\n1061    log_probs = F.log_softmax(shift_logits, dim=1)\n1062    for i in range(self.patch_size):\n1063        loss = loss + F.nll_loss(log_probs, shift_labels[:, i])\n1064    loss = loss / self.patch_size\n```\n\n## Example\n\nWe provide an example here for quick replication. Required environment: transformers\u003e=4.34.\n\nDue to copyright issues, the pile dataset is not publicly available now. An alternative is [pile-uncopyrighted](https://huggingface.co/datasets/monology/pile-uncopyrighted), which contains approximately 25% fewer tokens than pile. First, run the following script to download and pre-process the pile-uncopyrighted dataset, getting ~270B tokens:\n\n```bash\nbash get_data.sh\n```\n\nNext, train a Transformer with 370M parameters on the pile-uncopyrighted dataset. Run the following script for token-level training:\n\n```bash\nbash run_token.sh\n```\n\nRun the following script to perform patch-level training with a patch size of K=4 on 180B tokens, followed by token-level training on 90B tokens.\n\n```bash\nbash run_patch.sh\n```\n\nIn practice, the acceleration rate of patch-level training is lower than the patch size K. This is primarily due to the time consumed in data loading and processing, especially the tokenization takes a lot of time. The acceleration rate will be much closer to K if the streaming mode is disabled.\n\n### Loss Curves\n\nBelow are the loss curves obtained from our training on the Pile dataset (360B tokens), provided for reference.\n![loss](./loss.png)\n\n## Citation\nIf you find the resources in this repository useful, please cite as:\n```\n@article{shao2024patch,\n  title={Patch-Level Training for Large Language Models},\n  author={Shao, Chenze and Meng, Fandong and Zhou, Jie},\n  journal={arXiv preprint arXiv:2407.12665},\n  year={2024}\n}\n```\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaochenze%2FPatchTrain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshaochenze%2FPatchTrain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaochenze%2FPatchTrain/lists"}