{"id":13993925,"url":"https://github.com/jquesnelle/yarn","last_synced_at":"2025-04-08T13:03:25.000Z","repository":{"id":177027575,"uuid":"658780434","full_name":"jquesnelle/yarn","owner":"jquesnelle","description":"YaRN: Efficient Context Window Extension of Large Language Models","archived":false,"fork":false,"pushed_at":"2024-04-17T18:29:36.000Z","size":1517,"stargazers_count":1455,"open_issues_count":37,"forks_count":120,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-04-01T12:01:41.250Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jquesnelle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-26T13:27:56.000Z","updated_at":"2025-03-28T09:22:50.000Z","dependencies_parsed_at":"2023-11-20T05:24:38.664Z","dependency_job_id":"a00c6ba9-7ab4-4408-ba30-590ec268e7aa","html_url":"https://github.com/jquesnelle/yarn","commit_stats":null,"previous_names":["jquesnelle/scaled-rope","jquesnelle/yarn"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jquesnelle%2Fyarn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jquesnelle%2Fyarn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jquesnelle%2Fyarn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jquesnelle%2Fyarn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jquesnelle","download_url":"https://codeload.github.com/jquesnelle/yarn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247847601,"owners_count":21006099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-09T14:02:37.600Z","updated_at":"2025-04-08T13:03:24.957Z","avatar_url":"https://github.com/jquesnelle.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","4. Context Optimization"],"sub_categories":["大语言对话模型及数据","Rust"],"readme":"# YaRN\n\nThis repo contains the code and data for the YaRN context window extension method.\n\n## Paper\n\nPaper (ICLR 2024): [YaRN: Efficient Context Window Extension of Large Language Models](https://openreview.net/forum?id=wHBfxhZu1u)  \nOld Preprint [(arXiv)](https://arxiv.org/abs/2309.00071)\n\n## Models\n\n### LLaMA\n\nWe publish variants of [Llama 2](https://about.fb.com/news/2023/07/llama-2/) fine-tuned with YaRN at 32K, 64K and 128K context window length.\nThey are available under the Llama 2 license on 🤗 Hugging Face.\n\n| Size | Context | Link   |\n| ---: | ------: | :----- |\n|   7B |     64K | [NousResearch/Yarn-Llama-2-7b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-7b-64k)     |\n|   7B |    128K | [NousResearch/Yarn-Llama-2-7b-128k](https://huggingface.co/NousResearch/Yarn-Llama-2-7b-128k)   |\n|  13B |     64K | [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k)   |\n|  13B |    128K | [NousResearch/Yarn-Llama-2-13b-128k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-128k) |\n|  70B |     32K | [NousResearch/Yarn-Llama-2-70b-32k](https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k)   |\n\nIn addition, we also publish 8K context window versions of Llama 2 7B fine-tuned with [NTK-aware](https://huggingface.co/emozilla/NTK-Llama-2-7b-8k) and [YaRN](https://huggingface.co/emozilla/Yarn-Llama-2-7b-8k) (Table 1 in the conference paper).\n\n### Mistral\n\nWith the release of v2 of our paper we are also publishing 64K and 128K variants of [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).\n\n| Size | Context | Link   |\n| ---: | ------: | :----- |\n|   7B |     64K | [NousResearch/Yarn-Mistral-7b-64k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-64k)     |\n|   7B |    128K | [NousResearch/Yarn-Mistral-7b-128k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)   |\n\n### SOLAR\n\nThe [SOLAR 10.7B v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) model utilizes [depth-up scaling](https://arxiv.org/abs/2312.15166) to add layers to [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), which may potentially improve long context performance on a per-parameter basis.\nWe publish 32K and 64K variants.\n\n|    Size | Context | Link   |\n| ------: | ------: | :----- |\n|   10.7B |     32K | [NousResearch/Yarn-Solar-10b-32k](https://huggingface.co/NousResearch/Yarn-Solar-10b-32k)   |\n|   10.7B |     64K | [NousResearch/Yarn-Solar-10b-64k](https://huggingface.co/NousResearch/Yarn-Solar-10b-64k)   |\n\n## Reproduction\n\nWe strongly believe in open science, and thus publish all code and data to reproduce the results in our paper.\nTo reproduce, clone the repository and perform a local installation.\n\n```python\ngit clone https://github.com/jquesnelle/yarn\ncd yarn\npip install -e .\n```\n\n### Training\n\nTo train the models, run `accelerate config` and enable DeepSpeed acceleration. `deepspeed/zero3.json` was the configuration file used for training.\n\n```sh\n# ./train.sh\n```\n\nThe tokenized training data is available on [🤗Hugging Face](https://huggingface.co/datasets/emozilla/pg_books-tokenized-bos-eos-chunked-65536) and was derived from the [pg19](https://huggingface.co/datasets/emozilla/pg19) dataset.\nFor the Mistral models, a mix of the pretrain and fine-tune splits of [Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections) was used and the tokenized dataset is also available on [🤗Hugging Face](https://huggingface.co/datasets/emozilla/yarn-train-tokenized-16k-mistral).\n\n### Evaluation\n\nTo reproduce the evaluations, install [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with `pip install git+https://github.com/EleutherAI/lm-evaluation-harness` and then run the two provided scripts.\n\n```sh\n# ./eval.sh\n# ./eval-harness.sh\n```\n\n### Citation\n\n```\n@inproceedings{\n      peng2024yarn,\n      title={Ya{RN}: Efficient Context Window Extension of Large Language Models},\n      author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},\n      booktitle={The Twelfth International Conference on Learning Representations},\n      year={2024},\n      url={https://openreview.net/forum?id=wHBfxhZu1u}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjquesnelle%2Fyarn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjquesnelle%2Fyarn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjquesnelle%2Fyarn/lists"}