{"id":13478443,"url":"https://github.com/lucidrains/rotary-embedding-torch","last_synced_at":"2025-05-14T08:05:50.638Z","repository":{"id":39909852,"uuid":"381470350","full_name":"lucidrains/rotary-embedding-torch","owner":"lucidrains","description":"Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch","archived":false,"fork":false,"pushed_at":"2024-11-27T13:18:43.000Z","size":104,"stargazers_count":673,"open_issues_count":17,"forks_count":57,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-05-11T23:04:47.191Z","etag":null,"topics":["artificial-intelligence","deep-learning","positional-encoding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-29T19:05:39.000Z","updated_at":"2025-05-10T01:10:04.000Z","dependencies_parsed_at":"2023-11-22T03:27:00.260Z","dependency_job_id":"0d620de2-e579-4d3a-a2ef-922582db51bc","html_url":"https://github.com/lucidrains/rotary-embedding-torch","commit_stats":{"total_commits":66,"total_committers":4,"mean_commits":16.5,"dds":0.06060606060606055,"last_synced_commit":"4da3d07cb3685dbb5ec7cd1cfe53b9693bc735ec"},"previous_names":[],"tags_count":50,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frotary-embedding-torch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frotary-embedding-torch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frotary-embedding-torch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Frotary-embedding-torch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/rotary-embedding-torch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101615,"owners_count":22014909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","positional-encoding"],"created_at":"2024-07-31T16:01:57.079Z","updated_at":"2025-05-14T08:05:45.629Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cimg src=\"./rope.png\" width=\"450px\"\u003e\u003c/img\u003e\n\n## Rotary Embeddings - Pytorch\n\nA standalone library for adding \u003ca href=\"https://arxiv.org/abs/2104.09864\"\u003erotary embeddings\u003c/a\u003e to transformers in Pytorch, following its success as \u003ca href=\"https://blog.eleuther.ai/rotary-embeddings/\"\u003erelative positional encoding\u003c/a\u003e. Specifically it will make rotating information into any axis of a tensor easy and efficient, whether they be fixed positional or learned. This library will give you state of the art results for positional embedding, at little costs.\n\nMy gut also tells me there is something \u003ca href=\"https://www.nature.com/articles/s41593-021-00821-9\"\u003emore\u003c/a\u003e to rotations that can be exploited in artificial neural networks.\n\n## Install\n\n```bash\n$ pip install rotary-embedding-torch\n```\n\n## Usage\n\n```python\nimport torch\nfrom rotary_embedding_torch import RotaryEmbedding\n\n# instantiate the positional embedding in your transformer and pass to all your attention layers\n\nrotary_emb = RotaryEmbedding(dim = 32)\n\n# mock queries and keys - dimensions should end with (seq_len, feature dimension), and any number of preceding dimensions (batch, heads, etc)\n\nq = torch.randn(1, 8, 1024, 64) # queries - (batch, heads, seq len, dimension of head)\nk = torch.randn(1, 8, 1024, 64) # keys\n\n# apply the rotations to your queries and keys after the heads have been split out, but prior to the dot product and subsequent softmax (attention)\n\nq = rotary_emb.rotate_queries_or_keys(q)\nk = rotary_emb.rotate_queries_or_keys(k)\n\n# then do your attention with your queries (q) and keys (k) as usual\n```\n\nIf you do all the steps above correctly, you should see a dramatic improvement during training\n\n## Inference Key-Value Cache\n\nWhen dealing with key / value caches at inference, the query position needs to be offset with the `key_value_seq_length - query_seq_length`\n\nTo make this easy, use the `rotate_queries_with_cached_keys` method\n\n```python\nq = torch.randn(1, 8, 1, 64)     # only one query at a time\nk = torch.randn(1, 8, 1024, 64)  # key / values with cache concatted\n\nq, k = rotary_emb.rotate_queries_with_cached_keys(q, k)\n```\n\nYou can also do this manually like so\n\n```python\nq = rotary_emb.rotate_queries_or_keys(q, offset = k.shape[-2] - q.shape[-2])\n```\n\n## Axial Rotary Embeddings\n\nFor easy use of n-dimensional axial relative positional embedding, ie. video transformers\n\n```python\nimport torch\n\nfrom rotary_embedding_torch import (\n    RotaryEmbedding,\n    apply_rotary_emb\n)\n\npos_emb = RotaryEmbedding(\n    dim = 16,\n    freqs_for = 'pixel',\n    max_freq = 256\n)\n\n# queries and keys for frequencies to be rotated into\n# say for a video with 8 frames, and rectangular image (feature dimension comes last)\n\nq = torch.randn(1, 8, 64, 32, 64)\nk = torch.randn(1, 8, 64, 32, 64)\n\n# get axial frequencies - (8, 64, 32, 16 * 3 = 48)\n# will automatically do partial rotary\n\nfreqs = pos_emb.get_axial_freqs(8, 64, 32)\n\n# rotate in frequencies\n\nq = apply_rotary_emb(freqs, q)\nk = apply_rotary_emb(freqs, k)\n```\n\n## Length Extrapolatable Rotary Embeddings\n\nIn \u003ca href=\"https://arxiv.org/abs/2212.10554v1\"\u003ethis paper\u003c/a\u003e, they were able to fix length extrapolation issue with rotary embeddings by giving it a decay similar to ALiBi. They named this technique XPos, and you can use it by setting `use_xpos = True` on initialization.\n\nThis can only be used for autoregressive transformers\n\n```python\nimport torch\nfrom rotary_embedding_torch import RotaryEmbedding\n\n# instantiate the positional embedding in your transformer and pass to all your attention layers\n\nrotary_emb = RotaryEmbedding(\n    dim = 32,\n    use_xpos = True   # set this to True to make rotary embeddings extrapolate better to sequence lengths greater than the one used at training time\n)\n\n# mock queries and keys - dimensions should end with (seq_len, feature dimension), and any number of preceding dimensions (batch, heads, etc)\n\nq = torch.randn(1, 8, 1024, 64) # queries - (batch, heads, seq len, dimension of head)\nk = torch.randn(1, 8, 1024, 64) # keys\n\n# apply the rotations to your queries and keys after the heads have been split out, but prior to the dot product and subsequent softmax (attention)\n\n# instead of using `rotate_queries_or_keys`, you will use `rotate_queries_and_keys`, the rest is taken care of\n\nq, k = rotary_emb.rotate_queries_and_keys(q, k)\n```\n\n## Interpolating Sequence Positions\n\nThis MetaAI \u003ca href=\"https://arxiv.org/abs//2306.15595\"\u003epaper\u003c/a\u003e proposes simply fine-tuning on interpolations of the sequence positions for extending to longer context length for pretrained models. They show this performs much better than simply fine-tuning on the same sequence positions but extended further.\n\nYou can use this by setting the `interpolate_factor` on initialization to a value greater than `1.` (ex. if pretrained model was trained on 2048, setting `interpolate_factor = 2.` would allow fine-tuning to `2048 x 2. = 4096`)\n\nUpdate: someone in the community has reported that it does not work well. please email me if you see either a positive or negative result\n\n```python\nimport torch\nfrom rotary_embedding_torch import RotaryEmbedding\n\nrotary_emb = RotaryEmbedding(\n    dim = 32,\n    interpolate_factor = 2.    # add this line of code to pretrained model and fine-tune for ~1000 steps, as shown in paper\n)\n```\n\n## Citations\n\n```bibtex\n@misc{su2021roformer,\n    title   = {RoFormer: Enhanced Transformer with Rotary Position Embedding}, \n    author  = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},\n    year    = {2021},\n    eprint  = {2104.09864},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.CL}\n}\n```\n\n```bibtex\n@inproceedings{Sun2022ALT,\n    title     = {A Length-Extrapolatable Transformer},\n    author    = {Yutao Sun and Li Dong and Barun Patra and Shuming Ma and Shaohan Huang and Alon Benhaim and Vishrav Chaudhary and Xia Song and Furu Wei},\n    year      = {2022}\n}\n```\n\n```bibtex\n@inproceedings{Chen2023ExtendingCW,\n    title   = {Extending Context Window of Large Language Models via Positional Interpolation},\n    author  = {Shouyuan Chen and Sherman Wong and Liangjian Chen and Yuandong Tian},\n    year    = {2023}\n}\n```\n\n```bibtex\n@misc{bloc97-2023\n    title   = {NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation.},\n    author  = {/u/bloc97},\n    url     = {https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Frotary-embedding-torch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Frotary-embedding-torch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Frotary-embedding-torch/lists"}