{"id":18428970,"url":"https://github.com/codefuse-ai/collinear-constrained-attention","last_synced_at":"2025-04-07T17:32:16.219Z","repository":{"id":205020730,"uuid":"713172456","full_name":"codefuse-ai/Collinear-Constrained-Attention","owner":"codefuse-ai","description":null,"archived":true,"fork":false,"pushed_at":"2024-06-17T03:18:04.000Z","size":3289,"stargazers_count":62,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-05T20:45:15.666Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codefuse-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-02T01:37:01.000Z","updated_at":"2025-03-17T09:58:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"481c8e4f-b224-4c18-a470-6498a8c65410","html_url":"https://github.com/codefuse-ai/Collinear-Constrained-Attention","commit_stats":null,"previous_names":["codefuse-ai/collinear-constrained-attention"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FCollinear-Constrained-Attention","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FCollinear-Constrained-Attention/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FCollinear-Constrained-Attention/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FCollinear-Constrained-Attention/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codefuse-ai","download_url":"https://codeload.github.com/codefuse-ai/Collinear-Constrained-Attention/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247697868,"owners_count":20981262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T05:15:11.087Z","updated_at":"2025-04-07T17:32:15.584Z","avatar_url":"https://github.com/codefuse-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/codefuse-ai/Collinear-Constrained-Attention/blob/master/assets/logo.png\" width=\"540px\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n🤗 \u003ca href=\"https://huggingface.co/codefuse-ai/Collinear-Constrained-Attention\" target=\"_blank\"\u003eHugging Face(is coming)\u003c/a\u003e \n• \n🤖 \u003ca href=\"https://modelscope.cn/models/codefuse-ai/Collinear-Constrained-Attention/summary\" target=\"_blank\"\u003eModelScope(is coming)\u003c/a\u003e \n  • \n📄 \u003ca href=\"https://arxiv.org/abs/2309.08646\" target=\"_blank\"\u003ePaper\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![GitHub issues](https://img.shields.io/github/issues/codefuse-ai/Collinear-Constrained-Attention)](https://github.com/codefuse-ai/Collinear-Constrained-Attention/issues)\n[![GitHub Repo stars](https://img.shields.io/github/stars/codefuse-ai/Collinear-Constrained-Attention?style=social)](https://github.com/codefuse-ai/Collinear-Constrained-Attention)\n\n\u003c/div\u003e\n\n[comment]: \u003c\u003e ([\u003cimg src=\"https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg\" alt=\"Weights \u0026 Biases monitoring\" height=20\u003e]\u0026#40;https://wandb.ai/eleutherai/neox\u0026#41;)\n\nThis repository provides an implementation of [CoCA (Collinear Constrained Attention)](https://arxiv.org/abs/2309.08646). This implementation is based on 2 transformer models in [Hugging Face]().\n\n- [GPT-NeoX](https://github.com/huggingface/transformers/tree/main/src/transformers/models/gpt_neox) which is an [EleutherAI](https://www.eleuther.ai)'s library for training large-scale language models on GPUs.\n- [LLaMA](https://github.com/huggingface/transformers/tree/main/src/transformers/models/llama) from Meta AI team.\n\nWe just point out those modifications which made to implement CoCA here. For more information about model training and inference, we recommend [transformers](https://github.com/huggingface/transformers).\n\nFor practicality, we enhanced CoCA's computational and spatial efficiency with [opt_einsum](https://github.com/dgasmith/opt_einsum), view this repository for more information.\n\n![Model Structure](https://github.com/codefuse-ai/Collinear-Constrained-Attention/blob/master/assets/model.png \"Model Structure\")\n\n![PPL Performance](https://github.com/codefuse-ai/Collinear-Constrained-Attention/blob/master/assets/PPL.png \"PPL Performance\") ![Passkey Performance](https://github.com/codefuse-ai/Collinear-Constrained-Attention/blob/master/assets/passkey.png \"Passkey Performance\")\n\n[comment]: \u003c\u003e (\u003cimg src=\"https://github.com/codefuse-ai/Collinear-Constrained-Attention/blob/master/assets/PPL.png\" width=\"210px\"\u003e)\n\n## 🚀 Quick Start\n\n### 💻 Environment\nAtorch is an optimized torch version by Ant Group, it's not available for opensource community yet. It will be opensource in near future. Before that, you may use origin torch version instead.\n\n### 📂 Datasets\nYou can use raw data or tokenized data for training.\n\nWhen using raw data, please ensure the data format as:\n```json\n{\"content\" : \"It is a sentence for training.\"}\n```\nusing `.jsonl` for saving data.\n\nYou can also use tokenized data saving in `.bin` via [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) tokenizer.\n```bash\npython ./data/tokenization/generate_dataset.py\n```\nnotice to modify `input_dict`, `conver_type_list`, `output_name`, `seq_length` for your own dataset.\n\n### 🏋️‍♂️ Training\nYou can train a model from scratch as follows:\n```bash\nbash ./train/run_coca.sh 32 1 8 2\n```\n\n- first parameter means `per gpu batch size`\n- second parameter means `tensor parallel`(larger than 1 is not supported yet)\n- third parameter means `data parallel`, equals to the number of GPUs\n- last parameter means `train epochs`\n\nIf you want to load a pre-trained model, set `--pretrained_model_path $PRETRAINED_MODEL_PATH \\`.\n\n### 🧠 Inference\nCoCA can be loaded using the `transformers` functionality:\n\n```python\nfrom model.gpt_neox.modeling_gpt_neox import GPTNeoXForCausalLM, GPTNeoXConfig\nfrom transformers import AutoTokenizer\nfrom transformers import GenerationConfig\n\nconfig = GPTNeoXConfig.from_pretrained(checkpoint)\nconfig.is_decoder = True\n\n# If you want to inference out of training length, \n# CoCA is compatible with NTK-aware scaled RoPE and performs much more better than original attention structure\nrope_scaling= {\"type\": \"dynamic\", \"factor\": 4.0}\nconfig.rope_scaling = rope_scaling\n\nmodel = GPTNeoXForCausalLM.from_pretrained(checkpoint, \n                                           config=config, \n                                           device_map=\"auto\")\n\ntokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side=\"left\")\ntokenizer.add_special_tokens({'eos_token': \"\u003c|endoftext|\u003e\"})\ntokenizer.add_special_tokens({'pad_token': \"\u003c|pad|\u003e\"})\n```\n\n## 📝 Administrative Notes\n\n### 📚 Citing CoCA\n\nIf you have found the CoCA library helpful in your work, you can cite this repository as\n\n```bibtex\n@inproceedings{zhu2024coca,\n    title={CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending}, \n    author={Shiyi Zhu and Jing Ye and Wei Jiang and Siqiao Xue and Qi Zhang and Yifan Wu and Jianguo Li},\n    booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics},\n    month = august,\n    year = {2024},\n    publisher = {Association for Computational Linguistics},\n}\n```\n\n### 📜 Licensing\n\nThis repository hosts code of CoCA project. Copyright (c) 2023, Ant Group. Licensed under the Apache License:\n\n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n    \n        http://www.apache.org/licenses/LICENSE-2.0\n    \n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n\nThis repository is based off code written by EleutherAI that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by EleutherAI maintain a EleutherAI copyright header. When the EleutherAI code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.\n\nThis repository is based off code written by Meta AI that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by Meta AI maintain a Meta AI copyright header. When the Meta AI code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.\n\nThis repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are the exclusive copyright of EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.\n\nThis repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.\n\nFor full terms, see the `LICENSE` file. If you have any questions, comments, or concerns about licensing please email me at zhushiyi.zsy@antgroup.com.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Fcollinear-constrained-attention","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodefuse-ai%2Fcollinear-constrained-attention","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Fcollinear-constrained-attention/lists"}