{"id":30213772,"url":"https://github.com/labackdoor/rope-t5","last_synced_at":"2025-08-13T23:17:08.792Z","repository":{"id":303888001,"uuid":"1017022028","full_name":"LaBackDoor/RoPE-T5","owner":"LaBackDoor","description":"A from-scratch implementation of a T5 model modified with Rotary Position Embeddings (RoPE). This project includes the code for pre-training on the C4 dataset in streaming mode with Flash Attention 2.","archived":false,"fork":false,"pushed_at":"2025-07-09T23:43:22.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-10T09:25:33.406Z","etag":null,"topics":["c4-dataset","evaluation-benchmark","flash-attention","from-scratch","huggingface","language-model","llm","nlp","pre-training","pytorch","rope","rotary-position-embedding","sequence-to-sequence","span-corruption","t5"],"latest_commit_sha":null,"homepage":"https://www.labackdoor.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LaBackDoor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-09T22:49:38.000Z","updated_at":"2025-07-09T23:43:25.000Z","dependencies_parsed_at":"2025-07-10T09:25:37.142Z","dependency_job_id":"86207359-c7bb-4ce5-912b-7369e340020d","html_url":"https://github.com/LaBackDoor/RoPE-T5","commit_stats":null,"previous_names":["labackdoor/rope-t5"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/LaBackDoor/RoPE-T5","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaBackDoor%2FRoPE-T5","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaBackDoor%2FRoPE-T5/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaBackDoor%2FRoPE-T5/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaBackDoor%2FRoPE-T5/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LaBackDoor","download_url":"https://codeload.github.com/LaBackDoor/RoPE-T5/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaBackDoor%2FRoPE-T5/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270330914,"owners_count":24565890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c4-dataset","evaluation-benchmark","flash-attention","from-scratch","huggingface","language-model","llm","nlp","pre-training","pytorch","rope","rotary-position-embedding","sequence-to-sequence","span-corruption","t5"],"created_at":"2025-08-13T23:16:53.608Z","updated_at":"2025-08-13T23:17:08.782Z","avatar_url":"https://github.com/LaBackDoor.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RoPE-T5: A T5 Model with Rotary Position Embeddings\n\nThis repository provides a from-scratch implementation of a T5-style transformer model that replaces the standard relative position biases with **Rotary Position Embeddings (RoPE)**. The model is pre-trained on the C4 dataset using a span corruption objective.\n\nThis project serves as a foundational building block for more advanced models like **[RoPE-ByT5](https://github.com/LaBackDoor/RoPE-ByT5)** and is inspired by the work done in [melmoth/ru-rope-t5-small-instruct](https://huggingface.co/melmoth/ru-rope-t5-small-instruct).\n\n## ✨ Features\n\n* **T5 with Rotary Position Embeddings (RoPE)**: A from-scratch implementation of the T5 architecture that natively uses RoPE for positional information, removing the need for relative attention biases.\n* **Efficient Pre-training**: Utilizes the Hugging Face `Trainer` with an efficient, streaming-based data pipeline to process the large C4 dataset on the fly.\n* **Span Corruption**: Employs the canonical text-to-text denoising objective from the original T5 paper for robust pre-training.\n* **Performance Optimized**: Integrated with Flash Attention 2 for optimized training speed and reduced memory footprint.\n* **Comprehensive Evaluation**: Includes scripts to evaluate the pre-trained model on zero-shot and fine-tuning benchmarks like GLUE, SQuAD, and CNN/DailyMail.\n* **Modern Tooling**: Set up with `uv` for fast and reliable Python package management.\n\n## 🚀 Setup and Installation\n\nThis project uses `uv` for package management. Follow these steps to set up the environment.\n\n1.  **Create a Virtual Environment:**\n    First, create and activate a new virtual environment using `uv`.\n    ```bash\n    uv venv\n    source .venv/bin/activate\n    ```\n\n2.  **Install Core Build Dependencies:**\n    Install `torch` and `setuptools` first. This is a necessary step to prepare for building packages like `flash-attn` from source.\n    ```bash\n    uv add torch setuptools\n    ```\n\n3.  **Sync Project Dependencies:**\n    Use `uv sync` with the `--no-build-isolation` flag. This flag is crucial as it allows `flash-attn` to find the already-installed `torch` and build correctly.\n    ```bash\n    uv sync --no-build-isolation\n    ```\n\n## ▶️ How to Run\n\nMake sure you run all commands from the root directory of the project.\n\n1.  **Pre-training the Model**\n\n    To start pre-training the `RoPE-T5` model from scratch on the C4 dataset, run the following command:\n\n    ```bash\n    python scripts/run_pretraining.py\n    ```\n\n    The script will handle initializing the model, streaming the dataset, and saving checkpoints to the c4-rope-t5-from-scratch-stream directory in your project root. Training progress is logged to Weights \u0026 Biases.\n\n\n2.  **Evaluating the Model**\n\n    After pre-training is complete, you can evaluate your model on various downstream tasks using the evaluation script:\n\n    ```bash\n    python scripts/run_evaluation.py\n    ```\n\nThis script will:\n* Load your pre-trained model from the output directory.\n* Run a series of zero-shot evaluations on benchmarks like GLUE and SQuAD.\n* Save the results and intermediate files to the `evaluation/` directory.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabackdoor%2Frope-t5","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flabackdoor%2Frope-t5","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flabackdoor%2Frope-t5/lists"}