{"id":14081280,"url":"https://github.com/volcengine/veScale","last_synced_at":"2025-07-30T19:32:36.993Z","repository":{"id":224602362,"uuid":"763697768","full_name":"volcengine/veScale","owner":"volcengine","description":"A PyTorch Native LLM Training Framework","archived":false,"fork":false,"pushed_at":"2024-08-25T22:43:22.000Z","size":2646,"stargazers_count":674,"open_issues_count":4,"forks_count":34,"subscribers_count":34,"default_branch":"main","last_synced_at":"2024-12-01T14:08:59.881Z","etag":null,"topics":["llm-training","pytorch"],"latest_commit_sha":null,"homepage":"http://vescale.xyz","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/volcengine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-26T19:01:27.000Z","updated_at":"2024-11-30T06:55:58.000Z","dependencies_parsed_at":"2024-04-02T00:27:48.587Z","dependency_job_id":"30cd00c1-92cb-41b0-bf52-7e63aa142058","html_url":"https://github.com/volcengine/veScale","commit_stats":null,"previous_names":["volcengine/vescale"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveScale","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveScale/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveScale/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveScale/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/volcengine","download_url":"https://codeload.github.com/volcengine/veScale/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227574217,"owners_count":17788147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm-training","pytorch"],"created_at":"2024-08-13T13:00:38.395Z","updated_at":"2025-07-30T19:32:36.969Z","avatar_url":"https://github.com/volcengine.png","language":"Python","funding_links":[],"categories":["Computation and Communication Optimisation","A01_文本生成_文本对话","Python","7. Training \u0026 Fine-tuning Ecosystem","Open Source Projects"],"sub_categories":["大语言对话模型及数据"],"readme":"# Breaking Changes Coming Soon ...\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./docs/pictures/icon.png\" width=\"150\"/\u003e\n\u003c/div\u003e\n\n# A PyTorch Native LLM Training Framework\n\n_**An Industrial-Level Framework for Easy-of-Use**_\n\n- 🔥 **PyTorch Native**: veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.\n\n- 🛡 **Zero Model Code Change**: veScale decouples distributed system design from model architecture, requiring near-zero or zero modification on the model code of users.\n\n- 🚀 **Single Device Abstraction**:  veScale provides single-device semantics to users, automatically distributing and orchestrating model execution in a cluster of devices. \n\n- 🎯 **Automatic Parallelism Planning**:  veScale parallelizes model execution with a synergy of strategies (tensor, sequence, data, ZeRO, pipeline parallelism) under semi- or full-automation [coming soon].\n\n- ⚡ **Eager \u0026 Compile Mode**: veScale supports not only Eager-mode automation for parallel training and inference but also Compile-mode for ultimate performance [coming soon].\n\n- 📀 **Automatic Checkpoint Resharding**: veScale manages distributed checkpoints automatically with online resharding across different cluster sizes and different parallelism strategies. \n\n## Latest News\n\n- [2024-7-25] veScale's [pipeline parallelism](https://github.com/volcengine/veScale/blob/main/vescale/pipe/README.md) open sourced with API, graph parser, stage abstraction, schedules and execution runtime along with [nD distributed timeline](https://github.com/volcengine/veScale/blob/main/vescale/ndtimeline/README.md).\n\n- [2024-5-31] veScale's [fast checkpointing system](https://github.com/volcengine/veScale/blob/main/vescale/checkpoint/README.md) open sourced with automatic checkpoint resharding, caching, load-balancing, fast copying, deduplicating, and asynchronous io.\n\n- [2024-5-21] veScale's examples ([Mixtral](https://github.com/volcengine/veScale/tree/main/examples/mixtral_4D_training), [LLama2](https://github.com/volcengine/veScale/tree/main/examples/llama2_4D_finetune), and [nanoGPT](https://github.com/volcengine/veScale/tree/main/examples/nanogpt_4D_finetune)) open sourced with bit-wise correctness of training loss curves.\n\n- [2024-5-13] The debut of veScale in MLSys 2024 as a [poster](https://volcengine.github.io/veScaleWeb/blog/mlsys2024.html).\n\n- [2024-4-16] Our [internal LLM training system](https://volcengine.github.io/veScaleWeb/blog/megascale.html) presented in NSDI 2024.\n\n## Coming Soon\n\n_**veScale**_ is still in its early phase. We are refactoring our internal LLM training system components to meet open source standard. The tentative timeline is as follows:\n\n- High-level [nD parallel api](https://github.com/volcengine/veScale/issues/39) for extreme ease of use\n\n- Power-user plan api for easy customization of nD parallel training\n\n- End-to-end vescale/examples with 5D parallel training (TP, SP, DP, ZeRO, PP)\n\n## Table of Content ([web view](https://volcengine.github.io/veScaleWeb/))\n\n**[Introduction](./docs/texts/introduction.md)**\n\n**[Quick Start](./docs/texts/quick-start.md)**\n\n**[DTensor](./vescale/dtensor/README.md)**\n\n**Parallel**\n  * [Overview](./docs/texts/parallel_overview.md)\n  * [Tensor Parallel \u0026 Sequence Parallel](./vescale/dmodule/README.md)\n  * [Data Parallel](./vescale/ddp/README.md)\n  * [Optimizer Parallel](./vescale/optim/README.md)\n  * [Pipeline Parallel](./vescale/pipe/README.md)\n  * [nD Device Mesh](./vescale/devicemesh_api/README.md)\n\n**Plan**\n  * [Auto TP \u0026 SP Plan](./vescale/dmp/README.md)\n\n**[Checkpoint](./vescale/checkpoint/README.md)**\n\n## [We Are Hiring!](https://volcengine.github.io/veScaleWeb/misc/join-us.html) ##\n\n## [License](./LICENSE)\n\nThe veScale Project is under the Apache License v2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcengine%2FveScale","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvolcengine%2FveScale","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcengine%2FveScale/lists"}