{"id":19845965,"url":"https://github.com/vchitect/optix","last_synced_at":"2025-05-01T21:30:49.982Z","repository":{"id":219261691,"uuid":"748569460","full_name":"Vchitect/Optix","owner":"Vchitect","description":"Memory Efficient Training Framework for Large Video Generation Model","archived":false,"fork":false,"pushed_at":"2024-04-12T07:55:08.000Z","size":100,"stargazers_count":17,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-04-12T15:14:01.874Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Vchitect.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-26T09:21:55.000Z","updated_at":"2024-04-15T03:29:33.688Z","dependencies_parsed_at":"2024-01-26T12:28:48.173Z","dependency_job_id":"55438b1b-f8a2-48e5-a84e-94b610a13814","html_url":"https://github.com/Vchitect/Optix","commit_stats":null,"previous_names":["vchitect/optix"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FOptix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FOptix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FOptix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FOptix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Vchitect","download_url":"https://codeload.github.com/Vchitect/Optix/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224278457,"owners_count":17285080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T13:09:53.244Z","updated_at":"2024-11-12T13:09:53.425Z","avatar_url":"https://github.com/Vchitect.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optix\n\nOptix: Memory Efficient Training Framework for Large Video Generation Model\n\n**Update**:\n- support training [LATTE](https://github.com/Vchitect/Latte) with 320 frames of 512*512 video on A100 without sequence parallel; See [Latte training example](./example/train_latte_t2v.py)\n\n\n**Features**:\n- 4x the batchsize when training with high resolution images\n- average 1.2x the training throughput\n- Optix remains effective in DiT model training!\n\nResults of training stable-diffusion models：\n\n![max batchsize](./doc/imgs/sdto_bs.png)\n![acc ratio](./doc/imgs/sdto_acc.png)\n\n\n*Basline config*：tf32, grad checkpointing\n\n*Tested on*：A100 80GB; Pytorch2.1.2+cu118\n\n\n## Getting started\n\n### install optix\n\n`python setup.py develop`\n\n### install dependency\n\nRefer to [requirements](./requiresments.txt)\n\n## API Usage\n\n```py\nimport optix\n\n\n# optimize model(fusedops, ddp, etc), setup optimizer, and create a ema\nmodel, vae, opt, ema = optix.compile(model, vae, learning_rate=1e-5, weight_decay=1e-5, use_ema=True)\n\n# or do not create ema:\nmodel, vae, opt, _ = optix.compile(model, vae, learning_rate=1e-5, weight_decay=1e-5)\n\n\n# use `sliced_vae` to replace the original vae.encode codes:\n# with torch.no_grad():\n#     x = vae.encode(x)\n#     if not args.use_video:\n#         x = x.latent_dist.sample().mul_(vae.config.scaling_factor)\nmodel_input = optix.sliced_vae(vae, model_input, use_autocast=True, nhwc=True)\n\n```\n\nKeyword arguments for `optix.compile` and default value：\n```py\n{\n    'use_ema': False,                   # create ema\n    'compile_vae': True,                # [PERF] for torch\u003e2.0, recommended to use torch.compile\n    'ddp': True,                        # automatically create a ddp module over unet\n    'dp_group': None,                   # ddp communication group, default is None\n    'gradient_checkpointing': True,     # [PERF] grad_ckpt is ON by default; for small batchsize this can be turned off for speedup\n    'xformer': True,                    # [PERF] use xformer can speedup a little bit\n    'fusedln': True,                    # [PERF] use fusedln can speedup\n    'compile_unet': False,              # [PERF] this function is not stable so OFF by default\n    'vae_channels_last': True,          # [PERF] use channels_last format for vae\n    'optim': 'adamw',                   # the optimizer type\n    'learning_rate': 1e-5,              # optimizer params\n    'weight_decay': 0,                  # optimizer params\n    'hybrid_zero': True,                # [PERF] for multi node training, hybrid zero can be faster\n}\n```\nThese `Keyword arguments` can be directly passed to `optix.compile` like:\n```py\nmodel, vae, opt, _ = optix.compile(model, vae, learning_rate=1e-5, weight_decay=1e-5,\n                                   use_ema=False, compile_vae=False, optim='sgd',\n                                   xformer=False)\n\n```\n\n\n## Examples\n\nStable Diffusion: [train_sd_unet.py](./example/train_sd_unet.py)\n\nDiT:[train_dit.py](./example/train_dit.py)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Foptix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvchitect%2Foptix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Foptix/lists"}