{"id":13908166,"url":"https://github.com/lopho/sd-video","last_synced_at":"2025-07-18T07:30:32.532Z","repository":{"id":148390623,"uuid":"616129398","full_name":"lopho/sd-video","owner":"lopho","description":"Text to Video","archived":false,"fork":false,"pushed_at":"2023-03-28T18:59:39.000Z","size":2425,"stargazers_count":26,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-25T17:45:55.524Z","etag":null,"topics":["latent-diffusion","video"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lopho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-19T17:36:50.000Z","updated_at":"2024-08-12T20:30:21.000Z","dependencies_parsed_at":"2023-05-20T00:00:13.520Z","dependency_job_id":null,"html_url":"https://github.com/lopho/sd-video","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/lopho/sd-video","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopho%2Fsd-video","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopho%2Fsd-video/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopho%2Fsd-video/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopho%2Fsd-video/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lopho","download_url":"https://codeload.github.com/lopho/sd-video/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopho%2Fsd-video/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265720333,"owners_count":23817210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["latent-diffusion","video"],"created_at":"2024-08-06T23:02:30.904Z","updated_at":"2025-07-18T07:30:32.109Z","avatar_url":"https://github.com/lopho.png","language":"Python","funding_links":[],"categories":["HarmonyOS"],"sub_categories":["Windows Manager"],"readme":"# sd-video\n\nText to Video\n\n\n## Example\n\n### Text 2 Video\n```py\nfrom sd_video import SDVideo, save_gif\nmodel = SDVideo('/path/to/model_and_config', 'cuda', dtype=torch.float16)\n# if installed, use xformers for a small performance boost\nmodel.enable_xformers(True)\nx = model('arnold schwarzenegger eating a giant cheeseburger')\nsave_gif(x, 'output.gif')\n```\n\n![](examples/arnold_burger.gif)\n\n### Video 2 Video\n```py\n  denoise_strength = 0.7\n  timesteps = 50\n  model = SDVideo('/path/to/model_and_config', 'cuda')\n  init_frames = load_sequence('path/to/image_sequence')\n  x = model(\n          'very wrinkly and old',\n          initial_frames = init_frames,\n          bar = True,\n          timesteps = timesteps,\n          t_start = round(timesteps * denoise_strength)\n  )\n  save_gif(x, 'output.gif')\n```\n\n![](examples/old_input.gif)\n![](examples/old.gif)\n\n\n## Sampling options\n```py\nmodel(\n  text = 'some text', # text conditioning\n  text_neg = 'other text' # negative text conditioning\n  guidance_scale = 9.0, # positive / negative conditioning ratio (cfg)\n  timesteps = 50, # sampling steps\n  image_size = (256, 256), # output image resolution (w,h)\n  num_frames = 16, # number of video frames to generate\n  eta = 0.0, # DDIM randomness\n  bar = False, # display TQDM progress bar for sampling process\n)\n```\n\n## Model options\n```py\nmodel = SDVideo(\n  model_path = 'path/to/model', # path to model and configuration.json\n  device = 'cuda', # device (string or torch.device)\n  dtype = torch.float32, # load model in precision (only float types, float32, float16, bfloat16)\n  amp = True # sample with automatic mixed preicision\n)\n```\n\n## Training\n```py\n  from torch.utils.data import DataLoader\n  from sd_video import SDVideo\n  from functools import partial\n  from trainer import SDVideoTrainer\n  from dataloader.gif import GifSet, gif_collate_fn\n  model = SDVideo('path/to/model')\n  # example dataset, expects folder with gifs + text files (0001.gif, 0001.txt)\n  dataset = GifSet('path/to/dataset')\n  # if you write your own dataset and collate_fn\n  # the trainer expects batches in the following format:\n  # { 'pixel_values': tensor with shape b f c h w,\n  #   'text': list[str] with len == b\n  # }\n  dataloader = DataLoader(\n      dataset,\n      batch_size = 1,\n      shuffle = True,\n      num_workers = 4,\n      collate_fn = partial(gif_collate_fn,\n              num_frames = 16,\n              image_size = (256,256),\n              dtype = torch.float32)\n      )\n  trainer = SDVideoTrainer(\n          model,\n          dataloader,\n          output_dir = 'output'\n  )\n  trainer.train(save_every = 1000, log_every = 10)\n```\nRead the code of `SDVideoTrainer`s `__init__` and `train` methods for all available training parameters.\n\n## Model weights\n- From Huggingface\n  - https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis\n  - last version released under Apache 2.0 (later are CC-BY-NC-ND-4.0): https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis/tree/6961f660ba8d22f98da33829c73c7da5d205518e\n- From Modelscope\n  - https://modelscope.cn/models/damo/text-to-video-synthesis/files (v1.0.4 as released under Apache 2.0, later versions are released under CC-BY-NC-ND-4.0)\n\n\n## Acknowledgements\n\nPartly based on the following works\n  - https://github.com/openai/guided-diffusion (licensed MIT)\n  - https://github.com/CompVis/stable-diffusion (licensed MIT)\n  - https://github.com/modelscope/modelscope/blob/master/modelscope/pipelines/multi_modal/text_to_video_synthesis_pipeline.py (licensed Apache 2.0 at the time of copy)\n  - https://github.com/modelscope/modelscope/tree/master/modelscope/models/multi_modal/video_synthesis (licensed Apache 2.0 at the time of copy)\n\nAll other code is released under the GNU Affero General Public License v3 (AGPLv3).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flopho%2Fsd-video","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flopho%2Fsd-video","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flopho%2Fsd-video/lists"}