{"id":19156208,"url":"https://github.com/kyegomez/gen1","last_synced_at":"2025-05-07T07:35:21.650Z","repository":{"id":191284559,"uuid":"684334470","full_name":"kyegomez/Gen1","owner":"kyegomez","description":"My Implementation of \" Structure and Content-Guided Video Synthesis with Diffusion Models\" by RunwayML","archived":false,"fork":false,"pushed_at":"2024-01-16T17:05:57.000Z","size":312,"stargazers_count":27,"open_issues_count":1,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-05T08:05:28.963Z","etag":null,"topics":["diffusion","opensource","opensource-projects","opensourceforgood","stable","stable-diffusion-webui","texttoimage","texttovideo"],"latest_commit_sha":null,"homepage":"https://discord.gg/qUtxnK2NMf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyegomez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-08-28T23:42:21.000Z","updated_at":"2025-03-26T15:27:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"a5584941-2ef5-4d6d-ae18-eba4f6a77aaa","html_url":"https://github.com/kyegomez/Gen1","commit_stats":{"total_commits":45,"total_committers":2,"mean_commits":22.5,"dds":0.06666666666666665,"last_synced_commit":"796e27d35bd679c78e94388e08a3f1180238c784"},"previous_names":["kyegomez/gen1"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FGen1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FGen1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FGen1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FGen1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyegomez","download_url":"https://codeload.github.com/kyegomez/Gen1/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252834233,"owners_count":21811339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","opensource","opensource-projects","opensourceforgood","stable","stable-diffusion-webui","texttoimage","texttovideo"],"created_at":"2024-11-09T08:33:36.267Z","updated_at":"2025-05-07T07:35:21.623Z","avatar_url":"https://github.com/kyegomez.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n\n# Gen1\nMy Implementation of \" Structure and Content-Guided Video Synthesis with Diffusion Models\" by RunwayML. \"Input videos x are encoded to z0 with a fixed encoder E and diffused to zt. We extract a\nstructure representation s by encoding depth maps obtained with MiDaS, and a content representation c by encoding one of the frames\nwith CLIP. The model then learns to reverse the diffusion process in the latent space, with the help of s, which gets concatenated to zt, as\nwell as c, which is provided via cross-attention blocks. During inference (right), the structure s of an input video is provided in the same\nmanner. To specify content via text, we convert CLIP text embeddings to image embeddings via a prior.\"\n\n\n\n# Install\n`pip3 install gen1`\n\n# Usage\n```python\nimport torch\nfrom gen1.model import Gen1\n\n# Create an instance of the Gen1 model\nmodel = Gen1()\n\n# Generate random input images and video tensors\nimages = torch.randn(1, 3, 128, 128)\nvideo = torch.randn(1, 3, 16, 128, 128)\n\n# Pass the input images and video through the model's forward method\nrun_out = model.forward(images, video)\n\n```\n\n## Datasets\nHere is a summary table of the datasets used in the Structure and Content-Guided Video Synthesis with Diffusion Models paper:\n\n| Dataset | Type | Size | Domain | Description | Source |\n|-|-|-|-|-|-|\n| Internal dataset | Images | 240M | General | Uncaptioned images | Private |  \n| Custom video dataset | Videos | 6.4M clips | General | Uncaptioned short video clips | Private |\n| DAVIS | Videos | - | General | Video object segmentation | [Link](https://davischallenge.org/) |\n| Stock footage | Videos | - | General | Diverse video clips | - |\n\n\n\n## Citation\n```\n@misc{2302.03011,\nAuthor = {Patrick Esser and Johnathan Chiu and Parmida Atighehchian and Jonathan Granskog and Anastasis Germanidis},\nTitle = {Structure and Content-Guided Video Synthesis with Diffusion Models},\nYear = {2023},\nEprint = {arXiv:2302.03011},\n```\n\n\n# Todo\n- [ ] Add training script\n- [ ] Add in conditional text paramater to pass in text, not just images and or other videos","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fgen1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyegomez%2Fgen1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fgen1/lists"}