{"id":19220859,"url":"https://github.com/omerbt/text2live","last_synced_at":"2025-04-13T00:48:31.263Z","repository":{"id":49950152,"uuid":"518062112","full_name":"omerbt/Text2LIVE","owner":"omerbt","description":"Official Pytorch Implementation for \"Text2LIVE: Text-Driven Layered Image and Video Editing\" (ECCV 2022 Oral)","archived":false,"fork":false,"pushed_at":"2023-03-09T15:26:39.000Z","size":1636,"stargazers_count":888,"open_issues_count":22,"forks_count":79,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-04-13T00:48:26.402Z","etag":null,"topics":["clip","eccv2022","generative-model","image-editing","image-manipulation","single-image","single-video","text-driven-editing","text2live","video-editing"],"latest_commit_sha":null,"homepage":"https://text2live.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/omerbt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-07-26T12:57:23.000Z","updated_at":"2025-03-26T17:40:36.000Z","dependencies_parsed_at":"2024-01-20T20:52:52.061Z","dependency_job_id":null,"html_url":"https://github.com/omerbt/Text2LIVE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omerbt%2FText2LIVE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omerbt%2FText2LIVE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omerbt%2FText2LIVE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omerbt%2FText2LIVE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/omerbt","download_url":"https://codeload.github.com/omerbt/Text2LIVE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650437,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clip","eccv2022","generative-model","image-editing","image-manipulation","single-image","single-video","text-driven-editing","text2live","video-editing"],"created_at":"2024-11-09T14:37:35.630Z","updated_at":"2025-04-13T00:48:31.244Z","avatar_url":"https://github.com/omerbt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text2LIVE: Text-Driven Layered Image and Video Editing (ECCV 2022 - Oral)\n## [\u003ca href=\"https://text2live.github.io/\" target=\"_blank\"\u003eProject Page\u003c/a\u003e]\n\n[![arXiv](https://img.shields.io/badge/arXiv-Text2LIVE-b31b1b.svg)](https://arxiv.org/abs/2204.02491)\n![Pytorch](https://img.shields.io/badge/PyTorch-\u003e=1.10.0-Red?logo=pytorch)\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/weizmannscience/text2live)\n\n![teaser](https://user-images.githubusercontent.com/22198039/179798581-ca6f6652-600a-400a-b21b-713fc5c15d56.png)\n\n**Text2LIVE** is a method for text-driven editing of real-world images and videos, as described in \u003ca href=\"https://arxiv.org/abs/2204.02491\" target=\"_blank\"\u003e(link to paper)\u003c/a\u003e.\n\n[//]: # (. It can be used for localized and global edits that change the texture of existing objects or augment the scene with semi-transparent effects \u0026#40;e.g. smoke, fire, snow\u0026#41;.)\n\n[//]: # (### Abstract)\n\u003eWe present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Specifically, given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with new visual effects (e.g., smoke, fire) in a semantically meaningful manner. Our framework trains a generator using an internal dataset of training examples, extracted from a single input (image or video and target text prompt), while leveraging an external pre-trained CLIP model to establish our losses. Rather than directly generating the edited output, our key idea is to generate an edit layer (color+opacity) that is composited over the original input. This allows us to constrain the generation process and maintain high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer. Our method neither relies on a pre-trained generator nor requires user-provided edit masks. Thus, it can perform localized, semantic edits on high-resolution natural images and videos across a variety of objects and scenes.\n\n\n## Getting Started\n### Installation\n\n```\ngit clone https://github.com/omerbt/Text2LIVE.git\nconda create --name text2live python=3.9 \nconda activate text2live \npip install -r requirements.txt\n```\n\n### Download sample images and videos\nDownload sample images and videos from the DAVIS dataset:\n```\ncd Text2LIVE\ngdown https://drive.google.com/uc?id=1osN4PlPkY9uk6pFqJZo8lhJUjTIpa80J\u0026export=download\nunzip data.zip\n```\nIt will create a folder `data`:\n```\nText2LIVE\n├── ...\n├── data\n│   ├── pretrained_nla_models # NLA models are stored here\n│   ├── images # sample images\n│   └── videos # sample videos from DAVIS dataset\n│         ├── car-turn # contains video frames \n│         ├── ...\n└── ...\n```\nTo enforce temporal consistency in video edits, we utilize the Neural Layered Atlases (NLA). Pretrained NLA models are taken from \u003ca href=\"https://layered-neural-atlases.github.io\"\u003ehere\u003c/a\u003e, and are already inside the `data` folder.\n\n### Run examples \n* Our method is designed to change textures of existing objects / augment the scene with semi-transparent effects (e.g., smoke, fire). It is not designed for adding new objects or significantly deviating from the original spatial layout.\n* Training **Text2LIVE** multiple times with the same inputs can lead to slightly different results.\n* CLIP sometimes exhibits bias towards specific solutions (see figure 9 in the paper), thus slightly different text prompts may lead to different flavors of edits.\n\n\nThe required GPU memory depends on the input image/video size, but you should be good with a Tesla V100 32GB :).\nCurrently mixed precision introduces some instability in the training process, but it could be added later.\n\n#### Video Editing\nRun the following command to start training\n```\npython train_video.py --example_config car-turn_winter.yaml\n```\n#### Image Editing\nRun the following command to start training\n```\npython train_image.py --example_config golden_horse.yaml\n```\nIntermediate results will be saved to `results` during optimization. The frequency of saving intermediate results is indicated in the `log_images_freq` flag of the configuration.\n\n## Sample Results\nhttps://user-images.githubusercontent.com/22198039/179797381-983e0453-2e5d-40e8-983d-578217b358e4.mov\n\nFor more see the [supplementary material](https://text2live.github.io/sm/index.html).\n\n\n## Citation\n```\n@inproceedings{bar2022text2live,\n  title={Text2live: Text-driven layered image and video editing},\n  author={Bar-Tal, Omer and Ofri-Amar, Dolev and Fridman, Rafail and Kasten, Yoni and Dekel, Tali},\n  booktitle={European Conference on Computer Vision},\n  pages={707--723},\n  year={2022},\n  organization={Springer}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomerbt%2Ftext2live","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomerbt%2Ftext2live","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomerbt%2Ftext2live/lists"}