{"id":20256575,"url":"https://github.com/interdigitalinc/latent-transformer","last_synced_at":"2025-09-17T23:33:50.609Z","repository":{"id":41383376,"uuid":"396966429","full_name":"InterDigitalInc/latent-transformer","owner":"InterDigitalInc","description":"Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.","archived":false,"fork":false,"pushed_at":"2021-08-20T09:39:24.000Z","size":8551,"stargazers_count":146,"open_issues_count":2,"forks_count":22,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-24T21:11:11.934Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2106.11895","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InterDigitalInc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-16T20:46:55.000Z","updated_at":"2025-03-12T06:41:46.000Z","dependencies_parsed_at":"2022-09-20T23:10:43.272Z","dependency_job_id":null,"html_url":"https://github.com/InterDigitalInc/latent-transformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InterDigitalInc%2Flatent-transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InterDigitalInc%2Flatent-transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InterDigitalInc%2Flatent-transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InterDigitalInc%2Flatent-transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InterDigitalInc","download_url":"https://codeload.github.com/InterDigitalInc/latent-transformer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248318928,"owners_count":21083751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T10:47:16.194Z","updated_at":"2025-09-17T23:33:45.494Z","avatar_url":"https://github.com/InterDigitalInc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## A Latent Transformer for Disentangled Face Editing in Images and Videos\r\n\r\nOfficial implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos. \r\n\r\n[[Video Editing Results]](https://drive.google.com/drive/folders/1aIfmbgJL1CUFgZQzqDVaUtLHrqxS6QjP?usp=sharing)\r\n\r\n## Requirements\r\n\r\n### Dependencies\r\n\r\n- Python 3.6\r\n- PyTorch 1.8\r\n- Opencv\r\n- Tensorboard_logger\r\n\r\nYou can install a new environment for this repo by running\r\n```\r\nconda env create -f environment.yml\r\nconda activate lattrans \r\n```\r\n\r\n### Prepare StyleGAN2 encoder and generator\r\n\r\n* We use the pretrained StyleGAN2 encoder and generator released from paper [Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation](https://arxiv.org/pdf/2008.00951.pdf). Download and save the [official implementation](https://github.com/eladrich/pixel2style2pixel.git) to `pixel2style2pixel/` directory. Download and save the [pretrained model](https://drive.google.com/file/d/1bMTNWkh5LArlaWSc_wa8VKyq2V42T2z0/view) to `pixel2style2pixel/pretrained_models/`.\r\n\r\n* In order to save the latent codes to the designed path, we slightly modify `pixel2style2pixel/scripts/inference.py`.\r\n\r\n    ```\r\n    # modify run_on_batch()\r\n    if opts.latent_mask is None:\r\n        result_batch = net(inputs, randomize_noise=False, resize=opts.resize_outputs, return_latents=True)\r\n        \r\n    # modify run()\r\n    tic = time.time()\r\n    result_batch, latent_batch = run_on_batch(input_cuda, net, opts) \r\n    latent_save_path = os.path.join(test_opts.exp_dir, 'latent_code_%05d.npy'%global_i)\r\n    np.save(latent_save_path, latent_batch.cpu().numpy())\r\n    toc = time.time()\r\n    ```\r\n\r\n\r\n## Training\r\n\r\n* Prepare the training data\r\n\r\n    To train the latent transformers, you can download [our prepared dataset](https://drive.google.com/drive/folders/1aXVc-q2ER7A9aACSwml5Wyw5ZgrgPq52?usp=sharing) to the directory `data/` and the [pretrained latent classifier](https://drive.google.com/file/d/1K_ShWBfTOCbxBcJfzti7vlYGmRbjXTfn/view?usp=sharing) to the directory `models/`. \r\n    ```\r\n    sh download.sh\r\n    ```\r\n\r\n    You can also prepare your own training data. To achieve that, you need to map your dataset to latent codes using the StyleGAN2 encoder. The corresponding label file is also required. You can continue to use our pretrained latent classifier. If you want to train your own latent classifier on new labels, you can use `pretraining/latent_classifier.py`. \r\n\r\n* Training\r\n\r\n    You can modify the training options of the config file in the directory `configs/`.\r\n    ```\r\n    python train.py --config 001 \r\n    ```\r\n\r\n## Testing \r\n\r\n### Single Attribute Manipulation\r\n\r\nMake sure that the latent classifier is downloaded to the directory `models/` and the StyleGAN2 encoder is prepared as required. After training your latent transformers, you can use `test.py` to run the latent transformer for the images in the test directory `data/test/`. We also provide several pretrained models [here](https://drive.google.com/file/d/14uipafI5mena7LFFtvPh6r5HdzjBqFEt/view?usp=sharing) (run ```download.sh``` to download them). The output images will be saved in the folder `outputs/`. You can change the desired attribute with `--attr`.\r\n\r\n```\r\npython test.py --config 001 --attr Eyeglasses --out_path ./outputs/\r\n```\r\nIf you want to test the model on your custom images, you need to first encoder the images to the latent space of StyleGAN using the pretrained encoder.\r\n```\r\ncd pixel2style2pixel/\r\npython scripts/inference.py \\\r\n--checkpoint_path=pretrained_models/psp_ffhq_encode.pt \\\r\n--data_path=../data/test/ \\\r\n--exp_dir=../data/test/ \\\r\n--test_batch_size=1\r\n```\r\n\r\n### Sequential Attribute Manipulation\r\n\r\nYou can reproduce the sequential editing results in the paper using `notebooks/figure_sequential_edit.ipynb` and the results in the supplementary material using `notebooks/figure_supplementary.ipynb`.\r\n\r\n![User Interface](./image/user_interface.jpg)\r\n\r\nWe also provide an interactive visualization `notebooks/visu_manipulation.ipynb`, where the user can choose the desired attributes for manipulation and define the magnitude of edit for each attribute.  \r\n\r\n\r\n## Video Manipulation\r\n\r\n![Video Result](./image/video_result.jpg)\r\n\r\nWe provide a script to achieve attribute manipulation for the videos in the test directory `data/video/`. Please ensure that the StyleGAN2 encoder is prepared as required. You can upload your own video and modify the options in `run_video_manip.sh`. You can view our [video editing results](https://drive.google.com/drive/folders/1aIfmbgJL1CUFgZQzqDVaUtLHrqxS6QjP?usp=sharing) presented in the paper.\r\n\r\n```\r\nsh run_video_manip.sh\r\n```\r\n\r\n## Citation\r\n```\r\n@article{yao2021latent,\r\n  title={A Latent Transformer for Disentangled Face Editing in Images and Videos},\r\n  author={Yao, Xu and Newson, Alasdair and Gousseau, Yann and Hellier, Pierre},\r\n  journal={2021 International Conference on Computer Vision},\r\n  year={2021}\r\n}\r\n```\r\n## License\r\n\r\nCopyright © 2021, InterDigital R\u0026D France. All rights reserved.\r\n\r\nThis source code is made available under the license found in the LICENSE.txt in the root directory of this source tree.\r\n\r\n\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterdigitalinc%2Flatent-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finterdigitalinc%2Flatent-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterdigitalinc%2Flatent-transformer/lists"}