{"id":13411493,"url":"https://github.com/eladrich/pixel2style2pixel","last_synced_at":"2025-05-15T03:05:29.837Z","repository":{"id":37678850,"uuid":"300247371","full_name":"eladrich/pixel2style2pixel","owner":"eladrich","description":"Official Implementation for \"Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation\" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework","archived":false,"fork":false,"pushed_at":"2022-10-01T11:23:39.000Z","size":95170,"stargazers_count":3243,"open_issues_count":11,"forks_count":575,"subscribers_count":62,"default_branch":"master","last_synced_at":"2025-05-15T03:05:28.326Z","etag":null,"topics":["cvpr2021","generative-adversarial-network","image-translation","pixel2style2pixel","psp-framework","psp-model","stylegan","stylegan-encoder"],"latest_commit_sha":null,"homepage":"https://eladrich.github.io/pixel2style2pixel/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eladrich.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-01T11:01:03.000Z","updated_at":"2025-05-14T01:33:44.000Z","dependencies_parsed_at":"2022-07-14T07:20:40.065Z","dependency_job_id":null,"html_url":"https://github.com/eladrich/pixel2style2pixel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladrich%2Fpixel2style2pixel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladrich%2Fpixel2style2pixel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladrich%2Fpixel2style2pixel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladrich%2Fpixel2style2pixel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eladrich","download_url":"https://codeload.github.com/eladrich/pixel2style2pixel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254264765,"owners_count":22041793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2021","generative-adversarial-network","image-translation","pixel2style2pixel","psp-framework","psp-model","stylegan","stylegan-encoder"],"created_at":"2024-07-30T20:01:14.047Z","updated_at":"2025-05-15T03:05:29.820Z","avatar_url":"https://github.com/eladrich.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation\n\u003ca href=\"https://arxiv.org/abs/2008.00951\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" height=22.5\u003e\u003c/a\u003e  \n\n\u003ca href=\"https://www.youtube.com/watch?v=bfvSwhqsTgM\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=CVPR 2021\u0026message=5 Minute Video\u0026color=red\" height=22.5\u003e\u003c/a\u003e  \n\u003ca href=\"https://replicate.ai/eladrich/pixel2style2pixel\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Replicate\u0026message=Demo and Docker Image\u0026color=darkgreen\" height=22.5\u003e\u003c/a\u003e\n\n\u003ca href=\"http://colab.research.google.com/github/eladrich/pixel2style2pixel/blob/master/notebooks/inference_playground.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=22.5\u003e\u003c/a\u003e  \n\n\u003e We present a generic image-to-image translation framework, pixel2style2pixel (pSp). \nOur pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, \nforming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization.\nNext, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the \nlatent domain. By deviating from the standard \"invert first, edit later\" methodology used with previous StyleGAN encoders, our approach can handle a variety of \ntasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support \n\u003efor solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. \nFinally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/teaser.png\" width=\"800px\"/\u003e\n\u003cbr\u003e\nThe proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution.\n\u003c/p\u003e\n\n## Description   \nOfficial Implementation of our pSp paper for both training and evaluation. The pSp method extends the StyleGAN model to \nallow solving different image-to-image translation problems using its encoder.\n\n## Table of Contents\n  * [Description](#description)\n  * [Table of Contents](#table-of-contents)\n  * [Recent Updates](#recent-updates)\n  * [Applications](#applications)\n    + [StyleGAN Encoding](#stylegan-encoding)\n    + [Face Frontalization](#face-frontalization)\n    + [Conditional Image Synthesis](#conditional-image-synthesis)\n    + [Super Resolution](#super-resolution)\n  * [Getting Started](#getting-started)\n    + [Prerequisites](#prerequisites)\n    + [Installation](#installation)\n    + [Inference Notebook](#inference-notebook)\n    + [Pretrained Models](#pretrained-models)\n  * [Training](#training)\n    + [Preparing your Data](#preparing-your-data)\n    + [Training pSp](#training-psp)\n      - [Training the pSp Encoder](#training-the-psp-encoder)\n      - [Frontalization](#frontalization)\n      - [Sketch to Face](#sketch-to-face)\n      - [Segmentation Map to Face](#segmentation-map-to-face)\n      - [Super Resolution](#super-resolution-1)\n    + [Additional Notes](#additional-notes)\n    + [Weights \u0026 Biases Integration](#weights--biases-integration)\n  * [Testing](#testing)\n    + [Inference](#inference)\n    + [Multi-Modal Synthesis with Style-Mixing](#multi-modal-synthesis-with-style-mixing)\n    + [Computing Metrics](#computing-metrics)\n  * [Additional Applications](#additional-applications)\n    + [Toonify](#toonify)\n  * [Repository structure](#repository-structure)\n  * [TODOs](#todos)\n  * [Credits](#credits)\n  * [Inspired by pSp](#inspired-by-psp)\n  * [pSp in the Media](#psp-in-the-media)\n  * [Citation](#citation)\n  \n## Recent Updates\n**`2020.10.04`**: Initial code release  \n**`2020.10.06`**: Add pSp toonify model (Thanks to the great work from [Doron Adler](https://linktr.ee/Norod78) and [Justin Pinkney](https://www.justinpinkney.com/))!  \n**`2021.04.23`**: Added several new features: \n  - Added supported for StyleGANs of different resolutions (e.g., 256, 512, 1024). This can be set using the flag `--output_size`, which is set to 1024 by default. \n  - Added support for the MoCo-Based similarity loss introduced in [encoder4editing (Tov et al. 2021)](https://github.com/omertov/encoder4editing). More details are provided [below](https://github.com/eladrich/pixel2style2pixel#training-psp).  \n  \n**`2021.07.06`**: Added support for training with Weights \u0026 Biases. [See below for details](https://github.com/eladrich/pixel2style2pixel#weights--biases-integration).\n\n## Applications\n### StyleGAN Encoding\nHere, we use pSp to find the latent code of real images in the latent domain of a pretrained StyleGAN generator. \n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/encoding_inputs.jpg\" width=\"800px\"/\u003e\n\u003cimg src=\"docs/encoding_outputs.jpg\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\n\n### Face Frontalization\nIn this application we want to generate a front-facing face from a given input image. \n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/frontalization_inputs.jpg\" width=\"800px\"/\u003e\n\u003cimg src=\"docs/frontalization_outputs.jpg\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\n### Conditional Image Synthesis\nHere we wish to generate photo-realistic face images from ambiguous sketch images or segmentation maps. Using style-mixing, we inherently support multi-modal synthesis for a single input.\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/seg2image.png\" width=\"800px\"/\u003e\n\u003cimg src=\"docs/sketch2image.png\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\n### Super Resolution\nGiven a low-resolution input image, we generate a corresponding high-resolution image. As this too is an ambiguous task, we can use style-mixing to produce several plausible results.\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/super_res_32.jpg\" width=\"800px\"/\u003e\n\u003cimg src=\"docs/super_res_style_mixing.jpg\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\n\n## Getting Started\n### Prerequisites\n- Linux or macOS\n- NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)\n- Python 2 or 3\n\n### Installation\n- Clone this repo:\n``` \ngit clone https://github.com/eladrich/pixel2style2pixel.git\ncd pixel2style2pixel\n```\n- Dependencies:  \nWe recommend running this repository using [Anaconda](https://docs.anaconda.com/anaconda/install/). \nAll dependencies for defining the environment are provided in `environment/psp_env.yaml`.\n\n### Inference Notebook\nTo help visualize the pSp framework on multiple tasks and to help you get started, we provide a Jupyter notebook found in `notebooks/inference_playground.ipynb` that allows one to visualize the various applications of pSp.   \nThe notebook will download the necessary pretrained models and run inference on the images found in `notebooks/images`.  \nFor the tasks of conditional image synthesis and super resolution, the notebook also demonstrates pSp's ability to perform multi-modal synthesis using \nstyle-mixing. \n\n### Pretrained Models\nPlease download the pre-trained models from the following links. Each pSp model contains the entire pSp architecture, including the encoder and decoder weights.\n| Path | Description\n| :--- | :----------\n|[StyleGAN Inversion](https://drive.google.com/file/d/1bMTNWkh5LArlaWSc_wa8VKyq2V42T2z0/view?usp=sharing)  | pSp trained with the FFHQ dataset for StyleGAN inversion.\n|[Face Frontalization](https://drive.google.com/file/d/1_S4THAzXb-97DbpXmanjHtXRyKxqjARv/view?usp=sharing)  | pSp trained with the FFHQ dataset for face frontalization.\n|[Sketch to Image](https://drive.google.com/file/d/1lB7wk7MwtdxL-LL4Z_T76DuCfk00aSXA/view?usp=sharing)  | pSp trained with the CelebA-HQ dataset for image synthesis from sketches.\n|[Segmentation to Image](https://drive.google.com/file/d/1VpEKc6E6yG3xhYuZ0cq8D2_1CbT0Dstz/view?usp=sharing) | pSp trained with the CelebAMask-HQ dataset for image synthesis from segmentation maps.\n|[Super Resolution](https://drive.google.com/file/d/1ZpmSXBpJ9pFEov6-jjQstAlfYbkebECu/view?usp=sharing)  | pSp trained with the CelebA-HQ dataset for super resolution (up to x32 down-sampling).\n|[Toonify](https://drive.google.com/file/d/1YKoiVuFaqdvzDP5CZaqa3k5phL-VDmyz/view)  | pSp trained with the FFHQ dataset for toonification using StyleGAN generator from [Doron Adler](https://linktr.ee/Norod78) and [Justin Pinkney](https://www.justinpinkney.com/).\n\nIf you wish to use one of the pretrained models for training or inference, you may do so using the flag `--checkpoint_path`.\n\nIn addition, we provide various auxiliary models needed for training your own pSp model from scratch as well as pretrained models needed for computing the ID metrics reported in the paper.\n| Path | Description\n| :--- | :----------\n|[FFHQ StyleGAN](https://drive.google.com/file/d/1EM87UquaoQmk17Q8d5kYIAHqu0dkYqdT/view?usp=sharing) | StyleGAN model pretrained on FFHQ taken from [rosinality](https://github.com/rosinality/stylegan2-pytorch) with 1024x1024 output resolution.\n|[IR-SE50 Model](https://drive.google.com/file/d/1KW7bjndL3QG3sxBbZxreGHigcCCpsDgn/view?usp=sharing) | Pretrained IR-SE50 model taken from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch) for use in our ID loss during pSp training.\n|[MoCo ResNet-50](https://drive.google.com/file/d/18rLcNGdteX5LwT7sv_F7HWr12HpVEzVe/view?usp=sharing)  | Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the [official implementation](https://github.com/facebookresearch/moco).\n|[CurricularFace Backbone](https://drive.google.com/file/d/1f4IwVa2-Bn9vWLwB-bUwm53U_MlvinAj/view?usp=sharing)  | Pretrained CurricularFace model taken from [HuangYG123](https://github.com/HuangYG123/CurricularFace) for use in ID similarity metric computation.\n|[MTCNN](https://drive.google.com/file/d/1tJ7ih-wbCO6zc3JhI_1ZGjmwXKKaPlja/view?usp=sharing)  | Weights for MTCNN model taken from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch) for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)\n\nBy default, we assume that all auxiliary models are downloaded and saved to the directory `pretrained_models`. However, you may use your own paths by changing the necessary values in `configs/path_configs.py`. \n\n## Training\n### Preparing your Data\n- Currently, we provide support for numerous datasets and experiments (encoding, frontalization, etc.).\n    - Refer to `configs/paths_config.py` to define the necessary data paths and model paths for training and evaluation. \n    - Refer to `configs/transforms_config.py` for the transforms defined for each dataset/experiment. \n    - Finally, refer to `configs/data_configs.py` for the source/target data paths for the train and test sets\n      as well as the transforms.\n- If you wish to experiment with your own dataset, you can simply make the necessary adjustments in \n    1. `data_configs.py` to define your data paths.\n    2. `transforms_configs.py` to define your own data transforms.\n    \nAs an example, assume we wish to run encoding using ffhq (`dataset_type=ffhq_encode`). \nWe first go to `configs/paths_config.py` and define:\n``` \ndataset_paths = {\n    'ffhq': '/path/to/ffhq/images256x256'\n    'celeba_test': '/path/to/CelebAMask-HQ/test_img',\n}\n```\nThe transforms for the experiment are defined in the class `EncodeTransforms` in `configs/transforms_config.py`.   \nFinally, in `configs/data_configs.py`, we define:\n``` \nDATASETS = {\n   'ffhq_encode': {\n        'transforms': transforms_config.EncodeTransforms,\n        'train_source_root': dataset_paths['ffhq'],\n        'train_target_root': dataset_paths['ffhq'],\n        'test_source_root': dataset_paths['celeba_test'],\n        'test_target_root': dataset_paths['celeba_test'],\n    },\n}\n``` \nWhen defining our datasets, we will take the values in the above dictionary.\n\n\n### Training pSp\nThe main training script can be found in `scripts/train.py`.   \nIntermediate training results are saved to `opts.exp_dir`. This includes checkpoints, train outputs, and test outputs.  \nAdditionally, if you have tensorboard installed, you can visualize tensorboard logs in `opts.exp_dir/logs`.\n\n#### Training the pSp Encoder\n```\npython scripts/train.py \\\n--dataset_type=ffhq_encode \\\n--exp_dir=/path/to/experiment \\\n--workers=8 \\\n--batch_size=8 \\\n--test_batch_size=8 \\\n--test_workers=8 \\\n--val_interval=2500 \\\n--save_interval=5000 \\\n--encoder_type=GradualStyleEncoder \\\n--start_from_latent_avg \\\n--lpips_lambda=0.8 \\\n--l2_lambda=1 \\\n--id_lambda=0.1\n```\n\n#### Frontalization\n```\npython scripts/train.py \\\n--dataset_type=ffhq_frontalize \\\n--exp_dir=/path/to/experiment \\\n--workers=8 \\\n--batch_size=8 \\\n--test_batch_size=8 \\\n--test_workers=8 \\\n--val_interval=2500 \\\n--save_interval=5000 \\\n--encoder_type=GradualStyleEncoder \\\n--start_from_latent_avg \\\n--lpips_lambda=0.08 \\\n--l2_lambda=0.001 \\\n--lpips_lambda_crop=0.8 \\\n--l2_lambda_crop=0.01 \\\n--id_lambda=1 \\\n--w_norm_lambda=0.005\n```\n\n#### Sketch to Face\n```\npython scripts/train.py \\\n--dataset_type=celebs_sketch_to_face \\\n--exp_dir=/path/to/experiment \\\n--workers=8 \\\n--batch_size=8 \\\n--test_batch_size=8 \\\n--test_workers=8 \\\n--val_interval=2500 \\\n--save_interval=5000 \\\n--encoder_type=GradualStyleEncoder \\\n--start_from_latent_avg \\\n--lpips_lambda=0.8 \\\n--l2_lambda=1 \\\n--id_lambda=0 \\\n--w_norm_lambda=0.005 \\\n--label_nc=1 \\\n--input_nc=1\n```\n\n#### Segmentation Map to Face\n```\npython scripts/train.py \\\n--dataset_type=celebs_seg_to_face \\\n--exp_dir=/path/to/experiment \\\n--workers=8 \\\n--batch_size=8 \\\n--test_batch_size=8 \\\n--test_workers=8 \\\n--val_interval=2500 \\\n--save_interval=5000 \\\n--encoder_type=GradualStyleEncoder \\\n--start_from_latent_avg \\\n--lpips_lambda=0.8 \\\n--l2_lambda=1 \\\n--id_lambda=0 \\\n--w_norm_lambda=0.005 \\\n--label_nc=19 \\\n--input_nc=19\n```\nNotice with conditional image synthesis no identity loss is utilized (i.e. `--id_lambda=0`)\n\n#### Super Resolution\n``` \npython scripts/train.py \\\n--dataset_type=celebs_super_resolution \\\n--exp_dir=/path/to/experiment \\\n--workers=8 \\\n--batch_size=8 \\\n--test_batch_size=8 \\\n--test_workers=8 \\\n--val_interval=2500 \\\n--save_interval=5000 \\\n--encoder_type=GradualStyleEncoder \\\n--start_from_latent_avg \\\n--lpips_lambda=0.8 \\\n--l2_lambda=1 \\\n--id_lambda=0.1 \\\n--w_norm_lambda=0.005 \\\n--resize_factors=1,2,4,8,16,32\n```\n\n### Additional Notes\n- See `options/train_options.py` for all training-specific flags. \n- See `options/test_options.py` for all test-specific flags.\n- If you wish to resume from a specific checkpoint (e.g. a pretrained pSp model), you may do so using `--checkpoint_path`.\n- By default, we assume that the StyleGAN used outputs images at resolution `1024x1024`. If you wish to use a StyleGAN at a smaller resolution, you can do so by using the flag `--output_size` (e.g., `--output_size=256`). \n- If you wish to generate images from segmentation maps, please specify `--label_nc=N`  and `--input_nc=N` where `N` \nis the number of semantic categories. \n- Similarly, for generating images from sketches, please specify `--label_nc=1` and `--input_nc=1`.\n- Specifying `--label_nc=0` (the default value), will directly use the RGB colors as input.\n\n** Identity/Similarity Losses **   \nIn pSp, we introduce a facial identity loss using a pre-trained ArcFace network for facial recognition. When operating on the human facial domain, we \nhighly recommend employing this loss objective by using the flag `--id_lambda`.  \nIn a more recent paper, [encoder4editing](https://github.com/omertov/encoder4editing), the authors generalize this identity loss to other domains by \nusing a MoCo-based ResNet to extract features instead of an ArcFace network.\nApplying this MoCo-based similarity loss can be done by using the flag `--moco_lambda`. We recommend setting `--moco_lambda=0.5` in your experiments.  \nPlease note, you \u003cins\u003ecannot\u003c/ins\u003e set both `id_lambda` and `moco_lambda` to be active simultaneously (e.g., to use the MoCo-based loss, you should specify, \n`--moco_lambda=0.5 --id_lambda=0`).\n\n### Weights \u0026 Biases Integration\nTo help track your experiments, we've integrated [Weights \u0026 Biases](https://wandb.ai/home) into our training process. \nTo enable Weights \u0026 Biases (`wandb`), first make an account on the platform's webpage and install `wandb` using \n`pip install wandb`. Then, to train pSp using `wandb`, simply add the flag `--use_wandb`. \n\nNote that when running for the first time, you will be asked to provide your access key which can be accessed via the\nWeights \u0026 Biases platform. \n\nUsing Weights \u0026 Biases will allow you to visualize the training and testing loss curves as well as \nintermediate training results.\n\n\n## Testing\n### Inference\nHaving trained your model, you can use `scripts/inference.py` to apply the model on a set of images.   \nFor example, \n```\npython scripts/inference.py \\\n--exp_dir=/path/to/experiment \\\n--checkpoint_path=experiment/checkpoints/best_model.pt \\\n--data_path=/path/to/test_data \\\n--test_batch_size=4 \\\n--test_workers=4 \\\n--couple_outputs\n```\nAdditional notes to consider: \n- During inference, the options used during training are loaded from the saved checkpoint and are then updated using the \ntest options passed to the inference script. For example, there is no need to pass `--dataset_type` or `--label_nc` to the \n inference script, as they are taken from the loaded `opts`.\n- When running inference for segmentation-to-image or sketch-to-image, it is highly recommend to do so with a style-mixing,\nas is done in the paper. This can simply be done by adding `--latent_mask=8,9,10,11,12,13,14,15,16,17` when calling the \nscript.\n- When running inference for super-resolution, please provide a single down-sampling value using `--resize_factors`.\n- Adding the flag `--couple_outputs` will save an additional image containing the input and output images side-by-side in the sub-directory\n`inference_coupled`. Otherwise, only the output image is saved to the sub-directory `inference_results`.\n- By default, the images will be saved at resolutiosn of 1024x1024, the original output size of StyleGAN. If you wish to save \noutputs resized to resolutions of 256x256, you can do so by adding the flag `--resize_outputs`.\n\n\n### Multi-Modal Synthesis with Style-Mixing\nGiven a trained model for conditional image synthesis or super-resolution, we can easily generate multiple outputs \nfor a given input image. This can be done using the script `scripts/style_mixing.py`.    \nFor example, running the following command will perform style-mixing for a segmentation-to-image experiment:\n```\npython scripts/style_mixing.py \\\n--exp_dir=/path/to/experiment \\\n--checkpoint_path=/path/to/experiment/checkpoints/best_model.pt \\\n--data_path=/path/to/test_data/ \\\n--test_batch_size=4 \\\n--test_workers=4 \\\n--n_images=25 \\\n--n_outputs_to_generate=5 \\\n--latent_mask=8,9,10,11,12,13,14,15,16,17\n``` \nHere, we inject `5` randomly drawn vectors and perform style-mixing on the latents `[8,9,10,11,12,13,14,15,16,17]`.  \n\nAdditional notes to consider: \n- To perform style-mixing on a subset of images, you may use the flag `--n_images`. The default value of `None` will perform \nstyle mixing on every image in the given `data_path`. \n- You may also include the argument `--mix_alpha=m` where `m` is a float defining the mixing coefficient between the \ninput latent and the randomly drawn latent.\n- When performing style-mixing for super-resolution, please provide a single down-sampling value using `--resize_factors`.\n- By default, the images will be saved at resolutiosn of 1024x1024, the original output size of StyleGAN. If you wish to save \noutputs resized to resolutions of 256x256, you can do so by adding the flag `--resize_outputs`.\n\n\n### Computing Metrics\nSimilarly, given a trained model and generated outputs, we can compute the loss metrics on a given dataset.  \nThese scripts receive the inference output directory and ground truth directory.\n- Calculating the identity loss: \n```\npython scripts/calc_id_loss_parallel.py \\\n--data_path=/path/to/experiment/inference_outputs \\\n--gt_path=/path/to/test_images \\\n```\n- Calculating LPIPS loss:\n```\npython scripts/calc_losses_on_images.py \\\n--mode lpips\n--data_path=/path/to/experiment/inference_outputs \\\n--gt_path=/path/to/test_images \\\n```\n- Calculating L2 loss:\n```\npython scripts/calc_losses_on_images.py \\\n--mode l2\n--data_path=/path/to/experiment/inference_outputs \\\n--gt_path=/path/to/test_images \\\n```\n\n## Additional Applications\nTo better show the flexibility of our pSp framework we present additional applications below.\n\nAs with our main applications, you may download the pretrained models here: \n| Path | Description\n| :--- | :----------\n|[Toonify](https://drive.google.com/file/d/1YKoiVuFaqdvzDP5CZaqa3k5phL-VDmyz/view)  | pSp trained with the FFHQ dataset for toonification using StyleGAN generator from [Doron Adler](https://linktr.ee/Norod78) and [Justin Pinkney](https://www.justinpinkney.com/).\n\n### Toonify\nUsing the toonify StyleGAN built by [Doron Adler](https://linktr.ee/Norod78) and [Justin Pinkney](https://www.justinpinkney.com/),\nwe take a real face image and generate a toonified version of the given image. We train the pSp encoder to directly reconstruct real \nface images inside the toons latent space resulting in a projection of each image to the closest toon. We do so without requiring any labeled pairs\nor distillation!\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/toonify_input.jpg\" width=\"800px\"/\u003e\n\u003cimg src=\"docs/toonify_output.jpg\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\nThis is trained exactly like the StyleGAN inversion task with several changes:   \n- Change from FFHQ StyleGAN to toonifed StyleGAN (can be set using `--stylegan_weights`)\n    - The toonify generator is taken from [Doron Adler](https://linktr.ee/Norod78) and [Justin Pinkney](https://www.justinpinkney.com/) \n      and converted to Pytorch using [rosinality's](https://github.com/rosinality/stylegan2-pytorch) conversion script.\n    - For convenience, the converted generator Pytorch model may be downloaded [here](https://drive.google.com/file/d/1r3XVCt_WYUKFZFxhNH-xO2dTtF6B5szu/view?usp=sharing).\n- Increase `id_lambda` from `0.1` to `1`  \n- Increase `w_norm_lambda` from `0.005` to `0.025`  \n\nWe obtain the best results after around `6000` iterations of training (can be set using `--max_steps`) \n\n\n## Repository structure\n| Path | Description \u003cimg width=200\u003e\n| :--- | :---\n| pixel2style2pixel | Repository root folder\n| \u0026boxvr;\u0026nbsp; configs | Folder containing configs defining model/data paths and data transforms\n| \u0026boxvr;\u0026nbsp; criteria | Folder containing various loss criterias for training\n| \u0026boxvr;\u0026nbsp; datasets | Folder with various dataset objects and augmentations\n| \u0026boxvr;\u0026nbsp; environment | Folder containing Anaconda environment used in our experiments\n| \u0026boxvr; models | Folder containting all the models and training objects\n| \u0026boxv;\u0026nbsp; \u0026boxvr;\u0026nbsp; encoders | Folder containing our pSp encoder architecture implementation and ArcFace encoder implementation from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch)\n| \u0026boxv;\u0026nbsp; \u0026boxvr;\u0026nbsp; mtcnn | MTCNN implementation from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch)\n| \u0026boxv;\u0026nbsp; \u0026boxvr;\u0026nbsp; stylegan2 | StyleGAN2 model from [rosinality](https://github.com/rosinality/stylegan2-pytorch)\n| \u0026boxv;\u0026nbsp; \u0026boxur;\u0026nbsp; psp.py | Implementation of our pSp framework\n| \u0026boxvr;\u0026nbsp; notebook | Folder with jupyter notebook containing pSp inference playground\n| \u0026boxvr;\u0026nbsp; options | Folder with training and test command-line options\n| \u0026boxvr;\u0026nbsp; scripts | Folder with running scripts for training and inference\n| \u0026boxvr;\u0026nbsp; training | Folder with main training logic and Ranger implementation from [lessw2020](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer)\n| \u0026boxvr;\u0026nbsp; utils | Folder with various utility functions\n| \u003cimg width=300\u003e | \u003cimg\u003e\n\n## TODOs\n- [ ] Add multi-gpu support\n\n## Credits\n**StyleGAN2 implementation:**  \nhttps://github.com/rosinality/stylegan2-pytorch  \nCopyright (c) 2019 Kim Seonghyeon  \nLicense (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE  \n\n**MTCNN, IR-SE50, and ArcFace models and implementations:**  \nhttps://github.com/TreB1eN/InsightFace_Pytorch  \nCopyright (c) 2018 TreB1eN  \nLicense (MIT) https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/LICENSE  \n\n**CurricularFace model and implementation:**   \nhttps://github.com/HuangYG123/CurricularFace  \nCopyright (c) 2020 HuangYG123  \nLicense (MIT) https://github.com/HuangYG123/CurricularFace/blob/master/LICENSE  \n\n**Ranger optimizer implementation:**  \nhttps://github.com/lessw2020/Ranger-Deep-Learning-Optimizer   \nLicense (Apache License 2.0) https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/LICENSE  \n\n**LPIPS implementation:**  \nhttps://github.com/S-aiueo32/lpips-pytorch  \nCopyright (c) 2020, Sou Uchida  \nLicense (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE  \n\n**Please Note**: The CUDA files under the [StyleGAN2 ops directory](https://github.com/eladrich/pixel2style2pixel/tree/master/models/stylegan2/op) are made available under the [Nvidia Source Code License-NC](https://nvlabs.github.io/stylegan2/license.html)\n\n## Inspired by pSp\nBelow are several works inspired by pSp that we found particularly interesting:  \n\n**Reverse Toonification**  \nUsing our pSp encoder, artist [Nathan Shipley](https://linktr.ee/nathan_shipley) transformed animated figures and paintings into real life. Check out his amazing work on his [twitter page](https://twitter.com/citizenplain?lang=en) and [website](http://www.nathanshipley.com/gan).   \n\n**Deploying pSp with StyleSpace for Editing**  \nAwesome work from [Justin Pinkney](https://www.justinpinkney.com/) who deployed our pSp model on Runway and provided support for editing the resulting inversions using the [StyleSpace Analysis paper](https://arxiv.org/abs/2011.12799). Check out his repository [here](https://github.com/justinpinkney/pixel2style2pixel).\n\n**Encoder4Editing (e4e)**   \nBuilding on the work of pSp, Tov et al. design an encoder to enable high quality edits on real images. Check out their [paper](https://arxiv.org/abs/2102.02766) and [code](https://github.com/omertov/encoder4editing).\n\n**Style-based Age Manipulation (SAM)**  \nLeveraging pSp and the rich semantics of StyleGAN, SAM learns non-linear latent space paths for modeling the age transformation of real face images. Check out the project page [here](https://yuval-alaluf.github.io/SAM/).\n\n**ReStyle**  \nReStyle builds on recent encoders such as pSp and e4e by introducing an iterative refinment mechanism to gradually improve the inversion of real images. Check out the project page [here](https://yuval-alaluf.github.io/restyle-encoder/).\n\n## pSp in the Media\n* bycloud: [AI Generates Cartoon Characters In Real Life Pixel2Style2Pixel](https://www.youtube.com/watch?v=g-N8lfceclI\u0026ab_channel=bycloud)\n* Synced: [Pixel2Style2Pixel: Novel Encoder Architecture Boosts Facial Image-To-Image Translation](https://syncedreview.com/2020/08/07/pixel2style2pixel-novel-encoder-architecture-boosts-facial-image-to-image-translation/)\n* Cartoon Brew: [An Artist Has Used Machine Learning To Turn Animated Characters Into Creepy Photorealistic Figures](https://www.cartoonbrew.com/tech/an-artist-has-used-machine-learning-to-turn-animated-characters-into-creepy-photorealistic-figures-197975.html)\n\n\n## Citation\nIf you use this code for your research, please cite our paper \u003ca href=\"https://arxiv.org/abs/2008.00951\"\u003eEncoding in Style: a StyleGAN Encoder for Image-to-Image Translation\u003c/a\u003e:\n\n```\n@InProceedings{richardson2021encoding,\n      author = {Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},\n      title = {Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},\n      booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n      month = {June},\n      year = {2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feladrich%2Fpixel2style2pixel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feladrich%2Fpixel2style2pixel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feladrich%2Fpixel2style2pixel/lists"}