{"id":15025251,"url":"https://github.com/nvidia/pix2pixhd","last_synced_at":"2025-05-15T01:04:08.113Z","repository":{"id":37579998,"uuid":"112777740","full_name":"NVIDIA/pix2pixHD","owner":"NVIDIA","description":"Synthesizing and manipulating 2048x1024 images with conditional GANs","archived":false,"fork":false,"pushed_at":"2024-11-04T18:13:20.000Z","size":57017,"stargazers_count":6764,"open_issues_count":247,"forks_count":1410,"subscribers_count":167,"default_branch":"master","last_synced_at":"2025-04-22T20:09:21.029Z","etag":null,"topics":["computer-graphics","computer-vision","deep-learning","deep-neural-networks","gan","generative-adversarial-network","image-to-image-translation","pix2pix","pytorch"],"latest_commit_sha":null,"homepage":"https://tcwang0509.github.io/pix2pixHD/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-01T19:19:14.000Z","updated_at":"2025-04-22T09:05:56.000Z","dependencies_parsed_at":"2022-07-09T11:00:27.065Z","dependency_job_id":"7396a297-c8de-4304-836c-35fe2ccb8800","html_url":"https://github.com/NVIDIA/pix2pixHD","commit_stats":{"total_commits":31,"total_committers":9,"mean_commits":"3.4444444444444446","dds":0.4193548387096774,"last_synced_commit":"5a2c87201c5957e2bf51d79b8acddb9cc1920b26"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fpix2pixHD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fpix2pixHD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fpix2pixHD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fpix2pixHD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/pix2pixHD/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252882767,"owners_count":21819152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-graphics","computer-vision","deep-learning","deep-neural-networks","gan","generative-adversarial-network","image-to-image-translation","pix2pix","pytorch"],"created_at":"2024-09-24T20:01:53.997Z","updated_at":"2025-05-07T12:48:42.022Z","avatar_url":"https://github.com/NVIDIA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src='imgs/teaser_720.gif' align=\"right\" width=360\u003e\r\n\r\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\r\n\r\n# pix2pixHD\r\n### [Project](https://tcwang0509.github.io/pix2pixHD/) | [Youtube](https://youtu.be/3AIpPlzM_qs) | [Paper](https://arxiv.org/pdf/1711.11585.pdf) \u003cbr\u003e\r\nPytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translation. It can be used for turning semantic label maps into photo-realistic images or synthesizing portraits from face label maps. \u003cbr\u003e\u003cbr\u003e\r\n[High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs](https://tcwang0509.github.io/pix2pixHD/)  \r\n [Ting-Chun Wang](https://tcwang0509.github.io/)\u003csup\u003e1\u003c/sup\u003e, [Ming-Yu Liu](http://mingyuliu.net/)\u003csup\u003e1\u003c/sup\u003e, [Jun-Yan Zhu](http://people.eecs.berkeley.edu/~junyanz/)\u003csup\u003e2\u003c/sup\u003e, Andrew Tao\u003csup\u003e1\u003c/sup\u003e, [Jan Kautz](http://jankautz.com/)\u003csup\u003e1\u003c/sup\u003e, [Bryan Catanzaro](http://catanzaro.name/)\u003csup\u003e1\u003c/sup\u003e  \r\n \u003csup\u003e1\u003c/sup\u003eNVIDIA Corporation, \u003csup\u003e2\u003c/sup\u003eUC Berkeley  \r\n In CVPR 2018.  \r\n\r\n## Image-to-image translation at 2k/1k resolution\r\n- Our label-to-streetview results\r\n\u003cp align='center'\u003e  \r\n  \u003cimg src='imgs/teaser_label.png' width='400'/\u003e\r\n  \u003cimg src='imgs/teaser_ours.jpg' width='400'/\u003e\r\n\u003c/p\u003e\r\n- Interactive editing results\r\n\u003cp align='center'\u003e  \r\n  \u003cimg src='imgs/teaser_style.gif' width='400'/\u003e\r\n  \u003cimg src='imgs/teaser_label.gif' width='400'/\u003e\r\n\u003c/p\u003e\r\n- Additional streetview results\r\n\u003cp align='center'\u003e\r\n  \u003cimg src='imgs/cityscapes_1.jpg' width='400'/\u003e\r\n  \u003cimg src='imgs/cityscapes_2.jpg' width='400'/\u003e\r\n\u003c/p\u003e\r\n\u003cp align='center'\u003e\r\n  \u003cimg src='imgs/cityscapes_3.jpg' width='400'/\u003e\r\n  \u003cimg src='imgs/cityscapes_4.jpg' width='400'/\u003e\r\n\u003c/p\u003e\r\n\r\n- Label-to-face and interactive editing results\r\n\u003cp align='center'\u003e\r\n  \u003cimg src='imgs/face1_1.jpg' width='250'/\u003e\r\n  \u003cimg src='imgs/face1_2.jpg' width='250'/\u003e\r\n  \u003cimg src='imgs/face1_3.jpg' width='250'/\u003e\r\n\u003c/p\u003e\r\n\u003cp align='center'\u003e\r\n  \u003cimg src='imgs/face2_1.jpg' width='250'/\u003e\r\n  \u003cimg src='imgs/face2_2.jpg' width='250'/\u003e\r\n  \u003cimg src='imgs/face2_3.jpg' width='250'/\u003e\r\n\u003c/p\u003e\r\n\r\n- Our editing interface\r\n\u003cp align='center'\u003e\r\n  \u003cimg src='imgs/city_short.gif' width='330'/\u003e\r\n  \u003cimg src='imgs/face_short.gif' width='450'/\u003e\r\n\u003c/p\u003e\r\n\r\n## Prerequisites\r\n- Linux or macOS\r\n- Python 2 or 3\r\n- NVIDIA GPU (11G memory or larger) + CUDA cuDNN\r\n\r\n## Getting Started\r\n### Installation\r\n- Install PyTorch and dependencies from http://pytorch.org\r\n- Install python libraries [dominate](https://github.com/Knio/dominate).\r\n```bash\r\npip install dominate\r\n```\r\n- Clone this repo:\r\n```bash\r\ngit clone https://github.com/NVIDIA/pix2pixHD\r\ncd pix2pixHD\r\n```\r\n\r\n\r\n### Testing\r\n- A few example Cityscapes test images are included in the `datasets` folder.\r\n- Please download the pre-trained Cityscapes model from [here](https://drive.google.com/file/d/1OR-2aEPHOxZKuoOV34DvQxreqGCSLcW9/view?usp=drive_link) (google drive link), and put it under `./checkpoints/label2city_1024p/`\r\n- Test the model (`bash ./scripts/test_1024p.sh`):\r\n```bash\r\n#!./scripts/test_1024p.sh\r\npython test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none\r\n```\r\nThe test results will be saved to a html file here: `./results/label2city_1024p/test_latest/index.html`.\r\n\r\nMore example scripts can be found in the `scripts` directory.\r\n\r\n\r\n### Dataset\r\n- We use the Cityscapes dataset. To train a model on the full dataset, please download it from the [official website](https://www.cityscapes-dataset.com/) (registration required).\r\nAfter downloading, please put it under the `datasets` folder in the same way the example images are provided.\r\n\r\n\r\n### Training\r\n- Train a model at 1024 x 512 resolution (`bash ./scripts/train_512p.sh`):\r\n```bash\r\n#!./scripts/train_512p.sh\r\npython train.py --name label2city_512p\r\n```\r\n- To view training results, please checkout intermediate results in `./checkpoints/label2city_512p/web/index.html`.\r\nIf you have tensorflow installed, you can see tensorboard logs in `./checkpoints/label2city_512p/logs` by adding `--tf_log` to the training scripts.\r\n\r\n### Multi-GPU training\r\n- Train a model using multiple GPUs (`bash ./scripts/train_512p_multigpu.sh`):\r\n```bash\r\n#!./scripts/train_512p_multigpu.sh\r\npython train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7\r\n```\r\nNote: this is not tested and we trained our model using single GPU only. Please use at your own discretion.\r\n\r\n### Training with Automatic Mixed Precision (AMP) for faster speed\r\n- To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex\r\n- You can then train the model by adding `--fp16`. For example,\r\n```bash\r\n#!./scripts/train_512p_fp16.sh\r\npython -m torch.distributed.launch train.py --name label2city_512p --fp16\r\n```\r\nIn our test case, it trains about 80% faster with AMP on a Volta machine.\r\n\r\n### Training at full resolution\r\n- To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory (`bash ./scripts/train_1024p_24G.sh`), or 16G memory if using mixed precision (AMP).\r\n- If only GPUs with 12G memory are available, please use the 12G script (`bash ./scripts/train_1024p_12G.sh`), which will crop the images during training. Performance is not guaranteed using this script.\r\n\r\n### Training with your own dataset\r\n- If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity `--label_nc N` during both training and testing.\r\n- If your input is not a label map, please just specify `--label_nc 0` which will directly use the RGB colors as input. The folders should then be named `train_A`, `train_B` instead of `train_label`, `train_img`, where the goal is to translate images from A to B.\r\n- If you don't have instance maps or don't want to use them, please specify `--no_instance`.\r\n- The default setting for preprocessing is `scale_width`, which will scale the width of all training images to `opt.loadSize` (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the `--resize_or_crop` option. For example, `scale_width_and_crop` first resizes the image to have width `opt.loadSize` and then does random cropping of size `(opt.fineSize, opt.fineSize)`. `crop` skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify `none`, which will do nothing other than making sure the image is divisible by 32.\r\n\r\n## More Training/Test Details\r\n- Flags: see `options/train_options.py` and `options/base_options.py` for all the training flags; see `options/test_options.py` and `options/base_options.py` for all the test flags.\r\n- Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag `--no_instance`.\r\n\r\n\r\n## Citation\r\n\r\nIf you find this useful for your research, please use the following.\r\n\r\n```\r\n@inproceedings{wang2018pix2pixHD,\r\n  title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},\r\n  author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},  \r\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\r\n  year={2018}\r\n}\r\n```\r\n\r\n## Acknowledgments\r\nThis code borrows heavily from [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fpix2pixhd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia%2Fpix2pixhd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fpix2pixhd/lists"}