https://github.com/ue2020/colorize
A deep learning image & video colorizer
https://github.com/ue2020/colorize
colorization image-colorization libtorch pix2pix rust
Last synced: 6 months ago
JSON representation
A deep learning image & video colorizer
- Host: GitHub
- URL: https://github.com/ue2020/colorize
- Owner: UE2020
- License: mit
- Created: 2023-10-09T03:07:38.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-25T18:35:05.000Z (almost 2 years ago)
- Last Synced: 2024-02-13T21:43:50.207Z (over 1 year ago)
- Topics: colorization, image-colorization, libtorch, pix2pix, rust
- Language: Rust
- Homepage:
- Size: 42.2 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Colorize!
A deep learning image & video colorizer using Rust and libtorch. The model is a slightly modified pix2pix (Isola et al.). A video demos are available here:[The Three Stooges Episode 117 (Malice In The Palace) colorized](https://www.youtube.com/watch?v=F3TNbHVFwqw)
[MLK Interview colorized](https://www.youtube.com/watch?v=OosQa905nYQ)
## Training
To initialize the model, you'll need to run `src/transform.py` and `src/model.py` to initialize the LAB<->RGB and pre-trained generator torchscripts, respectively.
This requires PyTorch and the fastai library.Training is as simple as running with the following arguments, where use_gan is a boolean argument:
```
./target/release/autoencoder train starting_model.pt /data/path duration_in_hours use_gan
```See below for pre-trained model.
### Obtaining a dataset
The ImageNet Object Localization Challenge dataset (a subset of the full ImageNet dataset) is available on Kaggle,
and was used to train the baseline model. A diverse sampling of images is recommended to avoid overfitting.Any dataset that consists of images in a folder is usable, as long as there are no corrupted images or non-image files. Subdirectories will be crawled automatically.
### 3-Step Training Procedure
Models are trained in three steps to reduce the undesirable visual artifacts caused by GAN training:
1. Train for a long time without the discriminator network (use_gan = false).
2. Continue training the network produced by the previous step for a shorter time with the discriminator network enabled (use_gan = true).
3. **Merge** the two resulting networks using the pre-defined weighted average formula: `./target/release/autoencoder merge gan.pt no_gan.pt` (order matters). The merged model will be saved to `./merged.pt`, beware of overwriting any model that may have already been there.## Running
Running the model is as simple as:
```
./target/release/autoencoder test model.pt image.jpg image_size
```
Images will be written to `./fixed.jpg`.
Only powers of 2 may be used for the image_size parameter, although 256 is recommended, 512 and 1024 are useful for colorizing fine details.A pre-trained model is available here:
https://drive.google.com/file/d/1S6wAA-YkJsOVdh5-oHC6DkyPvfWiACA7/view?usp=sharing## Demo
Colorizing legacy photos:
## Credits
Although it's currently unused, the multi-scale discriminator implementation in `src/model.py` is courtesy of https://github.com/NVIDIA/pix2pixHD.## Citation
The model is based on the following papers:```
@article{pix2pix2017,
title={Image-to-Image Translation with Conditional Adversarial Networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
journal={CVPR},
year={2017}
}
``````
@inproceedings{wang2018pix2pixHD,
title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},
author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
```