https://github.com/aseem09/super-resolution-cnn
Deep CNN based implementation of Super-Resolution
https://github.com/aseem09/super-resolution-cnn
cnn deep-learning keras super-resolution tensorflow
Last synced: about 2 months ago
JSON representation
Deep CNN based implementation of Super-Resolution
- Host: GitHub
- URL: https://github.com/aseem09/super-resolution-cnn
- Owner: aseem09
- Created: 2020-10-10T10:01:16.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-13T11:24:23.000Z (over 5 years ago)
- Last Synced: 2025-06-21T00:11:33.792Z (about 1 year ago)
- Topics: cnn, deep-learning, keras, super-resolution, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 2.23 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Super-Resolution-CNN
It is possible to represent the entire process of Super-Resolution as a Deep Convolution Neural Network. The start-of-the-art model for Super-resolution is based on GANs. This repository contains the CNN-based implementaion which is an end to end mapping between low and high-resolution images. It takes as input a 64x64 image and outputs a 128x128 image.
## Dataset
Here `Linnaeus 5` dataset, which contains 6000 train images and 2000 test images, has been used. The resolution of all images is 256x256. For this model I have resized images to 64x64(which serve as the input data) and 128x128(which serve as ground truth for the respective images).
## Model Architecture
The CNN architecture is similar to one described in ['Reconstructing Obfuscated Human Faces'](http://cs231n.stanford.edu/reports/2017/pdfs/223.pdf).
Click [here](https://user-images.githubusercontent.com/43964071/95691772-509ef600-0c3f-11eb-86f8-f1639ead7288.png) to view the model architecture.
## Loss Function
In the un-optimized version `MeanSquaredError` is used as loss function. This resembles with the Pixel Loss which is given as-

In the optimized version a linear combination of Pixel Loss and Perceptual Loss is used. Perceptual loss gives an estimate of difference between feature map of image between this model and, say, a pre-trained VGGNet. The Perceptual loss is given as-
Here Φ denotes the activation of the 6th layer of a pre-trained VGGNet16 model.
To view the architecture of custom VGG model click [here](https://user-images.githubusercontent.com/43964071/95691772-509ef600-0c3f-11eb-86f8-f1639ead7288.png)
The final loss function looks something like-
Where,

## Results
### 1. Un-Optimized Model
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
These images quite clearly show that model performs pretty well with it comes to smoothening out curves and edges.
However, it can be seen that the images are blurry and miss intricate details. The can be resolved by adding the `Perceptual Loss` to the `Pixel Loss` function. This forces the model to focus more on detailed structures of the objects in the image.
### 2. Optimized Model
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Input(64x64) | Ground Truth | Predicted
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
After taking into consideration the Perceptual Loss the model performs way better. Though there is one drawback. The images have a checkerboard like pattern in which is solely due to the perceptual loss. This model also gives a value of around 35-36db for a few images when PSNR(Peak Signal to Noise Ratio) is calculated.