Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ravindramohith/satellitemapgan

This project implements the BiCycleGAN architecture for multimodal image-to-image translation from scratch using PyTorch
https://github.com/ravindramohith/satellitemapgan

bicyclegan computer-vision generative-adversarial-network pytorch vae-gan

Last synced: about 3 hours ago
JSON representation

This project implements the BiCycleGAN architecture for multimodal image-to-image translation from scratch using PyTorch

Host: GitHub
URL: https://github.com/ravindramohith/satellitemapgan
Owner: ravindramohith
Created: 2024-08-16T07:54:51.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2024-08-16T08:51:36.000Z (about 1 month ago)
Last Synced: 2024-09-26T20:04:09.607Z (about 3 hours ago)
Topics: bicyclegan, computer-vision, generative-adversarial-network, pytorch, vae-gan
Language: Jupyter Notebook
Homepage:
Size: 4.46 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        
# BiCycleGAN - Satellite to Map Image Translation

## Overview

This project implements the **BiCycleGAN** architecture for multimodal image-to-image translation from scratch using PyTorch. The BiCycleGAN model is particularly suited for tasks where multiple plausible outputs exist for a given input image. This project specifically focuses on translating satellite images to their corresponding Google Maps images and vice versa.

### What is BiCycleGAN?

BiCycleGAN is a hybrid model that combines the strengths of **cVAE-GAN** (Conditional Variational Autoencoder GAN) and **cLR-GAN** (Conditional Latent Regressor GAN). The cVAE-GAN helps in generating diverse outputs by sampling from a Gaussian latent space, while the cLR-GAN ensures the consistency and accuracy of these outputs by regressing the latent code back from the generated image.

## Dataset

### Satellite to Map Translation Dataset

The dataset consists of satellite images of New York and their corresponding Google Maps images. It is structured as follows:

- **Training Set**: 1,097 images

- **Validation Set**: 1,099 images

Each image in the dataset is 1,200 pixels wide and 600 pixels tall, with the satellite view on the left and the map view on the right.

## Implementation Details

### 1. Data Preprocessing and Augmentation

The data preprocessing involves:

- **Image Splitting**: Each image is split into two parts: the left half as the satellite image and the right half as the map image.

- **Data Augmentation**: Random horizontal flipping is applied to both satellite and map images with a 50% probability. This augmentation helps in increasing the diversity of the training data, making the model more robust.

### 2. Model Architecture

The core of the BiCycleGAN implementation involves several key components:

#### a. U-Net Generator

- **U-Net Architecture**: The generator is based on a U-Net architecture, which allows it to capture both high-level and low-level features, making it effective for tasks where the input and output images have a similar structure.

- **Skip Connections**: These connections between the encoder and decoder layers help in retaining spatial information, leading to sharper and more accurate image generation.

#### b. PatchGAN Discriminators

- **Discriminator Architecture**: The model employs PatchGAN discriminators, which classify whether 70x70 image patches are real or fake. This ensures that both global structures and finer details in the image are well captured.

- **Multi-Scale Discriminators**: Multiple discriminators operating at different scales are used to ensure both local and global consistency of the generated images.

### 3. Latent Space Engineering

- **Gaussian Latent Space**: A Gaussian latent space is engineered for the model, enabling it to produce a diverse set of outputs for a single input image. This is done by sampling latent vectors from a Gaussian distribution during training.

- **Stochastic Sampling**: During the generation process, stochastic sampling of latent codes allows the model to explore different modes of the output distribution, leading to varied yet realistic image translations.

### 4. Loss Functions

The training process of the BiCycleGAN is guided by several loss functions:

- **Adversarial Loss (GAN Loss)**: This loss drives the generator to produce images that are indistinguishable from real images, as judged by the discriminator.

- **L1 Loss**: This loss is used to minimize the pixel-wise difference between the generated image and the ground truth image, ensuring that the generated image closely matches the target.

- **KL Divergence Loss (KL Loss)**: The KL divergence loss is applied to enforce that the latent code distribution approximates a standard Gaussian distribution, which is crucial for meaningful sampling from the latent space.

### 5. Training Procedure

- **Data Handling**: Efficient data loading and augmentation techniques are employed to handle the large dataset without bottlenecks.

- **Training**: The model is trained with a combination of the aforementioned loss functions, ensuring that it learns both the global structure and fine details of the images. The training is monitored with real-time loss plots and image outputs.

## Code Walkthrough

### 1. Imports and Setup

The notebook starts with importing necessary libraries such as PyTorch, torchvision for data handling, PIL for image processing, and matplotlib for visualization.

```python

import os

import numpy as np

import torch

import torchvision.transforms as transforms

from torch.utils.data import DataLoader

from PIL import Image

import matplotlib.pyplot as plt

```

### 2. DataLoader Setup

A custom dataset class is implemented to handle the loading and preprocessing of the images. The class splits each image into its satellite and map components, applies the necessary transformations, and supports on-the-fly data augmentation.

```python

class ImageDataset(Dataset):

    def __init__(self, root, transforms_=None, mode="train"):

        self.transform = transforms_

        self.files = sorted(glob.glob(os.path.join(root, mode) + "/*.*"))

    def __getitem__(self, index):

        img = Image.open(self.files[index])

        img_A = img.crop((0, 0, img.width // 2, img.height))

        img_B = img.crop((img.width // 2, 0, img.width, img.height))

        

        if np.random.random() < 0.5:

            img_A = Image.fromarray(np.array(img_A)[:, ::-1, :], "RGB")

            img_B = Image.fromarray(np.array(img_B)[:, ::-1, :], "RGB")

            

        img_A = self.transform(img_A)

        img_B = self.transform(img_B)

        

        return {"A": img_A, "B": img_B}

    

    def __len__(self):

        return len(self.files)

```

### 3. Generator and Discriminator Definitions

The generator is implemented using a U-Net architecture with skip connections, while the PatchGAN discriminators are designed to operate on 70x70 patches of the image. This allows the model to focus on both fine details and overall image structure.

```python

class UNetGenerator(nn.Module):

    def __init__(self, in_channels, out_channels):

        super(UNetGenerator, self).__init__()

        # Define encoder and decoder layers

        

    def forward(self, x):

        # Implement the forward pass with skip connections

        return output

class Discriminator(nn.Module):

    def __init__(self, in_channels):

        super(Discriminator, self).__init__()

        # Define layers

        

    def forward(self, x):

        # Implement the forward pass

        return output

```

### 4. Loss Function Implementations

The loss functions are carefully crafted to guide the training of the BiCycleGAN, ensuring that the model generates high-quality images that are both diverse and accurate.

```python

adversarial_loss = nn.MSELoss()

l1_loss = nn.L1Loss()

kl_loss = nn.KLDivLoss()

```

### 5. Training Loop

The training loop integrates all components, including data loading, model forward passes, loss computation, and backpropagation. Real-time monitoring of the losses and visual outputs is performed to ensure the model is learning effectively.

```python

for epoch in range(num_epochs):

    for i, batch in enumerate(dataloader):

        # Load batch data

        # Forward pass through generator and discriminator

        # Compute losses

        # Backpropagation and optimizer step

        

        # Print losses and save images for monitoring

```

### 6. Results and Evaluation

After training, the model is evaluated on the validation dataset. The results include both qualitative visualizations of generated images and quantitative analysis of the losses.

## Getting Started

### Prerequisites

- Python 3.x

- PyTorch

- torchvision

- PIL (Pillow)

- matplotlib

### Installation

1. Clone the repository:

   ```bash

   git clone https://github.com/ravindramohith/SatelliteMapGAN.git

   ```

2. Install the required packages:

   ```bash

   pip install -r requirements.txt

   ```

3. Download and prepare the dataset as described above.

4. Run the Jupyter notebook to start training:

   ```bash

   jupyter notebook bicyclegan.ipynb

   ```

## Results

The model successfully learns to translate satellite images into maps and vice versa. The results demonstrate the model's ability to generate multiple plausible outputs for a single input image, showcasing the power of multimodal image-to-image translation.

### Sample Results

- Generated maps from satellite images with high fidelity.

- Diverse outputs generated from stochastic sampling of the latent space.

## Conclusion

This project demonstrates the effective application of BiCycleGAN for satellite-to-map image translation, achieving high-quality results with diverse outputs. The combination of cVAE-GAN and cLR-GAN, along with U-Net generators and PatchGAN discriminators, proves to be powerful for multimodal image generation tasks.

## Acknowledgments

This implementation is inspired by the BiCycleGAN paper titled **"Toward Multimodal Image-to-Image Translation"**. The dataset used is provided by the pix2pix repository.