Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/btrotta/kaggle-clouds

Top 10% solution for Understanding Clouds from Satellite Imagery Kaggle Competition.
https://github.com/btrotta/kaggle-clouds

Last synced: 18 days ago
JSON representation

Top 10% solution for Understanding Clouds from Satellite Imagery Kaggle Competition.

Host: GitHub
URL: https://github.com/btrotta/kaggle-clouds
Owner: btrotta
Created: 2019-11-19T07:32:56.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2019-12-07T00:02:52.000Z (about 5 years ago)
Last Synced: 2024-11-24T03:26:50.766Z (about 1 month ago)
Language: Python
Size: 797 KB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

        # Understanding clouds from satellite images

This is the code for my solution to the Kaggle competition hosted by Max Planck Meteorological Institute, where

the task is to segment images to identify 4 types of cloud formations. It scores in the top 10%.

For the neural network I used a very standard approach, a pre-trained U-net. My main innovations were in pre-processing

the training images to remove noise, and post-processing the neural network outputs to get the final prediction.

## Code structure

`get_image_arrays.py` processes the images and ground truth data. It calls functions in `prepare_data.py`.

`train_nn4.py` trains and saves the neural network. There are options for training using all the training data,

or holding out some for calibration and validation.

`calibrate_and_predict.py` calibrates the model on a hold-out set of the training data to find the

best threshold and predict the probability of a given image containing any pixels of a particular class.

## Pre-processing the images

I worked with grayscale images shrunken to 25% of original size.

I got a large boost in model accuracy from filtering out the over-exposed areas in the images. Below is a sample image

before and after correction.

![Before and after correction](https://raw.githubusercontent.com/btrotta/kaggle-clouds/master/img/before_after.png)

I achieved this by identifying the local ``background colour`` of each part of the image. The image background is defined to

be the part of the image with small variation in pixel intensity. The parts of the image that are over-exposed have lighter

background colour. We divide the image into a grid of 50 x 50 rectangles (each rectangle having size 28 x 42 pixels).

Within each rectangle, we consider 8 x 8 pixel squares

and define a square to be in the background if its standard deviation of pixel intensity is below some threshold. We calculate

the average colour of the background in each 28 x 42 rectangle. Then we smooth the results so we get a smoothed picture

of the background colour over the whole image. Finally we rescale the image so that the range `[background, 255]`  maps onto `[0, 255]`.

I also changed the missing areas to a medium-grey colour. I thought this would help the neural network converge since

it's more similar to the average colour of the overall images, which I thought would make the optimisation problem smoother.

I'm not sure whether it made any difference.

Below is a visualisation of the steps in the correction process.

![Steps in the image correction](https://raw.githubusercontent.com/btrotta/kaggle-clouds/master/img/3_step.png)

## Model

I used a blend of efficientnetb4 effecientnetb5, both pre-trained from this library: .

I trained for 10 epochs with the encoder layers frozen, then fine-tuned the whole model for 10 epochs with a lower

learning rate. I did horizontal and vertical flip augmentation; I tried others but found they didn't help.

## Post-processing the model predictions

The key to post-processing is to observe that the dice metric is not continuous: if a class doesn't exist in an

image, there is a huge difference in predicting 1 pixel (dice score 0) and predicting 0 pixels (dice score 1). So, to decide whether

to make a non-zero prediction, we need to estimate 2 things: the probability that the class exists in the image,

and the expected dice score given that the class does exist. Then we can calculate the expected dice score for a

zero and a non-zero prediction, and choose between them accordingly.

I built very simple models for these, all just using a single variable:

the 95th percentile of the predicted class probabilities for each image. These models were built using a set of 500

images excluded from the training of the neural network model.

### Predicting the expected dice score and choosing a threshold, given that the class exists

The optimal threshold depends on the confidence of the prediction: for more confident predictions a higher threshold

is better. I tested thresholds between 0.05 and 0.95 at intervals of 0.05. For each threshold, I built a model to predict

the expected dice score. This model has just one x variable, the 95th percentile prediction for the class in each image. The model

 is essentially just a smoothed average of the actual dice score for each x value (calculated only on the

images where the class exists, because of the discontinuity of the dice metric as noted above).

When making predictions,

we calculate the expected dice score for each image and threshold, then choose the best threshold. However, if the expected dice

score with the optimal threshold is lower than the probability that the class exists in the image (as calculated

by the model below), then we just predict all zeros.

### Predicting whether a class exists in an image

This is similar to the above model. We use the same x variable, the 95th percentile of the pixel predictions,

 but this time it's a classification model, so instead

of using the rolling average of the target we use a local logistic regression approach. I also used this successfully in the

Santander competition; there is more detail at my github repo for that competition

().

I didn't attempt to reshape the predicted areas into rectangles or polygons, as in some published kernels. I also didn't enforce a

minimum predicted area. My hypothesis is that  this information is already built in to the neural network predictions, and that

this is why augmentations that change the size or shape of the masks (e.g. skew, rotation, zoom) give poor results.