https://github.com/ram81/humpback-whale-identification
https://github.com/ram81/humpback-whale-identification
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ram81/humpback-whale-identification
- Owner: Ram81
- Created: 2019-03-29T05:25:47.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-17T09:15:35.000Z (almost 6 years ago)
- Last Synced: 2025-01-07T19:21:33.151Z (5 months ago)
- Language: Jupyter Notebook
- Size: 26.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Humpback Whale Identification
### Training Data
Training data contains thousands of images of humpback whale flukes. Individual whales have been identified by researchers and given an Id. The challenge is to predict the whale Id of images in the test set. What makes this such a challenge is that there are only a few examples for each of 3,000+ whale Ids.
### Approach
My Solution to the problem is to use Siamese Neural Networks to compute dissimilarity between one whale image with all 3000+ whale classes. After computing the dissimilarity of a whale image with all target categories we choose top 5 least dissimilar classes as top 5 predictions i.e. the whale classes which have least dissimilarity to current example in consideration is highly likely to fall into one of 5 categories.
As we have 25000+ images in training data with 5005 target classes the approach to compute dissimilarity with every image in training data is quite expensive computationally. To reduce this sample size we use [Linear Assignment Problem](https://en.wikipedia.org/wiki/Assignment_problem) to figure out which whale image to use for representing a particular whale category. So finally to generate prediction for a whale image we compute dissimilarity with 5005 images (i.e. one whale image representing one whale category) and use these dissimilarity values to generate top 5 predictions.
### Evaluation Metric
Mean Average Precision @ 5 (MAP@5)
### Results
Our solution has an accuracy of 90.003%, you can find the kernel [here](https://www.kaggle.com/axel81/siamese-baseline-lb-0-822)