An open API service indexing awesome lists of open source software.

https://github.com/ram81/imet-collection-fgvc-19

Implementation of our solution for Kaggle's IMet Collection FGVC'19 Challenge
https://github.com/ram81/imet-collection-fgvc-19

Last synced: 4 months ago
JSON representation

Implementation of our solution for Kaggle's IMet Collection FGVC'19 Challenge

Awesome Lists containing this project

README

        

# iMet Collection 2019 - FGVC

The Metropolitan Museum of Art in New York, also known as The Met, has a diverse collection of over 1.5M objects of which over 200K have been digitized with imagery. The online cataloguing information is generated by Subject Matter Experts (SME) and includes a wide range of data. These include, but are not limited to: multiple object classifications, artist, title, period, date, medium, culture, size, provenance, geographic location, and other related museum objects within The Met’s collection. While the SME-generated annotations describe the object from an art history perspective, they can also be indirect in describing finer-grained attributes from the museum-goer’s understanding. Adding fine-grained attributes to aid in the visual understanding of the museum objects will enable the ability to search for visually related objects.

## About
This is an FGVCx competition hosted as part of the [FGVC6 workshop](https://sites.google.com/view/fgvc6/home) at [CVPR 2019](http://cvpr2019.thecvf.com/). View the [github page](https://github.com/visipedia/imet-fgvcx) for more details.

## Dataset

In this dataset, you are presented with a large number of artwork images and associated attributes of the art. Multiple modalities can be expected and the camera sources are unknown. The photographs are often centered for objects, and in the case where the museum artifact is an entire room, the images are scenic in nature.

Each object is annotated by a single annotator without a verification step. Annotators were advised to add multiple labels from an ontology provided by The Met, and additionally are allowed to add free-form text when they see fit. They were able to view the museum's online collection pages and advised to avoid annotating labels already present. The attributes can relate to what one "sees" in the work or what one infers as the object's "utility."

While we have made efforts to make the attribute labels as high quality as possible, you should consider these annotations noisy. There may be a small number of attributes with similar meanings. The competition metric, F2 score, was intentionally chosen to provide some robustness against noisy labels, favoring recall over precision.

This is a kernels-only competition with two stages. After the deadline, Kaggle will rerun your selected kernels on an unseen test set. The second-stage test set is approximately five times the size of the first. You should plan your kernel's memory, disk, and runtime footprint accordingly.

### Files
The filename of each image is its id.

- train.csv gives the attribute_ids for the train images in /train
- /test contains the test images. You must predict the attribute_ids for these images.
- sample_submission.csv contains a submission in the correct format
- labels.csv provides descriptions of the attributes

## Solution

Solution to the problem is to use **Squeeze-and-Excitation Networks** to compute probability of an image having label i from all avaialable
classes. After doing some data analysis on the dataset we discovered few insights based on which we chose **SeResNext** model as the base model to experiment with.
Some insights that we figured are as follows:
- Super long images exist in the dataset
- Highly imbalanced dataset, over 1000 labels, with 90% images having less than 5 labels
- Two categories of class labels i.e culture and tag
- Similar target classes (e.g. men, women, portraits, human figures) in both categories of class labels i.e. culture:men, tag:men
- RGB pixel statistics - normal distribution with different mean

Based on above insights we started exploring multiple augmentation techniques and loss functions to tackle class imbalance.
Finally we settled with [Focal Loss](https://arxiv.org/abs/1708.02002) as our loss function and augmentations like flip horizontally, rotate, zoom and symmetric warp.
For the final solution to problem we used a KFold approach to train 5 models and take a weighted average for final set of predictions.
All of our models were trained using cosine annealing and one cycle LR technique to getter better local minima.
In order to make final predictions on test set we used TTA (Test Time Augmentation).

As the problem statement was a multi-class multi-label classification problem we also required a thresholding approach to select
final set of labels for a image. Thresholds were searched from a pruned search space, which was figured out by some heuristics gathered from data analysis.
Shortlisted threshold from the pruned search space was the one with max [F2 score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html) score between target and predictions by model on validation set.

## Evaluation metric

[F2 score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html)

## Results

All individual models had a F2 score in range 0.603-0.615.
Our final solution had a F2 score of 0.621 and [74th rank private leaderboard](https://www.kaggle.com/c/imet-2019-fgvc6/leaderboard), 0.625 on [public leaderboard](https://www.kaggle.com/c/imet-2019-fgvc6/leaderboard).