Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/btrotta/kaggle-google-quest

Top 6% solution for Google Quest Q&A Labeling Competition on Kaggle
https://github.com/btrotta/kaggle-google-quest

Last synced: 18 days ago
JSON representation

Top 6% solution for Google Quest Q&A Labeling Competition on Kaggle

Host: GitHub
URL: https://github.com/btrotta/kaggle-google-quest
Owner: btrotta
Created: 2020-02-11T08:27:26.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-02-11T09:59:16.000Z (almost 5 years ago)
Last Synced: 2024-11-24T03:26:50.257Z (about 1 month ago)
Language: Python
Size: 2.93 KB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

        # Google Quest Q&A labeling

This is the code for my top 6% solution to the Google Quest Q&A Labeling challenge on Kaggle. This NLP competition requires us to

predict the scores given by human raters to questions and answers on various Stack Exchange Q&A websites. The questions

and answers are scored on 30 dimensions, including whether they are useful, well-written, etc.

My code relies heavily on the following public notebook: https://www.kaggle.com/akensert/bert-base-tf2-0-now-huggingface-transformer

The BERT modelling code is almost identical to that kernel; the only change I made was to insert a special token

between the question title and body. (I'm not sure whether this had any real effect.)

To optimise the Spearman rho metric, I found it was helpful to limit the predicted scores to a smaller number of distinct

values. The way I did this was by post-processing using a LightGBM model with small leaf size (3) and few iterations (20).

The model just takes a single predictor (the output prediction of the BERT model) and optimises the cross-entropy loss. This

gives a boost of around 0.02 on the private leaderboard.