https://github.com/chenglongchen/kaggle-crowdflower

1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
https://github.com/chenglongchen/kaggle-crowdflower

crowdflower kaggle kaggle-competetion kaggle-crowdflower natural-language-processing nlp product-search relevance-competition search-engine search-relevance semantic-matching semantic-similarity

Last synced: 5 months ago
JSON representation

1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.

Host: GitHub
URL: https://github.com/chenglongchen/kaggle-crowdflower
Owner: ChenglongChen
Created: 2015-07-12T06:41:27.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2021-09-25T02:32:49.000Z (about 4 years ago)
Last Synced: 2025-05-15T19:04:23.044Z (5 months ago)
Topics: crowdflower, kaggle, kaggle-competetion, kaggle-crowdflower, natural-language-processing, nlp, product-search, relevance-competition, search-engine, search-relevance, semantic-matching, semantic-similarity
Language: C++
Homepage: https://www.kaggle.com/c/crowdflower-search-relevance
Size: 6.44 MB
Stars: 1,767
Watchers: 101
Forks: 658
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Kaggle_CrowdFlower

1st Place Solution for [Search Results Relevance Competition on Kaggle](https://www.kaggle.com/c/crowdflower-search-relevance)

The best single model we have obtained during the competition was an [XGBoost](https://github.com/dmlc/xgboost) model with linear booster of Public LB score **0.69322** and Private LB score **0.70768**. Our final winning submission was a median ensemble of 35 best Public LB submissions. This submission scored **0.70807** on Public LB and **0.72189** on Private LB.

## What's New
* 2016/05/14: For a more clean and modularized version of this code and framework, you may want to check [Kaggle_HomeDepot](https://github.com/ChenglongChen/Kaggle_HomeDepot), which holds the code of Turing Test's solution for the recently [Home Depot Product Search Relevance Competition on Kaggle](https://www.kaggle.com/c/home-depot-product-search-relevance).

## FlowChart

FlowChart

## Documentation

See `./Doc/Kaggle_CrowdFlower_ChenglongChen.pdf` for documentation.

## Instruction

* download data from the [competition website](https://www.kaggle.com/c/crowdflower-search-relevance/data) and put all the data into folder `./Data`.
* run `python ./Code/Feat/run_all.py` to generate features. This will take a few hours.
* run `python ./Code/Model/generate_best_single_model.py` to generate best single model submission. In our experience, it only takes a few trials to generate model of best performance or similar performance. See the training log in `./Output/Log/[Pre@solution]_[Feat@svd100_and_bow_Jun27]_[Model@reg_xgb_linear]_hyperopt.log` for example.
* run `python ./Code/Model/generate_model_library.py` to generate model library. This is quite time consuming. **But you don't have to wait for this script to finish: you can run the next step once you have some models trained.**
* run `python ./Code/Model/generate_ensemble_submission.py` to generate submission via ensemble selection.
* if you don't want to run the code, just submit the file in `./Output/Subm`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chenglongchen/kaggle-crowdflower

Awesome Lists containing this project

README