Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chenglongchen/kaggle-crowdflower
1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
https://github.com/chenglongchen/kaggle-crowdflower
crowdflower kaggle kaggle-competetion kaggle-crowdflower natural-language-processing nlp product-search relevance-competition search-engine search-relevance semantic-matching semantic-similarity
Last synced: 29 days ago
JSON representation
1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
- Host: GitHub
- URL: https://github.com/chenglongchen/kaggle-crowdflower
- Owner: ChenglongChen
- Created: 2015-07-12T06:41:27.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2021-09-25T02:32:49.000Z (about 3 years ago)
- Last Synced: 2024-08-05T02:01:17.758Z (3 months ago)
- Topics: crowdflower, kaggle, kaggle-competetion, kaggle-crowdflower, natural-language-processing, nlp, product-search, relevance-competition, search-engine, search-relevance, semantic-matching, semantic-similarity
- Language: C++
- Homepage: https://www.kaggle.com/c/crowdflower-search-relevance
- Size: 6.44 MB
- Stars: 1,755
- Watchers: 102
- Forks: 660
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kaggle_CrowdFlower
1st Place Solution for [Search Results Relevance Competition on Kaggle](https://www.kaggle.com/c/crowdflower-search-relevance)
The best single model we have obtained during the competition was an [XGBoost](https://github.com/dmlc/xgboost) model with linear booster of Public LB score **0.69322** and Private LB score **0.70768**. Our final winning submission was a median ensemble of 35 best Public LB submissions. This submission scored **0.70807** on Public LB and **0.72189** on Private LB.
## What's New
* 2016/05/14: For a more clean and modularized version of this code and framework, you may want to check [Kaggle_HomeDepot](https://github.com/ChenglongChen/Kaggle_HomeDepot), which holds the code of Turing Test's solution for the recently [Home Depot Product Search Relevance Competition on Kaggle](https://www.kaggle.com/c/home-depot-product-search-relevance).## FlowChart
## Documentation
See `./Doc/Kaggle_CrowdFlower_ChenglongChen.pdf` for documentation.
## Instruction
* download data from the [competition website](https://www.kaggle.com/c/crowdflower-search-relevance/data) and put all the data into folder `./Data`.
* run `python ./Code/Feat/run_all.py` to generate features. This will take a few hours.
* run `python ./Code/Model/generate_best_single_model.py` to generate best single model submission. In our experience, it only takes a few trials to generate model of best performance or similar performance. See the training log in `./Output/Log/[Pre@solution]_[Feat@svd100_and_bow_Jun27]_[Model@reg_xgb_linear]_hyperopt.log` for example.
* run `python ./Code/Model/generate_model_library.py` to generate model library. This is quite time consuming. **But you don't have to wait for this script to finish: you can run the next step once you have some models trained.**
* run `python ./Code/Model/generate_ensemble_submission.py` to generate submission via ensemble selection.
* if you don't want to run the code, just submit the file in `./Output/Subm`.