Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/btrotta/kaggle-homesite
Top 2% solution for Kaggle's "Homesite Quote Conversion" competition.
https://github.com/btrotta/kaggle-homesite
Last synced: 18 days ago
JSON representation
Top 2% solution for Kaggle's "Homesite Quote Conversion" competition.
- Host: GitHub
- URL: https://github.com/btrotta/kaggle-homesite
- Owner: btrotta
- Created: 2017-02-12T00:20:16.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-02-12T01:54:00.000Z (almost 8 years ago)
- Last Synced: 2024-11-24T03:26:50.750Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 2.93 KB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# kaggle-homesite
This is a relatively simple high-scoring solution for the "Homesite Quote Conversion" contest on Kaggle
(https://www.kaggle.com/c/homesite-quote-conversion). The task is to predict which insurance quotes will actual result in the sale of a
policy. We are given a training data set of around 260,000 rows and around 300 features, a mixture of numeric and categorical. This code scored 30th out of 1764 entries, in the top 2%.The solution makes use of the powerful xgboost package for gradient-boosted regression trees, and uses only basic feature engineering.
String-valued columns are one-hot encoded and an integer feature is added representing the day of the week.
The SalesField8 feature is dropped since it has a large number of unique integer values and therefore seems to represent some kind of
identifier rather than a genuine feature.We use relatively large values for the gamma and lambda regularisation parameters, which reduces the impact of irrelevant features without
the need to manually test and exclude these. We also use a small eta and large number of rounds.The code is written in Python 2.7. It requires pandas, numpy, sklearn, and xgboost packages.