Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/btrotta/kaggle-plasticc
14th place solution for the Kaggle Plasticc challenge to classify objects in space.
https://github.com/btrotta/kaggle-plasticc
Last synced: 18 days ago
JSON representation
14th place solution for the Kaggle Plasticc challenge to classify objects in space.
- Host: GitHub
- URL: https://github.com/btrotta/kaggle-plasticc
- Owner: btrotta
- License: mit
- Created: 2018-12-18T06:47:27.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-08T07:39:39.000Z (almost 6 years ago)
- Last Synced: 2024-11-24T20:10:56.702Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 890 KB
- Stars: 24
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
# kaggle-plasticc
Code for the 14th place solution in the Kaggle PLAsTiCC competition.
See `Modelling_approach.pdf` for a detailed discussion of the modelling approach.
#### Quick-start guide to running the code
Total runtime is around 5.5 hours on a 24 Gb laptop.
- Download the code. Create a subfolder called `data` and save the csv files there.
- To reproduce the results exactly, create an environment with the specific
package versions I used. (If you already have numpy, pandas, scikit-learn
and lightgbm you can skip this
step, but the results may differ slightly if you have different versions.) If you have conda, the
easiest option is to
build a conda environment using this command:
```
conda env create environment.yml
```
This will create an environment called `plasticc-bt`.
The `requirements.txt` file is provided as well if you want to build an environment with pip.- Run `split_test.py` to split the test data into 100 hdf5 files. They will
be saved in an automatically created subfolder `split_100` of the `data` folder. Takes around 15 minutes.- Run `calculate_features.py` to calculate the features. This will generate 3 files in a folder called
`features` (the folder is created automatically). Takes around 3.5 hours.- Run `predict.py` to train the model and make predictions on the test set. Takes around 1.5 hours.
- Run `scale.py` to apply regularisation to the class 99 predictions and generate the final submission file.
Takes a couple of minutes.