https://github.com/geffy/kaggle-malware
Kaggle 'Microsoft Malware Classification Challenge' 3rd place solution
https://github.com/geffy/kaggle-malware
Last synced: about 2 months ago
JSON representation
Kaggle 'Microsoft Malware Classification Challenge' 3rd place solution
- Host: GitHub
- URL: https://github.com/geffy/kaggle-malware
- Owner: geffy
- Created: 2015-05-05T20:53:19.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2015-05-05T20:59:13.000Z (about 10 years ago)
- Last Synced: 2025-03-23T18:52:06.349Z (2 months ago)
- Language: Python
- Homepage:
- Size: 319 KB
- Stars: 91
- Watchers: 7
- Forks: 56
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Kaggle ['Microsoft Malware Classification Challenge'](https://www.kaggle.com/c/malware-classification) 3rd place solution
=======
### Mikhail Trofimov, Dmitry Ulyanov, Stanislav Semenov.Gets score 0.0040 on private leaderboard
How to reproduce submission
=======
Don't forget to check paths in ./src/set_up.py!
```
./create_dirs.sh
cd ./src
./main.sh
cd ../
```
and run all the code in
`learning-main-model.ipynb`,
`learning-4gr-only.ipynb`,
`semi-supervised-trick.ipynb` and
`final-submission-builder.ipynb`.Dependencies
=======
* python 2.7.9
* ipython 3.1.0
* sklearn 0.16.1
* numpy 1.9.2
* pandas 0.16.0
* hickle 1.1.1
* pypy 2.5.1 (with installed joblib 0.8.4)
* scipy 0.15.1
* xgboost-0.3Hardware
=======
We run this code on machine with 16 cores and 120 GB RAM.
The most memory-consuming part is processing 4-gramms. All the others will require no more than 32 GB RAM.