https://github.com/piercingdan/kaggle-dsb
My Work on Kaggle Data Science Bowl as a member of the University of Toronto Data Science Team
https://github.com/piercingdan/kaggle-dsb
Last synced: about 1 year ago
JSON representation
My Work on Kaggle Data Science Bowl as a member of the University of Toronto Data Science Team
- Host: GitHub
- URL: https://github.com/piercingdan/kaggle-dsb
- Owner: PiercingDan
- Created: 2017-04-01T22:52:39.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-04-21T20:42:04.000Z (about 9 years ago)
- Last Synced: 2025-02-14T18:23:09.695Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 235 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kaggle Data Science Bowl
My Work on Kaggle Data Science Bowl as a member of the University of Toronto Data Science Team
I learned how powerful multiprocessing was, especially on AWS instances that had multi-core CPUs (up to 36 for c4.8xlarge!)
## Preprocessing Tests
For smaller dataset `sample_images`, here are the times it took to preprocess all 20 sample patients.
* 2 processes on 4 CPU c4.xlarge: 6m 47s
* 4 processes on 4 CPU c4.xlarge: 4m 29s
* 6 processes on 4 CPU c4.xlarge: 3m 40s
* 8 processes on 8 CPU c4.2xlarge: 2m 09s
For the first 50 images (by path name) in `stage_1` full dataset:
* 10 processes on 8CPU c4.2xlarge: 4m 45s
* 14 processes on 8CPU c4.2xlarge: 4m 14s