{"id":13934818,"url":"https://github.com/snakers4/ds_bowl_2018","last_synced_at":"2025-04-16T05:48:34.216Z","repository":{"id":92294386,"uuid":"128542622","full_name":"snakers4/ds_bowl_2018","owner":"snakers4","description":"Kaggle Data Science Bowl 2018","archived":false,"fork":false,"pushed_at":"2018-04-24T10:39:47.000Z","size":9836,"stargazers_count":119,"open_issues_count":1,"forks_count":26,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-29T05:03:56.602Z","etag":null,"topics":["data-science-bowl-2018","deep-watershed-transform","docker","dockerfile","dwt","gpu","kaggle","python3","pytorch","unet","vgg16"],"latest_commit_sha":null,"homepage":"https://spark-in.me/post/playing-with-dwt-and-ds-bowl-2018","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snakers4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-04-07T15:37:27.000Z","updated_at":"2024-12-22T07:46:24.000Z","dependencies_parsed_at":"2023-06-06T03:15:46.027Z","dependency_job_id":null,"html_url":"https://github.com/snakers4/ds_bowl_2018","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakers4%2Fds_bowl_2018","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakers4%2Fds_bowl_2018/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakers4%2Fds_bowl_2018/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakers4%2Fds_bowl_2018/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snakers4","download_url":"https://codeload.github.com/snakers4/ds_bowl_2018/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249204657,"owners_count":21229778,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science-bowl-2018","deep-watershed-transform","docker","dockerfile","dwt","gpu","kaggle","python3","pytorch","unet","vgg16"],"created_at":"2024-08-07T23:01:15.446Z","updated_at":"2025-04-16T05:48:34.185Z","avatar_url":"https://github.com/snakers4.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"![Architecture](ds_bowl.png)\n\n**More stuff from us**\n- [Telegram](https://t.me/snakers4) \n- [Twitter](https://twitter.com/AlexanderVeysov)\n- [Blog](https://spark-in.me/tag/data-science)\n\n\n# 0 Introduction\n\nThis is a [DWT-inspired](https://arxiv.org/abs/1611.08303) solution to the Kaggle's 2018 [DS Bowl](https://www.kaggle.com/c/data-science-bowl-2018/) I produced within approximately 1 week before the end of the compeititon.\n\n**UPDATE 2018-04-22** - my score was 114th. I guess they are cleaning the LB in the end.\n\n**UPDATE 2018-04-24** - found out why my model generalized poorly - I forgot to re-create `optimizer` after unfreezing the encoder weights.\n\nMost prominently it features a dockerized PyTorch implementation of approach similar to Deep Watershed Transform.\n\nSince the target metric was highly unstable (average mAP on 0.5 - 0.95 thresholds) and the private LB contained data mostly not related to the train dataset, it's a bit difficult to evaluate code performance, but it's safe to say that:\n- The performance is for single model on one fold;\n- ~400th place in stage 1 LB and ~100th place in stage 2 LB (most likely the position will rise, since the LB is not finalized yet);\n- I did not invest time in ensembling / folding / annotation etc because I entered late and it was obvious that second stage would be a gamble given the quality of the dataset and organization;\n\nKey take-aways:\n- People have reported that VGG based models overfitted heavily on train/val and heavy Resnet models (e.g. Resnet152) were key. I did not test heavy Resnets, so it's very likely that performance can be impoved greatly;\n- Top solutions also featured learning 2 masks - gt mask + a boundary BETWEEN nuclei;\n- DWT / energy greatly helps - it gave ~0.07-0.09 mAP locally and ~0.05 - 0.07 on the LB;\n- All other promising post-processing techniques (e.g. detecting centers of the nuclei with [blob_log](http://scikit-image.org/docs/dev/auto_examples/features_detection/plot_blob.html) and using them as additional energy level) did not work;\n- The base model is very powerful, but probably prone to overfitting due to dataset being of very low quality;\n- Dataset curation / balancing / proper annotation - matters more than a particular architecture in this case;\n\n![One of best local models](best_local.jpg)\n\n\n# 1 Hardware requirements\n\n**Training**\n\n- 6+ core modern CPU (Xeon, i7) for fast image pre-processing (in this case distance transform takes some time for each nuclei);\n- The models were trained on 2 * GeForce 1080 Ti;\n- Training time on my setup ~ **6-8 hours** per one fold;\n- Disk space - 10GB should be more than enough, ~20GB for building a docker image;\n\n**Inference**\n\n- 6+ core modern CPU (Xeon, i7) for fast image pre-processing;\n- On 2 * GeForce 1080 Ti inference takes **1-2 minutes** for the public test dataset (65 images);\n\n# 2 Preparing and launching the Docker environment\n\n**Clone the repository**\n\n`git clone https://github.com/snakers4/ds_bowl_2018 .`\n\n\n**This repository contains a Dockerfile used when training models**\n- `/dockerfiles/Dockerfile` - this is the main Dockerfile\n\n\n**Build a Docker image**\n\n```\ncd dockerfiles\ndocker build -t bowl_image .\n```\n\n**Install the latest nvidia docker**\n\nFollow instructions from [here](https://github.com/NVIDIA/nvidia-docker).\nPlease prefer nvidia-docker2 for more stable performance.\n\n\nTo test all works fine run:\n\n\n`docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi`\n\n**(IMPORTANT) Run docker container (IMPORTANT)**\n\nUnless you use this exact command (with --shm-size flag) (you can change ports and mounted volumes, of course), then the PyTorch generators **WILL NOT WORK**. \n\n\n- nvidia-docker 2: `docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -it -v /path/to/cloned/repository:/home/keras/notebook -p 8888:8888 -p 6006:6006  --shm-size 16G bowl_image`\n- nvidia-docker: `nvidia-docker -it -v /path/to/cloned/repository:/home/keras/notebook -p 8888:8888 -p 6006:6006  --shm-size 8G aveysov`\n\n\n**To start the stopped container**\n\n\n`docker start -i YOUR_CONTAINER_ID`\n\n\n# 3 Preparing the data and the machine for running scripts\n\n- Ssh into the docker container via `docker exec -it YOUR_CONTAINER_ID`\n- Cd to the root folder of the repo\n- Dowload the data into `data/`\n- Note that `data\\` already contains pickled train dataframes with meta-data (for convenience only)\n- If kaggle removes the data download links from the competition page, you can download the data from [here](https://drive.google.com/open?id=1uRO3elNqVVxeWpU8hsCn0tRP_YAtGkql)\n\n    \nAfter all of your manipulations your directory should look like this (omitting csv files):\n\n```\n├── README.md          \u003c- The top-level README for developers using this project.\n├── data\n│   ├── stage1_test                 \u003c- A folder with stage1 test data\n│   ├── stage2_test                 \u003c- A folder with stage2 test data\n│   ├── test_df_stage1_meta         \u003c- A pickled dataframe with stage1 test meta data\n│   ├── train_df_stage1_meta        \u003c- A pickled dataframe with stage1 train meta data\n│   └── stage1_train                \u003c- A folder with stage1 train data\n│       ├─ f8e74d4006dd68c1dbe68df7be905835e00d8ba4916f3b18884509a15fdc0b55\n│       │  ├──  images\n│       │  └──  masks\n\n        ...\n        \n\n│       └─ ff599c7301daa1f783924ac8cbe3ce7b42878f15a39c2d19659189951f540f48\n│\n├── dockerfiles                               \u003c- A folder with Dockerfiles\n│\n└── src                                       \u003c- Source code\n```\n\n\n# 4 Training the model\n\nYou see the list of the available model presets in `src/models/model_params.py`\n\nThe best model according to my tests was Unet16 (Unet + Vgg16 pre-trained encoder)\n\nIf all is ok, then use the following command to train the model\n\n- Ssh into the docker container via `docker exec -it YOUR_CONTAINER_ID`\n- Cd to the root folder of thre repo\n- `cd src`\n- optional - turn on tensorboard for monitoring progress `tensorboard --logdir='ds_bowl_2018/src/tb_logs --port=6006` via jupyter notebook console or via tmux + docker exec (model converges in 100-150 epochs)\n- then for example train on 2 folds\n\n```\necho 'python3 train_energy.py \\\n\t--arch unet16_160_7_dc --epochs 150 --workers 10 \\\n\t--channels 7 --batch-size 12 --fold_num 0 \\\n\t--lr 1e-3 --optimizer adam \\\n\t--bce_weight 0.9 --dice_weight 0.1 --ths 0.5 \\\n\t--print-freq 1 --lognumber unet16_160_7_dc_ths5_energy_distance_gray_final \\\n\t--tensorboard True --tensorboard_images True --is_distance_transform True --is_boundaries True \\\n\t--freeze True \\\n\npython3 train_energy.py \\\n\t--arch unet16_160_7_dc --epochs 150 --workers 10 \\\n\t--channels 7 --batch-size 12 --fold_num 1 \\\n\t--lr 1e-3 --optimizer adam \\\n\t--bce_weight 0.9 --dice_weight 0.1 --ths 0.5 \\\n\t--print-freq 1 --lognumber unet16_160_7_dc_ths5_energy_distance_gray_final \\\n\t--tensorboard True --tensorboard_images True --is_distance_transform True --is_boundaries True \\\n\t--freeze True \\' \u003e train.sh\n    \n```\n- `sh train.sh`\n\n\n# 5 Making predictions / evaluation\n\n\n- Ssh into the docker container via `docker exec -it YOUR_CONTAINER_ID`\n- Cd to the root folder of the repo\n- `cd src`\n- then\n``` \necho 'python3 train_energy.py \\\n\t--arch unet16_64_7_dc --channels 7 --batch-size 1 --ths 0.5 \\\n\t--lognumber unet16_64_7_dc_ths5_energy_distance_gray_longer_rerun \\\n\t--workers 0 --predict' \u003e predict.sh\n```\n- `sh predict.sh`\n- note that the `lognumber` is the lognumber you specified when training\n- please check which fold is used in the prediction loop\n\n- You can also run evaluation-only scripts like this\n```\npython3 train_energy.py \\\n    --evaluate \\\n    --resume weights/unet16_160_7_dc_ths5_energy_distance_gray_final_fold2_best.pth.tar \\\n\t--arch unet16_160_7_dc --epochs 50 --workers 10 \\\n\t--channels 7 --fold_num 2 \\\n\t--ths 0.5 --is_distance_transform True --is_boundaries True \\\n\t--print-freq 10 --lognumber eval_validation --tensorboard_images True \\\n```\n\n# 6 Watershed\n\n- The model is analogous to DWT since it uses predicted energy for watershed;\n- The best performing wateshed post-processing scripts is in `utils.watershed.energy_baseline`;\n- All the other functions in `utils.watershed` performed worse;\n\n\n# 6 Additional notes\n\n\n- The model randomly crops images when training and resizes them when predicting;\n- An unfinished `src/train_energy_pad.py` is also available. It works, but produces inferior quality;\n\n    \n# 7 Jupyter notebooks\n\nUse these notebooks on your own risk!\n\n- `src/bowl.ipynb` - general debugging notebook with new models / generators / etc\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnakers4%2Fds_bowl_2018","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnakers4%2Fds_bowl_2018","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnakers4%2Fds_bowl_2018/lists"}