{"id":15037487,"url":"https://github.com/mic-dkfz/medicaldetectiontoolkit","last_synced_at":"2025-04-08T09:07:28.093Z","repository":{"id":40973217,"uuid":"152747947","full_name":"MIC-DKFZ/medicaldetectiontoolkit","owner":"MIC-DKFZ","description":"The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.  ","archived":false,"fork":false,"pushed_at":"2024-06-17T22:47:46.000Z","size":4565,"stargazers_count":1322,"open_issues_count":47,"forks_count":294,"subscribers_count":52,"default_branch":"master","last_synced_at":"2025-04-08T09:07:23.284Z","etag":null,"topics":["3d-mask-rcnn","3d-models","3d-object-detection","deep-learning","deep-neural-networks","detection","mask-rcnn","medical-image-analysis","medical-image-computing","medical-image-processing","medical-imaging","object-detection","pytorch-cnn","pytorch-deeplearning","pytorch-implementation","retina-net","retina-unet","segmentation","semantic-segmentation","u-net"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MIC-DKFZ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-12T12:34:57.000Z","updated_at":"2025-04-02T08:56:07.000Z","dependencies_parsed_at":"2024-05-28T22:01:21.078Z","dependency_job_id":"e5cd3044-4da3-4e5d-a044-994b9f8cfc9f","html_url":"https://github.com/MIC-DKFZ/medicaldetectiontoolkit","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIC-DKFZ%2Fmedicaldetectiontoolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIC-DKFZ%2Fmedicaldetectiontoolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIC-DKFZ%2Fmedicaldetectiontoolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIC-DKFZ%2Fmedicaldetectiontoolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MIC-DKFZ","download_url":"https://codeload.github.com/MIC-DKFZ/medicaldetectiontoolkit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247809962,"owners_count":20999816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-mask-rcnn","3d-models","3d-object-detection","deep-learning","deep-neural-networks","detection","mask-rcnn","medical-image-analysis","medical-image-computing","medical-image-processing","medical-imaging","object-detection","pytorch-cnn","pytorch-deeplearning","pytorch-implementation","retina-net","retina-unet","segmentation","semantic-segmentation","u-net"],"created_at":"2024-09-24T20:34:46.389Z","updated_at":"2025-04-08T09:07:28.066Z","avatar_url":"https://github.com/MIC-DKFZ.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## *** MedicalDetectionToolkit is no longer maintained. Please check out our follow-up framework: \u003ca href=\"https://github.com/MIC-DKFZ/nnDetection\"\u003e nnDetection \u003c/a\u003e ***\n\u003cbr\u003e\n\u003cbr\u003e\n\n[\u003cimg src=\"https://img.shields.io/badge/chat-slack%20channel-75BBC4.svg\"\u003e](https://join.slack.com/t/mdtoolkit/shared_invite/enQtNTQ3MjY2MzE0MDg2LWNjY2I2Njc5MTY0NmM0ZWIxNmQwZDRhYzk2MDdhM2QxYjliYTcwYzhkNTAxYmRkMDA0MjcyNDMyYjllNTZhY2M)\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/mdt_logo_2.png\"  width=450\u003e\u003c/p\u003e\u003cbr\u003e\n\nCopyright © German Cancer Research Center (DKFZ), \u003ca href=\"https://www.dkfz.de/en/mic/index.php\"\u003eDivision of Medical Image Computing (MIC)\u003c/a\u003e. Please make sure that your usage of this code is in compliance with the code \u003ca href=\"https://github.com/pfjaeger/medicaldetectiontoolkit/blob/master/LICENSE\"\u003elicense\u003c/a\u003e.  \n\n## Overview\nThis is a comprehensive framework for object detection featuring:\n- 2D + 3D implementations of prevalent object detectors: e.g. Mask R-CNN [1], Retina Net [2], Retina U-Net [3]. \n- Modular and light-weight structure ensuring sharing of all processing steps (incl. backbone architecture) for comparability of models.\n- training with bounding box and/or pixel-wise annotations.\n- dynamic patching and tiling of 2D + 3D images (for training and inference).\n- weighted consolidation of box predictions across patch-overlaps, ensembles, and dimensions [3].\n- monitoring + evaluation simultaneously on object and patient level. \n- 2D + 3D output visualizations.\n- integration of COCO mean average precision metric [5]. \n- integration of MIC-DKFZ batch generators for extensive data augmentation [6].\n- easy modification to evaluation of instance segmentation and/or semantic segmentation.\n\u003cbr/\u003e\n[1] He, Kaiming, et al.  \u003ca href=\"https://arxiv.org/abs/1703.06870\"\u003e\"Mask R-CNN\"\u003c/a\u003e ICCV, 2017\u003cbr\u003e\n[2] Lin, Tsung-Yi, et al.  \u003ca href=\"https://arxiv.org/abs/1708.02002\"\u003e\"Focal Loss for Dense Object Detection\"\u003c/a\u003e TPAMI, 2018.\u003cbr\u003e\n[3] Jaeger, Paul et al. \u003ca href=\"https://ml4health.github.io/2019/pdf/232_ml4h_preprint.pdf\"\u003e \"Retina U-Net: Embarrassingly Simple Exploitation\nof Segmentation Supervision for Medical Object Detection\" \u003c/a\u003e, 2018\n\n[5] https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py\u003cbr/\u003e\n[6] https://github.com/MIC-DKFZ/batchgenerators\u003cbr/\u003e\u003cbr\u003e\n\n## How to cite this code\nPlease cite the original publication [3].\n\n## Installation\nSetup package in a virtual environment:\n```\ngit clone https://github.com/pfjaeger/medicaldetectiontoolkit.git .\ncd medicaldetectiontoolkit\nvirtualenv -p python3.6 venv\nsource venv/bin/activate\npip3 install -e .\n```\n\nWe use two cuda functions: Non-Maximum Suppression (taken from [pytorch-faster-rcnn](https://github.com/ruotianluo/pytorch-faster-rcnn) and added adaption for 3D) and RoiAlign (taken from [RoiAlign](https://github.com/longcw/RoIAlign.pytorch), fixed according to [this bug report](https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35), and added adaption for 3D). In this framework, they come pre-compile for TitanX. If you have a different GPU you need to re-compile these functions:\n\n\n| GPU | arch |\n| --- | --- |\n| TitanX | sm_52 |\n| GTX 960M | sm_50 |\n| GTX 1070 | sm_61 |\n| GTX 1080 (Ti) | sm_61 |\n  \n```\ncd cuda_functions/nms_xD/src/cuda/\nnvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=[arch]\ncd ../../\npython build.py\ncd ../\n\ncd cuda_functions/roi_align_xD/roi_align/src/cuda/\nnvcc -c -o crop_and_resize_kernel.cu.o crop_and_resize_kernel.cu -x cu -Xcompiler -fPIC -arch=[arch]\ncd ../../\npython build.py\ncd ../../\n```\n\n## Prepare the Data\nThis framework is meant for you to be able to train models on your own data sets. \nTwo example data loaders are provided in medicaldetectiontoolkit/experiments including thorough documentation to ensure a quick start for your own project. The way I load Data is to have a preprocessing script, which after preprocessing saves the Data of whatever data type into numpy arrays (this is just run once). During training / testing, the data loader then loads these numpy arrays dynamically. (Please note the Data Input side is meant to be customized by you according to your own needs and the provided Data loaders are merely examples: LIDC has a powerful Dataloader that handles 2D/3D inputs and is optimized for patch-based training and inference. Toy-Experiments have a lightweight Dataloader, only handling 2D without patching. The latter makes sense if you want to get familiar with the framework.).\n\n## Execute\n1. Set I/O paths, model and training specifics in the configs file: medicaldetectiontoolkit/experiments/your_experiment/configs.py\n2. Train the model: \n\n    ```\n    python exec.py --mode train --exp_source experiments/my_experiment --exp_dir path/to/experiment/directory       \n    ``` \n    This copies snapshots of configs and model to the specified exp_dir, where all outputs will be saved. By default, the data is split into 60% training and 20% validation and 20% testing data to perform a 5-fold cross validation (can be changed to hold-out test set in configs) and all folds will be trained iteratively. In order to train a single fold, specify it using the folds arg: \n    ```\n    python exec.py --folds 0 1 2 .... # specify any combination of folds [0-4]\n    ```\n3. Run inference:\n    ```\n    python exec.py --mode test --exp_dir path/to/experiment/directory \n    ```\n    This runs the prediction pipeline and saves all results to exp_dir.\n    \n    \n## Models\n\nThis framework features all models explored in [3] (implemented in 2D + 3D): The proposed Retina U-Net, a simple but effective Architecture fusing state-of-the-art semantic segmentation with object detection,\u003cbr\u003e\u003cbr\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/retu_figure.png\"  width=50%\u003e\u003c/p\u003e\u003cbr\u003e\nalso implementations of prevalent object detectors, such as Mask R-CNN, Faster R-CNN+ (Faster R-CNN w\\ RoIAlign), Retina Net, U-Faster R-CNN+ (the two stage counterpart of Retina U-Net: Faster R-CNN with auxiliary semantic segmentation), DetU-Net (a U-Net like segmentation architecture with heuristics for object detection.)\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/baseline_figure.png\"  width=85%\u003e\u003c/p\u003e\u003cbr\u003e\n\n## Training annotations\nThis framework features training with pixelwise and/or bounding box annotations. To overcome the issue of box coordinates in \ndata augmentation, we feed the annotation masks through data augmentation (create a pseudo mask, if only bounding box annotations provided) and draw the boxes afterwards.\u003cbr\u003e\u003cbr\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/annotations.png\"  width=85%\u003e\u003c/p\u003e\u003cbr\u003e\n\n\nThe framework further handles two types of pixel-wise annotations: \n\n1. A label map with individual ROIs identified by increasing label values, accompanied by a vector containing in each position the class target for the lesion with the corresponding label (for this mode set get_rois_from_seg_flag = False when calling ConvertSegToBoundingBoxCoordinates in your Data Loader).\n2. A binary label map. There is only one foreground class and single lesions are not identified. All lesions have the same class target (foreground). In this case the Dataloader runs a Connected Component Labelling algorithm to create processable lesion - class target pairs on the fly (for this mode set get_rois_from_seg_flag = True when calling ConvertSegToBoundingBoxCoordinates in your Data Loader). \n\n## Prediction pipeline\nThis framework provides an inference module, which automatically handles patching of inputs, and tiling, ensembling, and weighted consolidation of output predictions:\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"assets/prediction_pipeline.png\" \u003e\u003cbr\u003e\u003cbr\u003e\n\n\n## Consolidation of predictions (Weighted Box Clustering)\nMultiple predictions of the same image (from  test time augmentations, tested epochs and overlapping patches), result in a high amount of boxes (or cubes), which need to be consolidated. In semantic segmentation, the final output would typically be obtained by averaging every pixel over all predictions. As described in [3], **weighted box clustering** (WBC) does this for box predictions:\u003cbr\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/wcs_text.png\"  width=650\u003e\u003cbr\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/wcs_readme.png\"  width=800\u003e\u003cbr\u003e\u003cbr\u003e\u003c/p\u003e\n\n\n\n## Visualization / Monitoring\nBy default, loss functions and performance metrics are monitored:\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"assets/loss_monitoring.png\"  width=700\u003e\u003cbr\u003e\n\u003chr\u003e\nHistograms of matched output predictions for training/validation/testing are plotted per foreground class:\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"assets/hist_example.png\"  width=550\u003e\n\u003chr\u003e\nInput images + ground truth annotations + output predictions of a sampled validation abtch are plotted after each epoch (here 2D sampled slice with +-3 neighbouring context slices in channels):\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"assets/output_monitoring_1.png\"  width=750\u003e\n\u003chr\u003e\nZoomed into the last two lines of the plot:\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\u003cimg src=\"assets/output_monitoring_2.png\"  width=700\u003e\n\n\n## License\nThis framework is published under the [Apache License Version 2.0](LICENSE).\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmic-dkfz%2Fmedicaldetectiontoolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmic-dkfz%2Fmedicaldetectiontoolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmic-dkfz%2Fmedicaldetectiontoolkit/lists"}