{"id":18736729,"url":"https://github.com/digitalslidearchive/cnncelldetection","last_synced_at":"2025-04-12T19:31:51.260Z","repository":{"id":37605877,"uuid":"183457604","full_name":"DigitalSlideArchive/CNNCellDetection","owner":"DigitalSlideArchive","description":"This repository contains adaptations of the state-of-the-art deep learning based object detection models for the detection of cell nuclei in histology images","archived":false,"fork":false,"pushed_at":"2022-12-08T05:06:58.000Z","size":7669,"stargazers_count":27,"open_issues_count":22,"forks_count":3,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-26T13:54:08.980Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DigitalSlideArchive.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-25T15:04:37.000Z","updated_at":"2024-08-28T07:24:43.000Z","dependencies_parsed_at":"2023-01-24T12:00:12.103Z","dependency_job_id":null,"html_url":"https://github.com/DigitalSlideArchive/CNNCellDetection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FCNNCellDetection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FCNNCellDetection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FCNNCellDetection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FCNNCellDetection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DigitalSlideArchive","download_url":"https://codeload.github.com/DigitalSlideArchive/CNNCellDetection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248621143,"owners_count":21134755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T15:22:16.743Z","updated_at":"2025-04-12T19:31:50.649Z","avatar_url":"https://github.com/DigitalSlideArchive.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CNNCellDetection\nThis repository contains adaptations of the state-of-the-art deep learning based object detection models for the detection of cell nuclei in histology images\n\n\n\n\nFirst you have to clone the Luminoth repo either from its official repo or the forked repo, which has some modifications made internally.\n\n1. Official repo: https://github.com/tryolabs/luminoth.git\n2. Forked Modified repo: https://github.com/cramraj8/luminoth.git\n\nForked repo has,\n\n* ResNet block 3 **num_units** hyper-parameter exposed in base_config.yml for all ResNet variants\n* provided ```img = input_img[:, :, :num_channels]``` in dataset loading function\n to facilitate gray image loading and unncessary TensorFlow reshaping exceptions.\n* provided end-points in feature extraction from R-CNN layers\n\n\n\n# Data Generation\n\n\nRaw data can be found in different formats. Either in csv file or **PascalVoc** format in order to train the model.\nYou can check the documentation for more info at https://luminoth.readthedocs.io/en/latest/usage/dataset.html\n\n1. csv file only needs to contain the following columns, and the columns names can be overidden by input argument.\n```\n    a. image_id\n    b. bounding box coordinates in either convention\n            - x_min, y_min, x_max, y_max\n            - x_center, y_center, width, height\n    c. class label(for class-agnostic model, represent by objectness class)\n```\n\n2. PascalVoc data folder should look like below,\n\n\n\n```\n   Data\n    ├── annotations                     - Folder contains XML ground truth annotations.\n    │\n    │\n    ├── ImageSets\n    │   └──Main\n    │       ├── objectness_train.txt    - contains the image_ids that has this particular 'objectness' class.\n    │       └── train.py                - contains the image_ids that is going to be used for training.\n    │\n    │\n    └── JPEGImages                      - Folder contains JPEG/PNG images.\n```\n\n\n\n\n\n## To create tfrecord data\n\nEither place **image**, **annotations**, **train.txt** in appropriate arrangements inside the '**pascalvoc_format_data**' folder or\nhave a csv file in appropriate format.\n\n\n\n## Luminoth CLI:\n\nfrom PascalVoc format to tfrecord generation\n```\n    $ lumi dataset transform \\\n        --type pascal \\\n        --data-dir ./data/pascalvoc_format_data \\\n        --output-dir ./data/tf_dataset \\\n        --split train\n```\n\n\nfrom csv format to tfrecord generation\n```\n    $ lumi dataset transform \\\n        --type csv \\\n        --data-dir ./data/csv_file \\\n        --output-dir ./data/tf_dataset \\\n        --split train\n```\n\n\nyou may want to execute this command by standing inside **luminoth/** folder.\n\n\n\n\n\n\n\n\n# Training\n```\n$ lumi train -c train/config_resnet101_ff12ep4_default.yml\n```\n\nTo monitor the real-time performance of the training model, we can enable tensorboard.\n```\n$ tensorboard --logdir=\u003cpath_to_dir_jobs/my-run/\u003e\n```\n\nIf you want to modify the parameters of the model or training, you can do in the config file. There is a config file(**base_config.yml**) inside the path\n'**luminoth/models/fasterrcnn/**'. Also there is another very short config file(**sample_config.yml**) inside the path '**examples/**'. Therefore, you can copy the entire\nbase_config.yml, make changes to it, and provide its path with **-c** flag in the CLI so that the parameters will be overidden while initiating the training.\n\nFrequently changeable parameters can be found here: https://github.com/cramraj8/luminoth/blob/nuclei-detection/luminoth/README.md\n\n\n\n\n# Prediction\n```\n$ lumi predict -c train/config_resnet101_ff12ep4_default.yml --min-prob 0.1 --max-detections 800 -d \u003coutput_dir_path\u003e\n```\nThis CLI command is to generate prediction overlays visually in .png file and prediction coordinates together with labels in .json file.\nThe only safely tunable parameters at inference stage are min-prob and max-detections, which are used in filtering out the final predictions.\n\n**Remember**: When you are increasing the max-detections, the R-CNN part of the network will throw out more detection proposals that consumes more GPU memory. It\ncan result in memory overflow exception.\n\n\n\n\n# Evaluation\n\nTwo methods of evaluation conducted,\n\n1. IoU based mAP evaluation method\n2. Objectness/classification confidence score based AUC evaluation method\n        using Hungarian Algorithm for mapping GT with Predictions\n\nGenerally, in the PascalVOC and COCO object detection challenges people often use method #1; however, this method\nis better fit for the problem of detecting small number of objects per image. But in the nuclei-detection problem domain,\nwe usually face more than 100 nuclei(objects) in each image. In this case, without mapping best prediction bndbox with ground-truth,\nit is hard to identify the redundant detection bndboxes based on method #1.\n\nThe example of evaluation results overlayed over the input image using method #2 because of the main reason - crowded bndbox detections.\nGreen boxes are TPs, blue boxes are FP, and red boxes are FNs respectively.\n\n\u003c!-- ![alt text](https://github.com/DigitalSlideArchive/CNNCellDetection/evaluation/ex1-overlay_TCGA-G9-6362-01Z-00-DX1_3.png) --\u003e\n\u003c!-- ![Alt text](evaluation/ex1-overlay_TCGA-G9-6362-01Z-00-DX1_3.png=250x250?raw=true \"Title\") --\u003e\n\n\u003c!-- ![test image size](evaluation/ex1-overlay_TCGA-G9-6362-01Z-00-DX1_3.png){:height=\"45%\" width=\"44%\"}\n![test image size](evaluation/ex2-overlay_TCGA-HE-7130-01Z-00-DX1_2.png){:height=\"45%\" width=\"49%\"} --\u003e\n\n\u003c!-- ![alt-text-1](evaluation/ex1-overlay_TCGA-G9-6362-01Z-00-DX1_3.png \"title-1\") ![alt-text-2](evaluation/ex2-overlay_TCGA-HE-7130-01Z-00-DX1_2.png \"title-2\") --\u003e\n\nEvaluation overlays of prediction example-1                |  Evaluation overlays of prediction example-2\n:-------------------------:|:-------------------------:\n![](evaluation/ex1-overlay_TCGA-G9-6362-01Z-00-DX1_3.png)  |  ![](evaluation/ex2-overlay_TCGA-HE-7130-01Z-00-DX1_2.png)\n\n\n\n\n\n# Nuclei Detection Web CLI Plugin\n\nAn extended plugin for [girder/slicer_cli_web](https://github.com/girder/slicer_cli_web)\n\nThe complete workflow of the Dask-TensorFlow enabled pipeline is below ...\n![Alt text](cli/FasterNuclieDetectionCPU/pipeline-workflow.png?raw=true \"Complete Nuclei Detection Pipeline\")\n\n\nTo build a Docker Image from this CLI Plugin,\n\nFirst pull Nuclei-Detection TensorFlow pre-trained model files and place them inside the 'cli/jobs/' folder because these files\nare going to be placed inside the Docker Image. Or place a `wget \u003clink_for_pretrained_file\u003e` line in Dockerfile after line #13.\n\n\n\nthen run,\n\n```\n$ docker build -t \u003cDockerImage_name\u003e:\u003cDockerImage_tag_version\u003e \u003cDockerfile_path\u003e\n```\n\nTo check the Docker Image is completely running,\n\n1. First create a Docker Container of this Docker Image and navigate into /bin/bash\n```\n$ docker run -ti -v \u003clocal_volume_folder\u003e:\u003cDocker_volume_folder\u003e --rm --entrypoint=/bin/bash \u003cDockerImage_name\u003e:\u003cDockerImage_tag_version\u003e\n```\n  Note : -v : used for mouting local and Docker folders so that the changes to the folder will be mirrored immediately.\n\n2. Your default working directory inside the Container bash is '/Applications/', so run a sample test run.\n```\n$ python FasterNuclieDetection/FasterNuclieDetection.py ../91316_leica_at2_40x.svs.38552.50251.624.488.jpg annot.anot timeprofile.csv\n```\n3. Check the annotation and timeprofile files in the local folder.\n\n\n# Dask Execution\n\nThe Dask-CPU enabled CLI application folder is **cli/FasterNuclieDetectionCPU/**. The script consists complete pipeline. If we run this cli on a local\nmachine, then Dask will create default number of workers for action run.\n\n\n\n\n![Alt text](cli/FasterNuclieDetectionCPU/Cluster-Dask-topology.png?raw=true \"Dask Cluster Topology\")\n\nIf we want to run this pipeline on **cluster** environment, first we need to intialize **Dask-scheduler** and then run the script. Dask-scheduler will connect, split, configure the workers\nto run the tasks in parallel.\n\n```\n$ dask-ssh --hostfile=nodenames.txt  --remote-python=\u003cvirtualenv_path_python_folder\u003e --remote-dask-worker=distributed.cli.dask_worker --nprocs=6\n```\n\n**nodenames.txt** is just a text file contains the node names at its each line. **virtualenv_path_python_folder** is usually be `~/.virtualenvs/\u003cvirtualenv_name\u003e/bin/python`. **nprocs** flag is\nused to specify the number of processes that we want to run on each node.\n\nOnce we run the above CLI, the Dask-scheduler will be created in a virtual terminal session. Now you can run your script in another terminal session.\n\n**Note** : Any paths that you define inside the script should be declared in a file so that the all the nodes connected with the head node knows where to look for.\nFor this purpose, we usually create a `\u003csample_filename\u003e.pth` inside **site-packages** folder(usally located at `~/.local/lib/python2.7/site-packages/`), and write down the\npaths at each line. The paths that we usally consider are,\n\n1. any libraries that we installed outside the site-packages.\n2. any 'import file' location(like utils.py).\n3. input file location.\n4. ouput dir location.\n5. ckpt(pretrained files) dir location.\n6. sample_config.yml file location.\n\nNow when we run the CLI\n```\n$ python FasterNuclieDetectionCPU/FasterNuclieDetectionCPU.py ../91316_leica_at2_40x.svs.38552.50251.624.488.jpg annot.anot timeprofile.csv\n```\nthe task will be running in multi-worker nodes parallelly.\n\nYou can monitor the real time Dask performance using **Bokeh** tool. If you already installed this library in the virtualenv, the `port: 8787` will be listenining always to\nBokeh.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalslidearchive%2Fcnncelldetection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitalslidearchive%2Fcnncelldetection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalslidearchive%2Fcnncelldetection/lists"}