{"id":51024305,"url":"https://github.com/turhancan97/cell_size","last_synced_at":"2026-06-21T18:30:39.700Z","repository":{"id":345331124,"uuid":"1178241941","full_name":"turhancan97/cell_size","owner":"turhancan97","description":"Batch cell segmentation and size estimation for microscopy images using Cellpose-SAM, with automated good/bad cell classification and an interactive Gradio demo.","archived":false,"fork":false,"pushed_at":"2026-06-16T21:33:25.000Z","size":2292,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-16T23:20:16.089Z","etag":null,"topics":["biology","cell-segmentation","cell-size","cellpose","computer-vision","deep-learning","gradio","hydra","image-segmentation","microscopy-image-analysis","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/turhancan97.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-10T20:46:12.000Z","updated_at":"2026-06-16T21:33:29.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/turhancan97/cell_size","commit_stats":null,"previous_names":["turhancan97/cell_size"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/turhancan97/cell_size","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/turhancan97%2Fcell_size","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/turhancan97%2Fcell_size/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/turhancan97%2Fcell_size/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/turhancan97%2Fcell_size/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/turhancan97","download_url":"https://codeload.github.com/turhancan97/cell_size/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/turhancan97%2Fcell_size/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34622271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biology","cell-segmentation","cell-size","cellpose","computer-vision","deep-learning","gradio","hydra","image-segmentation","microscopy-image-analysis","pytorch"],"created_at":"2026-06-21T18:30:38.735Z","updated_at":"2026-06-21T18:30:39.687Z","avatar_url":"https://github.com/turhancan97.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cell Size Estimator\n\nBatch cell segmentation and size estimation for microscopy images using\n[Cellpose-SAM](https://github.com/MouseLand/cellpose), driven by\n[Hydra](https://hydra.cc/) configuration.\n\n## Features\n\n- **Batch processing** -- point at a directory and segment every image in one\n  command.\n- **Membrane or nucleus** segmentation via preset configs.\n- **Hydra configuration** -- override any parameter from the CLI; supports\n  multirun for parameter sweeps.\n- **Resume support** -- re-run safely; already-processed images are skipped.\n- **Auto pixel-scale detection** from OME-TIFF metadata, with manual fallback.\n- **Multiple output formats** -- masks as 16-bit TIFF or NumPy `.npy`.\n- **Optional overlays and histograms** for quality checking.\n- **Catalog CSV** recording every processed image with metadata.\n- **Cell quality classifier** -- train a binary good/bad classifier from human\n  feedback and automatically filter cells during size estimation.\n- **Nucleus measurements** -- run segmentation twice (membrane + nucleus) and\n  the inference pipeline automatically matches nuclei to cells, reporting\n  nucleus area, diameters, and nucleus-to-cytoplasm (N/C) ratio.\n\n## Installation\n\n### Option 1: conda (recommended)\n\n```bash\n# Clone with submodules\ngit clone --recurse-submodules \u003crepo-url\u003e\ncd cell_size\n\n# Create and activate a new conda environment\nconda create -n cell-size python=3.10 -y\nconda activate cell-size\n\n# Install PyTorch with CUDA (adjust cuda version to match your driver)\nconda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia -y\n\n# Install the package in editable mode\npip install -e .\n\n# Install the cellpose submodule\npip install -e cellpose/\n```\n\n### Option 2: pip + venv\n\n```bash\n# Clone with submodules\ngit clone --recurse-submodules \u003crepo-url\u003e\ncd cell_size\n\n# Create and activate a virtual environment\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\n\n# Install in editable mode\npip install -e .\n\n# The cellpose submodule is imported at runtime via sys.path.\n# If you prefer, you can also install it:\npip install -e cellpose/\n```\n\n\u003e **GPU**: Make sure PyTorch with CUDA is installed for GPU acceleration.\n\u003e See \u003chttps://pytorch.org/get-started/locally/\u003e.\n\n## Annotation Platform (Private Submodule)\n\nThis repository includes a private submodule at `apps/annotation-platform`\nfor Good/Bad cell annotation.\n\n### Access requirements\n\n- You must have read access to the private GitHub repository.\n- Your local Git must be able to authenticate over SSH (`git@github.com:...`).\n\nIf submodule checkout fails, verify SSH access:\n\n```bash\nssh -T git@github.com\n```\n\n### Clone/update with submodules\n\nFresh clone:\n\n```bash\ngit clone --recurse-submodules \u003crepo-url\u003e\ncd cell_size\n```\n\nExisting clone:\n\n```bash\ngit submodule sync --recursive\ngit submodule update --init --recursive\n```\n\n### Annotation export contract\n\nThe classifier training pipeline expects feedback CSVs with at least:\n\n```csv\ndataset,image_path,mask_index,verdict\ndatasetA,img001,25,good\n```\n\nThis schema is consumed by `cell-size-train` (`src/cell_size/classifier/consensus.py`).\n\n## Quick Start\n\nSegment all `.tif` images in a directory (membrane mode, default):\n\n```bash\ncell-size data.data_dir=/path/to/images\n```\n\nSegment nuclei instead:\n\n```bash\ncell-size data.data_dir=/path/to/images segmentation=nucleus\n```\n\nUse multiple file types with resize and overlays:\n\n```bash\ncell-size \\\n  data.data_dir=/path/to/images \\\n  data.file_types='[\".tif\",\".jpg\",\".png\"]' \\\n  segmentation.resize=1000 \\\n  output.generate_overlays=true \\\n  output.compute_cell_areas=true \\\n  cell_type=FrogBlood\n```\n\nForce reprocessing of all images:\n\n```bash\ncell-size data.data_dir=/path/to/images force=true\n```\n\nHydra multirun (parameter sweep):\n\n```bash\ncell-size -m \\\n  data.data_dir=/path/to/images \\\n  segmentation.flow_threshold=0.3,0.4,0.5\n```\n\n## Configuration\n\nAll configuration lives in `src/cell_size/configs/` and follows Hydra conventions.\n\n### Main config (`src/cell_size/configs/config.yaml`)\n\n| Key         | Default     | Description                        |\n|-------------|-------------|------------------------------------|\n| `cell_type` | `\"Unknown\"` | Label for the Cell_Type CSV column |\n| `force`     | `false`     | Reprocess already-segmented images |\n\n### Data (`src/cell_size/configs/data/default.yaml`)\n\n| Key            | Default    | Description                                            |\n|----------------|------------|--------------------------------------------------------|\n| `data_dir`     | (required) | Path to the image dataset                              |\n| `file_types`   | `[\".tif\"]` | Image extensions to look for                           |\n| `recursive`    | `true`     | Scan subdirectories                                    |\n| `channels`     | `null`     | Channel indices to select (null = use all)             |\n| `pixel_to_um`  | `null`     | Manual µm/pixel value (null = auto-detect from metadata) |\n\n### Segmentation (`src/cell_size/configs/segmentation/`)\n\nTwo presets: `membrane` (default) and `nucleus`.\n\n| Key                   | membrane | nucleus | Description                           |\n|-----------------------|----------|---------|---------------------------------------|\n| `target`              | membrane | nucleus | Descriptive label                     |\n| `chan`                 | 0        | 1       | Primary channel for cellpose          |\n| `chan2`                | 0        | 0       | Secondary channel                     |\n| `flow_threshold`      | 0.4      | 0.4     | Flow error threshold                  |\n| `cellprob_threshold`  | 0.0      | 0.0     | Cell probability threshold            |\n| `tile_norm_blocksize` | 0        | 0       | Block size for tile normalisation (0 = global) |\n| `resize`              | 0        | 0       | Resize longest side before segmentation (0 = no resize) |\n| `min_cell_size`       | 15       | 15      | Minimum cell size in pixels           |\n\n### Model (`src/cell_size/configs/model/cpsam.yaml`)\n\n| Key                 | Default  | Description                              |\n|---------------------|----------|------------------------------------------|\n| `model_type`        | `\"cpsam\"` | Cellpose model name                     |\n| `custom_model_path` | `null`   | Path to a custom fine-tuned model        |\n| `gpu`               | `true`   | Use GPU (falls back to CPU if unavailable) |\n| `batch_size`        | `32`     | Batch size for model evaluation          |\n\n### Output (`src/cell_size/configs/output/default.yaml`)\n\n| Key                  | Default        | Description                                  |\n|----------------------|----------------|----------------------------------------------|\n| `mask_format`        | `\"tif\"`        | `\"tif\"` or `\"npy\"`                           |\n| `csv_path`           | `\"results.csv\"` | Catalog CSV file path (relative to data_dir) |\n| `generate_overlays`  | `false`        | Save outline overlay PNGs                    |\n| `generate_plots`     | `false`        | Save cell-area histogram PNGs                |\n| `compute_cell_areas` | `false`        | Write per-image cell area CSVs               |\n\n## Dual-Run Workflow (Membrane + Nucleus)\n\nTo measure both cell and nucleus sizes, run the batch segmentation twice on the\nsame dataset -- once for membrane and once for nucleus. The outputs are saved\nwith distinct names so they coexist in the same per-image folders:\n\n```bash\n# 1. Segment membranes (default naming: _mask.tif, _overlay.png, ...)\ncell-size data.data_dir=/path/to/images output.generate_overlays=true\n\n# 2. Segment nuclei (nucleus naming: _nucleus_mask.tif, _nucleus_overlay.png, ...)\ncell-size data.data_dir=/path/to/images segmentation=nucleus output.generate_overlays=true\n```\n\nMembrane outputs keep their original names (backward-compatible). Nucleus\noutputs use a `_nucleus_` infix. Each run also writes its own catalog CSV\n(`results.csv` for membrane, `results_nucleus.csv` for nucleus).\n\nDuring `cell-size-classify` inference, the pipeline automatically detects\nnucleus masks (`_nucleus_mask.tif`) alongside membrane masks and adds\ncell/nucleus morphology measurements to `filtered_areas.csv`:\n\n| Column                  | Description                              |\n|-------------------------|------------------------------------------|\n| `cell_axis_ratio`       | Cell long-axis / short-axis ratio        |\n| `nucleus_area_px`       | Nucleus area in pixels                   |\n| `nucleus_major_axis_px` | Nucleus long diameter (px)               |\n| `nucleus_minor_axis_px` | Nucleus short diameter (px)              |\n| `nucleus_axis_ratio`    | Nucleus long-axis / short-axis ratio     |\n| `nc_ratio`              | Nucleus-to-cell area ratio               |\n| `nucleus_area_um2`      | Nucleus area in µm² (if scale known)     |\n| `nucleus_major_axis_um` | Nucleus long diameter in µm              |\n| `nucleus_minor_axis_um` | Nucleus short diameter in µm             |\n\nCells with no matching nucleus get `NaN` for all nucleus columns. When\nmultiple nuclei overlap a cell, only the largest (by pixel overlap) is used.\n\n## Output Structure\n\nAfter processing both membrane and nucleus, each image folder contains:\n\n```\ndata_dir/\n  projectA/\n    image000/\n      image000.jpg                    # original image (moved here)\n      image000_mask.tif               # membrane segmentation mask\n      image000_overlay.png            # (optional) membrane overlay\n      image000_areas.csv              # (optional) membrane cell areas\n      image000_histogram.png          # (optional) membrane area histogram\n      image000_nucleus_mask.tif       # nucleus segmentation mask\n      image000_nucleus_overlay.png    # (optional) nucleus overlay\n      image000_nucleus_areas.csv      # (optional) nucleus areas\n      image000_nucleus_histogram.png  # (optional) nucleus area histogram\n    image001/\n      image001.tif\n      image001_mask.tif\n      image001_nucleus_mask.tif\n  results.csv                         # membrane catalog CSV\n  results_nucleus.csv                 # nucleus catalog CSV\n```\n\n### Catalog CSV format\n\n```csv\nRelative_Path,Image_Name,File_Type,Mask_Name,Resize,Cell_Type,Timestamp\nprojectA/image000,image000,jpg,image000_mask,1000,FrogBlood,2026-03-10T17:30:00\n```\n\n## Cell Quality Classifier\n\nThe classifier module automates the human-in-the-loop cell validation\nworkflow. Biologists review segmented cells in a web UI and label them as\n\"good\" (include in size estimation) or \"bad\" (exclude). The classifier learns\nfrom these labels and applies the same filtering automatically to new datasets.\n\n### Training\n\n```bash\ncell-size-train \\\n    feedback_csvs='[\"/path/to/feedback1.csv\", \"/path/to/feedback2.csv\"]' \\\n    data_dir=/path/to/segmented/data \\\n    output_dir=./classifier_output \\\n    classifier.encoder='timm/vit_small_patch16_dinov3.lvd1689m' \\\n    classifier.use_efficient_probing=true \\\n    classifier.efficient_probing.num_queries=32 \\\n    classifier.freeze_encoder=true \\\n    classifier.epochs=30\n```\n\nPipeline: merge feedback CSVs -\u003e majority-vote consensus -\u003e extract cell\ncrops -\u003e train/val/test split -\u003e train -\u003e evaluate -\u003e save checkpoint +\nconfusion matrix.\n\n### SLURM launcher (training only)\n\nFor running many training experiments on a SLURM cluster, use\n`scripts/launch_classifier_train_parallel.sh`. It submits one job per run,\nsupports `--dry-run`, and throttles submissions with `--max-concurrent` using\n`squeue`.\n\nMinimal example:\n\n```bash\nDATA_DIR=/path/to/segmented/data \\\nFEEDBACK_CSVS='[\"/path/to/feedback.csv\"]' \\\nOUTPUT_DIR=/path/to/output_dir \\\nbash scripts/launch_classifier_train_parallel.sh --dry-run --max-concurrent 20 \\\n  --encoders resnet18,'timm/vit_small_patch16_dinov3.lvd1689m' --freeze both --train-with-val false --lrs 0.001,0.0005 --thresholds 0.7 --cv both \\\n  classifier.epochs=30 classifier.batch_size=64\n```\n\nEach run writes into `OUTPUT_DIR/\u003crun_name\u003e/`:\n\n- `best_model.pt`\n- `confusion_matrix.png` (when test split exists)\n- `results.csv` (one-row summary)\n\nAdditionally, a central `OUTPUT_DIR/experiments.csv` is appended (one row per\nrun, safe under concurrent SLURM jobs).\n\n#### `experiments.csv` schema\n\nFixed columns (one row per run):\n\n```text\nrun_name,encoder,freeze_encoder,learning_rate,confidence_threshold,seed,cross_validation,k_folds,best_val_f1,test_accuracy,test_precision,test_recall,test_f1,best_checkpoint_path,confusion_matrix_path,slurm_job_id,status,started_at,finished_at,hostname,pid\n```\n\n### Inference\n\n```bash\ncell-size-classify \\\n    checkpoint=./classifier_output/best_model.pt \\\n    data_dir=/path/to/new/segmented/data \\\n    output_dir=./classify_output \\\n    classifier.confidence_threshold=0.7 \\\n    classifier.selective_rejection.enabled=true \\\n    classifier.selective_rejection.t_bad=0.09 \\\n    classifier.selective_rejection.t_good=0.51\n```\n\nPipeline: load model -\u003e classify every cell in the dataset -\u003e write\npredictions CSV (`good`/`bad`/`rejected`) -\u003e compute filtered areas (good cells only) -\u003e generate\nfiltered overlay images.\n\nIf you already have `classify_output/predictions.csv`, you can regenerate overlays without rerunning inference:\n\n```bash\ncell-size-classify-overlays \\\n    data_dir=/path/to/new/segmented/data \\\n    output_dir=./classify_output \\\n    predictions_csv=./classify_output/predictions.csv\n```\n\nIf you already have `filtered_areas.csv`, you can apply morphology QC without rerunning inference:\n\n```bash\ncell-size-qc-filter \\\n    input_csv=./classify_output/filtered_areas.csv \\\n    output_dir=./classify_output\n```\n\nQC keeps raw `filtered_areas.csv` unchanged and writes cleaned/rejected tables using the default rule\n`0.05 \u003c= nc_ratio \u003c= 0.50` with required nucleus measurements.\n\n### Classifier Configuration (`src/cell_size/configs/classifier/default.yaml`)\n\n| Key                          | Default      | Description                                             |\n|------------------------------|--------------|---------------------------------------------------------|\n| `crop_size`                  | `224`        | Cell crop resize resolution                             |\n| `crop_padding_pct`           | `0.2`        | Padding around bounding box (fraction)                  |\n| `crop_format`                | `\"png\"`      | Saved crop format (`png` or `jpg`)                      |\n| `mask_background`            | `false`      | Zero out pixels outside cell mask                       |\n| `crops_dir`                  | `\"crops\"`    | Output directory for extracted crops                    |\n| `split_ratio`                | `[0.7, 0.15, 0.15]` | Train / val / test split ratio                 |\n| `seed`                       | `42`         | Random seed for splitting and reproducibility           |\n| `encoder`                    | `\"resnet18\"` | Backbone: `resnet18`, `resnet50`, `vit_b_16`, `efficientnet_b0`, `squeezenet1_1`, or `timm/\u003cmodel_name\u003e` (e.g. `timm/vit_small_patch16_dinov3.lvd1689m`) |\n| `freeze_encoder`             | `false`      | Freeze backbone, train only classification head         |\n| `pretrained`                 | `true`       | Use ImageNet-pretrained weights                         |\n| `train_with_val`             | `false`      | Final-fit mode: train on `train+val`, and monitor/select checkpoint on `test` (incompatible with cross-validation) |\n| `use_mlp_head`               | `false`      | Use MLP head: `Linear(in,128)-\u003eReLU-\u003eLinear(128,32)-\u003eReLU-\u003eLinear(32,8)-\u003eReLU-\u003eLinear(8,1)` |\n| `use_efficient_probing`      | `false`      | Use efficient probing head on patch tokens (timm ViT encoders only; mutually exclusive with `use_mlp_head`) |\n| `efficient_probing.num_queries` | `32`      | Number of learnable query tokens for efficient probing  |\n| `efficient_probing.num_heads`   | `1`       | Number of attention heads in efficient probing          |\n| `efficient_probing.d_out`       | `1`       | Channel reduction factor (`C' = C / d_out`)            |\n| `efficient_probing.qkv_bias`    | `false`   | Enable bias in efficient probing value projection       |\n| `efficient_probing.qk_scale`    | `null`    | Optional custom QK scale (default uses head dim scaling) |\n| `epochs`                     | `50`         | Maximum training epochs                                 |\n| `batch_size`                 | `32`         | Training batch size                                     |\n| `learning_rate`              | `0.001`      | Adam learning rate                                      |\n| `weight_decay`               | `0.0001`     | Adam weight decay                                       |\n| `early_stopping_patience`    | `7`          | Epochs without improvement before stopping              |\n| `confidence_threshold`       | `0.7`        | Minimum confidence to label a cell as \"good\" (ignored when `selective_rejection.enabled=true`) |\n| `selective_rejection.enabled`| `false`      | Enable selective rejection with `good`/`bad`/`rejected` outputs |\n| `selective_rejection.t_bad`  | `0.09`       | If `p_good \u003c= t_bad`, verdict is `bad`                  |\n| `selective_rejection.t_good` | `0.51`       | If `p_good \u003e= t_good`, verdict is `good`; otherwise `rejected` |\n| `gpu`                        | `true`       | Use GPU if available                                    |\n| `wandb.enabled`              | `false`      | Enable Weights \u0026 Biases logging                         |\n| `wandb.project`              | `\"cell-quality\"` | WandB project name                                  |\n| `cross_validation.enabled`   | `false`      | Use k-fold cross-validation instead of single split     |\n| `cross_validation.k_folds`   | `5`          | Number of folds                                         |\n\n### Training Config (`src/cell_size/configs/train.yaml`)\n\n| Key              | Default                  | Description                              |\n|------------------|--------------------------|------------------------------------------|\n| `feedback_csvs`  | `[]`                     | List of feedback CSV file paths          |\n| `data_dir`       | `null`                   | Root of segmented dataset                |\n| `output_dir`     | `\"./classifier_output\"`  | Where to save crops, checkpoints, plots  |\n\n### Inference Config (`src/cell_size/configs/classify.yaml`)\n\n| Key                          | Default                | Description                               |\n|------------------------------|------------------------|-------------------------------------------|\n| `checkpoint`                 | `null`                 | Path to trained model checkpoint           |\n| `data_dir`                   | `null`                 | Root of segmented dataset to classify      |\n| `output_dir`                 | `\"./classify_output\"`  | Output directory for predictions           |\n| `compute_filtered_areas`     | `true`                 | Compute areas for good cells only          |\n| `generate_filtered_overlays` | `true`                 | Generate overlay images with filtering     |\n| `morphology_qc.enabled`      | `true`                 | Write QC-clean morphology outputs          |\n| `morphology_qc.require_nucleus` | `true`              | Exclude cells without nucleus measurements |\n| `morphology_qc.min_nc_ratio` | `0.05`                 | Minimum accepted nucleus/cell area ratio   |\n| `morphology_qc.max_nc_ratio` | `0.50`                 | Maximum accepted nucleus/cell area ratio   |\n| `pixel_to_um`                | `null`                 | Manual pixel-to-um scale for area calc     |\n\n### Feedback CSV Format\n\nThe feedback CSV (from the review web UI) must have these columns:\n\n```csv\ndataset,image_path,mask_index,verdict,reviewer_email,comment,reviewed_at\ndatasetA,img001,25,good,reviewer@example.com,\"Cell is circular\",2026-03-16T14:00:00\n```\n\n### Classifier Output\n\n```\nclassifier_output/\n  crops/\n    train/good/*.png\n    train/bad/*.png\n    val/good/*.png\n    val/bad/*.png\n    test/good/*.png\n    test/bad/*.png\n  best_model.pt           # best checkpoint (by val F1)\n  confusion_matrix.png    # test set confusion matrix\n\nclassify_output/\n  predictions.csv         # per-cell predictions with confidence + accepted + frog_id (verdict: good|bad|rejected)\n  filtered_areas.csv      # raw good-cell morphology (areas, diameters, ratios, frog_id)\n  frog_aggregated_metrics.csv  # raw one-row-per-frog aggregate\n  filtered_areas_qc.csv   # QC-clean morphology used by biology reports when present\n  filtered_areas_qc_rejected.csv  # QC-rejected rows with qc_reason\n  frog_aggregated_metrics_qc.csv  # QC-clean one-row-per-frog aggregate\n  morphology_qc_image_summary.csv\n  morphology_qc_frog_summary.csv\n  morphology_qc_threshold_sensitivity.csv\n  frog_aggregated_metrics_qc_comparison.csv\n  overlays/\n    img001_filtered_overlay.jpg   # good=green, bad=orange, rejected=magenta + nucleus boundaries (cyan)\n    img002_filtered_overlay.jpg\n```\n\nFrog IDs are parsed from image names using `TIFF_AH_\u003cfrog_id\u003e_\u003cimage_idx\u003e`\n(for example: `TIFF_AH_001_04 -\u003e 1`, `TIFF_AH_476_10 -\u003e 476`).\n\n### Combined report (classifier + biology, LaTeX/PDF)\n\nThe primary deliverable is **`notebooks/report.pdf`**: classifier validation\n(Parts A \u0026 B) plus compact biology results (Part C1–C6), Discussion, and Glossary.\n\n```bash\nconda activate cell-size\npython notebooks/build_report.py\npython notebooks/build_report.py --no-figures\npython notebooks/build_report.py --no-figures --skip-classifier-inference\npython notebooks/build_report.py --skip-fill          # compile existing report.md\n```\n\nThis regenerates classifier + Part C figures (`notebooks/figures/`), fills\n`notebooks/report.md` from `report.md.template`, and writes `report.tex` /\n`report.pdf`. Classifier metrics are cached in\n`classify_output/analysis/classifier_stats.json` (use `--skip-classifier-inference`\nwhen torch is unavailable).\n\n### Biology-only report (extended LaTeX/PDF)\n\nFor the extended biology deep-dive (Methods, yield, reference intervals, etc.),\nbuild **`notebooks/report_biology.pdf`** separately:\n\n```bash\nconda activate cell-size\npython notebooks/build_report_latex.py\n```\n\nThis regenerates figures (`notebooks/figures/`), fills\n`notebooks/report_biology.md` from `report_biology.md.template`, and writes\n`notebooks/report_biology.tex` / `report_biology.pdf`. Optional:\n`--no-figures`, `--tex-only`. Requires `xelatex` (TeX Live) for PDF;\nuses a built-in LaTeX converter if `pandoc` is not installed.\n\nAnalysis CSV exports: `classify_output/analysis/` (`reference_intervals.csv`,\n`frog_summary_report.csv`, `nc_ratio_mixed_models.csv`).\n\n## Interactive Demo (Gradio)\n\nA browser-based demo that runs the full pipeline end-to-end: upload an image,\nsegment cells, classify them as good/bad, and view filtered results.\n\n![Gradio demo screenshot](assets/demo.png)\n\n### Install\n\n```bash\npip install -e \".[demo]\"\n```\n\n### Launch\n\n```bash\n# Via CLI entry point\ncell-size-demo\n\n# Or directly\npython demo/app.py\n\n# With a public share link\ncell-size-demo --share\n\n# Custom host/port\ncell-size-demo --server-name 0.0.0.0 --server-port 8080\n```\n\nThen open `http://localhost:7860` in your browser.\n\n### What it does\n\n1. Upload a single microscopy image.\n2. Choose membrane or nucleus segmentation, adjust parameters via sliders.\n3. Check \"Also segment nuclei\" (enabled by default when using membrane mode)\n   to get nucleus measurements alongside cell measurements.\n4. Optionally provide a trained classifier checkpoint (`.pt` file).\n5. Click **Run Pipeline** to:\n   - Segment all cells (and nuclei if enabled) and show a numbered overlay.\n   - Classify each cell as good/bad (if checkpoint is provided).\n   - Show the filtered overlay (good cells in colour, bad cells greyed out,\n     nucleus boundaries in cyan).\n   - Display per-cell predictions table and filtered areas with diameters\n     and nucleus measurements (area, diameters, N/C ratio).\n   - Provide downloadable CSV files for both tables.\n\n## Interactive Embedding Explorer (Streamlit)\n\nAn interactive Streamlit app for visualizing classifier crop embeddings in 2D\nor 3D and inspecting individual crop images directly from the scatter plot.\n\n### Install\n\n```bash\npip install -e \".[streamlit]\"\n```\n\n### Launch\n\n```bash\n# Via CLI entry point\ncell-size-streamlit\n\n# Custom host/port\ncell-size-streamlit --server-name 0.0.0.0 --server-port 8501\n\n# Or directly with streamlit\nstreamlit run demo/streamlit_embedding_app.py\n```\n\nThen open `http://localhost:8501` in your browser.\n\n### Data requirements\n\nThe app expects crop data generated by classifier training in this layout:\n\n```text\nclassifier_output/\n  crops/\n    mask_bg_false/\n      train/{good,bad}/*.jpg\n      val/{good,bad}/*.jpg\n      test/{good,bad}/*.jpg\n    mask_bg_true/\n      train/{good,bad}/*.jpg\n      val/{good,bad}/*.jpg\n      test/{good,bad}/*.jpg\n```\n\nModel selection is checkpoint-based (`best_model.pt`) and discovered from a\nsearch root you can set in the UI.\n\n### What it does\n\n1. Select crop root, mask mode, split, and checkpoint.\n2. Choose embedding method: `PCA`, `t-SNE`, or `UMAP`.\n3. Switch between 2D and 3D interactive Plotly scatter views.\n4. Apply uncertainty reject band thresholds (`t_bad`, `t_good`) to label each\n   sample as accepted-good, accepted-bad, or rejected.\n5. Color points by true label, predicted label, accepted/rejected status, or\n   confusion class (`TP`, `TN`, `FP`, `FN`, `REJECT`).\n6. Click points (or select top uncertain points) to preview exact crop images\n   and metadata in the side panel.\n7. Export selected points as CSV.\n\n### Notes\n\n- Features and embeddings are computed on the fly from the currently selected\n  model/split/settings.\n- For efficient-probing checkpoints (`use_efficient_probing=true`), embeddings\n  are built from probe output features (`x_cls`) before the final binary layer.\n- By default, recomputation is manual (`Run / Refresh`) to avoid repeated heavy\n  reruns while changing controls; you can enable auto-recompute in the sidebar.\n- If `umap-learn` is unavailable, the UI disables UMAP and keeps PCA/t-SNE\n  available.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturhancan97%2Fcell_size","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fturhancan97%2Fcell_size","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fturhancan97%2Fcell_size/lists"}