https://github.com/constantinpape/cluster_tools

Distributed segmentation for bio-image-analysis
https://github.com/constantinpape/cluster_tools

3d-segmentation bio-image-analysis cluster-computing connectomics lifted-multicut microscopy-images multicut mutex-watershed segmentation watershed

Last synced: 5 months ago
JSON representation

Distributed segmentation for bio-image-analysis

Host: GitHub
URL: https://github.com/constantinpape/cluster_tools
Owner: constantinpape
License: mit
Created: 2018-01-20T17:09:39.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2025-05-13T06:21:51.000Z (9 months ago)
Last Synced: 2025-05-13T06:27:46.178Z (9 months ago)
Topics: 3d-segmentation, bio-image-analysis, cluster-computing, connectomics, lifted-multicut, microscopy-images, multicut, mutex-watershed, segmentation, watershed
Language: Python
Homepage:
Size: 1.71 MB
Stars: 38
Watchers: 8
Forks: 14
Open Issues: 16
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          [![Anaconda-Server Badge](https://anaconda.org/conda-forge/cluster_tools/badges/version.svg)](https://anaconda.org/conda-forge/cluster_tools)

# Cluster Tools

Workflows for distributed Bio Image Analysis and Segmentation.

Supports Slurm, LSF and local execution, easy to extend to more scheduling systems.

## Workflows

- [Hierarchical Multicut](http:/openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w1/Pape_Solving_Large_Multicut_ICCV_2017_paper.pdf) / [Hierarchical lifted Multicut](https://arxiv.org/abs/1905.10535)

  - Distance Transform Watersheds

  - Region Adjacency Graph

  - Edge Feature Extraction from Boundary-or-Affinity Maps

  - Agglomeration via (lifted) Multicut

- [Sparse lifted Multicut from biological priors](https://arxiv.org/abs/1905.10535)

- [Mutex Watershed](https://link.springer.com/chapter/10.1007/978-3-030-01225-0_34)

- Connected Components

- Downscaling and Pyramids

  - [Paintera Format](https://github.com/saalfeldlab/paintera)

  - [BigDataViewer Format](https://imagej.net/BigDataViewer)

  - [Bigcat Format](https://github.com/saalfeldlab/bigcat)

- [Ilastik Prediction](https://www.ilastik.org/)

- Skeletonization

- Distributed Neural Network Prediction (originally implemented [here](https://github.com/constantinpape/simpleference))

- Validation with Rand Index and Variation of Information

## Installation

You can install the package via conda:

```

conda install -c conda-forge cluster_tools

```

To set-up a develoment environment with all necessary dependencies, you can use the `environment.yml` file:

```

conda env create -f environment.yml

```

and then install the package in development mode via

```

pip install -e . --no-deps

```

## Citation

If you use this software in a publication, please cite

```

Pape, Constantin, et al. "Solving large multicut problems for connectomics via domain decomposition." Proceedings of the IEEE International Conference on Computer Vision. 2017.

```

For the lifted multicut workflows, please cite

```

Pape, Constantin, et al. "Leveraging Domain Knowledge to improve EM image segmentation with Lifted Multicuts." arXiv preprint. 2019.

```

You can find code for the experiments in `publications/lifted_domain_knowledge`.

If you are using another algorithom not part of these two publications, please also cite the appropriate publication ([see the links here](https://github.com/constantinpape/cluster_tools#workflows)).

## Getting Started

This repository uses [luigi](https://github.com/spotify/luigi) for workflow management.

We support different cluster schedulers, so far 

- [`slurm`](https://slurm.schedmd.com/documentation.html)

- [`lsf`](https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_kc_ss.html)

- `local` (local execution based on `ProcessPool`)

The scheduler can be selected by the keyword `target`.

Inter-process communication is achieved through files which are stored in a temporary folder and

most workflows use [n5](https://github.com/saalfeldlab/n5) storage. You can use [z5](https://github.com/constantinpape/z5) to convert files to it with python.

Simplified, running a workflow from this repository looks like this:

```py

import json

import luigi

from cluster_tools import SimpleWorkflow  # this is just a mock class, not actually part of this repository

# folder for temporary scripts and files

tmp_folder = 'tmp_wf'

# directory for configurations for workflow sub-tasks stored as json

config_dir = 'configs'

# get the default configurations for all sub-tasks

default_configs = SimpleWorkflow.get_config()

# global configuration for shebang to proper python interpreter with all dependencies,

# group name and block-shape

global_config = default_configs['global']

shebang = '#! /path/to/bin/python'

global_config.update({'shebang': shebang, 'groupname': 'mygroup'})

with open('configs/global.config', 'w') as f:

  json.dump(global_config, f)

  

# run the example workflow with `max_jobs` number of jobs

max_jobs = 100

task = SimpleWorkflow(tmp_folder=tmp_folder, config_dir=config_dir,

                      target='slurm', max_jobs=max_jobs,

                      input_path='/path/to/input.n5', input_key='data',

                      output_path='/path/to/output.n5', output_key='data')

luigi.build([task])

 ```

For a list of the available segmentation worklfows, have a look at [this](https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/workflows.py).

Unfortunately, there is no proper documentation yet. For more details, have a look at the

[examples](https://github.com/constantinpape/cluster_tools/blob/master/example), in particular

[this example](https://github.com/constantinpape/cluster_tools/blob/master/example/multicut.py).

You can donwload the example data (also used for the tests) [here](https://drive.google.com/file/d/1E_Wpw9u8E4foYKk7wvx5RPSWvg_NCN7U/view?usp=sharing).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/constantinpape/cluster_tools

Awesome Lists containing this project

README