Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saalfeldlab/stitching-spark
Reconstruct big images from overlapping tiled images on a Spark cluster.
https://github.com/saalfeldlab/stitching-spark
Last synced: 2 months ago
JSON representation
Reconstruct big images from overlapping tiled images on a Spark cluster.
- Host: GitHub
- URL: https://github.com/saalfeldlab/stitching-spark
- Owner: saalfeldlab
- License: gpl-2.0
- Created: 2017-02-03T21:07:53.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2023-10-17T14:27:21.000Z (about 1 year ago)
- Last Synced: 2023-10-17T15:58:29.118Z (about 1 year ago)
- Language: Java
- Size: 2.4 MB
- Stars: 27
- Watchers: 12
- Forks: 8
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-janelia-software - Stitching Spark - Reconstruct large images from overlapping tiles on a Spark cluster (ImgLib2, BDV and N5 Ecosystem)
- awesome-janelia-software - Stitching Spark - Reconstruct large images from overlapping tiles on a Spark cluster (ImgLib2, BDV and N5 Ecosystem)
README
# stitching-spark
Reconstructing large microscopy images from overlapping image tiles on a high-performance Spark cluster.The code is based on the Stitching plugin for Fiji https://github.com/fiji/Stitching
## Usage
### 1. Building the package
Clone the repository with submodules:
```bash
git clone --recursive https://github.com/saalfeldlab/stitching-spark.git
```If you have already cloned the repository, run this after cloning to fetch the submodules:
```bash
git submodule update --init --recursive
```The application can be executed on Janelia cluster or locally. Build the package for the desired execution environment:
Compile for running on Janelia cluster
```bash
python build.py
```Compile for running on local machine
```bash
python build-spark-local.py
```
The scripts for starting the application are located under `startup-scripts/spark-janelia` and `startup-scripts/spark-local`, and their usage is explained in the next steps.
If running locally, you can access the Spark job tracker at http://localhost:4040/ to monitor the progress of the tasks.
#### If running on public platforms such as AWS or Google Cloud:
* Compile with `python build.py`. This will include embed required dependencies into the final package, except for the Spark which is provided by the respective target platform at runtime.
* For running the pipeline, refer to the wiki pages [Running on AWS](https://github.com/saalfeldlab/stitching-spark/wiki/Running-on-Amazon-Web-Services) and [Running on Google Cloud](https://github.com/saalfeldlab/stitching-spark/wiki/Running-on-Google-Cloud)
* The currently used Spark version is **2.3.1** — make sure you're requesting the same version when submitting a job### 2. Preparing input tile configuration files
The application uses JSON to store metadata about tile configurations. These metadata include:
* list of all tiles in the acquisition grouped by channel
* position and size of each tile in pixels
* pixel resolution (physical size of the pixel/voxel) in microns
* image data typeExample of the JSON metadata file (one JSON file per image channel):
*ch0.json:*
```json
[
{
"index" : 0,
"file" : "FCF_CSMH__54383_20121206_35_C3_zb15_zt01_63X_0-0-0_R1_L086_20130108192758780.lsm.tif",
"position" : [0.0, 0.0, 0.0],
"size" : [991, 992, 880],
"pixelResolution" : [0.097,0.097,0.18],
"type" : "GRAY16"
},
{
"index" : 1,
"file" : "FCF_CSMH__54383_20121206_35_C3_zb15_zt01_63X_0-0-0_R1_L087_20130108192825183.lsm.tif",
"position" : [700, -694.0887500300357, -77.41783189603937],
"size" : [991, 992, 880],
"pixelResolution" : [0.097,0.097,0.18],
"type" : "GRAY16"
}
]
```The application provides automated converters for commonly used formats, but in general case an input metadata file needs to be converted or generated (depending on how you store metadata for your acquisitions).
#### Zeiss Z1
The parser requires an .mvl metadata file. Image tiles can be stored in separate .czi files (one 4D image file per tile that includes all channels), or in a single .czi file.Run on Janelia cluster
```
spark-janelia/parse-zeiss-z1-metadata.py \
-i \
-b \
-f \
-r
```Run on local machine
```
spark-local/parse-zeiss-z1-metadata.py \
-i \
-b \
-f \
-r
```This will create a single `tiles.json` file in the same directory as the existing .mvl file. Separate JSON tile configuration files for each channel will be created in the next step when the images are split into channel during conversion.
#### ImageList.csv objective-scan acquisitions
*ImageList.csv* metadata file lists image tile filenames in all channels and contains stage and objective coordinates of each tile. Image tiles are expected to be stored as .tif files (separate files for each channel).Objective coordinates from the metadata file are converted into pixel coordinates based on the provided physical size of a voxel (in microns) and axis mapping which specifies flips and swaps between the two coordinate systems. For example, `-y,x,z` would mean that the objective X and Y should be swapped, and objective Y should be flipped.
Run on Janelia cluster
```
spark-janelia/parse-imagelist-metadata.py \
-i \
-b \
-r \
-a \
[--skipMissingTiles to exclude non-existing tile images from configuration instead of raising an error]
```Run on local machine
```
spark-local/parse-imagelist-metadata.py \
-i \
-b \
-r \
-a \
[--skipMissingTiles to exclude non-existing tile images from configuration instead of raising an error]
```This will create a number of JSON configuration files (one per channel), named as `488nm.json`, `560nm.json`, etc. if the corresponding laser frequency can be parsed from the image filenames, or simply `c0.json`, `c1.json`, etc. otherwise.
### 3. Conversion of image tiles into N5
The application requires to convert all tiles in the acquisition into [N5](https://github.com/saalfeldlab/n5) in order to make the processing in the next steps faster. This also allows to work with different image file formats more easily.
#### .czi (Zeiss Z1) -> N5:
Run on Janelia cluster
```
spark-janelia/convert-czi-tiles-n5.py \
\
-i \
[--blockSize to override the default block size 128,128,64]
```Run on local machine
```
spark-local/convert-czi-tiles-n5.py \
-i \
[--blockSize to override the default block size 128,128,64]
```This will convert the images into N5 and will create new tile configuration files that correspond to the converted tiles. The new configuration files will be named as `c0-n5.json`, `c1-n5.json`, etc. and should be used as inputs in the next steps.
#### .tif -> N5:
Run on Janelia cluster
```
spark-janelia/convert-tiff-tiles-n5.py \
\
-i 488nm.json -i 560nm.json ... \
[--blockSize to override the default block size 128,128,64]
```Run on local machine
```
spark-local/convert-tiff-tiles-n5.py \
-i 488nm.json -i 560nm.json ... \
[--blockSize to override the default block size 128,128,64]
```This will convert the images into N5 and will create new tile configuration files that correspond to the converted tiles. The new configuration files will be named as `488nm-n5.json`, `488nm-n5.json`, etc. and should be used as inputs in the next steps.
### 4. Flatfield estimation
Run on Janelia cluster
```
spark-janelia/flatfield.py -i 488nm-n5.json -i 560nm-n5.json ...
```Run on local machine
```
spark-local/flatfield.py -i 488nm-n5.json -i 560nm-n5.json ...
```This will create a folder for each channel named such as `488nm-flatfield/` near the provided input files. After the application is finished, it will store two files `S.tif` and `T.tif` in each of the created folders (the brightfield and the offset respectively).
The next steps will detect the flatfield folder and will automatically use the estimated flatfields to perform the flatfield correction on the fly.The full list of available parameters for the flatfield script is available [here](https://github.com/saalfeldlab/stitching-spark/wiki/Flatfield-parameters).
### 5. Stitching
Run on Janelia cluster
```
spark-janelia/stitch.py -i 488nm-n5.json -i 560nm-n5.json ...
```Run on local machine
```
spark-local/stitch.py -i 488nm-n5.json -i 560nm-n5.json ...
```This will run the stitching performing a number of iterations until it cannot improve the solution anymore. The images channels will be averaged on-the-fly before computing pairwise shifts in order to get higher correlations because of denser signal.
As a result, it will create files `488nm-n5-final.json`, `560nm-n5-final.json`, etc. near the input tile configuration files.
It will also store a file named `optimizer.txt` that will contain the statistics on average and max errors, number of retained tiles and edges in the final graph, and cross correlation and variance threshold values that were used to obtain the final solution.The current stitching method is iterative translation-based (improving the solution by building the prediction model).
The pipeline incorporating a higher-order model is currently under development in the `split-tiles` branch.The full list of available parameters for the stitch script is available [here](https://github.com/saalfeldlab/stitching-spark/wiki/Stitching-parameters).
### 6. Exporting
Run on Janelia cluster
```
spark-janelia/export.py -i 488nm-n5-final.json -i 560nm-n5-final.json ...
```Run on local machine
```
spark-local/export.py -i 488nm-n5-final.json -i 560nm-n5-final.json ...
```This will generate an [N5](https://github.com/saalfeldlab/n5) export under `export.n5/` folder. The export is fully compatible with [N5 Viewer](https://github.com/saalfeldlab/n5-viewer) and the [N5 Fiji plugin](https://github.com/saalfeldlab/n5-ij) for browsing.
The most common optional parameters are:
* `--blending`: smoothes transitions between the tiles instead of making a hard cut between them
* `--fill`: fills the extra space with the background intensity value instead of 0The full list of available parameters for the export script is available [here](https://github.com/saalfeldlab/stitching-spark/wiki/Export-parameters).
### 7. Converting N5 export to slice TIFF
Run on Janelia cluster
```
spark-janelia/n5-slice-tiff.py \
\
-i \
[-s ] \
[--compress to enable LZW compression. Can be much slower than uncompressed] \
[--sameDir to prepend all filenames with channel and store all slices in the same directory] \
[--leadingZeroes to pad slice indices in filenames with leading zeroes]
```Run on local machine
```
spark-local/n5-slice-tiff.py \
-i \
[-s ] \
[--compress to enable LZW compression. Can be much slower than uncompressed] \
[--sameDir to prepend all filenames with channel and store all slices in the same directory] \
[--leadingZeroes to pad slice indices in filenames with leading zeroes]
```This will create a directory named such as `slice-tiff-s0` based on the requested scale level index, where the generated stitched N5 export will be converted into a series of slice TIFF images. The resulting TIFF images will be XY slices.