https://github.com/genentech/spex_spatial_transcriptomics

Last synced: 11 months ago
JSON representation
Host: GitHub
URL: https://github.com/genentech/spex_spatial_transcriptomics
Owner: Genentech
License: other
Created: 2024-03-08T20:17:04.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-03-17T03:16:07.000Z (over 1 year ago)
Last Synced: 2025-03-17T04:25:58.179Z (over 1 year ago)
Language: Jupyter Notebook
Size: 9.48 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

          # Documentation: File and Folder Structure

Below is a description of the general file and folder structure in the project, along with how conda environments can be used, and how each element is typically utilized. **All executable scripts for stages should be named `app.py`** to maintain consistency.

## Project Structure

- **Project root**

    - `manifest.json` — The main project manifest (if used). It may contain common settings or link individual pipeline stages.

    - Other files not related to a specific stage.

- **Stage folders** (for example, `load_anndata`, `clustering`, `dimensionality_reduction`)

    1. **`manifest.json`** (inside the stage folder)

        - Describes the key parameters required by this stage.

        - Contains the stage name, its description, execution order (`stage`), types and requirements of input parameters (`params`), and information about returned data (`return`).

        - May specify dependencies (e.g., `depends_and_script` or `depends_or_script`) and the environments used (`conda`, `libs`, `conda_pip`).

            - **Conda usage**: If `conda` is specified, the system creates or uses a conda environment with the requested Python version and installs the listed libraries (`libs` via conda, `conda_pip` via pip in that conda environment). This isolation helps avoid library conflicts.

    2. **`app.py`** (the executable script for this stage)

        - This file name should **always** be `app.py` to maintain a consistent structure.

        - It contains the core business logic: reading data, transforming it, analyzing it, and producing output.

        - Typically, it defines a function (often `run(**kwargs)`) that:

            1. **Imports the necessary dependencies** (e.g., `scanpy`, `scvi`, `numpy`).

            2. **Reads parameters** from `kwargs`, which are provided from `manifest.json` (e.g., file path, analysis method, metrics).

            3. **Calls a helper function** or a series of functions that perform the main logic (e.g., data loading, clustering, dimensionality reduction, etc.).

            4. **Returns the result** in the format described in the manifest (usually a dictionary where the keys match the fields in `return`).

### Example `app.py` Structure

1. **Import libraries**

   ```python

   import scanpy as sc

   import numpy as np

   import pandas as pd

   # ...

   ```

2. **Define helper functions** (e.g., `reduce_dimensionality`, `cluster`, `load_data`)

   ```python

   def reduce_dimensionality(adata, method='pca', ...):

       # Dimensionality reduction logic

       return adata

   ```

3. **`run(**kwargs)` function**

   ```python

   def run(**kwargs):

       # Read arguments

       adata = kwargs.get('adata')

       method = kwargs.get('method', 'pca')

       # ...

       # Call a helper function

       out = reduce_dimensionality(adata, method=method)

       # Return the result

       return { 'adata': out }

   ```

## Main Purpose

1. **`manifest.json`** in each folder:

    - Defines which parameters the stage requires and what data it returns.

    - Specifies the execution order in the pipeline.

    - Allows you to determine which libraries (conda or pip) are needed for the stage.

    - May include version constraints for packages.

    - **Conda Environments**: If `conda` is specified, the system will create/use the indicated environment (for example, `python=3.11`) and install the specified libraries.

2. **`app.py`**:

    - Performs the main work — processes data using parameters obtained from `manifest.json`.

    - Produces output that subsequent stages can access.

    - Has a structure consisting of several steps:

        - Imports

        - Helper functions

        - `run(**kwargs)` function — the entry point.

## Example Project Structure

```text

project_root/

├── manifest.json               # Main (root) manifest, if present

├── load_anndata/

│   ├── manifest.json           # Manifest for the loading stage

│   └── app.py                  # Script performing data loading

├── clustering/

│   ├── manifest.json           # Manifest for the clustering stage

│   └── app.py                  # Script for clustering data

├── dimensionality_reduction/

│   ├── manifest.json           # Manifest for the dimensionality reduction stage

│   └── app.py                  # Script performing the analysis

└── other_folders_or_files      # Other files/folders in the project

```

## Usage Recommendations

- Store a maximum of one stage in **each folder** (with its own `manifest.json` and `app.py`).

- The **main manifest** can set the overall pipeline logic or serve as the entry point for the entire system.

- Each `app.py` should be as focused as possible, making the stage easier to test, modify, and reuse.

- Parameters in `manifest.json` should be described in as much detail as possible so that users understand what is required as input and what will be returned as output.

- **Conda Environments**: When `conda` is specified, each stage can be isolated in its own environment to avoid library version conflicts across different scripts.

# Documentation: File and Folder Structure

Below is a description of the general file and folder structure in the project, along with how conda environments can be used, and how each element is typically utilized. **All executable scripts for stages should be named `app.py`** to maintain consistency.

## Project Structure

- **Project root**

    - `manifest.json` — The main project manifest (if used). It may contain common settings or link individual pipeline stages.

    - Other files not related to a specific stage.

- **Stage folders** (for example, `load_anndata`, `clustering`, `dimensionality_reduction`)

    1. **`manifest.json`** (inside the stage folder)

        - Describes the key parameters required by this stage.

        - Contains the stage name, its description, execution order (`stage`), types and requirements of input parameters (`params`), and information about returned data (`return`).

        - May specify dependencies (e.g., `depends_and_script` or `depends_or_script`) and the environments used (`conda`, `libs`, `conda_pip`).

            - **Conda usage**: If `conda` is specified, the system creates or uses a conda environment with the requested Python version and installs the listed libraries (`libs` via conda, `conda_pip` via pip in that conda environment). This isolation helps avoid library conflicts.

    2. **`app.py`** (the executable script for this stage)

        - This file name should **always** be `app.py` to maintain a consistent structure.

        - It contains the core business logic: reading data, transforming it, analyzing it, and producing output.

        - Typically, it defines a function (often `run(**kwargs)`) that:

            1. **Imports the necessary dependencies** (e.g., `scanpy`, `scvi`, `numpy`).

            2. **Reads parameters** from `kwargs`, which are provided from `manifest.json` (e.g., file path, analysis method, metrics).

            3. **Calls a helper function** or a series of functions that perform the main logic (e.g., data loading, clustering, dimensionality reduction, etc.).

            4. **Returns the result** in the format described in the manifest (usually a dictionary where the keys match the fields in `return`).

### Example `app.py` Structure

1. **Import libraries**

   ```python

   import scanpy as sc

   import numpy as np

   import pandas as pd

   # ...

   ```

2. **Define helper functions** (e.g., `reduce_dimensionality`, `cluster`, `load_data`)

   ```python

   def reduce_dimensionality(adata, method='pca', ...):

       # Dimensionality reduction logic

       return adata

   ```

3. **`run(**kwargs)` function**

   ```python

   def run(**kwargs):

       # Read arguments

       adata = kwargs.get('adata')

       method = kwargs.get('method', 'pca')

       # ...

       # Call a helper function

       out = reduce_dimensionality(adata, method=method)

       # Return the result

       return { 'adata': out }

   ```

## Main Purpose

1. **`manifest.json`** in each folder:

    - Defines which parameters the stage requires and what data it returns.

    - Specifies the execution order in the pipeline.

    - Allows you to determine which libraries (conda or pip) are needed for the stage.

    - May include version constraints for packages.

    - **Conda Environments**: If `conda` is specified, the system will create/use the indicated environment (for example, `python=3.11`) and install the specified libraries.

2. **`app.py`**:

    - Performs the main work — processes data using parameters obtained from `manifest.json`.

    - Produces output that subsequent stages can access.

    - Has a structure consisting of several steps:

        - Imports

        - Helper functions

        - `run(**kwargs)` function — the entry point.

## Example Project Structure

```text

project_root/

├── manifest.json               # Main (root) manifest, if present

├── load_anndata/

│   ├── manifest.json           # Manifest for the loading stage

│   └── app.py                  # Script performing data loading

├── clustering/

│   ├── manifest.json           # Manifest for the clustering stage

│   └── app.py                  # Script for clustering data

├── dimensionality_reduction/

│   ├── manifest.json           # Manifest for the dimensionality reduction stage

│   └── app.py                  # Script performing the analysis

└── other_folders_or_files      # Other files/folders in the project

```

## Usage Recommendations

- Store a maximum of one stage in **each folder** (with its own `manifest.json` and `app.py`).

- The **main manifest** can set the overall pipeline logic or serve as the entry point for the entire system.

- Each `app.py` should be as focused as possible, making the stage easier to test, modify, and reuse.

- Parameters in `manifest.json` should be described in as much detail as possible so that users understand what is required as input and what will be returned as output.

- **Conda Environments**: When `conda` is specified, each stage can be isolated in its own environment to avoid library version conflicts across different scripts.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/genentech/spex_spatial_transcriptomics

Awesome Lists containing this project

README