{"id":50193897,"url":"https://github.com/GoekeLab/bambu-pipe","last_synced_at":"2026-06-11T08:00:36.870Z","repository":{"id":239521990,"uuid":"743357700","full_name":"GoekeLab/bambu-pipe","owner":"GoekeLab","description":"Transcript discovery and quantification for long read single cell and spatial transcriptomics data using Bambu","archived":false,"fork":false,"pushed_at":"2026-06-11T02:45:27.000Z","size":123645,"stargazers_count":22,"open_issues_count":10,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-06-11T04:17:33.600Z","etag":null,"topics":["bambu","genomics","isoform-quantification","long-read-rna-seq","long-read-sequencing","nanopore","nextflow","pacbio","pipeline","rna-seq","single-cell","spatial-transcriptomics","transcript-discovery","transcript-quantification","transcriptomics"],"latest_commit_sha":null,"homepage":"","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoekeLab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-01-15T03:45:59.000Z","updated_at":"2026-05-29T02:05:19.000Z","dependencies_parsed_at":"2024-05-28T12:10:18.132Z","dependency_job_id":"20d51321-aa47-4015-8d2a-1addeeb301d8","html_url":"https://github.com/GoekeLab/bambu-pipe","commit_stats":null,"previous_names":["goekelab/bambu-singlecell-spatial","goekelab/bambu-pipe"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/GoekeLab/bambu-pipe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoekeLab%2Fbambu-pipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoekeLab%2Fbambu-pipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoekeLab%2Fbambu-pipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoekeLab%2Fbambu-pipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoekeLab","download_url":"https://codeload.github.com/GoekeLab/bambu-pipe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoekeLab%2Fbambu-pipe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34188272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bambu","genomics","isoform-quantification","long-read-rna-seq","long-read-sequencing","nanopore","nextflow","pacbio","pipeline","rna-seq","single-cell","spatial-transcriptomics","transcript-discovery","transcript-quantification","transcriptomics"],"created_at":"2026-05-25T16:00:41.942Z","updated_at":"2026-06-11T08:00:36.819Z","avatar_url":"https://github.com/GoekeLab.png","language":"Nextflow","funding_links":[],"categories":["Pipelines"],"sub_categories":["Simulation"],"readme":"# **Context-Aware Transcript Quantification from Long-Read Single-Cell and Spatial Transcriptomics Data**\n### **Content**\n- [Overview](#overview)\n- [Installation](#installation)\n- [General Usage](#general-usage)\n- [Parameters](#parameters)\n- [Output](#output)\n- [Spatial Analysis](#spatial-analysis)\n- [Advanced Usage](#advanced-usage)\n- [Additional Information](#additional-information)\n- [Release History](#release-history)\n- [Citation](#citation)\n- [Contributors](#contributors)\n\n\n### **Overview**\n\n![Pipeline Overview](figures/metro.svg)\nThis pipeline performs context-aware transcript discovery and quantification from long-read single-cell and spatial transcriptomics data. The workflow is divided into three stages:\n\n**Preprocessing**\n1. (Optional) Quality score filtering with [Chopper](https://github.com/wdecoster/chopper)\n2. Barcode/UMI identification and demultiplexing with [Flexiplex](https://davidsongroup.github.io/flexiplex/)\n3. Primer removal with [Cutadapt](https://cutadapt.readthedocs.io/en/stable/)\n\n**Alignment**\n\n4. Genome alignment with [Minimap2](https://lh3.github.io/minimap2/minimap2.html)\n\n**Transcript Discovery and Quantification**\n\n5. Read class construction and transcript discovery with [Bambu](https://github.com/GoekeLab/bambu/tree/BambuDev) (performed jointly across all samples)\n6. (Optional) Transcript quantification with Bambu using one of two modes:\n   - Cluster-level EM: Gene expression-based cell clustering with [Seurat](https://github.com/satijalab/seurat) across all samples, followed by per-sample cluster-level transcript quantification\n   - Single-cell EM: Per-cell transcript quantification \n\n### **Installation** \nInstall the following dependencies before running the pipeline:\n- [Nextflow](https://www.nextflow.io/docs/latest/install.html) ≥ 26.04.0\n- [Docker](https://docs.docker.com/engine/install/ubuntu/) (or [Singularity](https://docs.sylabs.io/guides/3.0/user-guide/installation.html) if you do not have user permissions for Docker). \n\n### **General Usage** \nTo run the pipeline, you must provide a samplesheet, reference genome, and reference annotation file as input. The pipeline performs transcript discovery and quantification on either a single sample or multiple samples based on the number of samples specified in the samplesheet. Refer to the [Parameters](#parameters) and Samplesheet (CSV) sections below for more details. \n\n**Running the pipeline**\n\nUse the command below to run the pipeline on the test data provided in `examples/`\n``` \nnextflow run main.nf \\\n  --input examples/samplesheet_test_sc_fastq.csv \\\n  --genome examples/GRCh38.primary_assembly.genome.chr21.fa.gz \\\n  --annotation examples/gencode.v49.primary_assembly.annotation.chr21.gtf.gz \\\n  -profile singularity,hpc\n``` \n\n**Samplesheet (CSV)**\n\nThe pipeline requires a `.csv` formatted samplesheet to define the input data. This file is mandatory, regardless of the number of samples being processed. Each row in the samplesheet represents a single sample and its corresponding file path and metadata. \n\n*Required Columns*\n\nThe samplesheet must include the following columns:\n- `sample`: sample name (no spaces or non-alphanumeric characters)\n- `path`: path to the input file (`fastq` or `bam`)\n- `chemistry`: 10x library chemistry (see Supported 10x Library Chemistries below)\n- `technology`: sequencing technology (`ONT` or `PacBio`)\n\n\u003e **Note:** The first row of the samplesheet must be a header containing the exact column names: `sample`, `path`, `chemistry`, and `technology`.\n\n*Supported Input Formats*\n\nThe `path` column can point to the following file types:\n- `fastq`: Raw reads (compressed `.gz` or uncompressed)\n- `bam`: Demultiplexed and aligned reads\n\nFor more details on starting the pipeline directly from BAM, please refer to the [Advanced Usage](#advanced-usage) section.\n\n*Example Samplesheet (Single Sample)*\n```csv\nsample,path,chemistry,technology\n10x5v2_ONT_example,examples/10x5v2_ONT_example.fastq.gz,10x5v2,ONT\n```\n\n*Example Samplesheet (Multiple Samples)*\n```csv\nsample,path,chemistry,technology\n10x5v2_ONT_example,examples/10x5v2_ONT_example.fastq.gz,10x5v2,ONT\n10x5v2_PacBio_example,examples/10x5v2_PacBio_example.fastq.gz,10x5v2,PacBio\n10x5v3_ONT_example,examples/10x5v3_ONT_example_demultiplexed.bam,10x5v3,ONT\n```\n\n\u003e **Note:** Example samplesheets are provided in `examples/`. If all samples share the same library chemistry and/or sequencing technology, you may omit the `chemistry` and `technology` columns and use the `--chemistry` and `--technology` flags instead.\n\n\n*Supported 10x Library Chemistries*\n\nFor the following chemistries, the pipeline handles the full workflow — FASTQ preprocessing, genome alignment, and transcript discovery and quantification. Please specify the sample chemistry in the samplesheet as shown:\n- `10x3v2` (Single Cell 3' v2)\n- `10x3v3` (Single Cell 3' v3 \u0026 Next GEM Single Cell 3' v3.1)\n- `10x3v4` (GEM-X Single Cell 3' v4)\n- `10x5v2` (Single Cell 5' v2)\n- `10x5v3` (GEM-X Single Cell 5' v3)\n- `visium-v1` (Visium Spatial Gene Expression Slide 6.5 mm; serial prefix V1)\n- `visium-v2` (Visium Spatial Gene Expression Slide 6.5 mm; serial prefix V2)\n- `visium-v3` (Visium Spatial Gene Expression Slide 6.5 mm; serial prefix V3)\n- `visium-v4` (Visium CytAssist Spatial Gene Expression Slide 6.5 mm; serial prefix V4)\n- `visium-v5` (Visium CytAssist Spatial Gene Expression Slide 11mm; serial prefix V5)\n\n\u003e **Note:** Visium samples must be run one sample at a time. Multi-sample runs are not supported for Visium chemistries.\n\n*Custom Chemistry*\n\nIf your dataset uses a chemistry not listed above, or if you prefer to handle FASTQ preprocessing and genome alignment manually, provide a pre-processed, demultiplexed BAM file as input. See the [Advanced Usage](#advanced-usage) section for details.\n\n**Pipeline Configuration**\n\n*Nextflow Profiles*\n\nTo configure the executor and container, pass profile types via the `-profile` argument.\n\n- Container profiles:\n  - `singularity`: use Singularity images (recommended on HPC systems)\n  - `docker`: use Docker images\n\n- Executor profiles:\n  - `hpc`: execute on an HPC system (default executor: `slurm`; edit `process.executor` in `nextflow.config` to switch to `pbs`, `sge`, etc.)\n  - `local`: execute on a local machine with reduced resource limits — not recommended for full-size datasets\n\n### **Parameters**\n\n**Mandatory**\n- `--input` [string]: Path to the samplesheet `.csv` file \n- `--genome` [string]: Path to the reference genome `.fa`, `.fasta`, or `.fa.gz` file \n- `--annotation` [string]: Path to the reference annotation `.gtf`, `.gff`, `.gtf.gz`, or `.gff.gz` file \n\n**Optional**\n- `--output_dir` [string, default: 'output']: Path to the output directory\n- `--chemistry` [string, default: null]: Specify if all samples in the samplesheet share the same library chemistry \n- `--technology` [string, default: null]: Specify if all samples in the samplesheet share the same sequencing technology\n- `--bam_only` [boolean, default: false]: If true, stops the pipeline after genome alignment and saves BAM files only (see Advanced Usage section)\n- `--qscore_filtering` [boolean, default: true]: Enable or disable quality score filtering of reads\n- `--ndr` [float, default: null]: NDR threshold for Bambu transcript discovery. If not set, Bambu will recommend a suitable value\n- `--deduplicate_umis` [boolean, default: true]: If true, Bambu will perform UMI deduplication \n- `--quantification_mode` [string, default: \"EM_clusters\"]: Quantification mode for transcript counts. Available options are:\n  - \"no_quant\": Transcript quantification is not performed\n  - \"EM\": Performs transcript quantification for each cell/spatial coordinate\n  - \"EM_clusters\": Performs gene expression-based cell clustering using [Seurat](https://satijalab.org/seurat/), followed by transcript quantification at the cluster level\n- `--resolution` [float, default: 0.8]: Seurat clustering resolution\n\n### **Output**\nAll outputs from the pipeline are written to the directory specified by the `--output_dir` parameter. The pipeline produces per-sample alignment files and the combined transcript discovery and quantification results. \n\n*Output Structure*\n```\noutput/\n├── bam/                                \n│   ├── \u003csample\u003e_demultiplexed.bam\n│   └── \u003csample\u003e_demultiplexed.bam.bai\n│    # (one pair per sample for multi-sample runs)  \n│\n├── extended_annotations.gtf\n├── se_unique_counts.rds\n├── se_gene_counts.rds\n│\n│   # single-cell EM:\n├── se_transcript_counts_singlecell.rds\n│\n│   # clustered EM:\n├── seurat_obj.rds\n├── se_transcript_counts_clusters.rds\n├── se_gene_counts_clusters.rds\n│\n├── pipeline_info/\n│   ├── execution_timeline.html\n│   ├── execution_report.html\n│   ├── execution_trace.txt\n│   └── pipeline_dag.svg\n│\n└── software_versions.yml\n```\n\n**Description of the Output Files**\n| File | Description \n|---|---\n| \u003csample_name\u003e_demultiplexed.bam | BAM file containing demultiplexed, trimmed and aligned reads\n| \u003csample_name\u003e_demultiplexed.bam.bai | BAM index for the corresponding BAM file\n| extended_annotations.gtf | A `.gtf` file containing the novel transcripts discovered by Bambu as well as the reference annotations provided by the user.\n| seurat_obj.rds | A [SeuratObject](https://satijalab.github.io/seurat-object/reference/Seurat-class.html) containing normalised counts, PCA embeddings, and cluster assignments. For multi-sample runs, also contains Harmony-integrated embeddings corrected for sequencing technology and capture chemistry. UMAP has not been computed. Only produced when `--quantification_mode` is set to `EM_clusters`.\n| se_unique_counts.rds | A [RangedSummarizedExperiment](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object containing transcript-level unique counts at single-cell resolution, produced prior to EM quantification. Columns follow the `sampleName_barcode` naming convention.\n| se_gene_counts.rds | A RangedSummarizedExperiment object containing gene-level counts at single-cell resolution. Columns follow the `sampleName_barcode` naming convention.\n| se_transcript_counts_singlecell.rds | A RangedSummarizedExperiment object containing per-cell transcript counts after EM quantification. Columns follow the `sampleName_barcode` naming convention. Only produced when `--quantification_mode` is set to `EM`.\n| se_transcript_counts_clusters.rds | A RangedSummarizedExperiment object containing cluster-level transcript counts after EM quantification. Columns follow the `clusterId` naming convention for single-sample runs, and `sampleName_clusterId` for multi-sample runs. Only produced when `--quantification_mode` is set to `EM_clusters`.\n| se_gene_counts_clusters.rds | A RangedSummarizedExperiment object containing cluster-level gene counts. Columns follow the `clusterId` naming convention for single-sample runs, and `sampleName_clusterId` for multi-sample runs. Only produced when `--quantification_mode` is set to `EM_clusters`.\n| software_versions.yml | A YAML file listing the versions of all software tools used during the pipeline run.\n| execution_timeline.html | Pipeline execution timeline. See [Nextflow docs](https://www.nextflow.io/docs/latest/tracing.html#timeline-report).\n| execution_report.html | Resource and runtime report for the pipeline run. See [Nextflow docs](https://www.nextflow.io/docs/latest/tracing.html).\n| execution_trace.txt | Per-process execution trace. See [Nextflow docs](https://www.nextflow.io/docs/latest/tracing.html#trace-report).\n| pipeline_dag.svg | Workflow DAG diagram. See [Nextflow docs](https://www.nextflow.io/docs/latest/tracing.html#dag-visualisation).\n\n**Count Matrices**\n\nThe [RangedSummarizedExperiment](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object contains four distinct types of count matrices, which can be accessed in R using the `assays()` function. Depending on your analysis requirements you can choose from the following:\n- `counts`: expression estimates\n- `CPM`: sequencing depth normalised estimates\n- `fullLengthCounts`: estimates of read counts mapped as full length reads for each transcript\n- `uniqueCounts`: counts of reads that are uniquely mapped to each transcript \n\n\u003e **Note:** In `se_unique_counts.rds`, unique counts are stored under the `counts` assay, not `uniqueCounts`.\n\n\n### **Spatial Analysis**\nThe pipeline applies the same processing steps to both single-cell and spatial samples. However, for spatial data, the generated `SummarizedExperiment` object is appended with spatial mapping information, which is stored in `colData`.  \n\n**Example - Spatial Mapping Information (`visium-v*`)**:\n\nFor `visium-v*` samples, `colData` contains the spatial barcode and the corresponding X and Y spatial coordinates. \n\n| barcode            | x_coordinate | y_coordinate | \n|:---|:---|:---|\n| AAACAACGAATAGTTC | 17 | 1 |\n| AAACAAGTATCTCCCA  | 103 | 51 |\n| AAACAATCTACTAGCA | 44 | 4 |\n\n\n### **Visium HD Spatial Analysis (Under Development)**\n\nThis feature is still under development and will be released in a future update.\n\n\n### **Fusion Transcript Analysis (Under Development)**\nThis feature is still under development and will be released in a future update.\n\n\n### **Advanced Usage**\n\n**Minimal End-to-End Smoke Test**\n\nExample data and pre-configured profiles are provided in `examples/` to run the pipeline end-to-end automatically without preparing your own data. The commands below must be run from the project's root directory. Combine the profile `test_base` with one of the profiles below and a container profile (`singularity` or `docker`).\n\n| Profile | Description |\n|---|---|\n| `test_sc_fastq` | Single-cell, single-sample ONT run from raw reads |\n| `test_sc_bam` | Single-cell, single-sample ONT run from demultiplexed BAM |\n| `test_sc_multi` | Single-cell, multiple-sample run with different chemistries and technologies |\n| `test_visium` | Spatial (Visium), single-sample ONT run from raw reads |\n| `test_custom` | Custom chemistry, single-sample ONT run from demultiplexed BAM |\n\n```bash\n# Single-cell: test from FASTQ input\nnextflow run . -profile test_base,test_sc_fastq,singularity\n\n# Single-cell: test from BAM input\nnextflow run . -profile test_base,test_sc_bam,singularity\n\n# Single-cell: test with multiple samples (ONT + PacBio)\nnextflow run . -profile test_base,test_sc_multi,singularity\n\n# Spatial: test Visium from FASTQ input\nnextflow run . -profile test_base,test_visium,singularity\n\n# Custom chemistry: test from demultiplexed BAM\nnextflow run . -profile test_base,test_custom,singularity\n```\n\nThe output files from the smoke tests are written to `.smoke_test/\u003cprofile\u003e/output/`.\n\n**Running Pipeline with a Custom Chemistry or Pre-aligned BAM**\n\nIf your dataset uses a chemistry not listed under Supported 10x Library Chemistries, or if you prefer to perform FASTQ preprocessing and genome alignment manually, start the pipeline directly from a pre-processed, demultiplexed BAM file. The BAM file must have the barcode and UMI information encoded either in the `CB`/`UB` column, or in the read name using the format `\u003cbarcode\u003e_\u003cumi\u003e#\u003cread_id\u003e`.\n\n*Samplesheet (Custom Chemistry)*\n\nFor samples with a custom chemistry, set the `chemistry` field in the samplesheet to any descriptive string.\n\n```csv\nsample,path,chemistry,technology\ncustom_example,examples/custom_example.bam,my_custom_chemistry,ONT\n```\n\n**Stopping the Pipeline After Alignment**\n\nThe `--bam_only` flag stops the pipeline after genome alignment, saving BAM files to `output/bam/`. This is useful when you want to inspect the aligned reads or run downstream steps separately.\n\n```bash\nnextflow run main.nf \\\n  --input examples/samplesheet_test_sc_fastq.csv \\\n  --genome examples/GRCh38.primary_assembly.genome.chr21.fa.gz \\\n  --annotation examples/gencode.v49.primary_assembly.annotation.chr21.gtf.gz \\\n  --bam_only true \\\n  -profile singularity,hpc\n```\n\n**Starting the Pipeline Directly from BAM**\n\nIf you have already generated BAM files (e.g. from a previous run with `--bam_only true`), you can skip the preprocessing and alignment steps by pointing the `path` column directly at the BAM files:\n\n```csv\nsample,path,chemistry,technology\n10x5v2_ONT_example,examples/10x5v2_ONT_example_demultiplexed.bam,10x5v2,ONT\n```\n\n```bash\nnextflow run main.nf \\\n  --input examples/samplesheet_test_sc_bam.csv \\\n  --genome examples/GRCh38.primary_assembly.genome.chr21.fa.gz \\\n  --annotation examples/gencode.v49.primary_assembly.annotation.chr21.gtf.gz \\\n  -profile singularity,hpc\n```\n\n**Visualising Clustering Results**\n\nThe `seurat_obj.rds` output contains PCA embeddings and cluster assignments but does not include a UMAP. The examples below show how to compute UMAP and visualise clusters in R.\n\n\u003e **Note:** These examples use output generated from the smoke tests (`test_sc_fastq` for single sample, `test_sc_multi` for multiple samples), which are not representative of real datasets.\n\n*Single sample*\n```r\nlibrary(Seurat)\n\nobj \u003c- readRDS(\"examples/seurat_obj_single_sample.rds\")\ndim \u003c- min(15, ncol(obj[[\"pca\"]]))\nobj \u003c- RunUMAP(obj, dims = 1:dim, reduction = \"pca\")\nDimPlot(obj, reduction = \"umap\", label = TRUE)\n```\n\n*Multiple samples*\n\nFor multi-sample runs, UMAP is computed from the Harmony-corrected embeddings, and cells can be coloured by cluster, sample, or other metadata.\n```r\nlibrary(Seurat)\n\nobj \u003c- readRDS(\"examples/seurat_obj_multi_sample.rds\")\ndim \u003c- min(30, ncol(obj[[\"harmony\"]]))\nobj \u003c- RunUMAP(obj, dims = 1:dim, reduction = \"harmony\")\n\n# Colour by cluster\nDimPlot(obj, reduction = \"umap\", group.by = \"harmony_clusters\", label = TRUE)\n\n# Colour by sample\nDimPlot(obj, reduction = \"umap\", group.by = \"sample\")\n```\n\n**Manual Clustering (Under Development)**\n\nCurrently, cell clustering is performed automatically as part of the pipeline. In a future release, a tutorial will be provided that allows users to stop the pipeline after transcript discovery, perform their own custom clustering, and then resume the pipeline to run Bambu transcript quantification using their cluster assignments.\n\n### **Additional Information**\nUMI correction is done at the barcode level. The longest read for each unique barcode-UMI combination is kept for analysis.\n\n### **Release History** \n\n- v0.1-beta: 2025-May-19\n- v0.9-beta: 2026-May-11\n\n\n### **Citation**\nIf you use this pipeline, please cite our paper:\n\nSim, A., Ling, M. H., Chen, Y., Lu, H., See, Y. X., Perrin, A., Leng Agnes, O. B., Cao, E. Y., Chia, B., Liu, J., Wüstefeld, T., Shin, J. W., \u0026 Göke, J. (2025). Isoform-level discovery, quantification and fusion analysis from single-cell and spatial long-read RNA-seq data with Bambu-Clump. https://doi.org/10.1101/2024.12.30.630828\n\nThe following are citations for the other tools used in this pipeline:\n\n#### Chopper\nDe Coster Wouter, \u0026 Rademakers, R. (2023). NanoPack2: Population scale evaluation of long-read sequencing data. Bioinformatics, 39(5). https://doi.org/10.1093/bioinformatics/btad311\n\n#### Cutadapt\nMartin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), 10. https://doi.org/10.14806/ej.17.1.200\n\n#### Flexiplex\nCheng, O., Ling, M. H., Wang, C., Wu, S., Ritchie, M. E., Göke, J., Amin, N., \u0026 Davidson, N. M. (2024). Flexiplex: a versatile demultiplexer and search tool for omics data. Bioinformatics, 40(3). https://doi.org/10.1093/bioinformatics/btae102\n\n#### Minimap2\nLi, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37(23), 4572–4574. https://doi.org/10.1093/bioinformatics/btab705\n\n#### Samtools\nDanecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., \u0026 Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2). https://doi.org/10.1093/gigascience/giab008\n\n#### Seurat\nHao, Y., Stuart, T. A., Kowalski, M. H., Choudhary, S., Hoffman, P., Hartman, A., Srivastava, A., Molla, G., Shaista Madad, Fernandez-Granda, C., \u0026 Rahul Satija. (2023). Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature Biotechnology. https://doi.org/10.1038/s41587-023-01767-y\n\n### **Contributors**\nThis package is developed and maintained by [Andre Sim](https://github.com/andredsim), [Chin Hao Lee](https://github.com/ch99l), [Min Hao Ling](https://github.com/lingminhao), and [Jonathan Goeke](https://github.com/jonathangoeke) at the Genome Institute of Singapore. If you wish to contribute, please leave an issue. Thank you.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoekeLab%2Fbambu-pipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGoekeLab%2Fbambu-pipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoekeLab%2Fbambu-pipe/lists"}