{"id":26675351,"url":"https://github.com/krasnitzlab/sgains","last_synced_at":"2025-03-26T03:17:46.996Z","repository":{"id":132509272,"uuid":"98500084","full_name":"KrasnitzLab/sgains","owner":"KrasnitzLab","description":"Sparse Genomic Analysis of Individual Nuclei by Sequencing","archived":false,"fork":false,"pushed_at":"2021-09-03T13:23:12.000Z","size":48009,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-05-09T07:53:05.939Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KrasnitzLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-07-27T06:11:33.000Z","updated_at":"2021-09-03T13:23:15.000Z","dependencies_parsed_at":"2023-06-08T03:45:19.583Z","dependency_job_id":null,"html_url":"https://github.com/KrasnitzLab/sgains","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KrasnitzLab%2Fsgains","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KrasnitzLab%2Fsgains/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KrasnitzLab%2Fsgains/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KrasnitzLab%2Fsgains/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KrasnitzLab","download_url":"https://codeload.github.com/KrasnitzLab/sgains/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245579719,"owners_count":20638679,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-26T03:17:46.247Z","updated_at":"2025-03-26T03:17:46.985Z","avatar_url":"https://github.com/KrasnitzLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sparse Genomic Analysis of Individual Nuclei by Sequencing (s-GAINS)\n\n[![DOI](https://zenodo.org/badge/98500084.svg)](https://zenodo.org/badge/latestdoi/98500084)\n\nThis document describes how to setup `s-GAINS` pipeline tool and its basic command.\n\nShort tutorial on how to use this tool could be found in\n[Example usage of `sGAINS` pipeline](docs/tutorial-navin2011.md)\n\n## Anaconda environment setup\n\n### Install Anaconda\n\n* Go to anaconda web site\n    [https://www.anaconda.com/distribution/](https://www.anaconda.com/distribution/)\n    and download the latest anaconda installer for your operating system.\n\n* *s-GAINS* supports *Python 3.7* or greater so you need to choose an\n    appropriate installer. Note also that since *s-GAINS* uses *bioconda*\n    channel the supported operating systems are only those supported for\n    *bioconda* (at the time of this writing these are Linux and Mac OS X).\n\n* Install anaconda into suitable place on your local machine following\n    instructions from\n    [https://docs.anaconda.com/anaconda/install/](https://docs.anaconda.com/anaconda/install/)\n\n### Create `sgains` Anaconda environment\n\n* After installing and activating *Anaconda* you need to create an environment to\n    use with `sgains` pipeline. To this end you need to use:\n\n    ```bash\n    conda create -n sgains3\n    conda activate sgains3\n    ```\n\n### Install `sgains` anaconda package\n\n* *sGAINS* tools are distributed as a conda package through `krasnitzlab`\n    Annaconda channel. So to install *sGAINS* tools use:\n\n    ```bash\n    conda install -c defaults -c conda-forge -c krasnitzlab -c bioconda sgains\n    ```\n\n    This command should install all the packages and tools need for\n    proper functioning of `sgains-tools`.\n\n* After this command finishes, you should be able to use\n    `sgains-tools` command:\n\n    ```bash\n    sgains-tools --help\n    ```\n\n### Install SCGV viewer package\n\nTo visualize results of `sgains-tools` you may need `SCGV` viewer.\n\n`SCGV` package is available from `KrasnitzLab` Anaconda channel.\nYou can to install it using using following command:\n\n```bash\nconda install -c krasnitzlab scgv\n```\n\n## Usage of sgains docker container\n\nInstead of seting up `sgains` environment you can use `krasnitzlab/sgains`\ndocker container image to run the pipeline. To this end you need to have *Docker*\ntools installed and configured on your computer (please look for instructions\nin the official [*Docker* documentation](https://docs.docker.com).\n\n### Download *s-GAINS* container image\n\nOnce you have Docker installed and configured you can pull `krasnitzlab/sgains`\ndocker container image by using docker pull command:\n\n```bash\ndocker pull krasnitzlab/sgains\n```\n\n### Run *s-GAINS* container in interactive mode\n\nYou can run the `sgains` container interactively by using:\n\n```bash\ndocker run -i -v /data/pathname:/data -t krasnitzlab/sgains /bin/bash\n```\n\nwhere `/data/pathname` is a full pathname to a folder on your local machine,\nwhere data you want to process is located.\n\n### Run *s-GAINS* commands\n\nYou can use this docker container to run all subcommans of\n`sgains-tools` using\nfollowing sintax:\n\n```bash\ndocker run -i -v /data/pathname:/data -t krasnitzlab/sgains sgains-tools \u003carg1\u003e \u003carg2\u003e ...\n```\n\nIn this way you can run any `sgains-tools` subcommand with appropriate arguments\nyou need.\n\n## Usage of `sgains-tools` tool\n\nTo interact with *s-GAINS* pipeline you invoke `sgains-tools` command with different\nparameters and subcommands. You can list available options of `sgains-tools` using\n`-h` option:\n\n```bash\nsgains-tools -h\nusage: sgains-tools [-h] [-v] [-c path] [-n] [--force] [--parallel PARALLEL]\n                    [--sge]\n                    {genome,mappable-regions,bins,prepare,mapping,extract-10x,varbin,varbin-10x,scclust,process}\n                    ...\n\nsgains - sparse genomic analysis of individual nuclei by sequencing pipeline\nUSAGE\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --verbose         set verbosity level [default: 0]\n  -c path, --config path\n                        configuration file (default: None)\n  -n, --dry-run         perform a trial run with no changes made (default:\n                        False)\n  --force, -F           allows overwriting nonempty results directory\n                        (default: False)\n  --parallel PARALLEL, -p PARALLEL\n                        number of task to run in parallel (default: 1)\n  --sge                 parallelilizes commands using SGE cluster manager\n                        (default: False)\n\nsGAINS subcommands:\n  {genome,mappable-regions,bins,prepare,mapping,extract-10x,varbin,varbin-10x,scclust,process}\n    genomeindex         builds appropriate hisat2 or bowtie index for the\n                        reference genome\n    mappable-regions    finds all mappable regions in specified genome\n    bins                calculates all bins boundaries for specified bins\n                        count and read length\n    prepare             combines all preparation steps ('genome', 'mappable-\n                        regions' and 'bins') into single command\n    mapping             performs mapping of cells reads to the reference\n                        genome\n    extract-10x         extracts cells reads from 10x Genomics datasets\n    varbin              applies varbin algorithm to count read mappings in\n                        each bin\n    varbin-10x          applies varbin algorithm to count read mappings in\n                        each bin to 10x Genomics datasets without realigning\n    scclust             segmentation and clustering based bin counts and\n                        preparation of the SCGV input data\n    process             combines all process steps ('mapping', 'varbin' and\n                        'scclust') into single command\n```\n\nThe `sgains-tools` tool supports a list of common options:\n\n* `--dry-run`, `-n` - this option instructs `sgains-tools` to perform a trail run\n    displaying information of commands that should be performed but without actualy\n    running these commands\n\n* `--force` - when `sgains-tools` tool is run it checks if the result files or\n    directories already exist and, if they do, `sgains-tools` stops whitout\n    making any changes. To override this behaivor you can use the  `--force` option\n\n* `--config`, `-c` - instructs `sgains-tools` which configuration file to use.\n\n* `--parallel`, `-p` - instructs `sgains-tools` to parallelize work on subcommands\n    called.\n\n* `--sge` - parallellilize execution using SGE.\n\n\n## Pipeline preparation\n\n### Usage of `genomeindex` subcommand\n\nThe `genomeindex` subcommand builds the bowtie index for the reference genome. To\nlist the available options use:\n\n```bash\nsgains-tools genomeindex -h\nusage: sgains-tools genomeindex [-h] [--aligner-name ALIGNER_NAME]\n                                [--genome-version GENOME_VERSION]\n                                [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                                [--genome-dir GENOME_DIR]\n                                [--genomeindex-prefix GENOMEINDEX_PREFIX]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\naligner group::\n  --aligner-name ALIGNER_NAME\n                        aligner to use in sGAINS subcommands (default: bowtie)\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg19)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: None)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default: None)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n```\n\n### Usage of `mappable-regions` subcommand\n\nThis command find all uniquely mappable regions of the reference genome with\ngiven length.\n\nThis step is computationally expesive and could take days in CPU time.\n\nTo save this step you can use files with precomputed mappable regions that\ncould be found at:\n\n* For Human Reference Genome **HG19** with read length **50bp**:\n    [hg19_R50_mappable_regions.txt.gz](https://github.com/KrasnitzLab/sgains/releases/download/1.0.0RC1/hg19_R50_mappable_regions.txt.gz)\n\nYou can download and unzip some of these files and use them into following\nstages of the pipeline preparation.\n\nIf you want to build your own mappable regions file you can use `mappable-regions`\nsubcommand. To run this command you will need genome index build from `genomeindex`\nsubommand.\n\nTo list the options available for this subcommand use:\n\n```bash\nsgains-tools mappable-regions -h\nusage: sgains-tools mappable-regions [-h] [--aligner-name ALIGNER_NAME]\n                                     [--genome-version GENOME_VERSION]\n                                     [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                                     [--genome-dir GENOME_DIR]\n                                     [--genomeindex-prefix GENOMEINDEX_PREFIX]\n                                     [--mappable-read-length MAPPABLE_READ_LENGTH]\n                                     [--mappable-dir MAPPABLE_DIR]\n                                     [--mappable-file MAPPABLE_FILE]\n                                     [--mappable-aligner-options MAPPABLE_ALIGNER_OPTIONS]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\naligner group::\n  --aligner-name ALIGNER_NAME\n                        aligner to use in sGAINS subcommands (default: bowtie)\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg19)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: None)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default: None)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n\nmappable_regions group::\n  --mappable-read-length MAPPABLE_READ_LENGTH\n                        read length to use for generation of mappable regions\n                        (default: 100)\n  --mappable-dir MAPPABLE_DIR\n                        directory where mappable regions working files are\n                        stored (default: None)\n  --mappable-file MAPPABLE_FILE\n                        filename for mappable regions results (default:\n                        mappable_regions.txt)\n  --mappable-aligner-options MAPPABLE_ALIGNER_OPTIONS\n                        additional aligner options for use when computing\n                        uniquely mappable regions (default: )\n```\n\n### Usage of `bins` subcommand\n\nThe `bins` subcommand computes the bins boudaries.\n\nTo list options available for `bins` subcommand use:\n\n```bash\nsgains-tools bins -h\nusage: sgains-tools bins [-h] [--genome-version GENOME_VERSION]\n                         [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                         [--genome-dir GENOME_DIR]\n                         [--genomeindex-prefix GENOMEINDEX_PREFIX]\n                         [--mappable-read-length MAPPABLE_READ_LENGTH]\n                         [--mappable-dir MAPPABLE_DIR]\n                         [--mappable-file MAPPABLE_FILE]\n                         [--mappable-aligner-options MAPPABLE_ALIGNER_OPTIONS]\n                         [--bins-count BINS_COUNT] [--bins-dir BINS_DIR]\n                         [--bins-file BINS_FILE]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg19)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: None)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default: None)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n\nmappable_regions group::\n  --mappable-read-length MAPPABLE_READ_LENGTH\n                        read length to use for generation of mappable regions\n                        (default: 100)\n  --mappable-dir MAPPABLE_DIR\n                        directory where mappable regions working files are\n                        stored (default: None)\n  --mappable-file MAPPABLE_FILE\n                        filename for mappable regions results (default:\n                        mappable_regions.txt)\n  --mappable-aligner-options MAPPABLE_ALIGNER_OPTIONS\n                        additional aligner options for use when computing\n                        uniquely mappable regions (default: )\n\nbins group::\n  --bins-count BINS_COUNT\n                        number of bins (default: 10000)\n  --bins-dir BINS_DIR   bins working directory (default: None)\n  --bins-file BINS_FILE\n                        bins boundaries filename (default:\n                        bins_boundaries.txt)\n```\n\n## Processing sequence data\n\n### Use of `process` subcommand\n\n---\n\n**Please note, that to use `process` subcommands\n(`mapping`, `varbin`, `scclust` and `process`)\nyou need go through all the preparation steps.**\n\n---\n\nTo list the options available for `process` subcommand use:\n\n```bash\nsgains-tools process -h\nusage: sgains-tools process [-h] [--aligner-name ALIGNER_NAME]\n                            [--genome-version GENOME_VERSION]\n                            [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                            [--genome-dir GENOME_DIR]\n                            [--genomeindex-prefix GENOMEINDEX_PREFIX]\n                            [--reads-dir READS_DIR]\n                            [--reads-suffix READS_SUFFIX]\n                            [--mapping-dir MAPPING_DIR]\n                            [--mapping-suffix MAPPING_SUFFIX]\n                            [--mapping-aligner-options MAPPING_ALIGNER_OPTIONS]\n                            [--bins-count BINS_COUNT] [--bins-dir BINS_DIR]\n                            [--bins-file BINS_FILE] [--varbin-dir VARBIN_DIR]\n                            [--varbin-suffix VARBIN_SUFFIX]\n                            [--scclust-case SCCLUST_CASE]\n                            [--scclust-dir SCCLUST_DIR]\n                            [--scclust-cytoband-file SCCLUST_CYTOBAND_FILE]\n                            [--scclust-nsim SCCLUST_NSIM]\n                            [--scclust-sharemin SCCLUST_SHAREMIN]\n                            [--scclust-fdrthres SCCLUST_FDRTHRES]\n                            [--scclust-nshare SCCLUST_NSHARE]\n                            [--scclust-climbtoshare SCCLUST_CLIMBTOSHARE]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\naligner group::\n  --aligner-name ALIGNER_NAME\n                        aligner to use in sGAINS subcommands (default: bowtie)\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg19)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: None)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default: None)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n\nreads group::\n  --reads-dir READS_DIR\n                        data directory where sequencing reads are located\n                        (default: None)\n  --reads-suffix READS_SUFFIX\n                        reads files suffix pattern (default: .fastq.gz)\n\nmapping group::\n  --mapping-dir MAPPING_DIR\n                        data directory where mapping files are located\n                        (default: None)\n  --mapping-suffix MAPPING_SUFFIX\n                        mapping files suffix pattern (default: .rmdup.bam)\n  --mapping-aligner-options MAPPING_ALIGNER_OPTIONS\n                        additional aligner mapping options (default: )\n\nbins group::\n  --bins-count BINS_COUNT\n                        number of bins (default: 10000)\n  --bins-dir BINS_DIR   bins working directory (default: None)\n  --bins-file BINS_FILE\n                        bins boundaries filename (default:\n                        bins_boundaries.txt)\n\nvarbin group::\n  --varbin-dir VARBIN_DIR\n                        varbin working directory (default: None)\n  --varbin-suffix VARBIN_SUFFIX\n                        varbin files suffix pattern (default: .varbin.txt)\n\nscclust group::\n  --scclust-case SCCLUST_CASE\n                        SCclust case name (default: None)\n  --scclust-dir SCCLUST_DIR\n                        SCclust working directory (default: None)\n  --scclust-cytoband-file SCCLUST_CYTOBAND_FILE\n                        location of cyto band description file (default: None)\n  --scclust-nsim SCCLUST_NSIM\n                        SCclust number of simulations (default: 150)\n  --scclust-sharemin SCCLUST_SHAREMIN\n                        SCclust sharemin parameter (default: 0.85)\n  --scclust-fdrthres SCCLUST_FDRTHRES\n                        SCclust fdrthres parameter (default: -3)\n  --scclust-nshare SCCLUST_NSHARE\n                        SCclust nshare parameter (default: 4)\n  --scclust-climbtoshare SCCLUST_CLIMBTOSHARE\n                        SCclust climbtoshare parameter (default: 5)\n```\n\n* The data created by the `process` subcommand are placed in a subdirectory,\n    whose name is specified with `--output-dir` option. This name will be used\n    when creating\n    the result directory structure.\n\n* The input for `process` subcommand are *FASTQ* files containing the reads for\n    each individual cell. All *FASTQ* files for given study are expected to be\n    located into single directory. You should specify this directory using\n    `--reads-dir` option.\n\n* The results from `process` subcommand are stored in the output data directory,\n    as specified using `--output-dir` option. The process subcommand will\n    create a directory and inside that directory it will create three additional\n    subdirectories - `mapping`,\n    `varbin` and `scclust`. These will contain intermediate results from the\n    respective pipeline stages.\n\n* The first `mapping` stage of the pipeline invokes `bowtie` to map reads from\n    *FASTQ* files. This stage needs a name of the bowtie index (user\n    `--genome-index` option to specify bowtie index name) and a directory,\n    where this index is located (use `--genome-dir` to pass this parameter).\n\n* If you need to pass additional options to `bowtie` to control mapping reads\n    you can use `--mapping-bowtie-opts` option.\n\n* The `varbin` stage of the pipeline needs a bins boundaries file prepared in\n    advance. You can pass bins boundaries file using `--bins-boundaries` option.\n\n### Usage of `mapping` subcommand\n\nTo list the options available for `mapping` subcommand use:\n\n```bash\nsgains-tools mapping -h\nusage: sgains-tools mapping [-h] [--aligner-name ALIGNER_NAME]\n                            [--genome-version GENOME_VERSION]\n                            [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                            [--genome-dir GENOME_DIR]\n                            [--genomeindex-prefix GENOMEINDEX_PREFIX]\n                            [--reads-dir READS_DIR]\n                            [--reads-suffix READS_SUFFIX]\n                            [--mapping-dir MAPPING_DIR]\n                            [--mapping-suffix MAPPING_SUFFIX]\n                            [--mapping-aligner-options MAPPING_ALIGNER_OPTIONS]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\naligner group::\n  --aligner-name ALIGNER_NAME\n                        aligner to use in sGAINS subcommands (default: bowtie)\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg19)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: None)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default: None)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n\nreads group::\n  --reads-dir READS_DIR\n                        data directory where sequencing reads are located\n                        (default: None)\n  --reads-suffix READS_SUFFIX\n                        reads files suffix pattern (default: .fastq.gz)\n\nmapping group::\n  --mapping-dir MAPPING_DIR\n                        data directory where mapping files are located\n                        (default: None)\n  --mapping-suffix MAPPING_SUFFIX\n                        mapping files suffix pattern (default: .rmdup.bam)\n  --mapping-aligner-options MAPPING_ALIGNER_OPTIONS\n                        additional aligner mapping options (default: )\n```\n\n### Use of `varbin` subcommand\n\nTo list the options available for `varbin` subcommand use:\n\n```bash\nsgains-tools varbin -h\nusage: sgains-tools varbin [-h] [--bins-count BINS_COUNT]\n                           [--bins-dir BINS_DIR] [--bins-file BINS_FILE]\n                           [--mapping-dir MAPPING_DIR]\n                           [--mapping-suffix MAPPING_SUFFIX]\n                           [--mapping-aligner-options MAPPING_ALIGNER_OPTIONS]\n                           [--varbin-dir VARBIN_DIR]\n                           [--varbin-suffix VARBIN_SUFFIX]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nbins group::\n  --bins-count BINS_COUNT\n                        number of bins (default: 10000)\n  --bins-dir BINS_DIR   bins working directory (default: None)\n  --bins-file BINS_FILE\n                        bins boundaries filename (default:\n                        bins_boundaries.txt)\n\nmapping group::\n  --mapping-dir MAPPING_DIR\n                        data directory where mapping files are located\n                        (default: None)\n  --mapping-suffix MAPPING_SUFFIX\n                        mapping files suffix pattern (default: .rmdup.bam)\n  --mapping-aligner-options MAPPING_ALIGNER_OPTIONS\n                        additional aligner mapping options (default: )\n\nvarbin group::\n  --varbin-dir VARBIN_DIR\n                        varbin working directory (default: None)\n  --varbin-suffix VARBIN_SUFFIX\n                        varbin files suffix pattern (default: .varbin.txt)\n```\n\n### Use of `scclust` subcommand\n\nTo list options available for `scclust` subcommand use:\n\n```bash\nsgains-tools scclust -h\nusage: sgains-tools scclust [-h] [--bins-count BINS_COUNT]\n                            [--bins-dir BINS_DIR] [--bins-file BINS_FILE]\n                            [--varbin-dir VARBIN_DIR]\n                            [--varbin-suffix VARBIN_SUFFIX]\n                            [--scclust-case SCCLUST_CASE]\n                            [--scclust-dir SCCLUST_DIR]\n                            [--scclust-cytoband-file SCCLUST_CYTOBAND_FILE]\n                            [--scclust-nsim SCCLUST_NSIM]\n                            [--scclust-sharemin SCCLUST_SHAREMIN]\n                            [--scclust-fdrthres SCCLUST_FDRTHRES]\n                            [--scclust-nshare SCCLUST_NSHARE]\n                            [--scclust-climbtoshare SCCLUST_CLIMBTOSHARE]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nbins group::\n  --bins-count BINS_COUNT\n                        number of bins (default: 10000)\n  --bins-dir BINS_DIR   bins working directory (default: None)\n  --bins-file BINS_FILE\n                        bins boundaries filename (default:\n                        bins_boundaries.txt)\n\nvarbin group::\n  --varbin-dir VARBIN_DIR\n                        varbin working directory (default: None)\n  --varbin-suffix VARBIN_SUFFIX\n                        varbin files suffix pattern (default: .varbin.txt)\n\nscclust group::\n  --scclust-case SCCLUST_CASE\n                        SCclust case name (default: None)\n  --scclust-dir SCCLUST_DIR\n                        SCclust working directory (default: None)\n  --scclust-cytoband-file SCCLUST_CYTOBAND_FILE\n                        location of cyto band description file (default: None)\n  --scclust-nsim SCCLUST_NSIM\n                        SCclust number of simulations (default: 150)\n  --scclust-sharemin SCCLUST_SHAREMIN\n                        SCclust sharemin parameter (default: 0.85)\n  --scclust-fdrthres SCCLUST_FDRTHRES\n                        SCclust fdrthres parameter (default: -3)\n  --scclust-nshare SCCLUST_NSHARE\n                        SCclust nshare parameter (default: 4)\n  --scclust-climbtoshare SCCLUST_CLIMBTOSHARE\n                        SCclust climbtoshare parameter (default: 5)\n```\n\n## Configure the *s-GAINS* pipeline\n\nAn example *s-GAINS* pipeline configuration:\n\n```bash\naligner:\n    aligner_name: hisat2\n\ngenome:\n    genome_version: hg38\n    genome_pristine_dir: hg38_pristine\n    genome_dir: hg38\n    genomeindex_prefix: genomeindex\n\nmappable_regions:\n    mappable_read_length: 50\n    mappable_dir: hg38_R50\n    mappable_file: hisat2_hg38_R50_mappable_regions.txt\n    mappable_aligner_options: \"\"\n  \nbins:\n    bins_count: 20000\n    bins_dir: hg38_R50_B20k\n    bins_file: hg38_R50_B20k_bins_boundaries.txt\n\nreads:\n    reads_dir: navin_T10\n    reads_suffix: \".fastq.gz\"\n    \n\nmapping:\n    mapping_dir: navin_T10_hisat2/mapping\n    mapping_suffix: \".rmdup.bam\"\n    mapping_aligner_options: \"-3 0 -5 38\"\n\nvarbin:\n    varbin_dir: navin_T10_hisat2/varbin\n    varbin_suffix: \".varbin.r50_20k.txt\"\n\n\nscclust:\n    scclust_case: \"nyu007_hisat2\"\n    scclust_dir: \"navin_T10_hisat2/scclust\"\n    scclust_cytoband_file: cytoBand-hg38.txt\n    scclust_nsim: 150\n    scclust_sharemin: 0.85\n    scclust_fdrthres: -3\n    scclust_nshare: 4\n    scclust_climbtoshare: 5\n```\n\nEach section of this configuration file corresponds to the relevant `s-GAINS` tool\nsubcommand and sets values for the options of the subcommand.\n\nThe options passed from the command line override the options specified in the\nconfiguration file.\n\nTo pass configuration file to `sgains-tools` you should use `-c` or `--config` \noption. For example, if you want to use the config file for `mapping` subcommand\nyou should use:\n\n```bash\nsgains-tools -c sgains-hisat2-navin-T10.yml mapping -h\n```\n\nNote that the default values for various parameters of `mapping` subcommand would\nbe filled from the corresponding values specified into the configuration file:\n\n```bash\nusage: sgains-tools mapping [-h] [--aligner-name ALIGNER_NAME]\n                            [--genome-version GENOME_VERSION]\n                            [--genome-pristine-dir GENOME_PRISTINE_DIR]\n                            [--genome-dir GENOME_DIR]\n                            [--genomeindex-prefix GENOMEINDEX_PREFIX]\n                            [--reads-dir READS_DIR]\n                            [--reads-suffix READS_SUFFIX]\n                            [--mapping-dir MAPPING_DIR]\n                            [--mapping-suffix MAPPING_SUFFIX]\n                            [--mapping-aligner-options MAPPING_ALIGNER_OPTIONS]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\naligner group::\n  --aligner-name ALIGNER_NAME\n                        aligner to use in sGAINS subcommands (default: hisat2)\n\ngenome group::\n  --genome-version GENOME_VERSION\n                        version of reference genome to use (default: hg38)\n  --genome-pristine-dir GENOME_PRISTINE_DIR\n                        directory where clean copy of reference genome is\n                        located (default: /data/lubo/single-\n                        cell/test_data/hg38_pristine)\n  --genome-dir GENOME_DIR\n                        genome index working directory (default:\n                        /data/lubo/single-cell/test_data/hg38)\n  --genomeindex-prefix GENOMEINDEX_PREFIX\n                        genome index prefix (default: genomeindex)\n\nreads group::\n  --reads-dir READS_DIR\n                        data directory where sequencing reads are located\n                        (default: /data/lubo/single-cell/test_data/navin_T10)\n  --reads-suffix READS_SUFFIX\n                        reads files suffix pattern (default: .fastq.gz)\n\nmapping group::\n  --mapping-dir MAPPING_DIR\n                        data directory where mapping files are located\n                        (default: /data/lubo/single-\n                        cell/test_data/navin_T10_hisat2/mapping)\n  --mapping-suffix MAPPING_SUFFIX\n                        mapping files suffix pattern (default: .rmdup.bam)\n  --mapping-aligner-options MAPPING_ALIGNER_OPTIONS\n                        additional aligner mapping options (default: -3 0 -5\n                        38)\n\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrasnitzlab%2Fsgains","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrasnitzlab%2Fsgains","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrasnitzlab%2Fsgains/lists"}