{"id":21168471,"url":"https://github.com/jetbrains-research/span","last_synced_at":"2025-07-18T11:43:27.702Z","repository":{"id":38917463,"uuid":"159559994","full_name":"JetBrains-Research/span","owner":"JetBrains-Research","description":"SPAN Peak Analyzer","archived":false,"fork":false,"pushed_at":"2025-06-26T09:55:10.000Z","size":2644,"stargazers_count":10,"open_issues_count":2,"forks_count":1,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-07-09T19:49:01.560Z","etag":null,"topics":["bioinformatics","chip-seq","peak-caller"],"latest_commit_sha":null,"homepage":"https://doi.org/10.1093/bioinformatics/btab376","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JetBrains-Research.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.txt","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-11-28T20:22:10.000Z","updated_at":"2025-06-26T09:55:14.000Z","dependencies_parsed_at":"2024-01-23T15:29:47.239Z","dependency_job_id":"4c21eb31-d9bb-423e-9f48-6d96026bd4eb","html_url":"https://github.com/JetBrains-Research/span","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"purl":"pkg:github/JetBrains-Research/span","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fspan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fspan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fspan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fspan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JetBrains-Research","download_url":"https://codeload.github.com/JetBrains-Research/span/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetBrains-Research%2Fspan/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265753901,"owners_count":23823084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","chip-seq","peak-caller"],"created_at":"2024-11-20T15:13:44.118Z","updated_at":"2025-07-18T11:43:27.679Z","avatar_url":"https://github.com/JetBrains-Research.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![JetBrains Research](https://jb.gg/badges/research.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)\n[![license](https://img.shields.io/github/license/mashape/apistatus.svg)](https://opensource.org/licenses/MIT)\n[![tests](http://teamcity.jetbrains.com/app/rest/builds/buildType:(id:Biolabs_Span)/statusIcon.svg)](http://teamcity.jetbrains.com/viewType.html?buildTypeId=Biolabs_Span\u0026guest=1)\n[![DOI](https://zenodo.org/badge/159559994.svg)](https://doi.org/10.5281/zenodo.15131734)\n\nSPAN Peak Analyzer version 2.0\n==============================\n\n```           ,        ,\n      __.-'|'-.__.-'|'-.__\n    ='=====|========|====='=\n    ~_^~-^~~_~^-^~-~~^_~^~^~^\n```\n\n**SPAN Peak Analyzer version 2.0** is a universal HMM-based peak caller capable of processing a broad range of ChIP-seq, ATAC-seq,\nand single-cell ATAC-seq datasets of different quality.\u003cbr\u003e \n\nFeatures\n--------\n\n* Supports both narrow and broad footprint experiments (ChIP-seq, ATAC-seq, DNAse-seq)\n* Produces robust results on datasets of different signal-to-noise ratio, including Ultra-Low-Input ChIP-seq\n* Produces highly consistent results in multiple-replicates experiment setup\n* Tolerates missing control experiment\n* Integrated into the JetBrains Research ChIP-seq\n  analysis [pipeline](https://github.com/JetBrains-Research/chipseq-smk-pipeline) from raw reads to visualization and\n  peak calling\n* Integrated with the [JBR](https://github.com/jetBrains-Research/jbr) Genome Browser, uploaded data model allows for\n  interactive visualization and fine-tuning\n* _Experimentally_ supports multi-replicated mode and differential peak calling mode\n* In [semi-supervised mode](https://artyomovlab.wustl.edu/aging/tools) it is capable to robustly handle multiple\n  replicates and noise by leveraging limited manual annotation information.\n\nLatest release\n------------------\nSee [releases](https://github.com/JetBrains-Research/span/releases) section for actual information.\n\nSPAN 2.0 enhancements compared to version 1.0\n------------------------------------------\n\nSPAN version 2.0 introduces several key improvements over the original semi-supervised SPAN 1.0, most notably eliminating the need for manual markup annotations.\nIt now operates in a **fully unsupervised mode** with robust default parameters.\u003cbr\u003e\nKey changes include:\n\n* **Automated Setup**: SPAN 2.0 no longer requires semi-supervised markup to function. It runs directly with improved default settings.\n* **Enhanced Preprocessing**: The data preprocessing pipeline has been redesigned, featuring better control regression and smarter initialization of HMM parameters.\n* **Constraint-Driven Model Fitting**: The HMM now includes adaptive constraints for noise floor and signal-to-noise ratio, enhancing robustness across datasets with variable quality.\n* **New Peak Detection Framework**: Peak identification now leverages post-model analysis and a unified strategy for extracting peaks from HMM output.\n* **Improved Replicates-model**: These enhancements significantly boost performance in replicate-based analyses.\n* **Expanded Applicability**: SPAN 2.0 is more effective for diverse data types, including ATAC-seq, CUT\u0026RUN, and CUT\u0026Tag, and supports explicit input format declaration, BigWig signal visualization, and summit calling for fine resolution.\n\nThe original SPAN 1.0, which required semi-supervised input, is described in:\u003cbr\u003e\n\u003ci\u003eShpynov O, Dievskii A, Chernyatchik R, Tsurinov P, Artyomov MN. Semi-supervised peak calling with SPAN and\nJBR Genome Browser. Bioinformatics. 2021 May 21. https://doi.org/10.1093/bioinformatics/btab376\u003c/i\u003e\n\nRequirements\n------------\n\nDownload and install [Java 8+](https://openjdk.org/install/).\n\nPeak calling\n------------\n\nTo analyze a single (possibly replicated) biological condition use `analyze` command. See details with command:\n\n```bash\n$ java -jar span.jar analyze --help\n```\n\nThe `\u003coutput.bed\u003e` file will contain predicted and FDR-controlled peaks in the\nENCODE [broadPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) (BED 6+3) format:\n\n```\n\u003cchromosome\u003e \u003cpeak start offset\u003e \u003cpeak end offset\u003e \u003cpeak_name\u003e \u003cscore\u003e . \u003ccoverage or fold/change\u003e \u003c-log p-value\u003e \u003c-log Q-value\u003e\n```\n\nExamples:\n\n* Regular peak calling\u003cbr\u003e\n  `java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak`\n* Semi-supervised peak calling\u003cbr\u003e\n  `java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak`\n* Model fitting only\u003cbr\u003e\n  `java -Xmx8G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -m Model.span`\n\nDifferential peak calling\n-------------------------\n\n_Experimental!_\nTo compare two (possibly replicated) biological conditions use the `compare`. See help for details:\n\n```bash\n$ java -jar span.jar compare --help\n```\n\nCommand line options\n-------------------------\n\n| Parameter                                               | Description                                                                                                                                                                                                                                                          | \n|---------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `-t, --treatment TREATMENT`\u003cbr/\u003e **required**           | Treatment file. Supported formats: BAM, BED, or BED.gz file. \u003cbr/\u003eIf multiple files are provided, they are treated as replicates. \u003cbr/\u003eMultiple files should be separated by commas: `-t A,B,C`. \u003cbr/\u003eMultiple files are processed as replicates on the model level. |\n| `-c, --control CONTROL`                                 | Control file. Multiple files should be separated by commas. \u003cbr/\u003eA single control file, or a separate file per each treatment file is required. \u003cbr/\u003eFollow the instructions for `-t`, `--treatment`.                                                                |\n| `-cs, --chrom.sizes CHROMOSOMES_SIZES`\u003cbr/\u003e**required** | Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. \u003cbr/\u003eCan be downloaded at [UCSC](https://hgdownload.soe.ucsc.edu/downloads.html).                                                                                                    |\n| `-b, --bin BIN_SIZE`                                    | Peak analysis is performed on read coverage tiled into consequent bins of configurable size.                                                                                                                                                                         |\n| `-f, --fdr FDR`                                         | False Discovery Rate cutoff to call significant regions.                                                                                                                                                                                                             |\n| `-p, --peaks PEAKS`                                     | Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. \u003cbr\u003e If omitted, only the model fitting step is performed.                                                                                                                                               |\n| `-chr, --chromosomes CHROMOSOMES_LIST`                  | Chromosomes to process, multiple chromosomes should be separated by commas.                                                                                                                                                                                          |\n| `--format FORMAT`                                       | Reads file format. Supported: BAM, SAM, CRAM, BED. Text format can be in zip or gzip archive.\u003cbr\u003eIf not provided, guessed from file extensions.                                                                                                                      |\n| `--fragment FRAGMENT`                                   | Fragment size. If provided, reads are shifted appropriately. \u003cbr\u003eIf not provided, the shift is estimated from the data.\u003cbr\u003e`--fragment 0` is recommended for ATAC-Seq data processing.                                                                               |\n| `-kd, --keep-duplicates`                                | Keep duplicates. By default, SPAN filters out redundant reads aligned at the same genomic position.\u003cbr\u003eRecommended for bulk single cell ATAC-Seq data processing.                                                                                                    |\n| `--blacklist BLACKLIST_BED`                             | Blacklisted regions of the genome to be excluded from peak calling results.                                                                                                                                                                                          |\n| `--labels LABELS`                                       | Labels BED file. Used in semi-supervised peak calling.                                                                                                                                                                                                               |\n| `-m, --model MODEL`                                     | This option is used to specify SPAN model path. Required for further semi-supervised peak calling.                                                                                                                                                                   |\n| `-w, --workdir PATH`                                    | Path to the working directory. Used to save coverage and model cache.                                                                                                                                                                                                |\n| `--bigwig`                                              | Create beta-control corrected counts per million normalized track.                                                                                                                                                                                                   |\n| `--hmm-snr SNR`                                         | Fraction of coverage to estimate and guard signal to noise ratio, `0` to disable constraint check.                                                                                                                                                                   |\n| `--hmm-low LOW`                                         | Minimal low state mean threshold, guards against too broad peaks, `0` to disable constraint check.                                                                                                                                                                   |\n| `--sensitivity SENSITIVITY`                             | Configures log PEP threshold sensitivity for candidates selection.\u003cbr\u003eAutomatically estimated from the data, or during semi-supervised peak calling.                                                                                                                 |\n| `--gap GAP`                                             | Configures minimal gap between peaks.\u003cbr\u003eGenerally, not required, but used in semi-supervised peak calling.                                                                                                                                                          |\n| `--summits`                                             | Calls summits within peaks.\u003cbr\u003eRecommended for ATAC-seq and single-cell ATAC-seq analysis.                                                                                                                                                                           |\n| `--f-light LIGHT`                                       | Lightest fragmentation threshold to apply compensation gap.\u003cbr\u003eNot available when `gap` is explicitly provided.                                                                                                                                                      |                  \n| `--f-hard HARD`                                         | Hardest fragmentation threshold to apply compensation gap.\u003cbr\u003eNot available when `gap` is explicitly provided.                                                                                                                                                       |                  \n| `--f-speed SPEED`                                       | Fragmentation acceleration threshold to compute gap.\u003cbr\u003eNot available when `gap` is explicitly provided.                                                                                                                                                             |                  \n| `--clip CLIP_TRESHOLD`                                  | Clip max threshold for fine-tune boundaries according to local signal, `0` to disable.                                                                                                                                                                               |\n| `--multiple TEST`                                       | Method applied for multiple hypothesis testing.\u003cbr/\u003e`BH` for Benjamini-Hochberg, `BF` for Bonferroni.                                                                                                                                                                |\n| `-i, --iterations`                                      | Maximum number of iterations for Expectation Maximisation (EM) algorithm.                                                                                                                                                                                            |\n| `--tr, --threshold`                                     | Convergence threshold for EM algorithm, use `--debug` option to see detailed info.                                                                                                                                                                                   |\n| `--ext`                                                 | Save extended states information to model file.\u003cbr\u003eRequired for model visualization in JBR Genome Browser.                                                                                                                                                           |\n| `--deep-analysis`                                       | Perform additional track analysis - coverage (roughness) and creates multi-sensitivity bed track.                                                                                                                                                                    |\n| `--threads THREADS`                                     | Configure the parallelism level.                                                                                                                                                                                                                                     | |\n| `-l, --log LOG`                                         | Path to log file, if not provided, it will be created in working directory.                                                                                                                                                                                          |\n| `-d, --debug`                                           | Print debug information, useful for troubleshooting.                                                                                                                                                                                                                 |\n| `-q, --quiet`                                           | Turn off standard output.                                                                                                                                                                                                                                            |\n| `-kc, --keep-cache`                                     | Keep cache files. By default SPAN creates cache files in working directory and cleans up.                                                                                                                                                                            |\n\nBuild from sources\n------------------\n\nClone [bioinf-commons](https://github.com/JetBrains-Research/bioinf-commons) library under the project root.\n\n  ```\n  git clone git@github.com:JetBrains-Research/bioinf-commons.git\n  ```\n\nLaunch the following command line to build SPAN jar:\n\n  ```\n  ./gradlew shadowJar\n  ```\n\nThe SPAN jar file will be generated in the folder `build/libs`.\n\nFAQ\n---\n\n* Q: What is the average running time?\u003cbr\u003e\n  A: SPAN is capable of processing a single ChIP-Seq track in less than 10 minutes on an average laptop.\n* Q: Which operating systems are supported?\u003cbr\u003e\n  A: SPAN is developed in modern [Kotlin](https://kotlinlang.org) programming language and can be executed on any\n  platform supported by Java.\n* Q: Where did you get this lovely span picture?\u003cbr\u003e\n  A: From [ascii.co.uk](https://ascii.co.uk), the original author goes by the name jgs.\n\nErrors Reporting\n-----------------\n\nUse [GitHub issues](https://github.com/JetBrains-Research/span/issues) to suggest new features or report bugs.\n\nAuthors\n-------\n\n[JetBrains Research BioLabs](https://research.jetbrains.org/groups/biolabs)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjetbrains-research%2Fspan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjetbrains-research%2Fspan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjetbrains-research%2Fspan/lists"}