Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/satijalab/scchromhmm
🧬 🦀 A fast and efficient tool to perform a genome wide Single cell Chromatin State Analysis using multimodal histone modification data.
https://github.com/satijalab/scchromhmm
chromatin-state histone-modifications single-cell
Last synced: about 1 month ago
JSON representation
🧬 🦀 A fast and efficient tool to perform a genome wide Single cell Chromatin State Analysis using multimodal histone modification data.
- Host: GitHub
- URL: https://github.com/satijalab/scchromhmm
- Owner: satijalab
- License: bsd-3-clause
- Created: 2021-07-16T01:08:09.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-09-09T19:27:49.000Z (over 3 years ago)
- Last Synced: 2024-05-12T08:40:22.628Z (7 months ago)
- Topics: chromatin-state, histone-modifications, single-cell
- Language: Rust
- Homepage:
- Size: 195 KB
- Stars: 24
- Watchers: 5
- Forks: 5
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scChromHMM [![Rust](https://github.com/satijalab/scChromHMM/actions/workflows/rust.yml/badge.svg)](https://github.com/satijalab/scChromHMM/actions/workflows/rust.yml)
`scChromHMM` provides a suite of tools for rapid processing of single-cell histone modification data to perform chromatin states analysis of the genome within each single-cell. It is an extention of bulk ChromHMM framework, which consumes the HMM model learned from ChromHMM and perform chromatin state analysis by running forward-backward algorithm for each single-cell.
# Input Data
scChromHMM primarily requires a group of four kind of files, which are defined as follows:
* _fragment files_: Fragment files contains the information about the mapping location of the sequencing read fragments on the genome. The basic format is similar to as described by 10x, and it's primarily a BED file with an additional information of cellular barcode for each mapped fragment. toy example: `h3k27ac_fragments.tsv.gz`
* **NOTE** the tabix index of the fragment files is also needed and can be generated using the command `tabix -f -p bed ` for a block zipped (bgzip) fragment file. toy example: `example/h3k27ac_fragments.tsv.gz.tbi`
* _hmm_model_: A tsv file containing the information about the hmm model parameters. The default schema of this file is similar to the one generated by ChromHMM. toy example: `example/model_2.txt`.
* _anchors_: A tsv file with the list of anchors from the query data onto the reference data, along with their anchroring scores. toy example:`example/k27ac.txt`.
* _reference_cells_: A list of all the cellular barcodes (one per line) present in the reference dataset. toy example:`example/cells.txt`# Compilation of the program
scChromHMM has been tested with stable release 1.52.1 of Rust, and the program can be compiled by using the command:```{bash}
$ cargo build --release
```# Running scChromHMM
Once compiled the scChromHMM program can be run to generate the posterior probability distribution across the hidden states using the command:
```{bash}
$ target/release/schrom hmm -f -m -a -c -t -o
```
**Note**: The order of fragment files should be the same as the anchor files. A toy example can be run using the data present in the example folder using the following command: (An extra flag `--onlyone` has been added to run the toy example on a subsequence of chromosome 1).
```
RUST_BACKTRACE=full RUST_LOG="trace" /usr/bin/time target/release/schrom hmm -f example/h3k27ac_fragments.tsv.gz example/h3k27me3_fragments.tsv.gz example/h3k4me1_fragments.tsv.gz -m example/model_2.txt -a example/k27ac.txt example/k27me3.txt example/k4me1.txt -c example/cells.txt -t 10 -o output --onlyone
```# State-wise "short" representation
The `hmm` subcommand of the scChromHMM tool generates cell-wise posterior probabilities for every reference cell across the genome. The probabilities are stored for each cell in a binary format i.e. 200bp region by state matrix with integer values in range [0-100]. toy example: `output/chr1/L1_CCTCTAGTCGCTAAAC.bin`. Based on the number of reference cells, size of the output posterior probabilites can grow significantly; and some downstream analyses are faster to work with region by cells matrix (for each state) instead of region by state (for each cell) matrices. Hence, scChromHMM subcommand `transform` can be used to convert the data into the "short" representation of region by cell. The command to do that is as follows:
```{bash}
$ target/release/schrom transform -c -i -o
```
The toy example can be run using the following command. **NOTE** An extra flag `--onlyone` has been added to run the toy example on a subsequence of chromosome 1.
```bash
$ mkdir short_output
$ RUST_BACKTRACE=full RUST_LOG="trace" /usr/bin/time target/release/schrom transform -c example/cells.txt -i output -o short_output --onlyone
```# Importing the posterior probabilities into R
The chromatin state wise, region by cells posterior probabilities of the toy example can be imported into the R environment using the following script:
```{R}
library(Rcpp)
sourceCpp("src-R/parse.cpp")
mat <- get_state("short_output/chr1/1.bin", "chr1", "short_output/chr1/cells.txt")
dim(mat)
# [1] 5001 7201
```