https://github.com/nextomics/grandstr
https://github.com/nextomics/grandstr
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/nextomics/grandstr
- Owner: Nextomics
- Created: 2021-06-01T05:35:03.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2021-06-22T05:40:57.000Z (almost 4 years ago)
- Last Synced: 2025-02-16T13:27:44.558Z (3 months ago)
- Language: Shell
- Size: 1.84 MB
- Stars: 4
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GrandSTR
Estimation of repeat counts of short tandem repeats(STR) from long-read sequencing data. Get genotypes for known STR.
## Dependencies
Python packages:
- python: 3.6 or higher
- pysam: 0.16.0 or higher
- sklearn: 0.24.1 or higher
- hmmlearn: 0.2.5 or higher
- edlib: 1.2.6 or higher, we need edlib.so file for python bindings.Dependencies for install:
- cython: 0.29.21 or higher## Install
To build align.so, GrandSTR_lib.so, utils_lib.so, deal_str.so, run:
```bash
python setup.py build_ext -i
```## Usage & Examples
### Input files
- Input bam file is alignment file of read sequences aligned to reference sequences, which is typically generated by minimap2 (versoin 2.17) with parameters "-ax asm10 --MD -Y -L --secondary=no" for hifi reads, or "-ax map-ont --MD -Y -L --secondary=no" for ONT reads. For example:
```bash
minimap2 -t 16 -ax asm10 --MD -Y -L --secondary=no hg19.fasta hifi.fastq 2> align.log | samtools view -Sb - | samtools sort - -o hifi.sorted.bam
```- Input pa file is comma seperated information file, including coordinates of STR regions in reference, and repeat unit sequence. The required columns include STR name, chromosome, start coordinate, end coordinate, and repeat unit sequence. The left columns are optional. For example:
```
STR000009,1,691243,691307,CACCC,0,,downstream,LOC100288069
```- Input fasta file is reference genome fasta file.
### Commands
- For small amount of input STRs provided in pa file, add "-em 0" parameter to GrandSTR program. For example:
```bash
cd test/
samtools index hifi.sorted.bam
samtools faidx hg19.fasta
../GrandSTR test1.pa out1 -rf hg19.fasta -bf hifi.sorted.bam -em 0 -rt hifi
```- For large amount of input STRs provided in pa file, add "-em 1" parameter to GrandSTR program. For example:
```bash
cd test/
samtools index hifi.sorted.bam
../GrandSTR test1.pa out1 -rf hg19.fasta -bf hifi.sorted.bam -em 1 -rt hifi
```