Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brentp/hts-nim-tools
useful command-line tools written to showcase hts-nim
https://github.com/brentp/hts-nim-tools
bam bioinformatics genomics nim nim-lang vcf vcf-filtering
Last synced: about 2 months ago
JSON representation
useful command-line tools written to showcase hts-nim
- Host: GitHub
- URL: https://github.com/brentp/hts-nim-tools
- Owner: brentp
- License: mit
- Created: 2018-01-05T20:37:33.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-11-10T01:22:05.000Z (about 4 years ago)
- Last Synced: 2024-10-13T02:21:13.146Z (3 months ago)
- Topics: bam, bioinformatics, genomics, nim, nim-lang, vcf, vcf-filtering
- Language: Nim
- Homepage: https://github.com/brentp/hts-nim
- Size: 11.7 KB
- Stars: 49
- Watchers: 5
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hts-nim-tools
This repository contains a number of tools created with [hts-nim](https://github.com/brentp/hts-nim/) intended
to serve as examples for using `hts-nim` as well as to be useful tools.These tools are:
```
hts-nim utility programs.
version: $version• bam-filter : filter BAM/CRAM/SAM files with a simple expression language
• count-reads : count BAM/CRAM reads in regions given in a BED file
• vcf-check : check regions of a VCF against a background for missing chunks
```each of these is described in more detail below.
# bam-filter
Use simple expressions to filter a BAM/CRAM file:
```
bam-filterUsage: bam-filter [options]
-t --threads number of BAM decompression threads [default: 0]
-f --fasta fasta file for use with CRAM files [default: $env_fasta].
```valid expressions may access the bam attibutes:
+ `mapq `/ `start `/ `pos `/ `end `/ `flag `/ `insert_size ` (where pos is the 1-based start)
+ `is_aligned` `is_read1` `is_read2` `is_supplementary` `is_secondary` `is_dup` `is_qcfail`
+ `is_reverse` `is_mate_reverse` `is_pair` `is_proper_pair` `is_mate_unmapped` `is_unmapped`to use aux tags, indicate them prefixed with 'tag_', e.g.:
tag_NM < 2. Any tag present in the bam can be used in this manner.
example:
```
bam-filter "tag_NM == 2 && tag_RG == 'SRR741410' && is_proper_pair" tests/HG02002.bam
```# count-reads
Count reads reports the number of reads overlapping each interval in a BED file.
```
count-readsUsage: count-reads [options]
Arguments:
the bed file containing regions in which to count reads.
the alignment file for which to calculate depth.Options:
-t --threads number of BAM decompression threads [default: 0]
-f --fasta fasta file for use with CRAM files [default: ].
-F --flag exclude reads with any of the bits in FLAG set [default: 1796]
-Q --mapq mapping quality threshold [default: 0]
-h --help show help
```This is output a line with a count of reads for each line in .
# vcf-check
`vcf-check` is useful as a quality control for large projects which have done variant calling in regions
where each region is called in parallel. With many regions, and large projects, some regions can error and
this might be unknown to the analyst.This tools takes a background VCF, such as gnomad, that has full genome (though in some cases, users will
instead want whole exome) coverage and uses that as an expectation of variants. **If the background has many
variants across a long stretch of genome where the query VCF has no variation, we can expect that region is
missed in the query VCF.**```
Check a VCF against a background to make sure that there are no large missing chunks.vcf-check
Usage: vcf-check [options]
Arguments:
population VCF/BCF with expected sites
query VCF/BCF to checkOptions:
-c --chunk chunk size for genome [default: 100000]
-m --maf allele frequency cutoff [default: 0.1]
```This will output a tab-delimited file of `chrom\tposition\tbackground-count\tquery-count`.
The user can find regions that might be problematic by plotting or with some simple `awk` commands.