Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/y9c/cpup
Convert mpileup format to base count tsv table
https://github.com/y9c/cpup
bioinformatics dna-modifications mpileup mutations rna-modifications samtools snp-genotyping
Last synced: about 1 month ago
JSON representation
Convert mpileup format to base count tsv table
- Host: GitHub
- URL: https://github.com/y9c/cpup
- Owner: y9c
- Created: 2020-12-21T17:14:16.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-12-19T03:29:34.000Z (about 1 year ago)
- Last Synced: 2023-12-19T08:59:55.096Z (about 1 year ago)
- Topics: bioinformatics, dna-modifications, mpileup, mutations, rna-modifications, samtools, snp-genotyping
- Language: C++
- Homepage:
- Size: 652 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cpup
**Convert samtools mpileup result into base count table**
> Feature:
- multiple bam files are supported
- filter sites before output
- output by strandness
- group and count indels by pattern## Install
```bash
git clone https://github.com/y9c/cpup.git
cd ./cpup
make
```## Usage
Pipe the output of samtools mpileup into this tools!
```
Usage:
samtools mpileup -d 0 -Q 10 --reverse-del -l <.bed> -f <.fa> <.bam> | cpup-h, --help show help
-H, --headerless hide header
-S, --strandless ignore strand information
-s, --by-strand output by strand
-m, --major-strand output major strand only
-i, --indel append indel count
-e, --ends append read ends (5' | 3') count
-c, --count [] select count columns
-f, --filter [] filter sites
-F, --drop [] drop sites```
`samtools mpileup` can mpileup the mapping result site by site in the format
below. The 5th (8th, 11st, 14th, ...) column report the observed bases in each site.```
XII 455422 C 16 <<<<<<<<<<<<<<,, FFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455423 T 16 <<<<<<<<<<<<<<,, FFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455424 C 17 <<<<<<<<<<<<<<,,^$, FFFFFFFFFFFFFFFFE 4 <<,, FFFF
XII 455425 A 18 <<<<<<<<<<<<<<,,,^$, FFFFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455426 A 18 <<<<<<<<<<<<<<,,,, FFFFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455427 A 20 <<<<<<<<<<<<<<,,,,^$,^$, FFFFFFFFFFFFFFFFFFEE 4 <<,, FFFF
XII 455428 C 20 <<<<<<<<<<<<<<,,,,,, FFFFFFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455429 A 20 <<<<<<<<<<<<<<,,,,,, FFFFFFFFFFFFFFFFFFFF 4 <<,, FFFF
XII 455430 G 22 <<<<<<<<<<<<<<,,,,,,^$,^$, FFFFFFFFFFFFFFFFFFFFBB 4 <<,, FFFF
XII 455431 G 23 <<<<<<<<<<<<<<,,,,,,,,^$, FFFFFFFFFFFFFFFFFFFFEED 4 <<,, FFFF
...
````cpup` can convert the mpileup output into table below by parameters `-i -s -f mut:3`.
```
chr pos ref_base strand depth,a,c,g,t,n,gap,insert,delete,istat,dstat depth,a,c,g,t,n,gap,insert,delete,istat,dstat
XII 455496 C - 1409,17,0,0,0,0,2,0,2,,c:1|caa:1 139,3,0,0,0,0,1,0,2,,c:1|caa:1
XII 455498 T - 1454,0,6,2,0,0,407,0,244,,at:220|att:24 144,0,0,1,0,0,37,0,27,,at:23|att:4
XII 455499 T - 1496,0,1,12,0,0,250,0,12,,t:7|tt:5 151,0,0,2,0,0,29,0,1,,t:1
XII 455500 A - 1455,0,2,2,0,0,256,0,0,, 145,0,1,0,0,0,28,0,0,,
XII 729179 T + 223,0,3,0,0,0,33,0,0,, 56,0,0,0,0,0,14,0,0,,
XII 729182 C + 235,1,0,0,4,0,2,0,0,, 56,0,0,0,0,0,1,0,0,,
```> NOTE:
- `-Q` in samtools mpileup should not be set to zero, which might bring bug in counting overlapping reads.
- `-f` to check **any** (max) value greater than cutoff.
eg: `mut:3` for filter sites with **more than or equal to(>=)** 3 mutations in any sample.
You can set multile filters, like `mut:3,delete:2` to filter sites with >= 3 mutations in any samples, meanwhile, are should be >=2 delete events.
- `-F` to check **all** (min) value greater than cutoff.## Q&A?
- filter input base by its quality?
`samtools` can do it
- filter output site by cutoff?
`awk` can do it
- samtools mpileup is too slow?
`bcftools mpileup` might a better choice, if you are not confusing with tags (`-t AD`, `-t ADF`, `-t ADR`).