Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/maragkakislab/samql
SQL-like query language for the SAM/BAM file format
https://github.com/maragkakislab/samql
bam bioinformatics go sam sequencing sql
Last synced: 3 months ago
JSON representation
SQL-like query language for the SAM/BAM file format
- Host: GitHub
- URL: https://github.com/maragkakislab/samql
- Owner: maragkakislab
- License: mit
- Created: 2018-08-03T17:36:30.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-09-20T16:58:03.000Z (over 1 year ago)
- Last Synced: 2024-08-03T01:13:48.111Z (6 months ago)
- Topics: bam, bioinformatics, go, sam, sequencing, sql
- Language: Go
- Homepage:
- Size: 65.4 KB
- Stars: 26
- Watchers: 5
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bio-go - samql - like query language for the SAM/BAM file format. (Proteomics and Genomics Analysis)
README
# samql [![Go Report Card](https://goreportcard.com/badge/github.com/maragkakislab/samql)](https://goreportcard.com/report/github.com/maragkakislab/samql) [![GoDoc](https://godoc.org/github.com/maragkakislab/samql?status.svg)](https://godoc.org/github.com/maragkakislab/samql)
SQL-like query language for the SAM/BAM file format## Install
Download the latest executable:
[Latest](https://github.com/maragkakislab/samql/releases/latest/).
Otherwise, to install the Go library and executable:
`go get github.com/maragkakislab/samql/...`
## Objective
To provide a command line utility and a clean API for filtering SAM/BAM files
using simple SQL-like commands. The samql command showcases the envisioned
functionality.## Usage
```bash
Filters a SAM/BAM file using the SQL clause provided
samql 1.7
Usage: samql [--where WHERE] [--count] [--sam] [--parr PARR] [--obam] INPUT [INPUT ...]Positional arguments:
INPUT file (- for STDIN)Options:
--where WHERE SQL clause to match records
--count, -c print only the count of matching records
--sam, -S interpret input as SAM, otherwise BAM
--parr PARR, -p PARR Number of cores for parallelization. Uses all available, if not provided.
--obam, -b Output BAM
--help, -h display this help and exit
--version display version and exit
```## Examples
```bash
# Simple
samql --where "RNAME = chr1" test.bam # Reference name is "chr1"
samql --where "QNAME = read1" test.bam # Query name is "read1"
samql --where "POS > 100" test.bam # Position (0-based) greater than 100
samql --where "REVERSE" test.bam # Negative strand
samql --where "FLAG & 16 = 16" test.bam # Ditto using flag arithmetics# More than one files
samql --where "REVERSE" test1.bam test2.bam # Reads are returned in the order of the files# Regex
samql --where "CIGAR =~ /^15M/" test.bam # Alignment starts with 15 matches# More complex
samql --where "RNAME = chr1 OR QNAME = read1 AND POS > 100" test.bam# Just counting
samql -c --where "RNAME = chr1" test.bam# Very complex
# Uniquely mapped reads, with first pair on chr1 after
# position 1000000 and second pair on chr1 or chrX that
# start with 15 matches/mismatches, are shorter than 75 nts
# or begin with an ATG and are located on the reverse strand.samql --where "(RNAME = 'chr1' AND POS > 1000000) AND \
(RNEXT = 'chr1' OR RNEXT = 'chrX') AND \
NH:i = 1 AND \
CIGAR =~ /^15M/ AND \
(LENGTH < 75 OR SEQ =~ /^ATG/) AND \
PAIRED AND REVERSE"
```## Keywords
```Go
QNAME // QNAME corresponds to the SAM record query name.
FLAG // FLAG corresponds to the SAM record alignment flag.
RNAME // RNAME corresponds to the SAM record reference name
POS // POS corresponds to the SAM record position (0-based).
MAPQ // MAPQ corresponds to the SAM record mapping quality.
CIGAR // CIGAR corresponds to the SAM record CIGAR string.
RNEXT // RNEXT corresponds to the reference name of the mate read.
PNEXT // PNEXT corresponds to the position of the mate read.
TLEN // TLEN corresponds to SAM record template length.
SEQ // SEQ corresponds to SAM record segment sequence.
QUAL // QUAL corresponds to SAM record quality.
LENGTH // LENGTH corresponds to the alignment length.
PAIRED // PAIRED corresponds to SAM flag 0x1.
PROPERPAIR // PROPERPAIR corresponds to SAM flag 0x2.
UNMAPPED // UNMAPPED corresponds to SAM flag 0x4.
MATEUNMAPPED // MATEUNMAPPED corresponds to SAM flag 0x8.
REVERSE // REVERSE corresponds to SAM flag 0x10.
MATEREVERSE // MATEREVERSE corresponds to SAM flag 0x20.
READ1 // READ1 corresponds to SAM flag 0x40.
READ2 // READ2 corresponds to SAM flag 0x80.
SECONDARY // SECONDARY corresponds to SAM flag 0x100.
QCFAIL // QCFAIL corresponds to SAM flag 0x200.
DUPLICATE // DUPLICATE corresponds to SAM flag 0x400.
SUPPLEMENTARY // SUPPLEMENTARY corresponds to SAM flag 0x800.
END // END corresponds to the alignment end.
```## API example
```Go
// Open github.com/biogo/hts/sam reader
f := os.Open("test.sam")
sr, _ := sam.NewReader(f)// Do the filtering
r := samql.NewReader(sr)
filter, _ := samql.Where("POS = 1")
r.Filters = append(r.Filters, filter)
for {
rec, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
panic(err)
}// Do sth with rec
}
```