Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ryan-williams/samtools-helpers

A few helper scripts for working with samtools
https://github.com/ryan-williams/samtools-helpers

Last synced: about 2 months ago
JSON representation

A few helper scripts for working with samtools

Awesome Lists containing this project

README

        

# samtools-helpers
A few helper scripts for working with samtools.

## Installation
Put the path to this repo on your `$PATH`.

```sh
echo 'export PATH="$PATH:/path/to/samtools-helpers"' >> ~/.bashrc
```

For some handy aliases, `source` `.samtools-rc` in this repo:

```sh
echo 'source /path/to/samtools-helpers/.samtools-rc' >> ~/.bashrc
```

## Usage
The main useful scripts here are `samtools-view` (alias `sv`) and variants of it (`samtools-view-with-header` a.k.a. `svh`, `samtools-view-less` a.k.a. `svl`).

Each of them takes a `.sam` runs `samtools view`, and then makes the following improvements:

* converts the "bit flag" field to 12 `0`s and `1`s
* formats the file as a table, so e.g. longer vs. shorter read-names in the first column don't mess up the alignment of subsequent columns.

## Examples

#### First 5 non-header lines, using `samtools-view`:
```sh
sv 5 NA12878.sam
20FUKAAXX100202:3:6:15018:84106 000010100011 20 224759 60 101M = 225025 366 ACCCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA ?@BBBCEEDFEFEEEFDEEFEEEEBFEDEFCFDDEEFEDFDFEEEFEEEECEEFEEFCEFDEEFFEFEDEEEFFFDECEDCEFEEDDFFBFEFGEAEDCCC MD:Z:101 PG:Z:BWA RG:Z:20FUK.3 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHFHHHGHHHHHIIHHDHHHHHEHHHHH UQ:i:0
20GAVAAXX100126:8:62:5578:2527 001001010011 20 224759 60 101M = 224453 -406 ACCCAAAGCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA 834:/,1(:8::8::<98;-(-;>5?08/:;/+7<;=>?@:9>;==<=:<8<>?4>B>AABAAB@@;;<<=>===9>9?=9>=?==;=:;>>@3@;1 MD:Z:7T93 PG:Z:BWA RG:Z:20GAV.8 AM:i:25 NM:i:1 SM:i:37 MQ:i:60 OQ:Z:C4541/1.55555555544008??9?1514401555?AAA;5554444555?A?7AFEFFFFFFDF55555444454445555444@5@==5555555555 UQ:i:7
20FUKAAXX100202:4:47:20584:49257 000010100011 20 224761 60 101M = 225058 387 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT ?ACDBBCEDFEDEFEEEFEDBECFBFEFCFDEEEFEDFDFEEEFEEEECEEFEEFCEFFEEFFEFEDEAEFFFAECEFCDFEEFBFFDBEEC:@6A?C4>B MD:Z:101 PG:Z:BWA RG:Z:20FUK.4 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHHDHHHHHIHHHHFHGIGHFE;D9BBD7AH UQ:i:0
20GAVAAXX100126:7:47:4730:37293 000010100011 20 224761 60 101M = 225073 412 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT ?BB@BCBFDDECC=E@@DB;BDCFDE<BADD>?C?EDEB>@AC==DAE?E=CAC?;:>4=B676<17@@<:AA<;6 MD:Z:101 PG:Z:BWA RG:Z:20GAV.7 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:BBA>AB@BB@BA?>B==??7>@BBA@:6@@@@@@A@BAA>A?B@BA?=?>9=????@?@>>>@?67@<;??@>?@????@9:96=>2236-39=73@:652 UQ:i:0
20GAVAAXX100126:5:46:21151:39489 000001010011 20 224761 60 101M = 224465 -396 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT >9<=BBB>BB>EFFEEEFEEECEFEEFDEFEEEFFEEFEEFDDEEEEDEEFFDDDDFFFDDFFDEFDEEDFFEEEEEEEEEFEEEEEFFEFEFEF=DED=A MD:Z:101 PG:Z:BWA RG:Z:20GAV.5 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:DBGGFDFCFFBHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHGH UQ:i:0
```
It's still on you to know [which of the 12 bits mean what](https://samtools.github.io/hts-specs/SAMv1.pdf), but it's a lot better than doing the binary conversion in your head!

#### First 5 non-header lines, using regular `samtools view`:
```sh
$ samtools view NA12878.sam | head -n 5
20FUKAAXX100202:3:6:15018:84106 163 20 224759 60 101M = 225025 366 ACCCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA ?@BBBCEEDFEFEEEFDEEFEEEEBFEDEFCFDDEEFEDFDFEEEFEEEECEEFEEFCEFDEEFFEFEDEEEFFFDECEDCEFEEDDFFBFEFGEAEDCCC MD:Z:101 PG:Z:BWA RG:Z:20FUK.3 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHFHHHGHHHHHIIHHDHHHHHEHHHHH UQ:i:0
20GAVAAXX100126:8:62:5578:2527 595 20 224759 60 101M = 224453 -406 ACCCAAAGCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA 834:/,1(:8::8::<98;-(-;>5?08/:;/+7<;=>?@:9>;==<=:<8<>?4>B>AABAAB@@;;<<=>===9>9?=9>=?==;=:;>>@3@;1 MD:Z:7T93 PG:Z:BWA RG:Z:20GAV.8 AM:i:25 NM:i:1 SM:i:37 MQ:i:60 OQ:Z:C4541/1.55555555544008??9?1514401555?AAA;5554444555?A?7AFEFFFFFFDF55555444454445555444@5@==5555555555 UQ:i:7
20FUKAAXX100202:4:47:20584:49257 163 20 224761 60 101M = 225058 387 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT ?ACDBBCEDFEDEFEEEFEDBECFBFEFCFDEEEFEDFDFEEEFEEEECEEFEEFCEFFEEFFEFEDEAEFFFAECEFCDFEEFBFFDBEEC:@6A?C4>B MD:Z:101 PG:Z:BWA RG:Z:20FUK.4 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHHDHHHHHIHHHHFHGIGHFE;D9BBD7AH UQ:i:0
20GAVAAXX100126:7:47:4730:37293 163 20 224761 60 101M = 225073 412 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT ?BB@BCBFDDECC=E@@DB;BDCFDE<BADD>?C?EDEB>@AC==DAE?E=CAC?;:>4=B676<17@@<:AA<;6 MD:Z:101 PG:Z:BWA RG:Z:20GAV.7 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:BBA>AB@BB@BA?>B==??7>@BBA@:6@@@@@@A@BAA>A?B@BA?=?>9=????@?@>>>@?67@<;??@>?@????@9:96=>2236-39=73@:652 UQ:i:0
20GAVAAXX100126:5:46:21151:39489 83 20 224761 60 101M = 224465 -396 CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT >9<=BBB>BB>EFFEEEFEEECEFEEFDEFEEEFFEEFEEFDDEEEEDEEFFDDDDFFFDDFFDEFDEEDFFEEEEEEEEEFEEEEEFFEFEFEF=DED=A MD:Z:101 PG:Z:BWA RG:Z:20GAV.5 AM:i:37 NM:i:0 SM:i:37 MQ:i:60 OQ:Z:DBGGFDFCFFBHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHGH UQ:i:0
```

Note the opaque binary-flag integers in the second field, and the misalignments of some columns.

#### Entire `.sam` file without header:
```sh
sv NA12878.sam
# or:
samtools-view NA12878.sam
```

#### Entire `.sam` file with header:
```sh
svh NA12878.sam
samtools-view-with-header NA12878.sam
```