https://github.com/robaina/filtersam
Tools to filter SAM/BAM files by percent identity and percent of matched sequence
https://github.com/robaina/filtersam
alignment bioinformatics computational-biology genomics python samtools sequence-alignment
Last synced: 3 months ago
JSON representation
Tools to filter SAM/BAM files by percent identity and percent of matched sequence
- Host: GitHub
- URL: https://github.com/robaina/filtersam
- Owner: Robaina
- License: apache-2.0
- Created: 2021-08-28T18:41:26.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-06-06T08:57:54.000Z (about 2 years ago)
- Last Synced: 2025-03-10T22:37:36.622Z (4 months ago)
- Topics: alignment, bioinformatics, computational-biology, genomics, python, samtools, sequence-alignment
- Language: Python
- Homepage:
- Size: 313 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README

## A Python tool to filter sam/bam files by percent identity or percent of matched sequence

[](https://github.com/Robaina/filterSAM/blob/master/LICENSE)

[](https://zenodo.org/badge/latestdoi/400865776)
Percent identity is computed as:
$$PI = 100 \frac{N_m}{N_m + N_i}$$
where $N_m$ is the number of matches and $N_i$ is the number of mismatches.
Percent of matched sequences is computed as:
$$PM = 100 \frac{N_m}{L}$$
where $L$ corresponds to query sequence length.
## NOTES
1. Percent of matched sequence is also an alternative definition of percent identity used in some cases, for intance, in [BLAST](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity).
2. BAM/SAM files must contain [MD tags](https://github.com/vsbuffalo/devnotes/wiki/The-MD-Tag-in-BAM-Files) to be able to filter by percent identity. Aligners such as [BWA](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234/) add MD tags to each queried sequence in a BAM file. MD tags can also be generated with [samtools](http://www.htslib.org/doc/samtools-calmd.html).
## Installation
```pip install filtersam```
## Usage
You can find a jupyter notebook with usage examples [here](examples/examples.ipynb).
## Citation
If you use this software, please cite it as below:
Robaina-Estévez, S. (2022). filterSAM: filter sam/bam files by percent identity or percent of matched sequence (Version 0.0.11)[Computer software]. https://doi.org/10.5281/zenodo.7056278.