Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/robaina/filtersam
Tools to filter SAM/BAM files by percent identity and percent of matched sequence
https://github.com/robaina/filtersam
alignment bioinformatics computational-biology genomics python samtools sequence-alignment
Last synced: 3 months ago
JSON representation
Tools to filter SAM/BAM files by percent identity and percent of matched sequence
- Host: GitHub
- URL: https://github.com/robaina/filtersam
- Owner: Robaina
- License: apache-2.0
- Created: 2021-08-28T18:41:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-06T08:57:54.000Z (over 1 year ago)
- Last Synced: 2024-09-25T09:21:39.471Z (4 months ago)
- Topics: alignment, bioinformatics, computational-biology, genomics, python, samtools, sequence-alignment
- Language: Python
- Homepage:
- Size: 313 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
![logo](assets/logo.png)
## A Python tool to filter sam/bam files by percent identity or percent of matched sequence![PyPI](https://img.shields.io/pypi/v/filtersam)
![GitHub release (latest by date)](https://img.shields.io/github/v/release/Robaina/filterSAM)
[![GitHub license](https://img.shields.io/github/license/Robaina/filterSAM)](https://github.com/Robaina/filterSAM/blob/master/LICENSE)
![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4)
[![DOI](https://zenodo.org/badge/400865776.svg)](https://zenodo.org/badge/latestdoi/400865776)
Percent identity is computed as:
$$PI = 100 \frac{N_m}{N_m + N_i}$$
where $N_m$ is the number of matches and $N_i$ is the number of mismatches.
Percent of matched sequences is computed as:
$$PM = 100 \frac{N_m}{L}$$
where $L$ corresponds to query sequence length.
## NOTES
1. Percent of matched sequence is also an alternative definition of percent identity used in some cases, for intance, in [BLAST](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity).
2. BAM/SAM files must contain [MD tags](https://github.com/vsbuffalo/devnotes/wiki/The-MD-Tag-in-BAM-Files) to be able to filter by percent identity. Aligners such as [BWA](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234/) add MD tags to each queried sequence in a BAM file. MD tags can also be generated with [samtools](http://www.htslib.org/doc/samtools-calmd.html).
## Installation
```pip install filtersam```
## Usage
You can find a jupyter notebook with usage examples [here](examples/examples.ipynb).
## Citation
If you use this software, please cite it as below:
Robaina-Estévez, S. (2022). filterSAM: filter sam/bam files by percent identity or percent of matched sequence (Version 0.0.11)[Computer software]. https://doi.org/10.5281/zenodo.7056278.