https://github.com/juke34/sapin
Summarize Alignment Pile by Nucleotide
https://github.com/juke34/sapin
alignment bam
Last synced: 4 months ago
JSON representation
Summarize Alignment Pile by Nucleotide
- Host: GitHub
- URL: https://github.com/juke34/sapin
- Owner: Juke34
- License: gpl-3.0
- Created: 2023-06-23T15:33:07.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-06-24T16:36:56.000Z (5 months ago)
- Last Synced: 2025-06-24T17:43:36.882Z (5 months ago)
- Topics: alignment, bam
- Language: Python
- Homepage:
- Size: 8.39 MB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://www.gnu.org/licenses/gpl-3.0)
# SAPiN
---------------------------
Summarize Alignment Pile by Nucleotide
## Table of Contents
* [Foreword](#foreword)
* [Install](#install)
* [Usage](#usage)
* [Output](#output)
* [Acknowledgement](#acknowledgement)
## Foreword
This tool aims to summarize BAM read alignment by pileup or reads at each position in a tabulated way. More convenient as a mpileup format and containing extra information.
## Output
Here an example of output you would get with SAPiN
```
SEQID POS REF QUAL A T G C N INS DEL IUPAC COV COV_ATGC MUT_RAT APOBEC ADAR REGION CODON NUC DESC
HPV42REF 118 C 38.28 0 16 0 1356 0 0 0 0 1374 1372 1.17 1.2 . AATGTCAGGTA CAG 1 gene:ID=gene-1;Name=E6@@mRNA:ID=nbis-rna-1;Parent=gene-1;Name=E6@@exon:ID=nbis-exon-1;Parent=nbis-rna-1;Name=E6@@CDS:ID=cds-1;Parent=nbis-rna-1;Name=E6
```
Here a description of the different fields
| Field | Optional | Type | Description |
| --- | --- | --- | --- |
| SEQID | | String | The ID of the landmark used to establish the coordinate system for the current feature. |
| POS | | Integer | The reference position, with the 1st base having position 1 |
| REF | | Character | The reference base. |
| QUAL | | Float | Mean Phred-scaled quality score for the sequenced position. |
| A | | Integer | Number of Adenine nucleotide at the position |
| T | | Integer | Number of Thymine nucleotide at the position |
| G | | Integer | Number of Guanosine nucleotide at the position |
| C | | Integer | Number of Cytosine nucleotide at the position |
| N | | Integer | Number of Unknown nucleotide at the position |
| INS | | Integer | Number of Insertion at the position |
| DEL | | Integer | Number of Deletion at the position |
| IUPAC | | Integer | Number of IUPAC nucleotide (minus A,T,G,C,N) at the position |
| COV | | Integer | Coverage at the position (including INS,DEL,IUPAC) |
| COV_ATGC | | Integer | Coverage at the position of A,T,G,C nucleotide only |
| MUT_RAT | | Float | Mutation ration (COV_ATGC/nb mutated nuc*100) |
| APOBEC | | Float | Mutation ration of C-to-T or G-to-A. Usefull when studying transcriptomes |
| ADAR | | Float | Mutation ration of A-to-G or T-to-C. Usefull when studying transcriptomes |
| REGION | | STRING | substring of 5 nucleotide on each side. Usefill to make pattern |
| CODON | Only if GFF provided | STRING | substring of codon in phase/frame (/!\ do not take spliced CDS in account). |
| NUC | Only if GFF provided | Integer | 1,2 or 3. Indicate in the CODON (previous column) which nucleotide is the one studied at the position |
| DESC | Only if GFF provided | STRING | feature type and attributes extracted from the gff at the position |
## Install
### Prerequisite
* python3
* pysam
* gffutils
* matplotlib
They should be automatically installed during SAPiN installation.
#### Installation with pip:
```bash
pip install git+https://github.com/Juke34/SAPiN.git
```
or if you do not have administrative rights on your machine
```bash
pip install --user git+https://github.com/Juke34/SAPiN.git
```
#### Installation with git:
Clone the repository:
```bash
git clone https://github.com/Juke34/SAPiN.git
```
Install:
```bash
python -m pip install .
```
#### Check installation
Executing:
```bash
sapin
```
or
```bash
sapin -h
```
will display some help.
## Update
#### Update with pip:
```bash
pip install git+https://github.com/Juke34/SAPiN.git --upgrade
```
or if you do not have administartive rights on your machine
```bash
pip install --user git+https://github.com/Juke34/SAPiN.git --upgrade
```
#### Update with git:
Move into the repository folder and execute:
```bash
git pull
python -m pip install .
```
## Uninstall
```bash
pip uninstall sapin
```
## Usage
```
sapin -a t/reference.bam -f t/reference.fasta
```
**advanced:**
```
sapin -a t/reference.bam -f t/reference.fasta -g t/reference_agat.gff3 -cf 1000 -bqf 20 -p
```
## Parameters
| Parameter | Type | Description |
| --- | --- | --- |
| -a, --ali | String | Path to the BAM input file |
| -f, --fasta | String | Path to the reference fasta file used to align the reads against. |
| -g, --gff | String | Optional - Path to the reference gff |
| -o, --output | String | Path to the tsv output file |
| -p, --plot | Boolean | To plot the ratio of mutation per position (sapin_plot.svg by default. If outpout provided output.svg). |
| -q, --quiet | Boolean | "Decrease verbosity |
| -v, --verbose | Boolean | Increase verbosity |
| -z, --gzip | Boolean | Gzip output file |
| -s, --shame | Boolean | Suppress the shameless plug |
| -cf, --cover_filter | Integer | filter output to report only site with coverage >= |
| -bqf, --base_quality_filter | Integer | filter output to report only site with base quality >= (default 0) |
| -mqf, --base_quality_filter | Integer | filter output to report only site with mapping quality >= (default 0) |
| -mf, --mutation_filter | Integer | filter output to report only site where the mutation ratio >= (default 0) |