Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AdmiralenOla/PuppetMaster
Tools for manipulating sequencing data, multiple sequence alignments and phylogenetic trees
https://github.com/AdmiralenOla/PuppetMaster
Last synced: about 1 month ago
JSON representation
Tools for manipulating sequencing data, multiple sequence alignments and phylogenetic trees
- Host: GitHub
- URL: https://github.com/AdmiralenOla/PuppetMaster
- Owner: AdmiralenOla
- License: gpl-2.0
- Created: 2015-10-07T12:53:57.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2021-06-17T15:07:10.000Z (over 3 years ago)
- Last Synced: 2024-08-02T11:20:32.158Z (4 months ago)
- Language: Python
- Size: 18.6 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bacteria - PuppetMaster - [Python] - Variable sites extractor. (Software packages / Bacterial population genomics)
README
# PuppetMaster
Tools for manipulating sequencing data, multiple sequence alignments and phylogenetic trees# VARIABLE SITES EXTRACTION
The aim of many microbial typing pipelines is to compare isolates at the SNP level. After multiple sequence alignment,
phylogenetic inferences can be made by comparing SNPs across isolates. However, since SNPs are usually called with respect
to a reference genome, one typically ends up with many redundant SNP sites when comparing isolates towards each other.In parsimony methods (as opposed to likelihood or distance methods), constant sites (i.e. sites where the isolates being
compared do not differ) can safely be excluded, as they are not informative for tree inference.Similarly, sites where just one isolate is polymorphous, although variable, may not be of interest since they do not
contribute to tree discrimination. Finally, sites where all four bases are represented are also non-informative and
should be trimmed.Furthermore, multiple sequence alignments are typically cluttered with gap characters "-" and ambigous characters "N".
These sites are not valuable for phylogenetic inference, and should be trimmed.This script allows the user to trim away N-containing columns, gapped columns and non-variable/non-informative sites
from a multiple sequence alignment FASTA file.Author: Ola Brynildsrud