https://github.com/sap218/acidoseq
A Python package for studying Acidobacteria
https://github.com/sap218/acidoseq
bacteria bioinformatics nanopore plotting python python3
Last synced: about 1 month ago
JSON representation
A Python package for studying Acidobacteria
- Host: GitHub
- URL: https://github.com/sap218/acidoseq
- Owner: sap218
- License: mit
- Created: 2018-08-02T16:25:17.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-24T11:28:54.000Z (over 7 years ago)
- Last Synced: 2025-12-21T22:59:48.398Z (2 months ago)
- Topics: bacteria, bioinformatics, nanopore, plotting, python, python3
- Language: Python
- Homepage:
- Size: 2.15 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# acidoseq
Studying Acidobacteria reads from a **Nanopore** metagenomic data-set | **Python v3.5** | [PyPI](https://pypi.org/project/acidoseq/) (see version)
Author __Samantha C Pendleton__, Data Science MSc Aberystwyth University, [Twitter](https://twitter.com/sap218) | [GitHub](https://github.com/sap218)
Follow the Twitter bot I created, [acido_bot](https://twitter.com/acido_bot), that dispenses daily facts about Acidobacteria!
The **GC** content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60\% for group V fragments and roughly 10\% lower for group III fragments) are similar, displaying the diversity within the phylum [1].
The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].
This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.
## Introduction
[**Kaiju**](http://kaiju.binf.ku.dk) output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.
###### Prerequisite
* FASTA format of all the reads.
* Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.
###### Dependencies
```
import os
import csv
import pysam
import collections
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import random
from termcolor import colored
from colorama import init
import click
```
`$ pip3 install matplotlib`
## Installation
**GitClone**
`$ git clone https://github.com/sap218/acidoseq.git`
**pip**
`$ pip install acidoseq`
**Kaiju**
I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.
1. Filter the output with only classified labels `$ awk '$1 == "C"' kaiju.out > kaijuC.out`
2. Cut the columns `$ cut -f2,3 kaijuC.out > results.txt`
3. Converted the txt to csv (comma-delimted) `$ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv`
## Map
If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.
Please **note**: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.
`$ acidomap --city Birmingham`
## Usage
CLI **needs** the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.
If no plot style was provided, or entered incorrectly, it will choose a random one.
Run like followed with **Linux** (find how to [run with other operating systems here](https://en.wikibooks.org/wiki/Python_Programming/Creating_Python_Programs)):
```
$ acidoseq --help
Usage: acidoseq [OPTIONS]
Options:
--taxdumptype TEXT Study "ALL" or only unclassified "U"?
--kaijufile TEXT Place edited Kaiju (csv) in directory for ease.
--fastapath TEXT Place FASTA in directory for ease.
--style TEXT ['seaborn-bright', 'seaborn-poster', 'seaborn-white',
'bmh', 'seaborn-darkgrid', 'seaborn-pastel',
'grayscale', '_classic_test', 'ggplot', 'seaborn-
whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-
colorblind', 'seaborn-ticks', 'Solarize_Light2',
'seaborn-notebook', 'dark_background', 'fast',
'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-
dark-palette', 'seaborn-talk', 'classic', 'seaborn-
deep']
--plottype TEXT "span" range of GC means OR "line" average mean GC
--ph TEXT pH of soil, use map script for assistance.
--help Show this message and exit.
```
###### Examples
`$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa`
`$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92`
`$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14`
**Output**
* FASTA file: a collection of reads which were identified as Acidobacteria
* Plot of AT and GC ratio comparison with means
* Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')
* Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low
## Acknowledgements
* **Amanda Clare**, senior lecturer, MSc supervisor at Aberystwyth University, [Twitter](https://twitter.com/afcaber) | [GitHub](https://github.com/amandaclare) | [Staff Profile](https://www.aber.ac.uk/en/cs/staff-profiles/listing/profile/afc/)
* **Sam Nicholls**, postdoc at University of Birmingham, [Twitter](https://twitter.com/samstudio8) | [GitHub](https://github.com/SamStudio8)
* **Arwyn Edwards**, senior lecturer at Aberystwyth University, provided the data-set, [Twitter](https://twitter.com/arwynedwards) | [Staff Profile](https://www.aber.ac.uk/en/ibers/staff-profiles/listing/profile/aye/)
## Thank you! :seedling:
Don't hesitate to create an issue or make a suggestion!
###### Todo List
- [x] Make available
- [x] Improve descriptions and comments
- [x] Look into command line interface
- [x] Fix code to output unclassified subdivisions based on pH
- [ ] Alter code so the input file can be the original Kaiju output
- [ ] Make available on Conda
###### References
[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.
[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.