An open API service indexing awesome lists of open source software.

https://github.com/sap218/acidoseq

A Python package for studying Acidobacteria
https://github.com/sap218/acidoseq

bacteria bioinformatics nanopore plotting python python3

Last synced: about 1 month ago
JSON representation

A Python package for studying Acidobacteria

Awesome Lists containing this project

README

          

# acidoseq

Studying Acidobacteria reads from a **Nanopore** metagenomic data-set | **Python v3.5** | [PyPI](https://pypi.org/project/acidoseq/) (see version)

Author __Samantha C Pendleton__, Data Science MSc Aberystwyth University, [Twitter](https://twitter.com/sap218) | [GitHub](https://github.com/sap218)

Follow the Twitter bot I created, [acido_bot](https://twitter.com/acido_bot), that dispenses daily facts about Acidobacteria!

The **GC** content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60\% for group V fragments and roughly 10\% lower for group III fragments) are similar, displaying the diversity within the phylum [1].
The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].

This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.

## Introduction
[**Kaiju**](http://kaiju.binf.ku.dk) output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.

###### Prerequisite
* FASTA format of all the reads.
* Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.

###### Dependencies
```
import os
import csv
import pysam
import collections
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import random
from termcolor import colored
from colorama import init
import click
```

`$ pip3 install matplotlib`

## Installation

**GitClone**

`$ git clone https://github.com/sap218/acidoseq.git`

**pip**

`$ pip install acidoseq`

**Kaiju**

I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.

1. Filter the output with only classified labels `$ awk '$1 == "C"' kaiju.out > kaijuC.out`
2. Cut the columns `$ cut -f2,3 kaijuC.out > results.txt`
3. Converted the txt to csv (comma-delimted) `$ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv`

## Map
If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.

Please **note**: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.

`$ acidomap --city Birmingham`

## Usage
CLI **needs** the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.

If no plot style was provided, or entered incorrectly, it will choose a random one.

Run like followed with **Linux** (find how to [run with other operating systems here](https://en.wikibooks.org/wiki/Python_Programming/Creating_Python_Programs)):

```
$ acidoseq --help
Usage: acidoseq [OPTIONS]

Options:
--taxdumptype TEXT Study "ALL" or only unclassified "U"?
--kaijufile TEXT Place edited Kaiju (csv) in directory for ease.
--fastapath TEXT Place FASTA in directory for ease.
--style TEXT ['seaborn-bright', 'seaborn-poster', 'seaborn-white',
'bmh', 'seaborn-darkgrid', 'seaborn-pastel',
'grayscale', '_classic_test', 'ggplot', 'seaborn-
whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-
colorblind', 'seaborn-ticks', 'Solarize_Light2',
'seaborn-notebook', 'dark_background', 'fast',
'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-
dark-palette', 'seaborn-talk', 'classic', 'seaborn-
deep']
--plottype TEXT "span" range of GC means OR "line" average mean GC
--ph TEXT pH of soil, use map script for assistance.
--help Show this message and exit.
```

###### Examples

`$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa`

`$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92`

`$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14`

**Output**
* FASTA file: a collection of reads which were identified as Acidobacteria
* Plot of AT and GC ratio comparison with means
* Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')
* Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low

## Acknowledgements
* **Amanda Clare**, senior lecturer, MSc supervisor at Aberystwyth University, [Twitter](https://twitter.com/afcaber) | [GitHub](https://github.com/amandaclare) | [Staff Profile](https://www.aber.ac.uk/en/cs/staff-profiles/listing/profile/afc/)
* **Sam Nicholls**, postdoc at University of Birmingham, [Twitter](https://twitter.com/samstudio8) | [GitHub](https://github.com/SamStudio8)
* **Arwyn Edwards**, senior lecturer at Aberystwyth University, provided the data-set, [Twitter](https://twitter.com/arwynedwards) | [Staff Profile](https://www.aber.ac.uk/en/ibers/staff-profiles/listing/profile/aye/)

## Thank you! :seedling:

Don't hesitate to create an issue or make a suggestion!

###### Todo List
- [x] Make available
- [x] Improve descriptions and comments
- [x] Look into command line interface
- [x] Fix code to output unclassified subdivisions based on pH
- [ ] Alter code so the input file can be the original Kaiju output
- [ ] Make available on Conda

###### References
[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.

[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.