https://github.com/sap218/acidoseq

A Python package for studying Acidobacteria
https://github.com/sap218/acidoseq

bacteria bioinformatics nanopore plotting python python3

Last synced: 5 months ago
JSON representation

A Python package for studying Acidobacteria

Host: GitHub
URL: https://github.com/sap218/acidoseq
Owner: sap218
License: mit
Created: 2018-08-02T16:25:17.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-10-24T11:28:54.000Z (over 7 years ago)
Last Synced: 2025-12-21T22:59:48.398Z (6 months ago)
Topics: bacteria, bioinformatics, nanopore, plotting, python, python3
Language: Python
Homepage:
Size: 2.15 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # acidoseq

Studying Acidobacteria reads from a **Nanopore** metagenomic data-set | **Python v3.5** | [PyPI](https://pypi.org/project/acidoseq/) (see version)

Author __Samantha C Pendleton__, Data Science MSc Aberystwyth University, [Twitter](https://twitter.com/sap218) | [GitHub](https://github.com/sap218)

Follow the Twitter bot I created, [acido_bot](https://twitter.com/acido_bot), that dispenses daily facts about Acidobacteria!

The **GC** content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60\% for group V fragments and roughly 10\% lower for group III fragments) are similar, displaying the diversity within the phylum [1].

The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].

This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.

## Introduction

[**Kaiju**](http://kaiju.binf.ku.dk) output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.

###### Prerequisite

* FASTA format of all the reads.

* Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.

###### Dependencies

```

import os

import csv                                                                                                        

import pysam  

import collections

import matplotlib.pyplot as plt

import matplotlib.patches as mpatches

import random

from termcolor import colored

from colorama import init 

import click

```

`$ pip3 install matplotlib`

## Installation

**GitClone**

`$ git clone https://github.com/sap218/acidoseq.git`

**pip**

`$ pip install acidoseq`

**Kaiju**

I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.

1. Filter the output with only classified labels	`$ awk '$1 == "C"' kaiju.out > kaijuC.out`

2. Cut the columns					`$ cut -f2,3 kaijuC.out > results.txt`

3. Converted the txt to csv (comma-delimted)		`$ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv`

## Map

If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.

Please **note**: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.

`$ acidomap --city Birmingham`

## Usage

CLI **needs** the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.

If no plot style was provided, or entered incorrectly, it will choose a random one.

Run like followed with **Linux** (find how to [run with other operating systems here](https://en.wikibooks.org/wiki/Python_Programming/Creating_Python_Programs)):

```

$ acidoseq --help

Usage: acidoseq [OPTIONS]

Options:

  --taxdumptype TEXT  Study "ALL" or only unclassified "U"?

  --kaijufile TEXT    Place edited Kaiju (csv) in directory for ease.

  --fastapath TEXT    Place FASTA in directory for ease.

  --style TEXT        ['seaborn-bright', 'seaborn-poster', 'seaborn-white',

                      'bmh', 'seaborn-darkgrid', 'seaborn-pastel',

                      'grayscale', '_classic_test', 'ggplot', 'seaborn-

                      whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-

                      colorblind', 'seaborn-ticks', 'Solarize_Light2',

                      'seaborn-notebook', 'dark_background', 'fast',

                      'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-

                      dark-palette', 'seaborn-talk', 'classic', 'seaborn-

                      deep']

  --plottype TEXT     "span" range of GC means OR "line" average mean GC

  --ph TEXT           pH of soil, use map script for assistance.

  --help              Show this message and exit.

```

###### Examples

`$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa`

`$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92`

`$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14`

**Output**

* FASTA file: a collection of reads which were identified as Acidobacteria

* Plot of AT and GC ratio comparison with means 

* Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')

* Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low

## Acknowledgements

* **Amanda Clare**, senior lecturer, MSc supervisor at Aberystwyth University, [Twitter](https://twitter.com/afcaber) | [GitHub](https://github.com/amandaclare) | [Staff Profile](https://www.aber.ac.uk/en/cs/staff-profiles/listing/profile/afc/)

* **Sam Nicholls**, postdoc at University of Birmingham, [Twitter](https://twitter.com/samstudio8) | [GitHub](https://github.com/SamStudio8)

* **Arwyn Edwards**, senior lecturer at Aberystwyth University, provided the data-set, [Twitter](https://twitter.com/arwynedwards) | [Staff Profile](https://www.aber.ac.uk/en/ibers/staff-profiles/listing/profile/aye/)

## Thank you! :seedling:

Don't hesitate to create an issue or make a suggestion!

###### Todo List

- [x] Make available

- [x] Improve descriptions and comments

- [x] Look into command line interface

- [x] Fix code to output unclassified subdivisions based on pH

- [ ] Alter code so the input file can be the original Kaiju output

- [ ] Make available on Conda

###### References

[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.

[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sap218/acidoseq

Awesome Lists containing this project

README