https://github.com/kpj/metagenompy
The NCBI taxonomy as a NetworkX graph with helper functions
https://github.com/kpj/metagenompy
Last synced: 25 days ago
JSON representation
The NCBI taxonomy as a NetworkX graph with helper functions
- Host: GitHub
- URL: https://github.com/kpj/metagenompy
- Owner: kpj
- License: mit
- Created: 2021-01-04T09:20:49.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-04-12T06:08:48.000Z (about 2 years ago)
- Last Synced: 2025-03-30T10:32:50.724Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 1.29 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# metagenompy
[](https://pypi.python.org/pypi/metagenompy)
[](https://github.com/kpj/metagenompy/actions)Your all-inclusive package for aggregating and visualizing metagenomic BLAST results.
## Installation
Note that the `pygraphviz` python dependency has the `graphviz` non-python dependency. How to install it depends on your system. See the [pygraphviz docs](https://pygraphviz.github.io/documentation/stable/install.html#install) for details.
Here are a few common methods:```bash
# conda
$ conda install pygraphviz# Ubuntu and Debian
$ sudo apt-get install graphviz graphviz-dev# Fedora and Red Hat
$ sudo dnf install graphviz graphviz-devel# macOS
$ brew install graphviz
```Afterwards, `metagenompy` can be installed using pip:
```bash
$ pip install metagenompy
```## Usage
### NCBI taxonomy as NetworkX object
The core of `metagenompy` is a taxonomy as a networkX object.
This means that all your favorite algorithms work right out of the box.```python
import metagenompy
import networkx as nx# load taxonomy
graph = metagenompy.generate_taxonomy_network(auto_download=True)# print path from human to pineapple
for node in nx.shortest_path(graph.to_undirected(as_view=True), '9606', '4615'):
print(node, graph.nodes[node])
## 9606 {'rank': 'species', 'authority': 'Homo sapiens Linnaeus, 1758', 'scientific_name': 'Homo sapiens', 'genbank_common_name': 'human', 'common_name': 'man'}
## 9605 {'rank': 'genus', 'authority': 'Homo Linnaeus, 1758', 'scientific_name': 'Homo', 'common_name': 'humans'}
## [..]
## 4614 {'rank': 'genus', 'authority': 'Ananas Mill., 1754', 'scientific_name': 'Ananas'}
## 4615 {'rank': 'species', 'authority': ['Ananas comosus (L.) Merr., 1917', 'Ananas lucidus Mill., 1754'], 'scientific_name': 'Ananas comosus', 'synonym': ['Ananas comosus var. comosus', 'Ananas lucidus'], 'genbank_common_name': 'pineapple'}
```### Easy transformation and visualization of taxonomic tree
Extract taxonomic entities of interest and visualize their relations:
```python
import metagenompy
import matplotlib.pyplot as plt# load and condense taxonomy to relevant ranks
graph = metagenompy.generate_taxonomy_network(auto_download=True)
metagenompy.condense_taxonomy(graph)# highlight interesting nodes
graph_zoom = metagenompy.highlight_nodes(graph, [
'9606', # human
'9685', # cat
'9615', # dog
'4615', # pineapple
'3747', # strawberry
'4113', # potato
])# visualize result
fig, ax = plt.subplots(figsize=(10, 10))
metagenompy.plot_network(graph_zoom, ax=ax, labels_kws=dict(font_size=10))
fig.tight_layout()
fig.savefig('taxonomy.pdf')
```
### Summary statistics for BLAST results
After blasting your reads against a [sequence database](ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/), generating summary reports using `metagenompy` is a blast.
```python
import metagenompy
import pandas as pd# read BLAST results file with columns 'qseqid' and 'staxids'
df_blast = metagenompy.load_example_dataset()
df = (df_blast.set_index('qseqid')['staxids']
.str.split(';')
.explode()
.dropna()
.reset_index()
.rename(columns={'staxids': 'taxid'})
)df.head()
## qseqid taxid
## 0 read1 1811693
## 1 read2 327160
## 2 read3 821
## 3 read4 1871047
## 4 read5 69360# classify taxons at multiple ranks
graph = metagenompy.generate_taxonomy_network(auto_download=True)rank_list = ['species', 'genus', 'class', 'superkingdom']
df = metagenompy.classify_dataframe(
graph, df,
rank_list=rank_list
)# aggregate read matches
agg_rank = 'genus'
df_agg = metagenompy.aggregate_classifications(df, agg_rank)df_agg.head()
## taxid species genus class superkingdom
## qseqid
## read1 1811693 Pelotomaculum sp. PtaB.Bin104 Pelotomaculum Clostridia Bacteria
## read10 2488860 Erythrobacter spongiae Erythrobacter Alphaproteobacteria Bacteria
## read100 78398 Pectobacterium odoriferum Pectobacterium Gammaproteobacteria Bacteria
## read101 1843082 Macromonas sp. BK-30 Macromonas Betaproteobacteria Bacteria
## read102 2665644 Paracoccus sp. YIM 132242 Paracoccus Alphaproteobacteria Bacteria# visualize outcome
metagenompy.plot_piechart(df_agg)
```
![]()