https://github.com/artoria2e5/gtdb-treebase
Script for converting a GTDB database to a treebase format in opentreeoflife.
https://github.com/artoria2e5/gtdb-treebase
Last synced: 14 days ago
JSON representation
Script for converting a GTDB database to a treebase format in opentreeoflife.
- Host: GitHub
- URL: https://github.com/artoria2e5/gtdb-treebase
- Owner: Artoria2e5
- License: cc0-1.0
- Created: 2021-02-05T08:20:11.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-02-05T11:27:14.000Z (over 4 years ago)
- Last Synced: 2025-02-17T15:52:14.805Z (3 months ago)
- Language: Python
- Size: 7.81 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GTDB-Treebase
Script for converting a GTDB database to a treebase format in opentreeoflife.## What we do now
* Map representative genomes to species
* Map suffixed genus names in species to conventional versions
* Trim off the taxon level prefix – Actually we can do it in the mapper
* Join up bacteria and archaea at a root for single-file upload## Why some stuff aren't done yet
### More to conventional (NCBI) names
In theory we could use the `auxillary_files/gtdb_vs_ncbi_*.xlsx` to do a deeper species
mapping, but that is quite risky. Mapping on the higher-level taxa feels pointless too.## Stuff I really should do
* Instead of the taxonomy tsv, fetch the data from GenBank directly. In other words,
erase GTDB taxonomy and just use the hot mess. Will make OTU mapping a lot smoother...