Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shenwei356/gtaxon
gTaxon - a fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool, with cmd client and REST API server for both local and remote server.
https://github.com/shenwei356/gtaxon
bioinformatics client golang lca restful server taxonomy
Last synced: about 2 months ago
JSON representation
gTaxon - a fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool, with cmd client and REST API server for both local and remote server.
- Host: GitHub
- URL: https://github.com/shenwei356/gtaxon
- Owner: shenwei356
- License: mit
- Created: 2016-02-13T13:00:49.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-07-24T06:30:24.000Z (over 8 years ago)
- Last Synced: 2024-08-03T01:14:09.267Z (5 months ago)
- Topics: bioinformatics, client, golang, lca, restful, server, taxonomy
- Language: Go
- Homepage: http://github.com/shenwei356/gtaxon
- Size: 13.1 MB
- Stars: 15
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bio-go - gTaxon - platform NCBI taxonomy data querying tool with (Sequence Analysis and Manipulation)
README
# gTaxon
## Introduction
gTaxon - a fast cross-platform NCBI taxonomy data querying tool,
with cmd client and REST API server for both local and remote server.
[http:///github.com/shenwei356/gtaxon](http:///github.com/shenwei356/gtaxon)## Supporting querying types
| Query type | Function | Local/Remote |
|------------------|------------------------------------------|--------------|
| gi_taxid_nucl | query TaxId by Gi (nucl) | Both |
| gi_taxid_prot | query TaxId by Gi (prot) | Both |
| taxid2taxon | query Taxon by TaxId | Remote |
| name2taxid | query TaxId by Name | Remote |
| lca | query Lowest Common Ancestor by TaxIds | Remote |## Features
- Easy to install. **Only ONE single executable binary file**.
No scared source compilation, installing extra packages,
configuring environment variables
- **Cross platform**. gTaxon is implemented in [golang](https://golang.org).
Executable binary files for most popular operating system (Linux, Mac OS X,
Windows, \*BSD ) are available.
See [Release](https://github.com/shenwei356/gtaxon/releases) page.
- Supporting querying from **both LOCAL and REMOTE server** by REST API,
which is also easily called by various clients of other languages.
gTaxon has command-line client `gtaxon cli local` for local query and
`gtaxon cli remote` for remote query.
- **Fast**. See Section Performance.## Performance
### gi2taxid
[Detail](https://github.com/shenwei356/gtaxon/blob/master/testdata/PERFORMANCE.md)
Note: 1) bolt database utilizes the operating system's page cache,
so repeat queries are faster than the first query. 2) "remote query" actually is from local host
with minimum network latency| dataset | local query | remote query | remote query (repeated) |
|----------------|-----------------|------------------|-------------------------|
| small (0.25K) | 0.013 s | 0.013 s | 0.009s |
| medium (25K) | 0.38 s | 0.57 s | 0.178s |
| large (2.5M) | 17 s | 1min 38s | 20 s |## Download && Install
Steps:
1. Just download and uncompress the executable binary files of your operating system from [Release](https://github.com/shenwei356/gtaxon/releases) page.
2. Rename it to `gtaxon.exe` (for Windows) or `gtaxon` (for other operating systems) for convenience, and then run it in command-line interface, no compilation, no dependencies.
You can also add the directory of the executable file to environment variable `PATH`, so you can run `gtaxon` anywhere.
1. For windows, the simplest way is copy it to ` C:\WINDOWS\system32`.
2. For Linux, simply copy it to `/usr/local/bin` or add the path of gtaxon to environment variable `PATH`:
chmod a+x /PATH/OF/GTAXON/gtaxon
echo export PATH=\$PATH:/PATH/OF/GTAXON >> ~/.bashrc## Usage
### Loading data to database
1. Initializing database.
gtaxon db init
2. Importing data
Supported file types includes:
================================================
data type files
------------------------------------------------
gi_taxid_nucl gi_taxid_nucl.dmp.gz
gi_taxid_prot gi_taxid_prot.dmp.gz
nodes nodes.dmp
names names.dmp
divisions division.dmp
gencodes gencode.dmp
================================================For gi2taxid
# ~ 16 min for me
gtaxon db import -f -t gi_taxid_prot gi_taxid_prot.dmp.gzFor taxon query
gtaxon db import -f -t nodes nodes.dmp
gtaxon db import -f -t names names.dmp
gtaxon db import -f -t divisions division.dmp
gtaxon db import -f -t gencodes gencode.dmp### Querying from local (Only for gi2taxid)
- few queries
gtaxon cli local -t gi_taxid_prot 139299181 139299182
- from file
gtaxon cli local -t gi_taxid_prot -f gi_list_file
### Querying from remote server
1. Starting server
gtaxon server
2. Query TaxId by Gi (gi_taxid_nucl or gi_taxid_prot)
- few queries
gtaxon cli remote -t gi_taxid_prot 139299181 139299182
- from files
gtaxon cli remote -H 192.168.1.101 -P 8080 -t gi_taxid_prot -f gi_list_file
3. Query TaxId by Name (name2taxid)
Limiting name class, using regular expression
gtaxon cli remote -t name2taxid --use-regexp --name-class "scientific name" sapiens
[INFO] Query TaxId by Name from host: 127.0.0.1:8080
sapiens 9606(Homo sapiens),1035824(Trichuris sp. ex Homo sapiens JP-2011),1573476(Homo sapiens/Rattus norvegicus xenograft),324570(Phrynium sapiense),63221(Homo sapiens neanderthalensis),1383439(Homo sapiens/Mus musculus xenograft),741158(Homo sapiens ssp. Denisova),399796(Macrobiotus sapiens),349050(Ficus casapiensis),1131344(Homo sapiens x Mus musculus hybrid cell line),270523(Tetragonula sapiens)gtaxon cli remote -t name2taxid --use-regexp --name-class "genbank common name" human mouse
[INFO] Query TaxId by Name from host: 127.0.0.1:8080
human 121226(Pediculus humanus capitis),121225(Pediculus humanus),51028(Enterobius vermicularis),121224(Pediculus humanus corporis),433352(Diplogonoporus grandis),36087(Trichuris trichiura),115427(Dermatobia hominis),9606(Homo sapiens)
mouse 42410(Peromyscus eremicus),1595964(Apomys sacobianus),10105(Mus minutoides),221913(Pseudomys hermannsburgensis),240587(Thalpomys cerradensis),409025(Peromyscus melanocarpus) ...4. Query Taxon by TaxId (taxid2taxon)
gtaxon cli remote -t taxid2taxon 9
# result is similar with result of example 5)5. Query Lowest Common Ancestor by TaxIds (lca)
gtaxon cli remote -t lca 9606,63221
[INFO] Query LCA by TaxIds from host: 127.0.0.1:8080
Query TaxIDs: 9606,63221
Taxon: {
"TaxId": 9606,
"ScientificName": "Homo sapiens",
"OtherNames": [
{
"ClassCDE": "authority",
"DispName": "Homo sapiens Linnaeus, 1758"
},
{
"ClassCDE": "genbank common name",
"DispName": "human"
},
{
"ClassCDE": "common name",
"DispName": "man"
}
],
"ParentTaxId": 9605,
"Rank": "species",
"Division": "Primates",
"GeneticCode": {
"GCId": 1,
"GCName": "Standard"
},
"MitoGeneticCode": {
"MGCId": 2,
"MGCName": "Vertebrate Mitochondrial"
},
"Lineage": "cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Boreoeutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo",
"LineageEx": [
{
"TaxId": 131567,
"ScientificName": "cellular organisms",
"Rank": "no rank"
},
{
"TaxId": 2759,
"ScientificName": "Eukaryota",
"Rank": "superkingdom"
},
{
"TaxId": 33154,
"ScientificName": "Opisthokonta",
"Rank": "no rank"
},
{
"TaxId": 33208,
"ScientificName": "Metazoa",
"Rank": "kingdom"
},
{
"TaxId": 6072,
"ScientificName": "Eumetazoa",
"Rank": "no rank"
},
{
"TaxId": 33213,
"ScientificName": "Bilateria",
"Rank": "no rank"
},
{
"TaxId": 33511,
"ScientificName": "Deuterostomia",
"Rank": "no rank"
},
{
"TaxId": 7711,
"ScientificName": "Chordata",
"Rank": "phylum"
},
{
"TaxId": 89593,
"ScientificName": "Craniata",
"Rank": "subphylum"
},
{
"TaxId": 7742,
"ScientificName": "Vertebrata",
"Rank": "no rank"
},
{
"TaxId": 7776,
"ScientificName": "Gnathostomata",
"Rank": "no rank"
},
{
"TaxId": 117570,
"ScientificName": "Teleostomi",
"Rank": "no rank"
},
{
"TaxId": 117571,
"ScientificName": "Euteleostomi",
"Rank": "no rank"
},
{
"TaxId": 8287,
"ScientificName": "Sarcopterygii",
"Rank": "no rank"
},
{
"TaxId": 1338369,
"ScientificName": "Dipnotetrapodomorpha",
"Rank": "no rank"
},
{
"TaxId": 32523,
"ScientificName": "Tetrapoda",
"Rank": "no rank"
},
{
"TaxId": 32524,
"ScientificName": "Amniota",
"Rank": "no rank"
},
{
"TaxId": 40674,
"ScientificName": "Mammalia",
"Rank": "class"
},
{
"TaxId": 32525,
"ScientificName": "Theria",
"Rank": "no rank"
},
{
"TaxId": 9347,
"ScientificName": "Eutheria",
"Rank": "no rank"
},
{
"TaxId": 1437010,
"ScientificName": "Boreoeutheria",
"Rank": "no rank"
},
{
"TaxId": 314146,
"ScientificName": "Euarchontoglires",
"Rank": "superorder"
},
{
"TaxId": 9443,
"ScientificName": "Primates",
"Rank": "order"
},
{
"TaxId": 376913,
"ScientificName": "Haplorrhini",
"Rank": "suborder"
},
{
"TaxId": 314293,
"ScientificName": "Simiiformes",
"Rank": "infraorder"
},
{
"TaxId": 9526,
"ScientificName": "Catarrhini",
"Rank": "parvorder"
},
{
"TaxId": 314295,
"ScientificName": "Hominoidea",
"Rank": "superfamily"
},
{
"TaxId": 9604,
"ScientificName": "Hominidae",
"Rank": "family"
},
{
"TaxId": 207598,
"ScientificName": "Homininae",
"Rank": "subfamily"
},
{
"TaxId": 9605,
"ScientificName": "Homo",
"Rank": "genus"
}
]
}## Configuration file for Convenience
Default config file is: `$HOME/.gtaxon.yaml`
This is useful when querying from remote server,
we could type few words by saving flags like host and port to config file.See https://github.com/ogier/pflag
## REST APIs
1. gi2taxid
http://127.0.0.1:8080/gi2taxid?db=gi_taxid_prot&gi=139299191111&gi=139299181&gi=139299175
2. name2taxid
http://localhost:8080/name2taxid?regexp=true&class=genbank+common+name&name=human&name=mouse
3. taxid2taxon
http://localhost:8080/taxid2taxon?taxid=9906&taxid=2
4. lca
http://localhost:8080/lca?taxids=9606,63221&taxids=1,2
You can also write client in your favorite programming language.
## Implement details
API reference: [godoc](https://godoc.org/github.com/shenwei356/gtaxon/taxon)
- Programming language: [Go](https://golang.org)
- Database: [bolt](https://github.com/boltdb/bolt), an embedded key/value database for Go
- Web server: [gin](https://github.com/gin-gonic/gin), a fast HTTP web framework written in Go## Caveats
- 64bit operating system is better.
- `bolt` database utilizes the operating system's page cache, larger virtual memory is better.
- Database file size is 16G after loading gi_taxid_prot.dmp.gz
- About 1.5G RAM usage after starting server