https://github.com/tamerh/biobtree
A bioinformatics tool to search, map and retrieve identifiers, keywords and attributes
https://github.com/tamerh/biobtree
bioinformatics genome genomics identifiers mapping
Last synced: 6 months ago
JSON representation
A bioinformatics tool to search, map and retrieve identifiers, keywords and attributes
- Host: GitHub
- URL: https://github.com/tamerh/biobtree
- Owner: tamerh
- License: bsd-3-clause
- Created: 2019-01-14T11:04:23.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-02-25T07:11:30.000Z (about 3 years ago)
- Last Synced: 2024-06-19T04:22:56.891Z (almost 2 years ago)
- Topics: bioinformatics, genome, genomics, identifiers, mapping
- Language: Go
- Homepage:
- Size: 7.11 MB
- Stars: 16
- Watchers: 1
- Forks: 3
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE.lmdbgo.md
Awesome Lists containing this project
- awesome-medical-ai-skills - BioBTree v2 - square) | Mature biomedical graph database spanning 50+ primary data sources with native MCP support. Strong for identifier mapping and cross-database traversal across genes, proteins, compounds, diseases, pathways, and clinical data. | (Biomedical Research & Genomics)
README
# Biobtree
Biobtree is a bioinformatics tool which allows mapping the bioinformatics datasets
via identifiers and special keywors with simple or advance chain query capability.
## Features
* **Datasets** - supports wide datasets such as `Ensembl` `Uniprot` `ChEMBL` `HMDB` `Taxonomy` `GO` `EFO` `HGNC` `ECO` `Uniparc` `Uniref` with tens of more via cross references
by retrieving latest data from providers
* **MapReduce** - processes small or large datasets based on users selection and build B+ tree based uniform local database via specialized MapReduce based tecnique with efficient storage usage
* **Query** - Allow simple or advance chain queries between datasets with intiutive syntax which allows writing RDF or graph like queries
* **Genome** - supports querying full Ensembl genomes coordinates with `transcript`, `CDS`, `exon`, `utr` with several attiributes, mapped datasets and identifiers such as `ortholog` ,`paralog` or probe identifers belongs `Affymetrix` or `Illumina`
* **Protein** - Uniprot proteins including protein features with variations and mapped datasets.
* **Chemistry** - `ChEMBL` and `HMDB` datasets supported for chemistry, disease and drug releated analaysis
* **Taxonomy & Ontologies** - `Taxonomy` `GO` `EFO` `ECO` data with mapping to other datasets and child and parent query capability
* **Your data** - Your custom data can be integrated with or without relation to other datasets
* **Web UI** - Web interface for easy explorations and examples
* **Web Services** - REST or gRPC services
* **R & Python** - [Bioconductor R](https://github.com/tamerh/biobtreeR) and [Python](https://github.com/tamerh/biobtreePy) wrapper packages to use from existing pipelines easier with built-in databases
### Usage
First install [latest](https://github.com/tamerh/biobtree/releases/latest) biobtree executable available for Windows, Mac or Linux. Then extract the downloaded file to a new folder and open a terminal in this new folder directory and starts the biobtree. Alternatively R and Python based [biobtreeR](https://github.com/tamerh/biobtreeR) and [biobtreePy](https://github.com/tamerh/biobtreePy) wrapper packages can be used instead of using the executable directly for eaiser integration.
#### Starting biobtree with target datasets or genomes
```sh
# build ensembl genomes by tax id with uniprot&taxonomy datasets
biobtree --tax 595,984254 -d "uniprot,taxonomy" build
# build datasets only
biobtree -d "uniprot,taxonomy,hgnc" build
biobtree -d "hgnc,chembl,hmdb" build
# once data is built start web for using ws and ui
biobtree web
# to see all options and datasets use help
biobtree help
```
#### Starting biobtree with built-in databases
```sh
# 4 built-in database provided with commonly studied datasets and organism genomes in order to speed up database build process
# Check following func doc for each database content
# https://github.com/tamerh/biobtreeR/blob/master/R/buildData.R
biobtree --pre-built 1 install
biobtree web
```
Builting databases updated regularly at least for each Ensembl release and all builtin database files along with configuration files are hosted in spererate github [repository](https://github.com/tamerh/biobtree-conf)
### Web service endpoints
```ruby
# Meta
# datasets meta informations
localhost:8888/ws/meta
# Search
# i is the only mandatory parameter
localhost:8888/ws/?i={terms}&s={dataset}&p={page}&f={filter}
# Mapping
# i and m are mandatory parameters
localhost:8888/ws/map/?i={terms}&m={mapfilter_query}&s={dataset}&p={page}
# Retrieve dataset entry. Both paramters are mandatory
localhost:8888/ws/entry/?i={identifier}&s={dataset}
# Retrieve entry with filtered mapping entries. Only page parameter is optional
localhost:8888/ws/filter/?i={identifier}&s={dataset}&f={filter_datasets}&p={page}
# Retrieve entry results with page index. All the parameters are mandatory
localhost:8888/ws/page/?i={identifier}&s={dataset}&p={page}&t={total}
```
### Publication
https://f1000research.com/articles/8-145
### Building source
biobtree is written with GO for the data processing and Vue.js for the web application part. To build and the create biobtree executable install go>=1.13 and run
```sh
go build
```
To build the web application for development in the web directory run
```sh
npm install
npm run serve
```
To build the web package run
```sh
npm run build
```