https://github.com/tamerh/biobtree

A bioinformatics tool to search, map and retrieve identifiers, keywords and attributes
https://github.com/tamerh/biobtree

bioinformatics genome genomics identifiers mapping

Last synced: 6 months ago
JSON representation

A bioinformatics tool to search, map and retrieve identifiers, keywords and attributes

Host: GitHub
URL: https://github.com/tamerh/biobtree
Owner: tamerh
License: bsd-3-clause
Created: 2019-01-14T11:04:23.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2023-02-25T07:11:30.000Z (about 3 years ago)
Last Synced: 2024-06-19T04:22:56.891Z (almost 2 years ago)
Topics: bioinformatics, genome, genomics, identifiers, mapping
Language: Go
Homepage:
Size: 7.11 MB
Stars: 16
Watchers: 1
Forks: 3
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE.lmdbgo.md

Awesome Lists containing this project

awesome-medical-ai-skills - BioBTree v2 - square) | Mature biomedical graph database spanning 50+ primary data sources with native MCP support. Strong for identifier mapping and cross-database traversal across genes, proteins, compounds, diseases, pathways, and clinical data. | (Biomedical Research & Genomics)

README

# Biobtree

Biobtree is a bioinformatics tool which allows mapping the bioinformatics datasets
via identifiers and special keywors with simple or advance chain query capability.

## Features

* **Datasets** - supports wide datasets such as `Ensembl` `Uniprot` `ChEMBL` `HMDB` `Taxonomy` `GO` `EFO` `HGNC` `ECO` `Uniparc` `Uniref` with tens of more via cross references
by retrieving latest data from providers

* **MapReduce** - processes small or large datasets based on users selection and build B+ tree based uniform local database via specialized MapReduce based tecnique with efficient storage usage

* **Query** - Allow simple or advance chain queries between datasets with intiutive syntax which allows writing RDF or graph like queries

* **Genome** - supports querying full Ensembl genomes coordinates with `transcript`, `CDS`, `exon`, `utr` with several attiributes, mapped datasets and identifiers such as `ortholog` ,`paralog` or probe identifers belongs `Affymetrix` or `Illumina`

* **Protein** - Uniprot proteins including protein features with variations and mapped datasets.

* **Chemistry** - `ChEMBL` and `HMDB` datasets supported for chemistry, disease and drug releated analaysis

* **Taxonomy & Ontologies** - `Taxonomy` `GO` `EFO` `ECO` data with mapping to other datasets and child and parent query capability

* **Your data** - Your custom data can be integrated with or without relation to other datasets

* **Web UI** - Web interface for easy explorations and examples

* **Web Services** - REST or gRPC services

* **R & Python** - [Bioconductor R](https://github.com/tamerh/biobtreeR) and [Python](https://github.com/tamerh/biobtreePy) wrapper packages to use from existing pipelines easier with built-in databases

### Usage

First install [latest](https://github.com/tamerh/biobtree/releases/latest) biobtree executable available for Windows, Mac or Linux. Then extract the downloaded file to a new folder and open a terminal in this new folder directory and starts the biobtree. Alternatively R and Python based [biobtreeR](https://github.com/tamerh/biobtreeR) and [biobtreePy](https://github.com/tamerh/biobtreePy) wrapper packages can be used instead of using the executable directly for eaiser integration.

#### Starting biobtree with target datasets or genomes
```sh

# build ensembl genomes by tax id with uniprot&taxonomy datasets
biobtree --tax 595,984254 -d "uniprot,taxonomy" build

# build datasets only
biobtree -d "uniprot,taxonomy,hgnc" build
biobtree -d "hgnc,chembl,hmdb" build

# once data is built start web for using ws and ui
biobtree web

# to see all options and datasets use help
biobtree help

```

#### Starting biobtree with built-in databases

```sh
# 4 built-in database provided with commonly studied datasets and organism genomes in order to speed up database build process
# Check following func doc for each database content
# https://github.com/tamerh/biobtreeR/blob/master/R/buildData.R

biobtree --pre-built 1 install
biobtree web
```
Builting databases updated regularly at least for each Ensembl release and all builtin database files along with configuration files are hosted in spererate github [repository](https://github.com/tamerh/biobtree-conf)

### Web service endpoints
```ruby
# Meta
# datasets meta informations
localhost:8888/ws/meta

# Search
# i is the only mandatory parameter
localhost:8888/ws/?i={terms}&s={dataset}&p={page}&f={filter}

# Mapping
# i and m are mandatory parameters
localhost:8888/ws/map/?i={terms}&m={mapfilter_query}&s={dataset}&p={page}

# Retrieve dataset entry. Both paramters are mandatory
localhost:8888/ws/entry/?i={identifier}&s={dataset}

# Retrieve entry with filtered mapping entries. Only page parameter is optional
localhost:8888/ws/filter/?i={identifier}&s={dataset}&f={filter_datasets}&p={page}

# Retrieve entry results with page index. All the parameters are mandatory
localhost:8888/ws/page/?i={identifier}&s={dataset}&p={page}&t={total}

```

### Publication
https://f1000research.com/articles/8-145

### Building source

biobtree is written with GO for the data processing and Vue.js for the web application part. To build and the create biobtree executable install go>=1.13 and run

```sh
go build
```

To build the web application for development in the web directory run

```sh
npm install
npm run serve
```

To build the web package run

```sh
npm run build
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tamerh/biobtree

Awesome Lists containing this project

README