Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/itdaniher/friendly_chebi

"Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds."
https://github.com/itdaniher/friendly_chebi

Last synced: 21 days ago
JSON representation

"Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds."

Awesome Lists containing this project

README

        

### Friendly-ChEBI

#### Introduction

"Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds."

`friendly_ChEBI` is a project to make this archive of organic compounds trivially usable for both interactive lookups and programmatic (API) access with the
intent of facilitating new applications and reference resources.

The main purpose of this project is to serve as a chemical reference, allowing the end user to quickly check molecular mass and chemical structure from a
command line environment. The [whoosh](https://whoosh.readthedocs.io) project is used for ChEBI indexing and search. The
[docopt-ng](https://github.com/bazaar-projects/docopt-ng) project is used for the command line interface and documentation.

#### Usage:

To get you started fast, a recent copy of ChEBI is contained in this repo.
To use it, you'll need to install dependencies and build an index.

##### Setup:

``` bash
sudo apt install python3-pip
sudo pip3 install pipenv
pipenv install .
pipenv run build_index
pipenv search [options] (KEYWORDS...)
```

##### Searching:

For convenience, `search.py` can be invoked as `pipenv run search`.

```
Usage:
search.py --field_help
search.py [options] (KEYWORDS...)

Options:
--search FIELD Column to search. [default: names]
--max_count COUNT Maximum number of results to return. [default: 10]
--sort_by FIELD Sort results by this field name. [default: mass]
--postfix_glob G Use provided keyword as start of query. Appends G. [default: *]
--prefix_glob G Preface keyword with G for glob searching.
--no_glob Disables globbing.
--index PATH Specify path to the index made by ChEBI_Indexer.py. [default: index]
--field_help Returns available fieldnames for searching and sorting.
```

#### Examples

```
$ pipenv run search methylene blue

H-phenothiazine in which the ring hydrogens at positions 3 and 7 have been replaced by dimethylamino groups.', 'formula': 'C16H19N3S', 'mass': 285.409, 'names': 'leucomethylene blue; N(3),N(3),N(7),N(7)-tetramethyl-10H-phenothiazine-3,7-diamine; Panatone; Reduced methylene blue'}>

```

```
$ pipenv run search --prefix_glob '*' --max_count 5 tryptamine

...
```

#### An aside on SDF

SDF is a somewhat terrible format - it's a pseudo-heirarchical key-value mapping with objects separated by a the "$$$$" string. Originally designed to distribute [Molfile](http://en.wikipedia.org/wiki/Molfile) connection table information, EBI made use of associated data functionality to distribute a large amount of incredibly useful molecular metadata in addition to the standard table.

The only parser I could find for the SDF format was part of the overcomplicated [OpenBabel](http://openbabel.org) project. I wanted to play with the information contained in the ChEBI database, but didn't want to deal with an absurdly complex program to get at it. An hour or four and a bit of Python later and I had a beautiful, albiet large, 22k element list of dictionaries.