https://github.com/irskep/enzyme2sqlite

Convert the ENZYME database from flat file to SQLite
https://github.com/irskep/enzyme2sqlite

Last synced: 3 months ago
JSON representation

Convert the ENZYME database from flat file to SQLite

Host: GitHub
URL: https://github.com/irskep/enzyme2sqlite
Owner: irskep
Created: 2011-10-03T04:29:37.000Z (about 14 years ago)
Default Branch: master
Last Pushed: 2011-10-03T07:26:40.000Z (about 14 years ago)
Last Synced: 2023-10-20T20:26:12.164Z (almost 2 years ago)
Language: Python
Homepage:
Size: 102 KB
Stars: 4
Watchers: 4
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

          enzyme2sqlite

=============

This script parses the ENZYME enzyme nomenclature database from the flat file

format to a SQLite database (in my case, to use in an iPad app).

Example usage:

    python3 enzyme2sqlite.py enzyme.dat -o enzyme.sqlite

For more information, see the

[ENZYME project home page](http://enzyme.expasy.org/).

Parser Output

-------------

The `parse()` function returns a dictionary in this format:

    {

        'id': str,

        'names': [str],

        'alt_names': [str],

        'catalytic_activity': [str],

        'cofactors': [str],

        'comments': [str],

        'prosite_refs': [str],

        'db_refs': [[str, str], [str, str], ...],

    }

The name mappings should be obvious, but you can reference `parse.ABBREV_NAMES`

to be sure.

Table Format

------------

Most of the multi-item columns use strings separated by `-!-`. This is because

creating a "proper schema" seemed like overkill for my particular project, and

that is a safe delimiter for their data. So `['A', 'B']` becomes `A-!-B`.

    table enzymes

    id:                     unchanged string

    names:                  strings separated by '-!-'

    alt_names:              strings separated by '-!-'

    catalytic_activity:     strings separated by '-!-'

    cofactors:              strings separated by '-!-'

    comments:               strings separated by '-!-'

    prosite_refs:           strings separated by ';'

    db_refs                 string pairs like 'a,b;c,d;e,f'

Example

-------

This is straight from the tests, where `test_item_1_1_1_2` is alcohol

dehydrogenase (NADP(+)) parsed from `enzyme.dat`:

    from parse import parse

    from sqlize import sqlize, desqlize_row

    with open('enzyme.dat', 'r') as f:

        data = parse(f)

        conn = sqlize(data, ':memory:')

        c = conn.cursor()

        for row in c.execute('select * from enzymes where id=?', ('1.1.1.2',)):

            assert desqlize_row(row) == test_item_1_1_1_2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/irskep/enzyme2sqlite

Awesome Lists containing this project

README