https://github.com/irskep/enzyme2sqlite
Convert the ENZYME database from flat file to SQLite
https://github.com/irskep/enzyme2sqlite
Last synced: 3 months ago
JSON representation
Convert the ENZYME database from flat file to SQLite
- Host: GitHub
- URL: https://github.com/irskep/enzyme2sqlite
- Owner: irskep
- Created: 2011-10-03T04:29:37.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2011-10-03T07:26:40.000Z (about 14 years ago)
- Last Synced: 2023-10-20T20:26:12.164Z (almost 2 years ago)
- Language: Python
- Homepage:
- Size: 102 KB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
enzyme2sqlite
=============This script parses the ENZYME enzyme nomenclature database from the flat file
format to a SQLite database (in my case, to use in an iPad app).Example usage:
python3 enzyme2sqlite.py enzyme.dat -o enzyme.sqlite
For more information, see the
[ENZYME project home page](http://enzyme.expasy.org/).Parser Output
-------------The `parse()` function returns a dictionary in this format:
{
'id': str,
'names': [str],
'alt_names': [str],
'catalytic_activity': [str],
'cofactors': [str],
'comments': [str],
'prosite_refs': [str],
'db_refs': [[str, str], [str, str], ...],
}The name mappings should be obvious, but you can reference `parse.ABBREV_NAMES`
to be sure.Table Format
------------Most of the multi-item columns use strings separated by `-!-`. This is because
creating a "proper schema" seemed like overkill for my particular project, and
that is a safe delimiter for their data. So `['A', 'B']` becomes `A-!-B`.table enzymes
id: unchanged string
names: strings separated by '-!-'
alt_names: strings separated by '-!-'
catalytic_activity: strings separated by '-!-'
cofactors: strings separated by '-!-'
comments: strings separated by '-!-'
prosite_refs: strings separated by ';'
db_refs string pairs like 'a,b;c,d;e,f'Example
-------This is straight from the tests, where `test_item_1_1_1_2` is alcohol
dehydrogenase (NADP(+)) parsed from `enzyme.dat`:from parse import parse
from sqlize import sqlize, desqlize_row
with open('enzyme.dat', 'r') as f:
data = parse(f)
conn = sqlize(data, ':memory:')
c = conn.cursor()
for row in c.execute('select * from enzymes where id=?', ('1.1.1.2',)):
assert desqlize_row(row) == test_item_1_1_1_2