https://github.com/biosustain/gnomic
A grammar for describing microbial genotypes and phenotypes
https://github.com/biosustain/gnomic
bioinformatics biology microbial-genomics python
Last synced: about 1 month ago
JSON representation
A grammar for describing microbial genotypes and phenotypes
- Host: GitHub
- URL: https://github.com/biosustain/gnomic
- Owner: biosustain
- License: other
- Created: 2015-12-11T14:07:16.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-09-04T11:44:04.000Z (over 7 years ago)
- Last Synced: 2025-04-06T10:44:49.582Z (about 1 month ago)
- Topics: bioinformatics, biology, microbial-genomics, python
- Language: Python
- Homepage: https://gnomic.readthedocs.io/
- Size: 220 KB
- Stars: 6
- Watchers: 9
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
Gnomic
======.. image:: https://travis-ci.org/biosustain/gnomic.svg?branch=master
:target: https://travis-ci.org/biosustain/gnomic.. image:: https://zenodo.org/badge/47830031.svg
:target: https://zenodo.org/badge/latestdoi/47830031Gnomic is a human– and computer–readable representation of microbial genotypes and phenotypes. The ``gnomic``
Python package contains a parser for the Gnomic grammar able to interpret changes over multiple generations.The first formal guidelines for microbial genetic nomenclature were drawn up in the 1960s. These traditional nomenclatures are too
ambiguous to be useful for modern computer-assisted genome engineering. The *gnomic* grammar is an improvement over existing nomenclatures, designed to be clear, unambiguous, computer–readable and describe genotypes at various levels of detail.Installation
------------.. code-block:: bash
pip install gnomic
Language grammar
----------------The grammar consists of a list of genotype or phenotype designations, separated by
spaces and/or commas. The designations are described using the following nomenclature:============================================================= ==================================
Designation Grammar expression
============================================================= ==================================
``feature`` deleted ``-feature``
``feature`` at ``locus`` deleted ``-feature@locus``
``feature`` inserted ``+feature``
``site`` replaced with ``feature`` ``site>feature``
``site`` (multiple integration) replaced with ``feature`` ``site>>feature``
``site`` at ``locus`` replaced with ``feature`` ``site@locus>feature``
``feature`` of ``organism`` ``organism/feature``
``feature`` with ``type`` ``type.feature``
``feature`` with variant ``feature(variant)``
``feature`` with list of variants ``feature(var1, var2)`` or ``feature(var1; var2)``
``feature`` with accession number ``feature#GB:123456``
``feature`` by accession number ``#GB:123456``
accession number ``#database:id`` or ``#id``
fusion of ``feature1`` and ``feature2`` ``feature1:feature2``
insertion of two fused features ``+feature1:feature2``
insertion of a list of features or fusions ``+{..insertables}``
fusion of a list and a feature ``{..insertables}:feature``
a non-integrated plasmid ``(plasmid)`` or ``(plasmid ...insertables)``
integrated plasmid vector with required insertion site ``site>(vector ..insertables)``
============================================================= ==================================Feature variants
^^^^^^^^^^^^^^^^Features may have one or more variants, separated by colon ";" or comma ",".
For example: ``geneX(cold-resistant; heat-resistant)``
Variants can either be identifiers (using the characters a-z, 0-9, "-" and "_") or be sequence variants following
the HGVS `Sequence Variant Nomenclature `_.For example: ``geneY(c.123G>T)``
Example usage
-------------In this example, we parse `"EcGeneA ΔsiteA::promoterB:EcGeneB ΔgeneC"` and `"ΔgeneA"` in *gnomic* syntax:
.. code-block:: python
>>> from gnomic import Genotype
>>> g1 = Genotype.parse('+Ec/geneA(variant) siteA>P.promoterB:Ec/geneB -geneC')
>>> g1.added_features
{Feature(organism='Ec', name='geneA', variant=('variant',)),
Feature(organism='Ec', name='geneB'),
Feature(type='P', name='promoterB')}
>>> g1.removed_features
{Feature(name='geneC'),
Feature(name='siteA')}>>> g2 = Genotype.parse('-geneA', parent=g1)
>>> g2.added_features
{Feature(type='P', name='promoterB'),
Feature(name='geneB', organism='Ec')}
>>> g2.removed_features
{Feature(name='siteA'),
Feature(name='geneC')}
>>> g2.changes()
(Change(multiple=False,
after=Fusion(annotations=(Feature(type='P', name='promoterB'), Feature(organism='Ec', name='geneB'))),
before=Feature(name='siteA')),
Change(multiple=False, before=Feature(name='geneC')))>>> g2.format()
'ΔsiteA→P.promoterB:Ec/geneB ΔgeneC'Development
-----------To rebuild the gnomic parser using `grako` (version 3.18.1), run:
::
grako gnomic-grammar/genotype.enbf -o gnomic/grammar.py -m Gnomic
References
------------ `Wikipedia — Bacterial genetic nomenclature `_
- `Journal of Bacteriology — Instructions to Authors `_
- `Human Genome Variation Society — Sequence Variant Nomenclature `_
- `Databases cross-referenced in UniProtKB `_
- `MIRIAM Registry `_