https://github.com/nmdp-bioinformatics/seq-ann

Python package for doing fast and accurate sequence annotation
https://github.com/nmdp-bioinformatics/seq-ann

bioinformatics gfe hla hla-typing ngs sequence-analysis

Last synced: 17 days ago
JSON representation

Python package for doing fast and accurate sequence annotation

Host: GitHub
URL: https://github.com/nmdp-bioinformatics/seq-ann
Owner: nmdp-bioinformatics
License: gpl-3.0
Created: 2017-11-06T18:57:31.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2023-05-22T22:28:03.000Z (over 2 years ago)
Last Synced: 2025-09-02T07:20:07.743Z (5 months ago)
Topics: bioinformatics, gfe, hla, hla-typing, ngs, sequence-analysis
Language: Python
Size: 22.4 MB
Stars: 5
Watchers: 12
Forks: 8
Open Issues: 7
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: COPYING

Awesome Lists containing this project

README

          ===============================

SeqAnn

===============================

.. image:: https://img.shields.io/travis/nmdp-bioinformatics/SeqAnn.svg

        :target: https://travis-ci.org/nmdp-bioinformatics/SeqAnn

.. image:: https://readthedocs.org/projects/seqann/badge/?version=latest

        :target: https://seqann.readthedocs.io/en/latest/?badge=latest

        :alt: Documentation Status

.. image:: https://pyup.io/repos/github/nmdp-bioinformatics/SeqAnn/shield.svg

     :target: https://pyup.io/repos/github/nmdp-bioinformatics/SeqAnn/

     :alt: Updates

.. image:: https://img.shields.io/pypi/v/seqann.svg

        :target: https://pypi.python.org/pypi/seqann

.. image:: https://coveralls.io/repos/github/nmdp-bioinformatics/SeqAnn/badge.svg?branch=master

        :target: https://coveralls.io/github/nmdp-bioinformatics/SeqAnn?branch=master

Python package for annotating gene features

* Free software: LGPL 3.0

* Documentation: https://seqann.readthedocs.io.

* `Jupyter Notebook`_

Overview

---------

The ``seqann`` package allows 

users to annotate gene features in consensus sequences. Annotations can be created by passing consensus sequences to the ``annotate`` method in the

``BioSeqAnn`` class. No parameters are required when initalizing a ``BioSeqAnn`` class. However, annotations can be

created significantly faster when using a BioSQL database. When a BioSQL database is not provided the lastest `hla.dat`_ file is downloaded and parsed.

A BioSQL database containing all of IPD-IMGT/HLA is available on DockerHub_ and can be

run on any machine that has docker installed. 

Install

------------

.. code-block:: shell

    

    pip install seq-ann

Parameters

----------

Below are the list of parameters and the default values used when initalizing a ``BioSeqAnn`` object.

.. table::

    :widths: 10 10 10 50

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | Parameter   | Type              | Default | Description                                                                   |

    +=============+===================+=========+===============================================================================+

    | server      | ``BioSeqDatabase``| None    | A BioSQL database containing all of the sequence data from IPD-IMGT/HLA.      |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | dbversion   | ``str``           | Latest  | The IPD-IMGT/HLA or KIR database release.                                     |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | datfile     | ``str``           | None    | The IPD-IMGT/HLA or KIR dat file to use in place of the **server** parameter. |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | kir         | ``bool``          | False   | Flag for indicating the input sequences are from the KIR gene system.         |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | align       | ``bool``          | False   | Flag for producing the alignments along with the annotations.                 |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | verbose     | ``bool``          | False   | Flag for running in verbose mode.                                             |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | verbosity   | ``int``           | None    | Numerical value to indicate how verbose the output will be in verbose mode.   |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

    | debug       | ``Dict``          | None    | A dictionary containing a process names as the key and verbosity as the value |

    +-------------+-------------------+---------+-------------------------------------------------------------------------------+

Usage

---------

To annotated a sequence initialize a new ``BioSeqAnn`` object and then pass the sequence to the

``annotate`` method. The sequence must be a Biopython ``Seq``. The locus of the sequence is not required but it will improve the accuracy of the annotation.

  The packages `ncbi-blast+` and `clustalo` are required to be installed on your system.

  Set variables to BioSQL host/port if using BioSQL.

.. code-block:: shell

    export BIOSQLHOST="localhost"

    export BIOSQLPORT=3306

.. code-block:: python3

    from seqann import BioSeqAnn

    seqann = BioSeqAnn()

    ann = seqann.annotate(sequence, "HLA-A")

The annotation of sequence can be done with or without providing a ``BioSeqDatabase``. To use a BioSQL database 

initialize a ``BioSeqDatabase`` with the parameters that match the database you have running. If you are 

running the imgt_biosqldb from DockerHub_ then the following parameters we be the same. 

.. code-block:: python3

    from seqann import BioSeqAnn

    from BioSQL import BioSeqDatabase

    server = BioSeqDatabase.open_database(driver="pymysql", user="root",

                                          passwd="my-secret-pw", host="localhost",

                                          db="bioseqdb", port=3306)

    seqann = BioSeqAnn(server=server)

    ann = seqann.annotate(sequence, "HLA-A")

You may need to set environment variables: *BIOSQLHOST* (e.g. "localhost") and *BIOSQLPORT*  (e.g. 3306)  to your docker instance.

Annotations

------------

.. code-block:: shell

            {

                 'complete_annotation': True,

                 'annotation': {'exon_1': SeqRecord(seq=Seq('AGAGACTCTCCCG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[]),

                                'exon_2': SeqRecord(seq=Seq('AGGATTTCGTGTACCAGTTTAAGGCCATGTGCTACTTCACCAACGGGACGGAGC...GAG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[]),

                                'exon_3': SeqRecord(seq=Seq('TGGAGCCCACAGTGACCATCTCCCCATCCAGGACAGAGGCCCTCAACCACCACA...ATG', SingleLetterAlphabet()), id='HLA:HLA00630', name='', description='HLA:HLA00630', dbxrefs=[])},

                 'features': {'exon_1': SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(13), strand=1), type='exon_1'),

                              'exon_2': SeqFeature(FeatureLocation(ExactPosition(13), ExactPosition(283), strand=1), type='exon_2')

                              'exon_3': SeqFeature(FeatureLocation(ExactPosition(283), ExactPosition(503), strand=1), type='exon_3')},

                 'method': 'nt_search and clustalo',

                 'gfe': 'HLA-Aw2-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-4',

                 'seq': SeqRecord(seq=Seq('AGAGACTCTCCCGAGGATTTCGTGTACCAGTTTAAGGCCATGTGCTACTTCACC...ATG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[])

            }

Once a sequence has been annotated the gene features and their corresponding sequences are available in the returned ``Annotation`` object. If a full

annotation is not able to be produced then nothing will be returned. Below is an example showing how the features can be accessed and printed out.

.. code-block:: python3

    ann = seqann.annotate(sequence, "HLA-A")

    for feat in ann.annotation:

        print(feat, ann.gfe, str(ann.annotation[feat].seq), sep="\t")

Dependencies

------------

* `Clustal Omega`_ 1.2.0 or higher

* `Python 3.6`_

* blastn_

.. _DockerHub: https://hub.docker.com/r/nmdpbioinformatics/imgt_biosqldb/

.. _`GitHub page`: http://google.com

.. _`hla.dat`: https://github.com/ANHIG/IMGTHLA

.. _`Python 3.6`: https://www.python.org/downloads

.. _`Clustal Omega`: http://www.clustal.org/omega/

.. _blastn: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage

.. _`Jupyter Notebook`: https://github.com/nmdp-bioinformatics/SeqAnn/blob/master/notebook/Examples.ipynb

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nmdp-bioinformatics/seq-ann

Awesome Lists containing this project

README