https://github.com/jmenglund/pandas-charm
Python library for getting character matrices (alignments) into and out of pandas
https://github.com/jmenglund/pandas-charm
aligned-sequences alignment biopython character-matrix dendropy pandas python
Last synced: 3 months ago
JSON representation
Python library for getting character matrices (alignments) into and out of pandas
- Host: GitHub
- URL: https://github.com/jmenglund/pandas-charm
- Owner: jmenglund
- License: mit
- Created: 2016-07-03T20:08:27.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-05-19T11:53:08.000Z (about 6 years ago)
- Last Synced: 2025-02-26T14:42:49.955Z (3 months ago)
- Topics: aligned-sequences, alignment, biopython, character-matrix, dendropy, pandas, python
- Language: Python
- Homepage:
- Size: 72.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- License: LICENSE.txt
Awesome Lists containing this project
README
pandas-charm
============|Build-Status| |Coverage-Status| |PyPI-Status| |License| |DOI-URI|
pandas-charm is a small Python package for getting character
matrices (alignments) into and out of `pandas `_.
Use this library to make pandas interoperable with
`BioPython `_ and `DendroPy `_.Convert between the following objects:
* BioPython MultipleSeqAlignment <-> pandas DataFrame
* DendroPy CharacterMatrix <-> pandas DataFrame
* "Sequence dictionary" <-> pandas DataFrameThe code has been tested with Python 2.7, 3.5 and 3.6.
Source repository: ``_
------------------------------------------
.. contents:: Table of contents
:backlinks: none
:local:Installation
------------For most users, the easiest way is probably to install the latest version
hosted on `PyPI `_:.. code-block::
$ pip install pandas-charm
The project is hosted at https://github.com/jmenglund/pandas-charm and
can also be installed using git:.. code-block::
$ git clone https://github.com/jmenglund/pandas-charm.git
$ cd pandas-charm
$ python setup.py installYou may consider installing pandas-charm and its required Python packages
within a virtual environment in order to avoid cluttering your system's
Python path. See for example the environment management system
`conda `_ or the package
`virtualenv `_.Running the tests
-----------------Testing is carried out with `pytest `_:
.. code-block::
$ pytest -v test_pandascharm.py
Test coverage can be calculated with `Coverage.py
`_ using the following commands:.. code-block::
$ coverage run -m pytest
$ coverage report -m pandascharm.pyThe code follow style conventions in `PEP8
`_, which can be checked
with `pycodestyle `_:.. code-block::
$ pycodestyle pandascharm.py test_pandascharm.py setup.py
Usage
-----The following examples show how to use pandas-charm. The examples are
written with Python 3 code, but pandas-charm should work also with
Python 2.7+. You need to install BioPython and/or DendroPy manually
before you start:.. code-block::
$ pip install biopython
$ pip install dendropyDendroPy CharacterMatrix to pandas DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> import pandas as pd
>>> import pandascharm as pc
>>> import dendropy
>>> dna_string = '3 5\nt1 TCCAA\nt2 TGCAA\nt3 TG-AA\n'
>>> print(dna_string)
3 5
t1 TCCAA
t2 TGCAA
t3 TG-AA>>> matrix = dendropy.DnaCharacterMatrix.get(
... data=dna_string, schema='phylip')
>>> df = pc.from_charmatrix(matrix)
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A ABy default, characters are stored as rows and sequences as columns
in the DataFrame. If you want rows to hold sequences, just transpose
the matrix in pandas:.. code-block:: pycon
>>> df.transpose()
0 1 2 3 4
t1 T C C A A
t2 T G C A A
t3 T G - A Apandas DataFrame to Dendropy CharacterMatrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> import pandas as pd
>>> import pandascharm as pc
>>> import dendropy
>>> df = pd.DataFrame({
... 't1': ['T', 'C', 'C', 'A', 'A'],
... 't2': ['T', 'G', 'C', 'A', 'A'],
... 't3': ['T', 'G', '-', 'A', 'A']})
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A A>>> matrix = pc.to_charmatrix(df, data_type='dna')
>>> print(matrix.as_string('phylip'))
3 5
t1 TCCAA
t2 TGCAA
t3 TG-AABioPython MultipleSeqAlignment to pandas DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> from io import StringIO
>>> import pandas as pd
>>> import pandascharm as pc
>>> from Bio import AlignIO
>>> dna_string = '3 5\nt1 TCCAA\nt2 TGCAA\nt3 TG-AA\n'
>>> f = StringIO(dna_string) # make the string a file-like object
>>> alignment = AlignIO.read(f, 'phylip-relaxed')
>>> print(alignment)
SingleLetterAlphabet() alignment with 3 rows and 5 columns
TCCAA t1
TGCAA t2
TG-AA t3
>>> df = pc.from_bioalignment(alignment)
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A Apandas DataFrame to BioPython MultipleSeqAlignment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> import pandas as pd
>>> import pandascharm as pc
>>> import Bio
>>> df = pd.DataFrame({
... 't1': ['T', 'C', 'C', 'A', 'A'],
... 't2': ['T', 'G', 'C', 'A', 'A'],
... 't3': ['T', 'G', '-', 'A', 'A']})
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A A>>> alignment = pc.to_bioalignment(df, alphabet='generic_dna')
>>> print(alignment)
SingleLetterAlphabet() alignment with 3 rows and 5 columns
TCCAA t1
TGCAA t2
TG-AA t3"Sequence dictionary" to pandas DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> import pandas as pd
>>> import pandascharm as pc
>>> d = {
... 't1': 'TCCAA',
... 't2': 'TGCAA',
... 't3': 'TG-AA'
... }
>>> df = pc.from_sequence_dict(d)
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A Apandas DataFrame to "sequence dictionary"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. code-block:: pycon
>>> import pandas as pd
>>> import pandascharm as pc
>>> df = pd.DataFrame({
... 't1': ['T', 'C', 'C', 'A', 'A'],
... 't2': ['T', 'G', 'C', 'A', 'A'],
... 't3': ['T', 'G', '-', 'A', 'A']})
>>> pc.to_sequence_dict(df)
{'t1': 'TCCAA', 't2': 'TGCAA', 't3': 'TG-AA'}The name
--------pandas-charm got its name from the pandas library plus an acronym for
CHARacter Matrix.License
-------pandas-charm is distributed under the `MIT license `_.
Citing
------If you use results produced with this package in a scientific
publication, please just mention the package name in the text and
cite the Zenodo DOI of this project:|DOI-URI|
Choose your preferred citation style in the "Cite as" section on the Zenodo
page.Author
------Markus Englund, `orcid.org/0000-0003-1688-7112 `_
.. |Build-Status| image:: https://travis-ci.org/jmenglund/pandas-charm.svg?branch=master
:target: https://travis-ci.org/jmenglund/pandas-charm
:alt: Build status
.. |Coverage-Status| image:: https://codecov.io/gh/jmenglund/pandas-charm/branch/master/graph/badge.svg
:target: https://codecov.io/gh/jmenglund/pandas-charm
:alt: Coverage status
.. |PyPI-Status| image:: https://img.shields.io/pypi/v/pandas-charm.svg
:target: https://pypi.python.org/pypi/pandas-charm
:alt: PyPI status
.. |License| image:: https://img.shields.io/pypi/l/pandas-charm.svg
:target: https://raw.githubusercontent.com/jmenglund/pandas-charm/master/LICENSE.txt
:alt: License
.. |DOI-URI| image:: https://zenodo.org/badge/62513333.svg
:target: https://zenodo.org/badge/latestdoi/62513333
:alt: DOI