Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/modernatx/seqlike

Unified biological sequence manipulation in Python
https://github.com/modernatx/seqlike

biological-sequences biopython machine-learning sequence

Last synced: about 2 months ago
JSON representation

Unified biological sequence manipulation in Python

Awesome Lists containing this project

README

        


SeqLike

SeqLike - flexible biological sequence objects in Python











PyPI - Supported Python Version



PyPI - Package Version


Conda - Platform


Conda (channel only)


Docs - GitHub.io

## Introduction

A single object API that makes working with biological sequences in Python
more ergonomic. It'll handle anything _like a sequence_.

Built around the [Biopython SeqRecord class](https://biopython.org/wiki/SeqRecord),
SeqLikes abstract over the semantics of molecular biology (DNA -> RNA -> AA)
and data structures (strings, Seqs, SeqRecords, numerical encodings)
to allow manipulation of a biological sequence
at the level which is most computationally convenient.

## Code samples and examples

### Build data-type agnostic functions

```python
def f(seq: SeqLikeType, *args):
seq = SeqLike(seq, seq_type="nt").to_seqrecord()
# ...
```

#### Streamline conversion to/from ML friendly representations

```python
prediction = model(aaSeqLike('MSKGEELFTG').to_onehot())
new_seq = ntSeqLike(generative_model.sample(), alphabet="-ACGTUN")
```

### Interconvert between AA and NT forms of a sequence

Back-translation is conveniently built-in!

```python
s_nt = ntSeqLike("ATGTCTAAAGGTGAA")
s_nt[0:3] # ATG
s_nt.aa()[0:3] # MSK, nt->aa is well defined
s_nt.aa()[0:3].nt() # ATGTCTAAA, works because SeqLike now has both reps
s_nt[:-1].aa() # TypeError, len(s_nt) not a multiple of 3

s_aa = aaSeqLike("MSKGE")
s_aa.nt() # AttributeError, aa->nt is undefined w/o codon map
s_aa = aaSeqLike(s_aa, codon_map=random_codon_map)
s_aa.nt() # now works, backtranslated to e.g. ATGTCTAAAGGTGAA
s_aa[:1].nt() # ATG, codon_map is maintained
```

### Easily plot multiple sequence alignments

```python
seqs = [s for s in SeqIO.parse("file.fasta", "fasta")]
df = pd.DataFrame(
{
"names": [s.name for s in seqs],
"seqs": [aaSeqLike(s) for s in seqs],
}
)
df["aligned"] = df["seqs"].seq.align()
df["aligned"].seq.plot()
```

### Flexibly build and parse numerical sequence representations

```python
# Assume you have a dataframe with a column of 10 SeqLikes of length 90
df["seqs"].seq.to_onehot().shape # (10, 90, 23), padded if needed
```

To see more in action,
please check out the [docs](https://modernatx.github.io/seqlike/)!

## Getting Started

Install the library with `pip` or `conda`.

**With pip**

```python
pip install seqlike
```

**With conda**

```sh
conda install -c conda-forge seqlike
```

## Authors

- [@andrewgiessel](https://github.com/andrewgiessel)
- [@maxasauruswall](https://github.com/maxasauruswall)
- [@MihirMetkar](https://github.com/MihirMetkar)
- [@ndousis](https://github.com/ndousis)
- [@ericmjl](https://github.com/ericmjl)

## Support

- Questions about usage should be posed on [Stack Overflow with the #seqlike tag][SO].
- Bug reports and feature requests are managed using the [Github issue tracker][gh_issues].

[SO]: https://stackoverflow.com/questions/tagged/seqlike
[gh_issues]: https://github.com/modernatx/seqlike/issues

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):



Nasos Dousis
Nasos Dousis

💻
andrew giessel
andrew giessel

💻
Max Wall
Max Wall

💻 📖
Eric Ma
Eric Ma

💻 📖
Mihir Metkar
Mihir Metkar

🤔 💻
Marcus Caron
Marcus Caron

📖
pagpires
pagpires

📖


Sugato Ray
Sugato Ray

🚇 🚧
Damien Farrell
Damien Farrell

💻
Farbod Mahmoudinobar
Farbod Mahmoudinobar

💻
Jacob Hayes
Jacob Hayes

🚇

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!