Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/modernatx/seqlike
Unified biological sequence manipulation in Python
https://github.com/modernatx/seqlike
biological-sequences biopython machine-learning sequence
Last synced: about 2 months ago
JSON representation
Unified biological sequence manipulation in Python
- Host: GitHub
- URL: https://github.com/modernatx/seqlike
- Owner: modernatx
- License: apache-2.0
- Created: 2021-10-07T13:20:39.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-12T19:49:41.000Z (7 months ago)
- Last Synced: 2024-11-09T11:49:11.551Z (about 2 months ago)
- Topics: biological-sequences, biopython, machine-learning, sequence
- Language: Python
- Homepage: https://modernatx.github.io/seqlike
- Size: 1.22 MB
- Stars: 207
- Watchers: 8
- Forks: 21
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- top-pharma50 - **modernatx/seqlike** - sequences`, `biopython`, `machine-learning`, `sequence`<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 202 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 18 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> Apache License 2.0 <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2024-02-16 13:13:05 | (Ranked by starred repositories)
- top-life-sciences - **modernatx/seqlike** - sequences`, `biopython`, `machine-learning`, `sequence`<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 202 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 18 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> Apache License 2.0 <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2024-02-16 13:13:05 | (Ranked by starred repositories)
README
SeqLike - flexible biological sequence objects in Python
## Introduction
A single object API that makes working with biological sequences in Python
more ergonomic. It'll handle anything _like a sequence_.Built around the [Biopython SeqRecord class](https://biopython.org/wiki/SeqRecord),
SeqLikes abstract over the semantics of molecular biology (DNA -> RNA -> AA)
and data structures (strings, Seqs, SeqRecords, numerical encodings)
to allow manipulation of a biological sequence
at the level which is most computationally convenient.## Code samples and examples
### Build data-type agnostic functions
```python
def f(seq: SeqLikeType, *args):
seq = SeqLike(seq, seq_type="nt").to_seqrecord()
# ...
```#### Streamline conversion to/from ML friendly representations
```python
prediction = model(aaSeqLike('MSKGEELFTG').to_onehot())
new_seq = ntSeqLike(generative_model.sample(), alphabet="-ACGTUN")
```### Interconvert between AA and NT forms of a sequence
Back-translation is conveniently built-in!
```python
s_nt = ntSeqLike("ATGTCTAAAGGTGAA")
s_nt[0:3] # ATG
s_nt.aa()[0:3] # MSK, nt->aa is well defined
s_nt.aa()[0:3].nt() # ATGTCTAAA, works because SeqLike now has both reps
s_nt[:-1].aa() # TypeError, len(s_nt) not a multiple of 3s_aa = aaSeqLike("MSKGE")
s_aa.nt() # AttributeError, aa->nt is undefined w/o codon map
s_aa = aaSeqLike(s_aa, codon_map=random_codon_map)
s_aa.nt() # now works, backtranslated to e.g. ATGTCTAAAGGTGAA
s_aa[:1].nt() # ATG, codon_map is maintained
```### Easily plot multiple sequence alignments
```python
seqs = [s for s in SeqIO.parse("file.fasta", "fasta")]
df = pd.DataFrame(
{
"names": [s.name for s in seqs],
"seqs": [aaSeqLike(s) for s in seqs],
}
)
df["aligned"] = df["seqs"].seq.align()
df["aligned"].seq.plot()
```### Flexibly build and parse numerical sequence representations
```python
# Assume you have a dataframe with a column of 10 SeqLikes of length 90
df["seqs"].seq.to_onehot().shape # (10, 90, 23), padded if needed
```To see more in action,
please check out the [docs](https://modernatx.github.io/seqlike/)!## Getting Started
Install the library with `pip` or `conda`.
**With pip**
```python
pip install seqlike
```**With conda**
```sh
conda install -c conda-forge seqlike
```## Authors
- [@andrewgiessel](https://github.com/andrewgiessel)
- [@maxasauruswall](https://github.com/maxasauruswall)
- [@MihirMetkar](https://github.com/MihirMetkar)
- [@ndousis](https://github.com/ndousis)
- [@ericmjl](https://github.com/ericmjl)## Support
- Questions about usage should be posed on [Stack Overflow with the #seqlike tag][SO].
- Bug reports and feature requests are managed using the [Github issue tracker][gh_issues].[SO]: https://stackoverflow.com/questions/tagged/seqlike
[gh_issues]: https://github.com/modernatx/seqlike/issues## Contributors ✨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Nasos Dousis
💻
andrew giessel
💻
Max Wall
💻 📖
Eric Ma
💻 📖
Mihir Metkar
🤔 💻
Marcus Caron
📖
pagpires
📖
Sugato Ray
🚇 🚧
Damien Farrell
💻
Farbod Mahmoudinobar
💻
Jacob Hayes
🚇
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!