Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aaronkollasch/seqdesign-pytorch
Protein design and variant prediction using autoregressive generative models
https://github.com/aaronkollasch/seqdesign-pytorch
Last synced: about 1 month ago
JSON representation
Protein design and variant prediction using autoregressive generative models
- Host: GitHub
- URL: https://github.com/aaronkollasch/seqdesign-pytorch
- Owner: aaronkollasch
- License: mit
- Created: 2019-01-11T18:16:48.000Z (almost 6 years ago)
- Default Branch: v3
- Last Pushed: 2023-02-27T18:47:12.000Z (almost 2 years ago)
- Last Synced: 2023-03-01T00:11:50.033Z (almost 2 years ago)
- Language: Python
- Homepage:
- Size: 150 KB
- Stars: 11
- Watchers: 1
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SeqDesign
SeqDesign is a generative, unsupervised model for biological sequences.
It is capable of learning functional constraints from unaligned sequences
in order to predict the effects of mutations and generate novel sequences,
including insertions and deletions. For more information,
check out the [biorxiv preprint](https://doi.org/10.1101/757252).This version of the codebase is compatible with Python 3 and PyTorch.
It also implements [Fast Wavenet](https://github.com/tomlepaine/fast-wavenet) generation.
A TensorFlow version is available [here](https://github.com/debbiemarkslab/SeqDesign)## Installation
See [INSTALL.md](INSTALL.md).
## Examples
See the [examples](examples) directory for examples of
training, mutation effect prediction, and generation.## Usage
Run each script with the `-h` argument to see additional arguments:
### TrainingGiven a fasta file of training sequences, run:
```shell script
run_autoregressive_fr --dataset .fa
```
Sequences are uniformly weighted by default. To set sequence
weights, append `:` and a weight to each fasta header, e.g. `:1.0`.### Mutation effect prediction
Deterministic:
```shell script
calc_logprobs_seqs_fr --sess --dropout-p 1.0 --num-samples 1 --input .fa --output .csv
```Average of 500 samples:
```shell script
calc_logprobs_seqs_fr --sess --dropout-p 0.5 --num-samples 500 --input .fa --output .csv
```### Sequence generation
```shell script
generate_sample_seqs_fr --sess
```
Use the `--fast-generation` argument for Fast Wavenet.## Data availability
See the [examples](examples) directory to download training sequences,
mutation effect predictions, and generated sequences.