https://github.com/chonglc/seqrandomizer
A tool to generate a random viral protein sequence dataset
https://github.com/chonglc/seqrandomizer
Last synced: 3 months ago
JSON representation
A tool to generate a random viral protein sequence dataset
- Host: GitHub
- URL: https://github.com/chonglc/seqrandomizer
- Owner: ChongLC
- License: mit
- Created: 2023-04-29T23:07:39.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-02T22:28:03.000Z (about 2 years ago)
- Last Synced: 2025-01-28T18:15:56.410Z (4 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 610 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# seqRandomizer: A tool to generate random viral protein sequence datasets
[](https://colab.research.google.com/drive/1IwNPKaRKGgPzqiOBuEo8S0VbUpe3XqVh?usp=sharing)A random protein sequence dataset can be useful for various sequence analysis, in particular to evaluate and correct the analysis for background noise. Herein, we offer a tool that can generate a dataset of random viral protein sequences.
---
### Usage
`python seqRandomizer.py [-h] [-o OUTPUT] [-l SEQLEN] [-n SEQNUM]`In the usage case below, the `seqRandomizer` tool is applied to generate a random viral protein sequence dataset in the folder `result`, named `random_vprot.fasta` consisting of 1,000 sequences of length 1,000 amino acids. The amino acid composition of the random sequences is based on all reported viral sequence retrieved from the NCBI Protein database (as of May 2021; `allVirus080521.fasta`).
```
python seqRandomizer.py -o random_vprot.fasta -l 1000 -n 1000
```#### Command-line Arguments
| Argument | Parameter | Type | Required | Description |
|:--------:|-----------|--------- |:--------:|------------------------------------------------------- |
| -h | help | N/A |FALSE | Show this help message and exit |
| -o | output | String |TRUE | Path of the output file to be created |
| -l | seqlen | Integer |TRUE | The length of random protein sequences to be generated |
| -n | seqnum | Integer |TRUE | The number of random protein sequences to be generated |---
### Important Note for Users
* This seqRandomizer is specific to the [protocol describing the step-by-step utility of UNIQmin](https://www.biorxiv.org/content/10.1101/2022.08.09.503271v2.full).---
### Found a bug?
Or would like a feature added? Or maybe drop some feedback? Just open a new issue or send an email to us ([email protected]).