Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mojaveazure/snptoseq
Capture contextual sequences for a SNP
https://github.com/mojaveazure/snptoseq
Last synced: 18 days ago
JSON representation
Capture contextual sequences for a SNP
- Host: GitHub
- URL: https://github.com/mojaveazure/snptoseq
- Owner: mojaveazure
- Created: 2016-11-10T16:22:21.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-21T21:59:27.000Z (almost 8 years ago)
- Last Synced: 2024-10-15T09:11:17.263Z (2 months ago)
- Language: Python
- Size: 8.79 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SNPtoSEQ.py
`SNPtoSEQ.py` is a simple Python program that captures SNP contextual sequence and provides a BED and/or FASTA file with the sequences
```
$ ./SNPtoSEQ.py
usage: SNPtoSEQ.py -v VCF FILE -r REFERENCE FASTA FILE [-w WINDOW SIZE]
[-o OUTPUT NAME] [--no-fasta | --no-bed]Capture SNP contextual sequences into BED and/or FASTA format
Input Options:
Provide a VCF file of SNPs and a reference genome in FASTA format-v VCF FILE, --vcf VCF FILE
SNPs in VCF format
-r REFERENCE FASTA FILE, --reference REFERENCE FASTA FILE
Reference genome in FASTA formatWindow options:
Select a one-sided window size to capture in contextual sequence; this
size represents the amount of sequence on each side of the SNP to capture-w WINDOW SIZE, --window WINDOW SIZE
Set the window size, defaults to 120Output Options:
Provide an output name as well as choose if we suppress some output-o OUTPUT NAME, --outname OUTPUT NAME
Basename for the output file(s), defaults to the name
of '-v | --vcf' with modified extensions
--no-fasta Do we suppress FASTA output? Incompatible with '--no-
bed'
--no-bed Do we suppress BED output? Incompatible with '--no-
fasta'
```## Inputs
`SNPtoSEQ.py` requires a VCF file with SNP information and a reference genome in FASTA format. Please ensure that the chromosome information in the VCF matches the sequence identifiers in the FASTA file. To check, use UNIX `grep` and `cut` to find the chromosome information in both files:
```bash
grep -v '#' ${MY_VCF} | cut -f 1
grep '>' ${MY_REFERENCE}
```## Outputs
`SNPtoSEQ.py` creates between one and two output files:
| File name | Contents |
| --------- | -------- |
| *output*.bed | BED file describing the SNP contextual sequence; suppressed with `--no-bed` |
| *output*.fasta | FASTA file with containing the SNP contextual sequences; sequence identifiers are the SNP name; suppressed-with `--no-fasta` |## Dependencies
`SNPtoSEQ.py` depends on the following:
- [Python 3](https://www.python.org/)
- [BioPython](http://biopython.org/wiki/Biopython)
- [overload](https://pypi.python.org/pypi/overload)BioPython and overload are available through [PyPi](https://pypi.python.org/pypi) and can be downloaded using [pip3](https://pip.pypa.io/en/latest/installing/) (included with Python 3.4 or greater)