Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mojaveazure/pseudoscaffold_annotator
A Python package for creating an annotation file for a pseudoscaffold
https://github.com/mojaveazure/pseudoscaffold_annotator
Last synced: 23 days ago
JSON representation
A Python package for creating an annotation file for a pseudoscaffold
- Host: GitHub
- URL: https://github.com/mojaveazure/pseudoscaffold_annotator
- Owner: mojaveazure
- Created: 2015-06-05T03:26:52.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-03-21T21:56:54.000Z (over 7 years ago)
- Last Synced: 2023-08-14T13:11:55.547Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 14.1 MB
- Stars: 0
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pseudoscaffold_annotator
### A program to annotate an assembled pseudoscaffold
___
___**If this program fails on you, revert back to a previous release using the commands**
```shell
git checkout -f
git reset --hard 25543ca66a9a1a7cc1ab8dc72f4c6e237e797478
```This is a program for annotating an assembled pseudoscaffold using a reference genome and annotation file. Currently, this only supports using GFF3 files as input, but can output both GFF3 files and 3-column BED files. Increased support for the BED format will come later.
Running this program to annotate a pseudoscaffold is done using the following command:
```shell
./pseudoscaffold_annotator.py annotate -r REFERENCE_FASTA -a ORIGINAL_ANNOTATION -p PSEUDOSCAFFOLD_FASTA -o OUTFILE_NAME -c BLAST_CONFIG_FILE
```The BLAST configuration file can be run using the following command:
```shell
./pseudoscaffold_annotator.py blast-config
```
Use the `-h` flag to see all options for configuring.**IMPORTANT**
pseudoscaffold_annotator.py requires no new lines within the sequence of the pseudoscaffold. The following is not an allowed sequence:
>pseudoscaffold
ACTGTCAG
GCTATCGAThe 'fix' subroutine removes new lines
between sequence data, creating a fasta
file that reads like:
>pseudoscaffold
ACTGTCAGGCTATCGATo fix a pseudoscaffold, run the following command:
```shell
./pseudoscaffold_annotator.py fix -p PSEUDOSCAFFOLD_FASTA -n FIXED_FASTA
```This program requires Python 2.7 or higher, or the [`argparse`](https://pypi.python.org/pypi/argparse) module installed for Python 2.6
**NOTE: this has NOT been tested on Python 3.x**
Other dependencies include:
- [BEDTools](http://bedtools.readthedocs.org/en/latest/)
- [NCBI's BLAST+](http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
- [BioPython](http://biopython.org/wiki/Main_Page)**NOTE: This program has only been tested with the _Morex_ (Barley) genome, please use with caution**
## TODO
- Add support for extracting information from BED file
- Add parallelization support
- Add BED to GFF annotating capabilities
- Add GFF to BED annotating capabilities
- ~~Finish GFF to GFF annotation capabilities~~ DONE!
- Add BED to BED annotating capabilities