https://github.com/mojaveazure/pseudoscaffold_annotator

A Python package for creating an annotation file for a pseudoscaffold
https://github.com/mojaveazure/pseudoscaffold_annotator

Last synced: 3 months ago
JSON representation

A Python package for creating an annotation file for a pseudoscaffold

Host: GitHub
URL: https://github.com/mojaveazure/pseudoscaffold_annotator
Owner: mojaveazure
Created: 2015-06-05T03:26:52.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2017-03-21T21:56:54.000Z (over 8 years ago)
Last Synced: 2025-01-31T15:40:37.388Z (5 months ago)
Language: Python
Homepage:
Size: 14.1 MB
Stars: 0
Watchers: 5
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# pseudoscaffold_annotator
### A program to annotate an assembled pseudoscaffold
___
___

**If this program fails on you, revert back to a previous release using the commands**
```shell
git checkout -f
git reset --hard 25543ca66a9a1a7cc1ab8dc72f4c6e237e797478
```

This is a program for annotating an assembled pseudoscaffold using a reference genome and annotation file. Currently, this only supports using GFF3 files as input, but can output both GFF3 files and 3-column BED files. Increased support for the BED format will come later.

Running this program to annotate a pseudoscaffold is done using the following command:

```shell
./pseudoscaffold_annotator.py annotate -r REFERENCE_FASTA -a ORIGINAL_ANNOTATION -p PSEUDOSCAFFOLD_FASTA -o OUTFILE_NAME -c BLAST_CONFIG_FILE
```

The BLAST configuration file can be run using the following command:

```shell
./pseudoscaffold_annotator.py blast-config
```
Use the `-h` flag to see all options for configuring.

**IMPORTANT**

pseudoscaffold_annotator.py requires no new lines within the sequence of the pseudoscaffold. The following is not an allowed sequence:

>pseudoscaffold
ACTGTCAG
GCTATCGA

The 'fix' subroutine removes new lines
between sequence data, creating a fasta
file that reads like:
>pseudoscaffold
ACTGTCAGGCTATCGA

To fix a pseudoscaffold, run the following command:

```shell
./pseudoscaffold_annotator.py fix -p PSEUDOSCAFFOLD_FASTA -n FIXED_FASTA
```

This program requires Python 2.7 or higher, or the [`argparse`](https://pypi.python.org/pypi/argparse) module installed for Python 2.6

**NOTE: this has NOT been tested on Python 3.x**

Other dependencies include:
- [BEDTools](http://bedtools.readthedocs.org/en/latest/)
- [NCBI's BLAST+](http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
- [BioPython](http://biopython.org/wiki/Main_Page)

**NOTE: This program has only been tested with the _Morex_ (Barley) genome, please use with caution**

## TODO

- Add support for extracting information from BED file
- Add parallelization support
- Add BED to GFF annotating capabilities
- Add GFF to BED annotating capabilities
- ~~Finish GFF to GFF annotation capabilities~~ DONE!
- Add BED to BED annotating capabilities

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mojaveazure/pseudoscaffold_annotator

Awesome Lists containing this project

README