Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benhg/isolate-single-gene
Isolate a single gene from genetic data.
https://github.com/benhg/isolate-single-gene
Last synced: about 1 month ago
JSON representation
Isolate a single gene from genetic data.
- Host: GitHub
- URL: https://github.com/benhg/isolate-single-gene
- Owner: benhg
- Created: 2020-04-20T18:23:54.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-21T17:58:24.000Z (almost 5 years ago)
- Last Synced: 2024-11-01T22:42:58.756Z (3 months ago)
- Language: Python
- Size: 382 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GFF Feature Extractor
## Introduction and Motivation
Looking around, it was surprisingly difficult to find a simple script for extracting features specified with a GFFv3 file. This is meant to be a dead-simple implementation of a CLI tool for extracting features from a database.
## Description
Given a FASTA formatted database of sequences and a GFFv3 file which contains features and their locations within sequences contained in the FASTA formatted database, this will extract the features and put them into a new FASTA file, with each entry titled the same way as the input, plus the name of the feature which was extracted. Here's an example.
Input file:
```
>seq1
ACAGACTGCCGGTGATAAGCCGGAGGAAGGTGAGGATGACGTCAAGTCATCATGCCCCTTATGCCCTGGGCGACACACGTGCTACAATGGCCGGGACAAAGGGTCGCGATCCCGCGAGGGTGAGCTAACCCCAAAAACCCGTCCTCAGTTCGGATTGCAGGCTGCAACTCGCCTGCATGAAGCCGGAATCGCTAGTAATCGCCGGTCAGCCATACGGCGGTGAATTCGTTCCCGGGCCTTGTACACACCGCCCGTCACACTATGGGAGCTGGCCATGCCCGAAGTCGTTACCTTAACCGCAAGGAGGGGGATGCCGAAGGCAGGGCTAGTGACTGGAGTGAAGTCGTAACAAGGTAGCCGTACTGGAAGGTGCGGCTGGATCA
>seq2
ACAGACTGCCGGTGATAAGCCGGAGGAAGGTGAGGATGACGTCAAGTCATCATGCCCCTTATGCCCTGGGCGACACACGTGCTACAATGGCCGGGACAAAGGGTCGCGATCCCGCGAGGGTGAGCTAACCCCAAAAACCCGTCCTCAGTTCGGATTGCAGGCTGCAACTCGCCTGCATGAAGCCGGAATCGCTAGTAATCGCCGGTCAGCCATACGGCGGTGAATTCGTTCCCGGGCCTTGTACACACCGCCCGTCACACTATGGGAGCTGGCCATGCCCGAAGTCGTTACCTTAACCGCAAGGAGGGGGATGCCGAAGGCAGGGCTAGTGACTGGAGTGAAGTCGTAACAAGGTAGCCGTACTGGAAGGTGCGGCTGGATCA
```
Output File:
```
>seq1,18S_rRNA
GGGTCGCGATCCCGCGAGGGTGAGCTAACCC
>seq2,28S_rRNA
TAGTGACTGGAGTGAAGTCGTAACAAGGTAGCCGTA
```## Usage
Simple, command-line based usage:
```
usage: gff_biopy.py [-h] --locations LOCATIONS --db DB --out OUT
[--filter FILTER]required arguments:
--locations LOCATIONS
Path to locations file in GFF3 format
--db DB Path to database file in FASTA format
--out OUT Path to output file in FASTA format (needs not exist)optional arguments:
-h, --help show this help message and exit
--filter FILTER Name of type of feature to extract. Stored as a name
in the qualifiers section. Example: '18S_rRNA'
```