https://github.com/vihaankulkarni29/geneaaextractor
https://github.com/vihaankulkarni29/geneaaextractor
amino amino-acid-sequence amino-acids antimicrobial-resistance bioinformatics-tool biopython extraction-data google-colab jupyter-notebook python-3
Last synced: 20 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/vihaankulkarni29/geneaaextractor
- Owner: vihaankulkarni29
- Created: 2025-06-06T17:09:34.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-06-08T17:11:49.000Z (5 months ago)
- Last Synced: 2025-08-04T08:27:04.116Z (3 months ago)
- Topics: amino, amino-acid-sequence, amino-acids, antimicrobial-resistance, bioinformatics-tool, biopython, extraction-data, google-colab, jupyter-notebook, python-3
- Language: Jupyter Notebook
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GeneAAExtractor
# ๐งฌ GeneAAExtractor
GeneAAExtractor is a lightweight Google Colab tool for extracting amino acid sequences of specific genes directly from GFF3 and FASTA files. It's designed for microbiologists, bioinformaticians, and AMR researchers working with genome annotations and isolate analysis.
## ๐ Features
- Extracts amino acid sequences for only the user-specified genes
- Works with `.gff3`, `.fasta`, and `.txt` gene list inputs
- Supports strand orientation and reverse complement logic
- Exports individual `.faa` files for each gene in the format: `GeneName IsolateName.faa`
- Automatically zips all extracted protein files for download
## ๐ Input Files
1. **GFF3 file** โ Genome annotations
2. **FASTA file** โ Genomic sequence
3. **TXT file** โ List of gene names to extract (one per line, case-sensitive optional)
## ๐งช Output
A `.zip` archive containing one `.faa` file per gene:
acrA SSS08.faa
blaTEM SSS08.faa
dfrA12 SSS08.faa
## ๐ How to Use
1. Open the tool in **Google Colab**
2. Upload your `.gff3`, `.fasta`, and `.txt` files
3. Enter your isolate name when prompted
4. The tool processes your genome and downloads the protein `.zip`
## ๐ฉโ๐ฌ Example Use Case
```bash
Input:
- ecoli_annotations.gff3
- ecoli_genome.fasta
- gene_list.txt (contains: acrA, acrB, blaTEM)
Output:
- acrA SSS08.faa
- acrB SSS08.faa
- blaTEM SSS08.faa
๐ฆ Dependencies
Python 3.7+
Biopython
๐ License
MIT License โ free to use and adapt for academic or research purposes.
โจ Acknowledgements
Developed with love for wet-lab researchers looking to automate their isolate curation workflows.