https://github.com/engkinandatama/ncbi-sequence-fetcher
NCBI Sequence Fetcher is a Python desktop app for downloading nucleotide sequences and extracting metadata from NCBI. It features an easy-to-use GUI, supports FASTA and GenBank formats, and helps researchers students and bioinformaticians efficiently collect DNA sequences and store metadata in Excel files.
https://github.com/engkinandatama/ncbi-sequence-fetcher
academic bioinformatics bioinformatics-tool data-scraping fasta genbank metadata metadata-extraction molecular-biology ncbi nucleotide-sequences python tkinter
Last synced: about 1 month ago
JSON representation
NCBI Sequence Fetcher is a Python desktop app for downloading nucleotide sequences and extracting metadata from NCBI. It features an easy-to-use GUI, supports FASTA and GenBank formats, and helps researchers students and bioinformaticians efficiently collect DNA sequences and store metadata in Excel files.
- Host: GitHub
- URL: https://github.com/engkinandatama/ncbi-sequence-fetcher
- Owner: engkinandatama
- License: mit
- Created: 2025-05-12T20:21:40.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-05-12T21:03:42.000Z (about 1 month ago)
- Last Synced: 2025-05-12T21:42:36.818Z (about 1 month ago)
- Topics: academic, bioinformatics, bioinformatics-tool, data-scraping, fasta, genbank, metadata, metadata-extraction, molecular-biology, ncbi, nucleotide-sequences, python, tkinter
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐งฌ NCBI Sequence Fetcher
A lightweight and user-friendly desktop application for downloading nucleotide sequences and extracting biological metadata directly from NCBI.
Built with Python and Tkinter, designed for researchers, students, and bioinformaticians.---
### โ ๏ธ **Note**:
> **This repository is for personal and educational use only.**
> It is not currently open for external collaboration or contribution.
> Please use responsibly and cite NCBI appropriately when using downloaded data.---
## ๐ธ GUI Preview
Coming soon โ stay tuned!
---
## โจ Features
- ๐ **Direct URL Input**: Download GenBank/FASTA files from any valid NCBI nuccore link.
- ๐ **FASTA / GenBank Format Support**: Choose your preferred sequence format.
- ๐งฌ **Automatic Metadata Extraction**:
- Accession, Organism, Strain, Taxonomy, Country, Collection Date, Length, etc.
- ๐ **Excel Export**:
- All metadata saved in a clean Excel file (`ncbi_metadata.xlsx`)
- ๐ท๏ธ **Smart File Naming**:
- Files saved with informative names: `Organism_Strain_Accession_Length_Feature.fasta`
- ๐ฅ๏ธ **GUI Based**:
- No command line needed; simple Tkinter-based interface.---
## ๐ ๏ธ Technologies Used
- **Language**: Python 3.8+
- **Libraries**:
- `requests`
- `pandas`
- `openpyxl`
- `tkinter` (built-in)---
## ๐ Installation & Usage
### ๐ง Step 1: Clone the repository
```bash
git clone https://github.com/engkinandatama/NCBI-Sequence-Fetcher.git
```
```
cd ncbi-data-scraper
```
### ๐ฆ Step 2: Install dependencies
```
pip install -r requirements.txt
```
### โถ๏ธ Step 3: Run the app
```
python ncbi_scraper.py
```---
## ๐งช How It Works
1. **Paste a valid NCBI URL**
Example: https://www.ncbi.nlm.nih.gov/nuccore/JN188370.12. **Choose the format**
`fasta` or `genbank`3. **Select a destination folder**
Where the sequence file and Excel metadata will be saved4. **Click `Download`**
---
### ๐ Behind the Scenes:
The app will:
- ๐ **Fetch** the nucleotide data directly from NCBI
- ๐พ **Save** the sequence locally as: `.fasta` if FASTA format is selected, `.gb` (GenBank) if GenBank format is selected
- ๐งฌ **Parse and extract metadata**: Accession, Organism, Strain, Country, Date, and more
- ๐ **Append** the metadata into an Excel file: `ncbi_metadata.xlsx`---
## ๐ Output Example
Setelah proses selesai, file output akan tersimpan seperti berikut:
```
๐ Output_Folder/
โโโ Escherichia_coli_K12_JN188370.1_4500bp_partial_cds.fasta
โโโ ncbi_metadata.xlsx
```
- **FASTA / GenBank File**: Berisi urutan nukleotida yang diunduh dari NCBI.
- **ncbi_metadata.xlsx**: File Excel yang berisi metadata terstruktur dari setiap entri GenBank yang diunduh.---
## ๐ License
This project is licensed under the **MIT License**.
> **Disclaimer**:
> This tool is developed solely for **academic and personal research purposes**.
> Commercial use, bulk data scraping, or redistribution of NCBI content may **violate NCBI's usage policies** and is **strongly discouraged**.
>
> The developer **does not take any responsibility** for misuse, legal issues, or policy violations resulting from the use of this tool.
> **Users are fully responsible** for ensuring their usage complies with relevant terms, laws, and guidelines.
>
> See [NCBI's policies](https://www.ncbi.nlm.nih.gov/home/about/policies/) for more information.