https://github.com/chonglc/seqextractor
a python tool to extract multiple fasta sequence records from multiFASTA file based on a list of record ids
https://github.com/chonglc/seqextractor
Last synced: 9 months ago
JSON representation
a python tool to extract multiple fasta sequence records from multiFASTA file based on a list of record ids
- Host: GitHub
- URL: https://github.com/chonglc/seqextractor
- Owner: ChongLC
- License: mit
- Created: 2022-12-25T18:50:52.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-10T13:22:42.000Z (10 months ago)
- Last Synced: 2025-03-10T14:28:37.421Z (10 months ago)
- Language: Python
- Size: 70.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# seqExtractor
A python tool to extract multiple fasta sequence records from multiFASTA file based on a list of record ids or a substring in the sequence header.
Written by: [Li Chuin Chong](https://github.com/ChongLC) and [Yeo Keat Ee](https://github.com/ee2110)
---
## Installation
1. Download only the python code
```
wget https://raw.githubusercontent.com/ChongLC/seqExtractor/master/seqExtractor.py
```
2. Download the entire repo
To use the `seqExtractor`, the following packages are needed to be installed:
- Biopython
- argparse
You can install these packages by using following command:
```
pip install -r requirements.txt
```
---
## Usage
```
seqExtractor.py [-h] -i INPUT (-l ID_LIST | -s SUBSTRING) -o OUTPUT [-t THREADS] [-c]
Extract fasta sequence records from multiFASTA file based on a list of record ids or a substring in the sequence header
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Filename include extension of original FASTA file
-l ID_LIST, --id_list ID_LIST
Filename include extension of the sequence ID list
-s SUBSTRING, --substring SUBSTRING
Substring to search for in the sequence header
-o OUTPUT, --output OUTPUT
Filename include extension of output FASTA file
-t THREADS, --threads THREADS
Number of threads to use (default: 1)
-c, --case_insensitive
Make the substring search case insensitive (default: False)
-e, --exclude
Make the output FASTA file only contains excluded result (default: False)
```
There are two ways to use seqExtractor tool:
- Extract sequences based on a list of sequence IDs:
```
python seqExtractor.py -i input.fasta -l id_list.txt -o output.fasta -t 4
```
- Extract sequences based on a substring in the sequence header:
- case sensitive (by default)
```
python seqExtractor.py -i input.fasta -s Belgium -o output.fasta -t 4
```
- case insensitive
```
python seqExtractor.py -i input.fasta -s belgium -o output.fasta -t 4 --case-insensitive
```
---
## Motivation and goal
## Inspiration
Inspired by [faSomeRecords from Santiago Sanchez-Ramirez](https://github.com/santiagosnchez/faSomeRecords), which firstly created by kentUtils in C++ version.
---
## Found a bug?
Or would like a feature added? Or maybe drop some feedback? Just open a new issue or send an email to us (lichuinchong@gmail.com).