https://github.com/lh3/seqtk

Toolkit for processing sequences in FASTA/Q formats
https://github.com/lh3/seqtk

bioinformatics sequence-analysis

Last synced: 26 days ago
JSON representation

Toolkit for processing sequences in FASTA/Q formats

Host: GitHub
URL: https://github.com/lh3/seqtk
Owner: lh3
License: mit
Created: 2012-03-23T23:24:13.000Z (about 13 years ago)
Default Branch: master
Last Pushed: 2024-08-10T13:41:49.000Z (9 months ago)
Last Synced: 2025-04-03T13:15:02.773Z (about 1 month ago)
Topics: bioinformatics, sequence-analysis
Language: C
Homepage:
Size: 178 KB
Stars: 1,447
Watchers: 62
Forks: 310
Open Issues: 66
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

Awesome-Bioinformatics - Seqtk - Toolkit for processing sequences in FASTA/Q formats. (Next Generation Sequencing / Sequence Processing)
top-life-sciences - **lh3/seqtk** - analysis`<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 1332 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 310 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> C <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> MIT License <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2023-10-24 15:01:39 | (Ranked by starred repositories)

README

Introduction
------------

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or
FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be
optionally compressed by gzip. To install `seqtk`,
```sh
git clone https://github.com/lh3/seqtk.git;
cd seqtk; make
```
The only library dependency is zlib.

Seqtk Examples
--------------

* Convert FASTQ to FASTA:

seqtk seq -a in.fq.gz > out.fa

* Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to `N` (the 2nd):

seqtk seq -aQ64 -q20 in.fq > out.fa
seqtk seq -aQ64 -q20 -n N in.fq > out.fa

* Fold long FASTA/Q lines and remove FASTA/Q comments:

seqtk seq -Cl60 in.fa > out.fa

* Convert multi-line FASTQ to 4-line FASTQ:

seqtk seq -l0 in.fq > out.fq

* Reverse complement FASTA/Q:

seqtk seq -r in.fq > out.fq

* Extract sequences with names in file `name.lst`, one sequence name per line:

seqtk subseq in.fq name.lst > out.fq

* Extract sequences in regions contained in file `reg.bed`:

seqtk subseq in.fa reg.bed > out.fa

* Mask regions in `reg.bed` to lowercases:

seqtk seq -M reg.bed in.fa > out.fa

* Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):

seqtk sample -s100 read1.fq 10000 > sub1.fq
seqtk sample -s100 read2.fq 10000 > sub2.fq

* Trim low-quality bases from both ends using the Phred algorithm:

seqtk trimfq in.fq > out.fq

* Trim 5bp from the left end of each read and 10bp from the right end:

seqtk trimfq -b 5 -e 10 in.fa > out.fa

* Find telomere (TTAGGG)n repeats:

seqtk telo seq.fa > telo.bed 2> telo.count

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lh3/seqtk

Awesome Lists containing this project

README