{"id":13703839,"url":"https://github.com/lh3/seqtk","last_synced_at":"2025-05-15T04:04:34.066Z","repository":{"id":2812277,"uuid":"3813593","full_name":"lh3/seqtk","owner":"lh3","description":"Toolkit for processing sequences in FASTA/Q formats","archived":false,"fork":false,"pushed_at":"2024-08-10T13:41:49.000Z","size":182,"stargazers_count":1456,"open_issues_count":68,"forks_count":310,"subscribers_count":62,"default_branch":"master","last_synced_at":"2025-05-13T05:25:01.278Z","etag":null,"topics":["bioinformatics","sequence-analysis"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-03-23T23:24:13.000Z","updated_at":"2025-05-10T14:10:55.000Z","dependencies_parsed_at":"2024-01-03T06:47:09.445Z","dependency_job_id":"eef78ce3-8e00-412c-b43a-d667d92f5060","html_url":"https://github.com/lh3/seqtk","commit_stats":{"total_commits":118,"total_committers":9,"mean_commits":13.11111111111111,"dds":0.0847457627118644,"last_synced_commit":"c9458bad2c355d29c721926c0d2cadc95e01eddc"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fseqtk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fseqtk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fseqtk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fseqtk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/seqtk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270641,"owners_count":22042858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","sequence-analysis"],"created_at":"2024-08-02T21:01:00.642Z","updated_at":"2025-05-15T04:04:29.035Z","avatar_url":"https://github.com/lh3.png","language":"C","funding_links":[],"categories":["Next Generation Sequencing","Ranked by starred repositories"],"sub_categories":["Sequence Processing"],"readme":"Introduction\n------------\n\nSeqtk is a fast and lightweight tool for processing sequences in the FASTA or\nFASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be\noptionally compressed by gzip. To install `seqtk`,\n```sh\ngit clone https://github.com/lh3/seqtk.git;\ncd seqtk; make\n```\nThe only library dependency is zlib.\n\nSeqtk Examples\n--------------\n\n* Convert FASTQ to FASTA:\n\n        seqtk seq -a in.fq.gz \u003e out.fa\n\n* Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to `N` (the 2nd):\n\n        seqtk seq -aQ64 -q20 in.fq \u003e out.fa\n        seqtk seq -aQ64 -q20 -n N in.fq \u003e out.fa\n\n* Fold long FASTA/Q lines and remove FASTA/Q comments:\n\n        seqtk seq -Cl60 in.fa \u003e out.fa\n\n* Convert multi-line FASTQ to 4-line FASTQ:\n\n        seqtk seq -l0 in.fq \u003e out.fq\n\n* Reverse complement FASTA/Q:\n\n        seqtk seq -r in.fq \u003e out.fq\n\n* Extract sequences with names in file `name.lst`, one sequence name per line:\n\n        seqtk subseq in.fq name.lst \u003e out.fq\n\n* Extract sequences in regions contained in file `reg.bed`:\n\n        seqtk subseq in.fa reg.bed \u003e out.fa\n\n* Mask regions in `reg.bed` to lowercases:\n\n        seqtk seq -M reg.bed in.fa \u003e out.fa\n\n* Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):\n\n        seqtk sample -s100 read1.fq 10000 \u003e sub1.fq\n        seqtk sample -s100 read2.fq 10000 \u003e sub2.fq\n\n* Trim low-quality bases from both ends using the Phred algorithm:\n\n        seqtk trimfq in.fq \u003e out.fq\n\n* Trim 5bp from the left end of each read and 10bp from the right end:\n\n        seqtk trimfq -b 5 -e 10 in.fa \u003e out.fa\n\n* Find telomere (TTAGGG)n repeats:\n\n        seqtk telo seq.fa \u003e telo.bed 2\u003e telo.count\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fseqtk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Fseqtk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fseqtk/lists"}