{"id":13752153,"url":"https://github.com/lh3/readfq","last_synced_at":"2025-05-07T13:04:56.062Z","repository":{"id":47922634,"uuid":"2304737","full_name":"lh3/readfq","owner":"lh3","description":"Fast multi-line FASTA/Q reader in several programming languages","archived":false,"fork":false,"pushed_at":"2021-06-06T07:27:15.000Z","size":18,"stargazers_count":176,"open_issues_count":6,"forks_count":58,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-07T13:04:47.409Z","etag":null,"topics":["bioinformatics","sequence-analysis"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-08-31T23:10:08.000Z","updated_at":"2025-04-03T07:27:29.000Z","dependencies_parsed_at":"2022-08-12T14:20:17.810Z","dependency_job_id":null,"html_url":"https://github.com/lh3/readfq","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Freadfq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Freadfq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Freadfq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Freadfq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/readfq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252883225,"owners_count":21819159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","sequence-analysis"],"created_at":"2024-08-03T09:01:00.370Z","updated_at":"2025-05-07T13:04:56.045Z","avatar_url":"https://github.com/lh3.png","language":"C","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"Readfq is a collection of routines for parsing the FASTA/FASTQ format. It\nseamlessly parses both FASTA and multi-line FASTQ with a simple interface.\n\nReadfq is first implemented in a single C header file and then ported to Lua,\nPerl and Python as a single function less than 50 lines. For users of scripting\nlanguages, I encourage to copy-and-paste the function instead of using readfq\nas a library. It is always good to avoid unnecessary library dependencies.\n\nReadfq also strives for efficiency. The C implementation is among the fastest\n(if not the fastest). The Python and Perl implementations are several to tens\nof times faster than the official Bio* implementations. If you can speed up\nreadfq further, please let me know. I am not good at optimizing programs in\nscripting languages. Thank you.\n\nAs to licensing, the C implementation is distributed under the MIT license.\nImplementations in other languages are released without a license. Just copy\nand paste. You do not need to acknowledge me. The following shows a brief\nexample for each programming language:\n\n\n  # Perl\n  my @aux = undef; # this is for keeping intermediate data\n  while (my ($name, $seq, $qual) = readfq(\\*STDIN, \\@aux)) { print \"$seq\\n\"; }\n\n\n  # Python: generator function\n  for name, seq, qual in readfq(sys.stdin): print seq\n\n\n  -- Lua: closure\n  for name, seq, qual in readfq(io.stdin) do print seq end\n\n  /* Go */\n  package main\n\n  import (\n    \"fmt\"\n    \"bufio\"\n    \"github.com/drio/drio.go/bio/fasta\"\n  )\n\n  func main() {\n    var fqr fasta.FqReader\n    fqr.Reader = bufio.NewReader(os.Stdin)\n    for r, done := fqr.Iter(); !done; r, done = fqr.Iter() {\n      fmt.Println(r.Seq)\n    }\n  }\n\n  /* C */\n  #include \u003czlib.h\u003e\n  #include \u003cstdio.h\u003e\n  #include \"kseq.h\"\n  KSEQ_INIT(gzFile, gzread)\n\n  int main() {\n    gzFile fp;\n    kseq_t *seq;\n    fp = gzdopen(fileno(stdin), \"r\");\n    seq = kseq_init(fp);\n    while (kseq_read(seq) \u003e= 0) puts(seq-\u003eseq.s);\n    kseq_destroy(seq);\n    gzclose(fp);\n    return 0;\n  }\n\n\nSome naive benchmarks. To convert a FASTQ containing 25 million 100bp reads to FASTA,\nFASTX-Toolkit (parsing 4-line FASTQ only) takes 325.0 CPU seconds and EMBOSS' seqret\n247.8 seconds. My seqtk, which uses the kseq.h library, finishes the task in 24.6\nseconds, 10X faster. For retrieving 25k sequences by name from the same FASTQ,\nBioPython takes 963 seconds, while readfq.py takes 136 seconds; BioPerl takes more\nthan 40 minutes (killed), while readfq.pl 273 seconds. Seqtk takes 29 seconds.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Freadfq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Freadfq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Freadfq/lists"}