{"id":13752098,"url":"https://github.com/lh3/biofast","last_synced_at":"2025-05-07T08:12:24.714Z","repository":{"id":46962976,"uuid":"261206752","full_name":"lh3/biofast","owner":"lh3","description":"Benchmarking programming languages/implementations for common tasks in Bioinformatics","archived":false,"fork":false,"pushed_at":"2021-12-09T14:10:44.000Z","size":127,"stargazers_count":185,"open_issues_count":6,"forks_count":26,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-05-07T08:12:19.221Z","etag":null,"topics":["bioinformatics"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-04T14:34:33.000Z","updated_at":"2025-04-14T16:30:21.000Z","dependencies_parsed_at":"2022-08-12T13:11:20.380Z","dependency_job_id":null,"html_url":"https://github.com/lh3/biofast","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fbiofast","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fbiofast/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fbiofast/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fbiofast/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/biofast/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252839295,"owners_count":21812090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics"],"created_at":"2024-08-03T09:00:59.461Z","updated_at":"2025-05-07T08:12:24.699Z","avatar_url":"https://github.com/lh3.png","language":"C","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"## Introduction\n\nBiofast is a small benchmark for evaluating the performance of programming\nlanguages and implementations on a few common tasks in the field of\nBioinformatics. It currently includes two benchmarks: [interval query](#bedcov)\nand [FASTQ parsing](#fqcnt). Please see also the companion [blog post][blog].\n\n## Results\n\n### Setup\n\nWe ran the test on a CentOS 7 server with two EPYC 7301 CPUs and 1TB memory.\nThe system comes with gcc-4.8.5, python-3.7.6, nim-1.2.0, julia-1.4.1, go-1.14.3,\nluajit-322db02 and k8-0.2.5. Relatively small libraries are included in the\n[lib directory](lib) directory.\n\nWe tried to avoid other active processes when test programs were running.\nTiming in this page was obtained with [hyperfine][hyperfine], which reports\nCPU time averaged in at least ten rounds. Peak memory was often measured only\nonce as hyperfine doesn't report memory usage.\n\nFull results can be found in the [bedcov](bedcov) and [fqcnt](fqcnt)\ndirectories, respectively. This README only shows one implementation per\nlanguage. We exclude those binding to C libraries and try to select the one\nimplementing a similar algorithm to the C version.\n\n### \u003ca name=\"bedcov\"\u003e\u003c/a\u003eComputing the depth and breadth of coverage from BED files\n\nIn this benchmark, we load one BED file into memory. We stream another BED file\nand compute coverage of each interval using the [cgranges algorithm][cgr] (see\nthe [C++ header][cppiitree] for algorithm details). The\noutput all programs should be identical \"[bedtools coverage][bedcov]\". In the\ntable below, \"t\" stands for CPU time in seconds and \"M\" for peak memory in\nmega-bytes. Subscripts \"g2r\" and \"r2g\" correspond to the following two command\nlines, respectively:\n```sh\nbedcov ex-rna.bed ex-anno.bed  # g2r\nbedcov ex-anno.bed ex-rna.bed  # r2g\n```\nBoth input BED files can be found in `biofast-data-v1.tar.gz` from the\n[download page][dl].\n\n|Program | Language | t\u003csub\u003eg2r\u003c/sub\u003e (s) | M\u003csub\u003eg2r\u003c/sub\u003e (Mb) | t\u003csub\u003er2g\u003c/sub\u003e (s) | M\u003csub\u003er2g\u003c/sub\u003e (Mb) |\n|:-------|:---------|--------------------:|---------------------:|--------------------:|---------------------:|\n|[bedcov\\_c1\\_cgr.c](bedcov/bedcov_c1_cgr.c)          |C         |  5.2|  138.4 | 10.7|  19.1 |\n|[bedcov\\_cr1\\_klib.cr](bedcov/bedcov_cr1_klib.cr)    |Crystal   |  8.8|  319.6 | 14.8|  40.7 |\n|[bedcov\\_nim1\\_klib.nim](bedcov/bedcov_nim1_klib.nim)|Nim       | 16.6|  248.4 | 26.0|  34.1 |\n|[bedcov\\_jl1\\_klib.jl](bedcov/bedcov_jl1_klib.jl)    |Julia     | 25.9|  428.1 | 63.0| 257.0 |\n|[bedcov\\_go1.go](bedcov/bedcov_go1.go)               |Go        | 34.0|  318.9 | 21.8|  47.3 |\n|[bedcov\\_js1\\_cgr.js](bedcov/bedcov_js1_cgr.jl)      |Javascript| 76.4| 2219.9 | 80.0| 316.8 |\n|[bedcov\\_lua1\\_cgr.lua](bedcov/bedcov_lua1_cgr.lua)  |LuaJIT    |174.7| 2668.0 |218.9| 364.6 |\n|[bedcov\\_py1\\_cgr.py](bedcov/bedcov_py1_cgr.py)      |PyPy    |17332.9| 1594.3 |5481.2|256.8 |\n|[bedcov\\_py1\\_cgr.py](bedcov/bedcov_py1_cgr.py)      |Python |\u003e33770.4| 2317.6|\u003e20722.0|313.7|\n\n* For the full table and technical notes, see the [bedcov directory](bedcov).\n\n### \u003ca name=\"fqcnt\"\u003e\u003c/a\u003eFASTQ parsing\n\nIn this benchmark, we parse a 4-line FASTQ file consisting of 5,682,010\nrecords and report the number of records and the total length of sequences and\nquality. The input file is `M_abscessus_HiSeq.fq` in\n`biofast-data-v1.tar.gz` from the [download page][dl]. In the table below,\n\"t\u003csub\u003egzip\u003c/sub\u003e\" gives the CPU time in seconds for gzip'd input and\n\"t\u003csub\u003eplain\u003c/sub\u003e\" gives the time for raw input without compression.\n\n|Program | Language | t\u003csub\u003egzip\u003c/sub\u003e (s) | t\u003csub\u003eplain\u003c/sub\u003e (s) | Comments |\n|:-------|:---------|---------------------:|----------------------:|:---------|\n|[fqcnt\\_rs2\\_needletail.rs](fqcnt/fqcnt_rs2_needletail.rs)|Rust|  9.3|  0.8|[needletail][nt]; fasta/4-line fastq|\n|[fqcnt\\_c1\\_kseq.c](fqcnt/fqcnt_c1_kseq.c)          |C         |  9.7|  1.4|multi-line fasta/fastq|\n|[fqcnt\\_cr1\\_klib.cr](fqcnt/fqcnt_cr1_klib.cr)      |Crystal   |  9.7|  1.5|kseq.h port|\n|[fqcnt\\_nim1\\_klib.nim](fqcnt/fqcnt_nim1_klib.nim)  |Nim       | 10.5|  2.3|kseq.h port|\n|[fqcnt\\_jl1\\_klib.jl](fqcnt/fqcnt_jl1_klib.jl)      |Julia     | 11.2|  2.9|kseq.h port|\n|[fqcnt\\_js1\\_k8.js](fqcnt/fqcnt_js1_k8.js)          |Javascript| 17.5|  9.4|kseq.h port|\n|[fqcnt\\_go1.go](fqcnt/fqcnt_go1.go)                 |Go        | 19.1|  2.8|4-line only|\n|[fqcnt\\_lua1\\_klib.lua](fqcnt/fqcnt_lua1_klib.lua)  |LuaJIT    | 28.6| 27.2|partial kseq.h port|\n|[fqcnt\\_py2\\_rfq.py](fqcnt/fqcnt_py2_rfq.py)        |PyPy      | 28.9| 14.6|partial kseq.h port|\n|[fqcnt\\_py2\\_rfq.py](fqcnt/fqcnt_py2_rfq.py)        |Python    | 42.7| 19.1|partial kseq.h port|\n\n* For the full table and technical notes, see the [fqcnt directory](fqcnt).\n\n[dl]: https://github.com/lh3/biofast/releases/tag/biofast-data-v1\n[bp]: https://biopython.org/\n[fx.jl]: https://github.com/BioJulia/FASTX.jl\n[mappy]: https://github.com/lh3/minimap2/tree/master/python\n[pyfx]: https://github.com/lmdu/pyfastx\n[cgr]: https://github.com/lh3/cgranges\n[bedcov]: https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html\n[blog]: http://lh3.github.io/2020/05/17/fast-high-level-programming-languages\n[cppiitree]: https://github.com/lh3/cgranges/blob/master/cpp/IITree.h\n[hyperfine]: https://github.com/sharkdp/hyperfine\n[nt]: https://github.com/onecodex/needletail\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fbiofast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Fbiofast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fbiofast/lists"}