{"id":15645982,"url":"https://github.com/samuell/gccontent-benchmark","last_synced_at":"2025-04-27T06:51:20.132Z","repository":{"id":52850385,"uuid":"97119234","full_name":"samuell/gccontent-benchmark","owner":"samuell","description":"Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)","archived":false,"fork":false,"pushed_at":"2023-01-28T13:27:47.000Z","size":898,"stargazers_count":58,"open_issues_count":8,"forks_count":17,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-30T08:32:08.738Z","etag":null,"topics":["benchmarking","bioinformatics","programming-languages"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samuell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-13T12:16:15.000Z","updated_at":"2025-03-09T11:53:59.000Z","dependencies_parsed_at":"2023-02-15T16:30:41.699Z","dependency_job_id":null,"html_url":"https://github.com/samuell/gccontent-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samuell%2Fgccontent-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samuell%2Fgccontent-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samuell%2Fgccontent-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samuell%2Fgccontent-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samuell","download_url":"https://codeload.github.com/samuell/gccontent-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251099734,"owners_count":21536153,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","bioinformatics","programming-languages"],"created_at":"2024-10-03T12:10:51.190Z","updated_at":"2025-04-27T06:51:20.101Z","avatar_url":"https://github.com/samuell.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Comparing string processing performance of programming languages\n\n... using a simple bioinformatics task: Computing the GC fraction of DNA. It is based on the [GC content problem at Rosalind](http://rosalind.info/problems/gc/).\n\n## Usage\n\n```bash\nmake\ncat report.md\n```\n\nIf you have [pandoc](http://pandoc.org/) installed, you can also create a HTML report:\n\n```bash\nmake html-report\n\u003cbrowser\u003e report.html\n```\n\n## More info\n\nThis is a continuation of a previous benchmarking project, covered in [this blog post](http://saml.rilspace.com/moar-languagez-gc-content-in-python-d-fpc-c-and-c).\n\nThe idea is to compare the string processing performance of different programming languages\nby implementing a very small a very simple algorithm and task: Read a [specific file](http://ftp.ensembl.org/pub/release-67/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa.gz)\ncontaining DNA sequence in the [FASTA format](https://en.wikipedia.org/wiki/FASTA_format),\nand compute the GC content in this file.\n\nTwo requirements apply:\n\n1. The file must be read line by line (since DNA files are in reality ofter\n   bigger than RAM, and this also helps make the implementations remotely\n   comparable)\n2. For each line, the program has to check if it starts with a `\u003e` character,\n   which if so means it is a header row and should be skipped.\n\nThe FASTA file can contain DNA letters (A,C,G,T) or unknowns (N), or new-lines\n(Unix style `\\n` ones).\n\nThis is it. Please have a look in the Makefile, and the various implementations\nin the code directories, or send a pull request with your own implementation\n(if the language already exists, increase the number one step, so for a new Go\nimplementation, you would create a `golang.001` folder, optionally with some\ntag appended to it, like: `golang.001.table-optimized`, etc).\n\n## Results\u003ca name=\"current-results\"\u003e\n\nThese are some results (Execution times in seconds, smaller is better) from\nrunning some of the tests in the Makefile, on a Dell Inspiron laptop with an\nIntel(R) Core(TM) i7-8650U CPU @ 1.90GHz, with Xubuntu 18.04 Bionic LTS 64bit\nas operating system.\n\n(Below the tables are some more details about BIOS settings etc).\n\n| Language and implementation                            | Execution time (s) | Compiler or interpreter version                                                                                       |\n|--------------------------------------------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------|\n| [rust.003.vectorized](rust.003.vectorized/src/main.rs) | 0.442              | rustc 1.52.0-nightly (152f66092 2021-02-17)                                                                           |\n| [rust.004.simd](rust.004.simd/src/main.rs)             | 0.445              | rustc 1.52.0-nightly (152f66092 2021-02-17)                                                                           |\n| [rust.002.bitshift](rust.002.bitshift/src/main.rs)     | 0.695              | rustc 1.52.0-nightly (152f66092 2021-02-17)                                                                           |\n| [rust.001](rust.001/src/main.rs)                       | 0.891              | rustc 1.52.0-nightly (152f66092 2021-02-17)                                                                           |\n| [c.001](c.001/gc.c)                                    | 0.970              | gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                               |\n| [cpp.001](cpp.001/gc.)                                 | 1.025              | g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                               |\n| [d](d/gc.d)                                            | 1.215              | LDC - the LLVM D compiler (1.22.0): based on DMD v2.092.1                                                             |\n| [c](c/gc.c)                                            | 1.226              | gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                               |\n| [go.001.unroll](go.001.unroll/gc.go)                   | 1.616              | go version go1.15 linux/amd64                                                                                         |\n| [nim.003.zerocopy](nim.003.zerocopy/gc.nim)            | 1.660              | Nim Compiler Version 1.2.6 [Linux: amd64]                                                                             |\n| [nim.002](nim.002/gc.nim)                              | 1.703              | Nim Compiler Version 1.2.6 [Linux: amd64]                                                                             |\n| [julia](julia/gc.jl)                                   | 1.926              | julia version 1.5.3                                                                                                   |\n| [go](go/gc.go)                                         | 1.937              | go version go1.15 linux/amd64                                                                                         |\n| [c.003.ril](c.003.ril/gc.c)                            | 1.955              | gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                               |\n| [nim.001](nim.001/nim.001/gc.nim)                      | 2.281              | Nim Compiler Version 1.2.6 [Linux: amd64]                                                                             |\n| [crystal.002.peek](crystal.002.peek/gc.cr)             | 2.369              | Crystal 0.36.1 [c3a3c1823] (2021-02-02)  LLVM: 10.0.0                                                                 |\n| [pypy](pypy/gc.py)                                     | 2.677              | Python 2.7.13 (5.10.0+dfsg-3build2, Feb 06 2018, 18:37:50) [PyPy 5.10.0 with GCC 7.3.0]                               |\n| [cpp](cpp/gc.cpp)                                      | 2.832              | g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                               |\n| [nim](nim/gc.nim)                                      | 2.976              | Nim Compiler Version 1.2.6 [Linux: amd64]                                                                             |\n| [rust](rust/src/main.rs)                               | 3.195              | rustc 1.52.0-nightly (152f66092 2021-02-17)                                                                           |\n| [crystal](crystal/gc.cr)                               | 4.054              | Crystal 0.36.1 [c3a3c1823] (2021-02-02)  LLVM: 10.0.0                                                                 |\n| [ada](ada/gc.adb)                                      | 4.235              | GNAT Community 2020 (20200818-93)                                                                                     |\n| [java](java/java/gc.java)                              | 4.279              | openjdk version \"11.0.10\" 2021-01-19 OpenJDK Runtime Environment GraalVM CE 21.0.0.2 (build 11.0.10+8-jvmci-21.0-b06) |\n| [crystal.001.csp](crystal.001.csp/gc.cr)               | 4.283              | Crystal 0.36.1 [c3a3c1823] (2021-02-02)  LLVM: 10.0.0                                                                 |\n| [java](java/gc.java)                                   | 4.284              | openjdk version \"11.0.10\" 2021-01-19 OpenJDK Runtime Environment GraalVM CE 21.0.0.2 (build 11.0.10+8-jvmci-21.0-b06) |\n| [cython](cython/gc.pyx)                                | 6.016              | Cython version 0.26.1                                                                                                 |\n| [fpc](fpc/gc.pas)                                      | 6.578              | Free Pascal Compiler version 3.0.4+dfsg-18ubuntu2 [2018/08/29] for x86_64                                             |\n| [node](node/gc.js)                                     | 6.836              | v15.9.0                                                                                                               |\n| [perl](perl/gc.pl)                                     | 7.323              | This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi                            |\n| [python](python/gc.py)                                 | 8.855              | Python 3.7.0                                                                                                          |\n| [graalvm](graalvm/gc.java)                             | 11.734             | GraalVM Version 21.0.0.2 (Java Version 11.0.10+8-jvmci-21.0-b06)                                                      |\n\n## Results with relaxed constraints on reading line-by-line\n\nThe below contributed versions departs slightly from reading line-by-line (by\nsome definition of that requirement, which is clearly very hard to define):\n\n| Language                         | Execution time (s) | Compiler versions                           |\n|----------------------------------|-------------------:|---------------------------------------------|\n| [rust.007.rawio](rust.007.rawio) |              0.221 | rustc 1.52.0-nightly (152f66092 2021-02-17) |\n| [rust.005.rawio](rust.005.rawio) |              0.318 | rustc 1.52.0-nightly (152f66092 2021-02-17) |\n| [C.002.rawio](c.002.rawio/gc.c)  |              0.524 | gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0     |\n| [rust.006.rawio](rust.006.rawio) |              0.539 | rustc 1.52.0-nightly (152f66092 2021-02-17) |\n\n## More details about settings used when benchmarking\n\nThe following CPU options were turned off in BIOS, to try to avoid fluctuating\nCPU clock frequencies:\n\n- Performance \u003e Intel SpeedStep\n- Performance \u003e C-States Control\n- Performance \u003e Intel TurboBoost\n- Power Management \u003e Intel Speed Shift Technology\n\nBenchmarking was done with other GUI apps, networking and bluetooth turned off.\n\n## Incomplete list of contributions before merge to GitHub\n\nFor contributors after establishing the GitHub repo, see [this page here on GitHub](https://github.com/samuell/gccontent-benchmark/graphs/contributors).\n\nBelow is additionally an incomplete list of people who contributed to the code\nexamples while the benchmark was only [hosted on my old blog](https://github.com/samuell/gccontent-benchmark/graphs/contributors):\n\n- Daniel Spångberg (working at UPPMAX HPC center at the time) contributed\n  numerous, extremely fast implementations in C, including the one above (c),\n  which is constrained by the requirement to process the file line by line.\n- [Roger Peppe](https://github.com/rogpeppe)\n  ([twitter](https://twitter.com/rogpeppe)) contributed the fastest Go\n  implementation, including pointers in combination with a table lookup.\n- [Mario Ray Mahardhika (aka leledumbo)](https://github.com/leledumbo)\n  contributed the fastest FreePascal implementation, which is the one above\n  (fpc.000).\n- [Harald Achitz](https://www.linkedin.com/in/harald-achitz-860657139/)\n  provided the C++ implementation used above (cpp.000).\n- (Who is missing here?)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamuell%2Fgccontent-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamuell%2Fgccontent-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamuell%2Fgccontent-benchmark/lists"}