{"id":19669891,"url":"https://github.com/attractivechaos/plb2","last_synced_at":"2025-04-09T06:06:03.602Z","repository":{"id":215206637,"uuid":"737157592","full_name":"attractivechaos/plb2","owner":"attractivechaos","description":"A programming language benchmark","archived":false,"fork":false,"pushed_at":"2025-01-30T16:34:21.000Z","size":762,"stargazers_count":263,"open_issues_count":14,"forks_count":36,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-02T04:47:56.017Z","etag":null,"topics":["benchmarks","performance"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/attractivechaos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-30T02:30:02.000Z","updated_at":"2025-03-25T15:19:11.000Z","dependencies_parsed_at":"2024-01-13T16:57:37.201Z","dependency_job_id":"7e2bb2d8-7026-4d9c-8bef-12aee4a7602a","html_url":"https://github.com/attractivechaos/plb2","commit_stats":null,"previous_names":["attractivechaos/plb2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attractivechaos%2Fplb2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attractivechaos%2Fplb2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attractivechaos%2Fplb2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attractivechaos%2Fplb2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/attractivechaos","download_url":"https://codeload.github.com/attractivechaos/plb2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247987184,"owners_count":21028891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarks","performance"],"created_at":"2024-11-11T17:02:38.247Z","updated_at":"2025-04-09T06:06:03.567Z","avatar_url":"https://github.com/attractivechaos.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"**TL;DR**: see the figure below. Note that nqueen and matmul are implemented in\nall languages but sudoku and bedcov are only implemented in some.\n\n\u003cimg align=\"left\" width=\"100%\" src=\"https://i.ibb.co/zGgbmVG/template.png?v32\"\u003e\n\n## Table of Content\n\n- [Introduction](#intro)\n- [Results](#result)\n  - [Overall impression](#overall)\n  - [Caveats](#caveat)\n    - [Startup time](#startup)\n\t- [Elapsed time vs CPU time](#cputime)\n  - [Subtle optimizations](#opt)\n\t- [Optimizing inner loops](#matmul)\n\t- [Controlling memory layout](#memlayout)\n- [Discussions](#conclusion)\n- [Appendix: Timing on Apple M1 Macbook Pro](#table)\n\n## \u003ca name=\"intro\"\u003e\u003c/a\u003eIntroduction\n\nProgramming Language Benchmark v2 (plb2) evaluates the performance of 25\nprogramming languages on four CPU-intensive tasks. It is a follow-up to\n[plb][plb] conducted in 2011. In plb2, all implementations use the same\nalgorithm for each task and their performance bottlenecks do not fall in\nlibrary functions. We do not intend to compare different algorithms or the\nquality of the standard libraries in these languages. Plb2 aims to\nevaluate the performance of a language when you have to implement a new\nalgorithm in the language - this may happen if you can't find the algorithm in\nexisting libraries.\n\nThe four tasks in plb2 all take a few seconds for a fast implementation to\ncomplete. The tasks are:\n\n* **nqueen**: solving a [15-queens problem][8queen]. The algorithm was inspired\n  by the second C implementation [from Rosetta Code][8qrc]. It involves nested\n  loops and integer bit operations.\n\n* **matmul**: multiplying two square matrices of 1500x1500 in size.\n\n* **sudoku**: solving 4000 hard [Sudokus][sudoku] (20 puzzles repeated for 200\n  times) using the [kudoku algorithm][kudoku]. This algorithm heavily uses\n  small fixed-sized arrays with a bit complex logic.\n\n* **bedcov**: finding the overlaps between two arrays of 1,000,000 intervals\n  with [implicit interval trees][iitree]. The algorithm involves frequent\n  array access in a pattern similar to binary searches.\n\nEvery language has nqueen and matmul implementations. Some languages do not\nhave sudoku or bedcov implementations. Most programs were initially implemented\nby me and a few were contributed by others. As I am mostly a C programmer,\nimplementations in other languages may be suboptimal. **Pull requests are welcomed!**\n\n## \u003ca name=\"result\"\u003e\u003c/a\u003eResults\n\nThe figure at the top of the page summarizes the elapsed time of each implementation\nmeasured on an Apple M1 MacBook Pro. [Hyperfine][hyperfine] was used for timing\nexcept for a few slow implementations which were timed with the \"time\" bash\ncommand without repetition. A plus sign \"+\" indicates [ahead-of-time\ncompilation][aot] (AOT). Exact timing can be found in the [table below](#table). The\nfigure was [programmatically generated](analysis) from the table.\n\n### \u003ca name=\"overall\"\u003e\u003c/a\u003eOverall impression\n\nProgramming language implementations in plb2 can be classified into three groups\ndepending on how and when compilation is done:\n\n1. Purely interpreted (QuickJS, Perl and [CPython][cpy], the official Python\n   implementation). Not surprisingly, these are among the slowest language\n   implementations in this benchmark.\n\n2. JIT compiled (Dart, Bun/Node, Java, Julia, LuaJIT, PHP, PyPy and Ruby3 with\n   [YJIT][yjit]). They are generally faster than pure interpretation.\n   Nonetheless, there is a large variance in this group. While PHP and Ruby3\n   are faster than Perl and CPython, they are still an order of magnitude\n   slower than PyPy. The two JavaScript engines (Bun and Node) and Julia\n   perform well. They are about twice as fast as PyPy.\n\n3. AOT compiled (the rest). Optimizing binaries for specific hardware, these\n   compilers tend to generate the fastest executables.\n\n### \u003ca name=\"caveat\"\u003e\u003c/a\u003eCaveats\n\n#### \u003ca name=\"startup\"\u003e\u003c/a\u003eStartup time\n\nSome JIT-based language runtimes take up to ~0.3 second to compile and warm-up.\nWe are not separating out this startup time. Nonetheless, because most\nbenchmarks run for several seconds, including the startup time does not greatly\naffect the results.\n\n#### \u003ca name=\"cputime\"\u003e\u003c/a\u003eElapsed time vs CPU time\n\nAlthough no implementations use multithreading, language runtimes may be doing\nextra work, such as garbage collection, in a separate thread. In this case, the\nCPU time (user plus system) may be longer than elapsed wall-clock time. Julia,\nin particular, takes noticeably more CPU time than wall-clock time even for the\nsimplest nqueen benchmark. In plb2, we are measuring the elapsed wall-clock\ntime because that is the number users often see. The ranking of CPU time may be\nslightly different.\n\n### \u003ca name=\"opt\"\u003e\u003c/a\u003eSubtle optimizations\n\n#### \u003ca name=\"memlayout\"\u003e\u003c/a\u003eControlling memory layout\n\nWhen implementing bedcov in Julia, C and many compiled languages, it is\npreferred to have an array of objects in a contiguous memory block such that\nadjacent objects are close in memory. This helps cache efficiency. In most\nscripting languages, unfortunately, we have to put references to objects in an\narray at the cost of cache locality. The issue can be alleviated by cloning\nobjects to a new array. This doubles the speed of PyPy and Bun.\n\n#### \u003ca name=\"matmul\"\u003e\u003c/a\u003eOptimizing inner loops\n\nThe bottleneck of matrix multiplication falls in the following nested loop:\n```cpp\nfor (int i = 0; i \u003c n; ++i)\n    for (int k = 0; k \u003c n; ++k)\n        for (int j = 0; j \u003c n; ++j)\n            c[i][j] += a[i][k] * b[k][j];\n```\nIt is obvious that `c[i]`, `b[k]` and `a[i][k]` can be moved out of the inner\nloop to reduce the frequency of matrix access. The Clang compiler can apply\nthis optimization. Manual optimization may actually hurt performance.\n\nHowever, **many other languages cannot optimize this nested loop.** If we\nmanually move `a[i][k]` to the loop above it, we can often improve their\nperformance. Some C/C++ programmers say compilers often optimize better than\nhuman, but this might not be the case in other languages.\n\n## \u003ca name=\"conclusion\"\u003e\u003c/a\u003eDiscussions\n\nThe most well-known and the longest running language benchmark is the [Computer\nLanguage Benchmark Games][clbg]. Plb2 differs in that it includes different\nlanguages (e.g. Nim and Crystal), different language runtimes (e.g. PyPy and\nLuaJIT) and new tasks, and it comes with more uniform\nimplementations and focuses more on the performance of the language itself\nwithout library functions. **Plb2 complements the Computer Language Benchmark\nGames.**\n\nOne important area that plb2 does not evaluate is the performance of memory\nallocation and/or garbage collection. This may contribute more to practical\nperformance than generating machine code. Nonetheless, it is challenging to\ndesign a realistic micro-benchmark to evaluate memory allocation. If the\nbuilt-in allocator in a language implementation does not work well, we can\nimplement customized memory allocator just for the specific task but this, in\nmy view, would not represent typical use cases.\n\nWhen plb was conducted in 2011, half of the languages in the figure above were\nnot mature or even did not exist. It is exciting to see many of them have\nreached the 1.0 milestone and are gaining popularity among modern programmers.\nOn the other hand, Python remains one of the two most used scripting languages\ndespite its poor performance. In my view, this is because PyPy would not be\nofficially endorsed while other JIT-based languages are not general or good\nenough. Will there be a language to displace Python in the next decade? I am\nnot optimistic.\n\n## \u003ca name=\"table\"\u003e\u003c/a\u003eAppendix: Timing on Apple M1 Macbook Pro\n\nIn the following table, star \"\\*\" indicates AOT compilation and plus \"+\"\nindicates JIT compilation.\n\n|Label    |Language  |Runtime|Version| Plot | nqueen | matmul | sudoku | bedcov |\n|:--------|:---------|:------|:------|:----:|-------:|-------:|-------:|-------:|\n|c:clang* |C         |Clang  |15.0.0 | Y    | 2.57   | 0.54   | 1.56   | 0.84   |\n|cl:sbcl* |Lisp      |SBCL   |2.4.0  | Y    | 3.19   | 3.84   | 3.40   |        |\n|codon\\*  |Codon     |       |0.18.0 | N    | 2.82   | 2.49   | 3.08   |        |\n|crystal* |Crystal   |       |1.10.0 | Y    | 3.28   | 2.45   | 3.14   | 0.87   |\n|c#:.net* |C#        |.NET   |8.0.100| Y    | 2.82   | 1.38   | 1.62   | 0.99   |\n|d:ldc2*  |D         |LDC2   |1.35.0 | Y    | 2.68   | 0.57   | 1.60   | 0.98   |\n|dart:jit+|Dart      |(JIT)  |3.2.4  | Y    | 3.62   | 2.74   | 3.24   | 2.85   |\n|elixir+  |Elixir    |       |1.15.7 | Y    | 26.17  | 67.39  |        |        |\n|f90:gcc* |Fortran   |GCC    |13.2.0 | Y    | 2.67   | 0.51   | 1.84   |        |\n|go*      |Go        |       |1.21.5 | Y    | 2.94   | 1.14   | 2.04   | 0.94   |\n|java+    |Java      |OpenJDK|20.0.1 | Y    | 3.92   | 1.14   | 3.20   | 3.04   |\n|js:bun+  |JavaScript|Bun    |1.0.20 | Y    | 3.11   | 1.75   | 3.07   | 2.32   |\n|js:deno+ |JavaScript|Deno   |1.39.1 | N    | 4.00   | 3.06   | 4.04   | 2.50   |\n|js:k8+   |JavaScript|k8     |1.0    | N    | 3.79   | 2.99   | 3.76   | 2.60   |\n|js:node+ |JavaScript|Node   |21.5.0 | Y    | 3.73   | 2.88   | 3.77   | 2.45   |\n|js:node  |JavaScript|Node-nojit|21.5.0|N   | 55.48  | 162.84 | 63.91  | 20.81  |\n|js:qjs   |JavaScript|QuickJS|23-12-09|Y    | 59.04  | 135.66 | 67.55  | 37.56  |\n|julia+   |Julia     |       |1.10.0 | Y    | 3.02   | 0.76   | 2.18   | 1.96   |\n|luajit+  |Lua       |LuaJIT |2.1    | Y    | 5.31   | 2.66   | 4.48   | 10.52  |\n|mojo*    |Mojo      |       |0.6.1  | Y    | 3.24   | 1.12   |        |        |\n|nim*     |Nim       |       |2.0.2  | Y    | 2.57   | 0.56   | 1.64   | 1.07   |\n|ocaml*   |OCaml     |       |4.14.1 | Y    | 3.56   | 2.14   |        |        |\n|perl     |Perl      |       |5.34.1 | Y    | 158.34 | 158.01 | 90.78  |        |\n|php+     |PHP       |       |8.3    | Y    | 48.15  | 71.20  |        |        |\n|py:cpy   |Python    |CPython|3.11.7 | Y    | 159.97 | 117.81 | 52.88  | 42.84  |\n|py:graal+|Python    |Graal EE|23.1.1| N    | 4.38   | 16.22  | 59.52  | 12.32  |\n|py:pypy+ |Python    |PyPy   |7.3.14 | Y    | 6.91   | 4.89   | 8.82   | 6.27   |\n|rb:crb+  |Ruby      |CRuby+yjit|3.3.0| Y   | 87.53  | 64.95  | 17.47  | 37.07  |\n|rb:graal+|Ruby      |Graal EE|23.1.1| Y    | 6.54   | 4.10   | 4.11   | 5.23   |\n|rust*    |Rust      |       |1.75.0 | Y    | 2.49   | 0.56   | 1.65   | 0.94   |\n|scm:ch+  |Scheme    |Chez   |9.5.8  | Y    | 3.54   | 18.98  |        |        |\n|swift*   |Swift     |       |5.9.0  | Y    | 2.92   | 0.56   | 1.78   | 1.21   |\n|v*       |V         |       |0.4.3  | Y    | 2.55   | 0.57   | 1.59   | 1.23   |\n|zig*     |Zig       |       |0.11.0 | Y    | 2.72   | 0.56   |        |        |\n\n[plb]: https://github.com/attractivechaos/plb\n[8queen]: https://en.wikipedia.org/wiki/Eight_queens_puzzle\n[8qrc]: https://rosettacode.org/wiki/N-queens_problem#C\n[sudoku]: https://en.wikipedia.org/wiki/Sudoku\n[kudoku]: https://attractivechaos.github.io/plb/kudoku.html\n[iitree]: https://academic.oup.com/bioinformatics/article/37/9/1315/5910546\n[hyperfine]: https://github.com/sharkdp/hyperfine\n[cpy]: https://en.wikipedia.org/wiki/CPython\n[pypy]: https://www.pypy.org\n[bun]: https://bun.sh\n[luablog]: https://attractivechaos.wordpress.com/2011/01/23/amazed-by-luajit/\n[yjit]: https://github.com/ruby/ruby/blob/master/doc/yjit/yjit.md\n[aot]: https://en.wikipedia.org/wiki/Ahead-of-time_compilation\n[clbg]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html\n[axpy]: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_1\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattractivechaos%2Fplb2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fattractivechaos%2Fplb2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattractivechaos%2Fplb2/lists"}