{"id":21244588,"url":"https://github.com/lac-dcc/jotai-benchmarks","last_synced_at":"2025-07-16T10:43:48.870Z","repository":{"id":42564552,"uuid":"433842166","full_name":"lac-dcc/jotai-benchmarks","owner":"lac-dcc","description":"Collection of executable benchmarks","archived":false,"fork":false,"pushed_at":"2023-12-01T22:01:21.000Z","size":57654,"stargazers_count":43,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-05T17:43:32.832Z","etag":null,"topics":["autotuning","benchmarking","clang","compilation","fuzzing","llvm","machinelearning"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lac-dcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-12-01T13:40:16.000Z","updated_at":"2024-11-20T00:51:27.000Z","dependencies_parsed_at":"2023-11-27T23:46:28.760Z","dependency_job_id":null,"html_url":"https://github.com/lac-dcc/jotai-benchmarks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lac-dcc/jotai-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lac-dcc%2Fjotai-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lac-dcc%2Fjotai-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lac-dcc%2Fjotai-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lac-dcc%2Fjotai-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lac-dcc","download_url":"https://codeload.github.com/lac-dcc/jotai-benchmarks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lac-dcc%2Fjotai-benchmarks/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264666021,"owners_count":23646570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autotuning","benchmarking","clang","compilation","fuzzing","llvm","machinelearning"],"created_at":"2024-11-21T01:28:59.110Z","updated_at":"2025-07-10T21:30:54.503Z","avatar_url":"https://github.com/lac-dcc.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Jotai Benchmark Collection\n\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"jotai drawing\" src=\"./assets/img/ProjBanner.png\" width=\"95%\" height=\"auto\"/\u003e\u003c/br\u003e\n\u003c/p\u003e\n\nJotai is a large collection of executable benchmarks mined from open source\nrepositories.\nEach benchmark consists of a single function written in C, plus a driver to run that function.\nTo know more about the benchmarks, you can watch this [video](https://youtu.be/_fWa2rTK3mY).\n\n## Running\n\nExecuting Jotai benchmarks is quite easy: just compile and run!\nEach executable program receives a single argument: an integer that specifies\nwhich input will be used to run that program.\nEvery benchmark has at least input 0 (e.g., `./file.exe 0`), but often they\nhave more inputs (1, 2, ...). \nFor instance, the following commands will compile and run `extr_A...al.c` with\nits first input:\n\n```\n$\u003e cd benchmarks/anghaLeaves/\n$\u003e clang extr_Arduinotestsdevicetest_libcmemmove1.c_mymemmove_Final.c\n$\u003e ./a.out 0\n```\n\nTo see all the inputs available for a benchmark, just run the benchmark\nwithout passing arguments to it.\nFor instance, still considering `extr_A...al.c`, we get:\n\n```\n$\u003e ./a.out\n\nUsage:\n    prog [OPTIONS] [ARGS]\n\n    ARGS:\n       0    big-arr\n       1    big-arr-10x\n```\n\nIn this example, the benchmark provides two inputs. Each one was produced with\na different set of constraints. Each set of constraints that we use has a name:\n`big-arr` and `bit-arr-10x`, in the above example.\nWe have a domain specific language to specify these constraints.\nWe are preparing a report about it, but if you want to know more, just write us\nan email.\n\n## CompilerGym\n\nAn ensemble of [18,761](https://compilergym.com/llvm/api.html#compiler_gym.envs.llvm.datasets.JotaiBenchDataset) benchmarks from the Jotai collection is available in the\n[CompilerGym](https://github.com/facebookresearch/CompilerGym) library.\nCompilerGym is a library of reinforcement learning environments for compilation\ntasks.\nCurrently, Jotai is the largest suite of executable benchmarks publicly\navailable in CompilerGym.\nTo use these benchmarks, check CompilerGym's [user guide](https://compilergym.com/index.html).\n\n### Deriving Statistics\n\nMost of the Jotai functions run for a very short time.\nIf you want to time them, be prepared to use solid statistical equipment\n(t-test, confidence interval, p-values, etc) to deal with high variances.\nIf you want more deterministic numbers, we recommend you to analyze the\nbenchmarks using [CFGGrind](https://github.com/rimsa/CFGgrind).\nCFGGrind lets you count the number of instructions that were executed, the\nnumber of basic blocks that were visited, the number of executed conditional\nbranches that were not completely covered, etc.\nThe beauty of it is that these numbers are deterministic and lead to\nreproducible experiments.\n\n## Sample Results\n\nWe can extract lots of statistics from the Jotai benchmarks.\nFor instance, below we show results for 15,305 programs that contain\ninputs produced by a constraint set called \"BigArray\".\nThis strategy consists in assigning each pointer in the target function to a\nmemory region that is a large as the size of the largest integer passed to the\nfunction.\nWe have eliminated from this evaluation the speedups greater than 8.0x, as they\nare likely to be exceptional results.\nIn total, 105 benchmarks have been eliminated in this way.\nBelow, on the left, we show the\n[density distribution](https://en.wikipedia.org/wiki/Histogram) of speedups:\n\n![Results involving big array constraints](./assets/img/BigArrayDynResults.jpg?raw=true \"Sample Results\")\n\nOn the average, we observe a mean speedup of 2.72x, with median 2.57x. Regressions are also possible: the worse regression lead to a slowdown of 2.08x (the speedup of 0.48x). In the middle of the above figure, we show a [quantile-quantile plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot). This Q-Q plot compares the distributions of speedups with the normal distribution. Through the Q-Q plot, we can see an excess of outliers, mostly on the right of the speedup mean. The gray area highlights the region where outliers would be expected in a normal distribution. Thus, the plot provides visual hints that the speedups obtained through optimization are not normally distributed. The [Shapiro-Wilk Normality Test](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test) indicates that these results are unlikely to come from a normal distribution.\n\n## The Zen of Jotai\n\nJotai is a large dataset of executable programs.\nWhile producing these programs, we decided to stick to the following\n\n- **Compile-and-run**: each benchmark comes in a separate file as an independent compilation unit, with all the drivers necessary to run it. All that must be done is to compile it and it will (hopefully) run.\n- **Well-defined**: every benchmark must run till termination if compiled with the commands: `clang -g -O1 -fsanitize=address,undefined,signed-integer-overflow -fno-sanitize-recover=all`\n- **Deterministic**: random inputs are produced from the same seed using a library of our own craft; hence, execution should be portable across different platforms.\n- **Harmless**: benchmark do not invoke third-party functions. Thus, every instruction is [visible](https://homepages.dcc.ufmg.br/~fernando/publications/papers/AlvaresJCL21.pdf), and the benchmarks cannot exploit security vulnerabilities in the host system. The sole exception is that we have a small folder with programs that invoke functions from `math.h`.\n- **Extensible**: we are working on a DSL for the generation of random inputs. In this way, users of Jotai can produce more inputs to the benchmarks without having to hardcode the drivers.\n- **Observable**: we would like to have functions that return values (done!) or at least be able to print the values of local variables, like Whiro does (not done!)\n- **Clean**: every allocated memory chunk should be deallocated at the end of execution. In other words, none of the benchmarks cause memory leaks.\n\n## Browsing the repository\n\nBenchmarks are currently stored in two folders:\n\n- `anghaLeaves`: benchmark functions that do not call any other function\n- `anghaMath`: benchmark functions that call functions from `math.h`\n\nEach benchmark consists of single file. This file contains a single function (which we call the *benchmark*) plus everything that you need to compile and run that function: input generators, forward declarations, the `main` function, stuff to generate random numbers, etc.\n\nWe have a few CSV files that we have produced for the benchmarks stored in the `data` folder. This folder is subdivided into three directories:\n\n- `SPEC_CPU_2017`: results produced for [SPEC CPU2017](https://www.spec.org/cpu2017/), which we provide for comparing against our benchmarks.\n- `anghaLeaves`: results produced after observing the execution of the benchmarks in the `anghaLeaves` dataset.\n- `anghaMath`: results produced after observing the execution of the benchmarks in the `anghaMath` dataset.\n\nAdditionally, we have a `Scripts` folder with a few useful shell scripts that you can use to collect statistics for the Jotai programs.\n\n## Generating new benchmarks \n\nThe Jotai functions have been taken from the [AnghaBench](http://cuda.dcc.ufmg.br/angha/home) repository, and have been augmented with code to generate inputs for them.\nIf you want to experiment with Jotai’s tool to generate new benchmarks, the source code is available inside the source directory. The instructions on how to build and run Jotai are available [here](source/jotai/README.md).\nWe have also prepared a docker with everything that you need to run Jotai.\nTo use our container, check out this [video-tutorial](https://youtu.be/uLVR5N45lm0).\nThe tutorial contains instructions to build and run Jotai from source, or via the\ncontainer.\n\n\n## Technical Report\n\nA draft paper describing the methodology to generate benchmarks is available [here](https://raw.githubusercontent.com/lac-dcc/jotai-benchmarks/main/assets/doc/LaC_TechReport022022.pdf). This document describes ongoing work; hence, its contents might change in the future. To cite it:\n\n```\n@techreport{Kind22,\n  title = {Jotai: a Methodology for the Generation of Executable C Benchmarks},\n  author = {Cecilia Conde Kind and Michael Canesche and Fernando Magno Quintao Pereira},\n  year = {2022},\n  institution = {Universidade Federal de Minas Gerais},\n  number = {02-2022}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flac-dcc%2Fjotai-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flac-dcc%2Fjotai-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flac-dcc%2Fjotai-benchmarks/lists"}