{"id":20135788,"url":"https://github.com/replikativ/datahike-benchmark","last_synced_at":"2025-07-05T14:33:07.672Z","repository":{"id":141683033,"uuid":"276487896","full_name":"replikativ/datahike-benchmark","owner":"replikativ","description":"Measuring of datahike performance","archived":false,"fork":false,"pushed_at":"2021-09-29T11:20:06.000Z","size":168,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-13T09:37:18.742Z","etag":null,"topics":["benchmark","clojure","datahike"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/replikativ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-07-01T21:42:04.000Z","updated_at":"2021-08-26T16:49:36.000Z","dependencies_parsed_at":"2023-09-22T07:25:00.186Z","dependency_job_id":null,"html_url":"https://github.com/replikativ/datahike-benchmark","commit_stats":{"total_commits":24,"total_committers":1,"mean_commits":24.0,"dds":0.0,"last_synced_commit":"14099c14a97024e53c794487fa6ca5a9a960a454"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replikativ%2Fdatahike-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replikativ%2Fdatahike-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replikativ%2Fdatahike-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replikativ%2Fdatahike-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/replikativ","download_url":"https://codeload.github.com/replikativ/datahike-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241582527,"owners_count":19985846,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","clojure","datahike"],"created_at":"2024-11-13T21:16:32.466Z","updated_at":"2025-03-02T22:41:23.135Z","avatar_url":"https://github.com/replikativ.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# datahike-benchmark\nCommandline tool to run benchmarks and create visualizations for datahike backend performance comparisons.\n\n## Prerequisites\n\nSet up [PostgreSQL](https://www.postgresql.org/) and [mysql](https://www.mysql.com/) databases using [docker compose](https://docs.docker.com/compose/):\n``` bash\ndocker-compose up postgres mysql\n```\n\nYou can clean up the containers with:\n``` bash\ndocker-compose down --volumes\n```\n\n### Time Measurements\n\nNo prerequisites.\n\nDepending on your choice, this tool uses a macro similar to the built in [time](https://clojuredocs.org/clojure.core/time) function or [criterium](https://github.com/hugoduncan/criterium) for the time measurements, none of them require further installations.\n\n### Space Measurements\n\nThe Java functions require no further installations.\n\nThe [clj-async-profiler](https://github.com/clojure-goes-fast/clj-async-profiler) based on a [JVM profiling tool](https://github.com/jvm-profiling-tools/async-profiler) used here to measure heap allocations requires HotSpot debug symbols. \n\nIn the Oracle JDK they are already embedded, but for the Open JDK, the debug symbols have to be installed, e.g. by running\n\n``` bash\nsudo apt install openjdk-11-dbg\n```\n\nIt might also be necessary to set\n``` bash\necho 1 | sudo tee /proc/sys/kernel/perf_event_paranoid\necho 0 | sudo tee /proc/sys/kernel/kptr_restrict\n```\n\nIf there are still problems occurring, please check the [profiler Github page](https://github.com/clojure-goes-fast/clj-async-profiler) and let us know, so we can update our instructions.\n\n\n## Commandline Tool Usage \n\n``` bash\nclj -M:run [options] \n```\n\nOptions:\n |                                | Description                                                                      | Default value                                |\n |--------------------------------|----------------------------------------------------------------------------------|----------------------------------------------|\n | -e, --crash-on-error           | Continue after error occurs                                                      | false                                        |\n | -D, --not-save-data            | Do not save raw benchmark output data                                            | false                                        |\n | -P, --not-save-plots           | Do not create plots                                                              | false                                        |\n | -a, --space-only               | Measure only heap allocations                                                    | false                                        |\n | -t, --time-only                | Measure only execution time                                                      | false                                        |\n | -c, --use-criterium            | Use criterium library for time measurements                                      | false                                        |\n | -j, --use-perf                 | Use perf events for space measurements                                           | false                                        |\n | -n, --data-dir                 | Data directory                                                                   | \"./data\"                                     |\n | -p, --plot-dir                 | Plot directory                                                                   | \"./plots\"                                    |\n | -m, --error-dir                | Error directory                                                                  | \"./errors\"                                   |\n | -u, --save-to-db URI           | Save results to datahike database with given URI instead of file                 | nil                                          |\n | -s, --seed SEED                | Initial seed for data creation                                                   | (rand-int)                                   |\n | -g, --time-step STEP           | Step size for measurements in ms. Used for measuring space with Java.            | 5                                            |\n | -d, --space-step STEP          | Step size for measurements in kB. Used for measuring space with Profiler.        | 5                                            |\n | -b, --only-database DBNAME     | Run benchmarks only for this database (library with backend); multi value        | #{}                                          |\n | -B, --except-database DBNAME   | Do not run benchmarks for this database (library with backend); multi value      | #{}                                          |\n | -l, --only-lib LIB             | Run benchmarks only for this library; multi value                                | #{}                                          |\n | -L, --except-lib LIB           | Do not run benchmarks for this library; multi value                              | #{}                                          |\n | -f, --only-function FUNCTION   | Function or database part to measure; multi value                                | #{}                                          |\n | -F, --except-function FUNCTION | Function or database part not to measure; multi value                            | #{}                                          |\n | -i, --iterations ITERATIONS    | Number of iterations of function measurements (ignored for criterium)            | {:connection 50, :transaction 10, :query 10} |\n | -x, --db-datom-count RANGE     | Range of numbers of datoms in database for which benchmarks should be run. Used in 'connection' and 'transaction'.   | :function-specific |\n | -y, --tx-datom-count RANGE     | Range of numbers of datoms in transaction for which benchmarks should be run. Used in 'transaction'.                 | :function-specific |\n | -z, --entity-count RANGE       | Range of numbers of entities in database for which benchmarks should be run. Used in 'random-query' and 'set-query'. | :function-specific |\n | -w, --ref-attr-count RANGE     | Range of numbers of attributes in entity for which benchmarks should be run. Used in 'random-query'.                 | :function-specific |\n | -h, --help                     |                                                                                   |                                             |\n\nThe indication 'multi value' indicated that this argument can be used multiple times. The values will be aggregated into a set. \n\nRANGE must be given as triple of integers 'start stop step' which are given as input for range function.\nExample: \n``` bash\nclj -M:run --db-datom-count \"0 101 25\" # (range 0 101 25) -\u003e [0 25 50 75 100]\n```\n\nITERATIONS must be given as string of space-separated integers of \n  1. connection \n  2. transaction and \n  3. query measurements\n  \nExample: \n``` bash\nclj -M:run  --iterations \"1 50 10\" #  {:iterations {:connection 1, :transaction 50, :query 10}}\n```\n\nFUNCTION can be one of: \n- connection \n- transaction \n- random-query\n\nLIB can be one of: \n- datahike \n- datalevin\n- datascript\n- hitchhiker\n\nDBNAME can be one of:\n | id          | description                                          |\n |-------------|------------------------------------------------------|\n | dh-mem-hht  | datahike in-memory with hitchhiker-tree index        | \n | dh-mem-set  | datahike in-memory with persistent set index         |\n | dh-file     | datahike with file backend and hitchhiker-tree index |\n | dh-psql     | datahike with Postgres and hitchhiker-tree index     |\n | dh-mysql    | datahike with MySQL and hitchhiker-tree index        |\n | dh-h2       | datahike with H2 in-memory and hitchhiker-tree index |\n | dh-level    | datahike with LevelDB and hitchhiker-tree index      |\n | hht-dat     | hitchhiker-tree directly using raw values            |\n | hht-val     | hitchhiker-tree directly using datoms                |\n | datascript  | datascript                                           |\n | datalevin   | datalevin                                            |\n\nYou can see the results as csv files in `./data` and as charts in `./plots`.\n\n## Starting Full Run With Report creation\n\nFor full run renew volumes to get rid of old data, recreate the images and run docker-compose:\n\n``` bash\ncd bin\n./recreate-docker-volumes.sh\ndocker-compose up --build --force-recreate\n```\n\n## Reproducing Errors\n\nIf an error occurs for a parameter configuration, on default the computations will not be stopped but skip the troubling configuration and create an error log.\nThe part of the computation where the error occurred can then be run again by using the exact options from the error log.\nMost importantly, the seed has to be set like stated in the error log. Then, the data for testing will be exactly the same as in the faulty run so that you should be able to reproduce the error.\n\n\n## Measuring Restrictions\n\nconnection\n- The database *dat-mem* cannot be measured since reconnection after *db/release* is not possible. Therefore, connection on different database sizes cannot be compared.\n\nrandom-query\n- The *hitchhiker* library cannot be measured since queries are not applicable here.\n\n## Commercial support\n\nWe are happy to provide commercial support with\n[lambdaforge](https://lambdaforge.io). If you are interested in a particular\nfeature, please let us know.\n\n## License\n\nCopyright © 2020 Judith Massa\n\nLicensed under Eclipse Public License (see [LICENSE](LICENSE)).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freplikativ%2Fdatahike-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freplikativ%2Fdatahike-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freplikativ%2Fdatahike-benchmark/lists"}