{"id":18011307,"url":"https://github.com/johnnyjayjay/persistent-data-structures-benchmark","last_synced_at":"2025-04-04T13:42:41.949Z","repository":{"id":138563240,"uuid":"347599425","full_name":"JohnnyJayJay/persistent-data-structures-benchmark","owner":"JohnnyJayJay","description":"A benchmark of different persistent data structures across JVM libraries and languages","archived":false,"fork":false,"pushed_at":"2021-04-27T09:29:08.000Z","size":271,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-09T23:27:35.109Z","etag":null,"topics":["benchmark","benchmarking","collections","data-structures","hacktoberfest","jvm","jvm-languages","performance"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JohnnyJayJay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-14T09:56:51.000Z","updated_at":"2021-12-23T17:02:17.000Z","dependencies_parsed_at":"2023-04-03T19:33:55.321Z","dependency_job_id":null,"html_url":"https://github.com/JohnnyJayJay/persistent-data-structures-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnnyJayJay%2Fpersistent-data-structures-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnnyJayJay%2Fpersistent-data-structures-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnnyJayJay%2Fpersistent-data-structures-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnnyJayJay%2Fpersistent-data-structures-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JohnnyJayJay","download_url":"https://codeload.github.com/JohnnyJayJay/persistent-data-structures-benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247188870,"owners_count":20898597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","benchmarking","collections","data-structures","hacktoberfest","jvm","jvm-languages","performance"],"created_at":"2024-10-30T03:09:04.437Z","updated_at":"2025-04-04T13:42:41.925Z","avatar_url":"https://github.com/JohnnyJayJay.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# persistent-collections-benchmark\n\nA set of benchmarks comparing the performance of various operations from four different JVM collections frameworks.\n\n[Jump to the results](#Results)\n\n## Goals\n\nThe goal of these benchmarks is to compare the computational performance of **addition**, **removal** and **lookup** on different types of \npersistent collections from three different JVM libraries/languages:\n\n- [clojure.lang](https://clojure.org/reference/data_structures#Collections) (Clojure)\n- [kotlinx.collections.immutable](https://github.com/Kotlin/kotlinx.collections.immutable) (Kotlin)\n- [pcollections](https://github.com/hrldcpr/pcollections) (Java)\n\nAnother goal is to see how persistent collections compare to the mutable\n[java.util Collections framework](https://docs.oracle.com/javase/tutorial/collections/intro/index.html).\n\n## Non-Goals\n### Claiming Authority\n**This benchmark does not aim to determine what collections are ultimately better than others in general.**\\\nIt does not emulate a real-world-scenario, because the operations are each benchmarked individually. \nIn reality, most uses of collections involve addition, removal and lookup all at once and in a much \nless concentrated fashion. Also, the size of the collections is a significant factor when it comes to speed, \nso there may be notable differences in results when focusing more on collections of different sizes.\n\n### Benchmarking Undesirable Operations\nThis benchmark only considers the ways these collections are *supposed to be* used, i.e.,\nit examines the performance of the operations the respective collections were built and optimised for \n(e.g. value-/key-based lookup on sets/maps, random access on vectors/arraylists, \nhead/tail lookup on LinkedLists/queues/stacks).\nThus, there are no benchmarks measuring the performance of random access \non non-random-access collections, for example.\n\n### Benchmarking Memory Efficiency\nMemory use and efficiency are not measured.\n\n### Examining Concurrent Contexts\nThe benchmarks all run on a single thread and do not make a statement about performance in multi-threaded contexts. The reason for this decision are that\n\n1. Writing correct concurrent code is hard, writing meaningful concurrent benchmarks is harder\n2. It is unclear how to compare Java's mutable collections to the persistent ones. Do we use locks for Java and atomic references for the others? \nIf so, does that accurately represent the way we write concurrent code and to which extent is that still comparable?\n\n### Comparing Collections of Different Types\nThe benchmark results are generally *not* suitable to compare different types of collections (e.g. Lists and Sets). The reason for that is that the benchmarks may work differently:\nFor example, In the Set lookup benchmarks, there is the additional overhead of creating random strings to measure lookup of nonexistent elements, while this is not done for lists\nand therefore may result in a significant difference.\\\nThe random string creation method is benchmarked independently if you want to get a vague impression nonetheless.\n\n### Benchmarking all Available Operations\nLastly, the benchmarks do not cover every operation. For instance, Clojure's vectors and sorted maps \nsupport a constant time reversing operation that is not included.\n\n## Structure\n\nBelow you can see how many collections are benchmarked in each category for each library. The last column gives information about how long each benchmark takes.\n\n| Benchmark/Subject Benchmark count | Java | PCollections | Kotlin | Clojure |                                     |\n| --------------------------------- | ---- | ------------ | ------ | ------- | ----------------------------------- |\n| Addition                          | 8    | 5            | 5      | 7       | 5 * 1M ops Warmup, 10 * 1M ops Measurement |\n| Removal                           | 8    | 5            | 5      | 6       | 5 * 1M ops Warmup, 10 * 1M ops Measurement  |\n| Lookup                            | 8    | 5            | 5      | 7       | 5 * 10s Warmup, 5 * 10s Measurement |\n\nHere are all the collections that are benchmarked. Clojure is missing the removal benchmark for its `PersistentVector`, because it doesn't define an index-based removal operation.\n\n| Data Structure/Library Equivalent   | Java            | PCollections   | Kotlin                 | Clojure             |\n| ----------------------------------- | --------------- | -------------- | ---------------------- | ------------------- |\n| Random Access List                  | `ArrayList`     | `TreePVector`  | `PersistentVector`     | `PersistentVector`  |\n| Queue (FIFO)                        | `LinkedList`    | -              | -                      | `PersistentQueue`   |\n| Stack (LIFO)                        | `LinkedList`    | `ConsPStack`   | -                      | `PersistentList`    |\n| Unordered Set                       | `HashSet`       | `HashTreePSet` | `PersistentHashSet`    | `PersistentHashSet` |\n| Unordered Map                       | `HashMap`       | `HashTreePMap` | `PersistentHashMap`    | `PersistentHashMap` |\n| Linked Set (entry order)            | `LinkedHashSet` | -              | `PersistentOrderedSet` | -                   |\n| Linked Map (entry order)            | `LinkedHashMap` | -              | `PersistentOrderedMap` | -                   |\n| Sorted Set                          | `TreeSet`       | -              | -                      | `PersistentTreeSet` |\n| Sorted Map                          | `TreeMap`       | -              | -                      | `PersistentTreeMap` |\n| Bag (unordered, duplicate elements) | -               | `HashTreePBag` | -                      | -                   |\n\n- The addition benchmarks fill an empty collection with random strings each iteration\n\n- The removal benchmarks remove elements from a fresh collection each iteration\n\n- The lookup benchmarks lookup random elements/indices from a collection\n\nEvery single benchmark is forked 3 times to make up for differences in VM configurations, randomness \nand other environmental factors.\n\n## How to run\n\nIf you do want to run everything, do the following:\n\n```\n$ git clone https://github.com/johnnyjayjay/persistent-data-structures-benchmark\n$ cd persistent-data-structures-benchmark\n$ ./gradlew jmh\n```\n\n## Results\nMany thanks to those who provided their computing power to run these benchmarks for my initial evaluation:\n\n- [Kaliber's results](./plot/results_kali.csv) (Ryzen 5 3600X, 16GB DDR4-3200)\n- [PiggyPiglet's results](./plot/results_piglet.csv) (i7-7700K, 32GB DDR4-3200)\n- [BomBardyGamer's results](./plot/results_bardy.csv) (Ryzen 7 3700X, ...)\n\nThe above results mostly match up and show similar patterns. Here is a simplified view of PiggyPiglet's results:\n\n![Results as a graph](./plot/plot_piglet.png)\n\nThis graph only shows the 4 most common data types in all 4 libraries:\n\n- `List` (random access list - included because it is the de facto default data structure for most applications)\n- `Stack` (aka cons, list - included because it is the simplest persistent data structure)\n- `Set` (unordered set)\n- `Map` (unordered map)\n\nLower scores are better, as they indicate execution time.\n\nThere are a couple of additional things to note about this data:\n\n- Kotlin does not provide a persistent stack implementation or similar, thus there is no data in those areas.\n- For the \"List removal\" benchmark, only PCollection results are available, because:\n  - Clojure does not have indexed-based removal on its vectors\n  - Java is too big of an outlier, taking around 20 seconds to remove 1M elements\n  - Kotlin is an even bigger outlier, in this case taking over 200 seconds. In the other results, it did not even terminate without timing out.\n- The black lines on top of the bars show the 99.9% error relative to the measurement result\n- While it is hard to make out, there are indeed removal and lookup results for Stacks. They just happen to be very efficient in those cases, to an extent where the difference between the tested libraries almost becomes insignificant.\n\n## Contributions\nWhether you find an issue in the benchmark code, want to improve the default settings or simply want to submit your own benchmark results: Feel free to open an issue or a pull request! The more people work on benchmarks like this, the more useful their results become.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnyjayjay%2Fpersistent-data-structures-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnnyjayjay%2Fpersistent-data-structures-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnyjayjay%2Fpersistent-data-structures-benchmark/lists"}