{"id":28386003,"url":"https://github.com/unum-cloud/ucsb","last_synced_at":"2025-06-26T12:30:48.381Z","repository":{"id":39602613,"uuid":"463431083","full_name":"unum-cloud/ucsb","owner":"unum-cloud","description":"Wide NoSQL benchmark for RocksDB, LevelDB, Redis, WiredTiger and MongoDB extending the Yahoo Cloud Serving Benchmark","archived":false,"fork":false,"pushed_at":"2023-09-08T19:22:56.000Z","size":1769,"stargazers_count":56,"open_issues_count":3,"forks_count":6,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-30T15:35:13.992Z","etag":null,"topics":["benchmark","database","ebpf","google-benchmark","io-uring","leveldb","lsm-tree","mongodb","rocksdb","spdk","terabyte","valgrind","wiredtiger","ycsb"],"latest_commit_sha":null,"homepage":"https://unum-cloud.github.io/ucsb/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unum-cloud.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-02-25T07:00:36.000Z","updated_at":"2025-05-20T12:08:14.000Z","dependencies_parsed_at":"2024-01-12T00:27:05.157Z","dependency_job_id":"b2931e56-98c4-422f-8492-e68c5ef28f43","html_url":"https://github.com/unum-cloud/ucsb","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/unum-cloud/ucsb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fucsb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fucsb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fucsb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fucsb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unum-cloud","download_url":"https://codeload.github.com/unum-cloud/ucsb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fucsb/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262067664,"owners_count":23253646,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","database","ebpf","google-benchmark","io-uring","leveldb","lsm-tree","mongodb","rocksdb","spdk","terabyte","valgrind","wiredtiger","ycsb"],"created_at":"2025-05-30T12:38:17.395Z","updated_at":"2025-06-26T12:30:48.375Z","avatar_url":"https://github.com/unum-cloud.png","language":"C++","funding_links":[],"categories":["Benchmarking"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eUnbranded Cloud Serving Benchmark\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e\nYahoo Cloud Serving Benchmark for NoSQL Databases\u003cbr/\u003e\nRefactored and Extended with Batch and Range Queries\u003cbr/\u003e\n\u003c/h3\u003e\n\u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://discord.gg/AxsU9mctAn\"\u003e\u003cimg height=\"25\" src=\"https://github.com/unum-cloud/ustore/raw/main/assets/icons/discord.svg\" alt=\"Discord\"\u003e\u003c/a\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003ca href=\"https://www.linkedin.com/company/unum-cloud/\"\u003e\u003cimg height=\"25\" src=\"https://github.com/unum-cloud/ustore/raw/main/assets/icons/linkedin.svg\" alt=\"LinkedIn\"\u003e\u003c/a\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003ca href=\"https://twitter.com/unum_cloud\"\u003e\u003cimg height=\"25\" src=\"https://github.com/unum-cloud/ustore/raw/main/assets/icons/twitter.svg\" alt=\"Twitter\"\u003e\u003c/a\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003ca href=\"https://unum.cloud/post\"\u003e\u003cimg height=\"25\" src=\"https://github.com/unum-cloud/ustore/raw/main/assets/icons/blog.svg\" alt=\"Blog\"\u003e\u003c/a\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003ca href=\"https://github.com/unum-cloud/ucset\"\u003e\u003cimg height=\"25\" src=\"https://github.com/unum-cloud/ustore/raw/main/assets/icons/github.svg\" alt=\"GitHub\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nUnum Cloud Serving Benchmark is the grandchild of Yahoo Cloud Serving Benchmark, reimplemented in C++, with fewer mutexes or other bottlenecks, and with additional \"batch\" and \"range\" workloads, crafted specifically for the Big Data age!\n\n|                         | Present in YCSB | Present in UCSB |\n| :---------------------- | :-------------: | :-------------: |\n| Size of the dataset     |        ✅        |        ✅        |\n| DB configuration files  |        ✅        |        ✅        |\n| Workload specifications |        ✅        |        ✅        |\n| Tracking hardware usage |        ❌        |        ✅        |\n| Workload Isolation      |        ❌        |        ✅        |\n| Concurrency             |        ❌        |        ✅        |\n| Batch Operations        |        ❌        |        ✅        |\n| Bulk Operations         |        ❌        |        ✅        |\n| Support of Transactions |        ❌        |        ✅        |\n\nAs you may know, benchmarking databases is very complex.\nThere is too much control flow to tune, so instead of learning the names of a thousand CLI arguments, you'd use a [`run.py`](https://github.com/unum-cloud/UCSB/blob/main/run.py) script to launch the benchmarks.\nThe outputs will be placed in the `bench/results/` folder.\n\n```sh\ngit clone https://github.com/unum-cloud/ucsb.git \u0026\u0026 cd ucsb \u0026\u0026 ./run.py\n```\n\n## Supported Databases\n\nKey-Value Stores and NoSQL databases differ in supported operations.\nIncluding the ones queried by UCSB, like \"batch\" operations.\nWhen batches aren't natively supported, we simulate them with multiple single-entry operations.\n\n|                        | Bulk Scan | Batch Read | Batch Write | Integer Keys |\n| :--------------------- | :-------: | :--------: | :---------: | :----------: |\n|                        |           |            |             |              |\n| 💾 Embedded Databases   |           |            |             |              |\n| WiredTiger             |     ✅     |     ❌      |      ❌      |      ✅       |\n| LevelDB                |     ✅     |     ❌      |      ✅      |      ❌       |\n| RocksDB                |     ✅     |     ✅      |      ✅      |      ❓       |\n| LMDB                   |     ✅     |     ❌      |      ❌      |      ❌       |\n| UDisk                  |     ✅     |     ✅      |      ✅      |      ✅       |\n|                        |           |            |             |              |\n| 🖥️ Standalone Databases |           |            |             |              |\n| Redis                  |     ❌     |     ✅      |      ✅      |      ❌       |\n| MongoDB                |     ✅     |     ✅      |      ✅      |      ✅       |\n\nThere is also asymmetry elsewhere:\n\n* WiredTiger supports fixed-size integer keys.\n* LevelDB only supports variable length keys and values.\n* RocksDB has minimal support for [`fixed_key_len`](https://cs.github.com/facebook/rocksdb?q=fixed_key_len), incompatible with `BlockBasedTable`.\n* UDisk supports both fixed-size keys and values.\n\nJust like YCSB, we use 8-byte integer keys and 1000-byte values.\nBoth WiredTiger and UDisk were configured to use integer keys natively.\nRocksDB wrapper reverts the order of bytes in keys to use the native comparator.\nNone of the DBs was set to use fixed-size values, as only UDisk supports that.\n\n---\n\nRecent results:\n\n* 1 TB collections. Mar 22, 2022. [post](https://unum.cloud/post/2022-03-22-ucsb)\n* 10 TB collections. Sep 13, 2022. [post](https://unum.cloud/post/2022-09-13-ucsb-10tb/)\n\n---\n\n- [Supported Databases](#supported-databases)\n- [Yet Another Benchmark?](#yet-another-benchmark)\n- [Preset Workloads](#preset-workloads)\n- [Ways to Spoil a DBMS Benchmark](#ways-to-spoil-a-dbms-benchmark)\n  - [Durability vs Write Speed](#durability-vs-write-speed)\n  - [Strict vs Flexible RAM Limits](#strict-vs-flexible-ram-limits)\n  - [Dataset Size and NAND Modes](#dataset-size-and-nand-modes)\n  - [Slow Benchmarks for Fast Code](#slow-benchmarks-for-fast-code)\n  - [Incomplete Measurements](#incomplete-measurements)\n\n---\n\n## Yet Another Benchmark?\n\nYes.\nIn the DBMS world there are just 2 major benchmarks:\n\n* [YCSB](https://github.com/brianfrankcooper/YCSB) for NoSQL.\n* [TPC](https://www.tpc.org/) for SQL.\n\nWith YCSB everything seems simple - clone the repo, pick a DBMS, run the benchmark.\nTPC suite seems more \"enterprisey\", and after a few years in the industry, I still don't understand the procedure.\nMoreover, most SQL databases these days are built on top of other NoSQL solutions, so NoSQL is more foundational.\nSo naturally we used YCSB internally.\n\nWe were getting great numbers.\nAll was fine until it wasn't.\nWe looked under the hood and realized that the benchmark code itself was less efficient than the databases it was trying to evaluate, causing additional bottlenecks and affecting the measurements.\nSo just like others, we decided to port it to C++, refactor it, and share with the world.\n\n## Preset Workloads\n\n* **∅**: imports monotonically increasing keys 🔄\n* **A**: 50% reads + 50% updates, all random\n* **C**: reads, all random\n* **D**: 95% reads + 5% inserts, all random\n* **E**: range scan 🔄\n* **✗**: batch read 🆕\n* **Y**: batch insert 🆕\n* **Z**: scans 🆕\n\nThe **∅** was previously implemented as one-by-one inserts, but some KVS support the external construction of its internal representation files.\nThe **E** was [previously](https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloade) mixed with 5% insertions.\n\n## Ways to Spoil a DBMS Benchmark\n\n\u003e Unlike humans, [ACID](https://en.wikipedia.org/wiki/ACID) is one of the best things that can happen to DBMS 😁\n\n### Durability vs Write Speed\n\nLike all good things, ACID is unreachable, because of at least one property - Durability.\nAbsolute Durability is practically impossible and high Durability is expensive.\n\nAll high-performance DBs are designed as [Log Structured Merge Trees](https://en.wikipedia.org/wiki/Log-structured_merge-tree).\nIt's a design that essentially bans in-place file overwrites.\nInstead, it builds layers of immutable files arranged in a Tree-like order.\nThe problem is that until you have enough content to populate an entire top-level file, you keep data in RAM - in structures often called `MemTable`s.\n\n![LSM Tree](assets/lsm-tree.png)\n\nIf the lights go off, volatile memory will be discarded.\nSo a copy of every incoming write is generally appended to a Write-Ahead-Log (WAL).\nTwo problems  here:\n\n1. You can't have a full write confirmation before appending to WAL. It's still a write to disk. A system call. A context switch to kernel space. Want to avoid it with [`io_uring`](https://unixism.net/loti/what_is_io_uring.html) or [`SPDK`](https://spdk.io), then be ready to change all the above logic to work in an async manner, but fast enough not to create a new bottleneck.  Hint: [`std::async`](https://en.cppreference.com/w/cpp/thread/async) will not cut it.\n2. WAL is functionally stepping on the toes of a higher-level logic. Every wrapping DBMS, generally implements such mechanisms, so they disable WAL in KVS, to avoid extra stalls and replication. Example: [Yugabyte is a port](https://blog.yugabyte.com/how-we-built-a-high-performance-document-store-on-rocksdb/) of Postgres to RocksDB and disables the embedded WAL.\n\nWe generally disable WAL and benchmark the core.\nStill, you can tweak all of that in the UCSB configuration files yourself.\n\nFurthermore, as widely discussed, [flushing the data still may not guarantee it's preservation on your SSD](https://twitter.com/xenadu02/status/1495693475584557056?s=20\u0026t=eG2cIbzMg_rTq379EkkHMQ).\nSo pick you ~~poison~~ hardware wisely and tune your benchmarks cautiously.\n\n### Strict vs Flexible RAM Limits\n\nWhen users specify a RAM limit for a KVS, they expect all of the required in-memory state to fit into that many bytes.\nIt would be too obvious for modern software, so here is one more problem.\n\nFast I/O is hard.\nThe faster you want it, the more abstractions you will need to replace.\n\n```mermaid\ngraph LR\n    Application --\u003e|libc| LIBC[Userspace Buffers]\n    Application --\u003e|mmap| PC[Page Cache]\n    Application --\u003e|mmap+O_DIRECT| BL[Block I/O Layer]\n    Application --\u003e|SPDK| DL[Device Layer]\n\n    LIBC --\u003e PC\n    PC --\u003e BL\n    BL --\u003e DL\n```\n\nGenerally, OS keeps copies of the requested pages in RAM cache.\nTo avoid it, enable [`O_DIRECT`](https://man7.org/linux/man-pages/man2/open.2.html).\nIt will slow down the app and would require some more engineering.\nFor one, all the disk I/O will have to be aligned to page sizes, [generally 4KB](https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/advanced-topics/i-o-alignment-considerations), which includes both the address in the file and the address in the userspace buffers.\nSplit-loads should also be managed with an extra code on your side.\nSo most KVS (except for UDisk, of course 😂) solutions don't bother implementing very fast I/O, like `SPDK`.\nIn that case, they can't even know how much RAM the underlying OS has reserved for them.\nSo we have to configure them carefully and, ideally, add external constraints:\n\n```sh\nsystemd-run --scope -p MemoryLimit=100M /path/ucsb\n```\n\nNow a question.\nLet's say you want to [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html) files and be done.\nAnyways, Linux can do a far better job at managing caches than most DBs.\nIn that case - the memory usage will always be very high but within the limits of that process.\nAs soon as we near the limit - the OS will drop the old caches.\nIs it better to use the least RAM or the most RAM until the limit?\n\nFor our cloud-first offering, we will favour the second option.\nIt will give the users the most value for their money on single-purpose instances.\n\nFurthermore, we allow and enable \"**Workload Isolation**\" in UCSB by default.\nIt will create a separate process and a separate address space for each workload of each DB.\nBetween this, we flush the whole system.\nThe caches filled during insertions benchmarks, will be invalidated before the reads begin.\nThis will make the numbers more reliable but limits concurrent benchmarks to one.\n\n### Dataset Size and NAND Modes\n\nLarge capacity SSDs store multiple bits per cell.\nIf you are buying a Quad Level Cell SSD, you expect each of them to store 4 bits of relevant information.\nThat may be a false expectation.\n\n![SLC MLC vs TLC](assets/slc-mlc-tlc-shape.jpg)\n\nThe SSD can switch to SLC mode during intensive writes, where IO is faster, especially if a lot of space is available.\nIn the case of an 8 TB SSD, before we reach 2 TB used space, all [NAND](https://en.wikipedia.org/wiki/Flash_memory) arrays can, in theory, be populated with just one relevant bit.\n\n![SLC vs eMLC vs MLC vs TLC](assets/slc-mlc-tlc-specs.png)\n\nIf you are benchmarking the DBMS, not the SSD, ensure that you did all benchmarks within the same mode.\nIn our case for a 1 TB workload on 8 TB drives, it's either:\n\n* starting with an empty drive,\n* starting with an 80% full drive.\n\n### Slow Benchmarks for Fast Code\n\nReturning to the topic of deficiencies in the original implementation, let's linger on the fact that is implemented in Java, while all performant Key-Value Stores are implemented in C and C++.\nThis means, that you would need some form of a “Foreign Function Interface” to interact with the KVS.\nThis immediately adds unnecessary work for our CPU, but it’s a minor problem compared to rest.\n\nEvery language and its ecosystem has different priorities. Java focuses on the simplicity of development, while C++ trades it for higher performance.\n\n```java\nprivate static String getRowKey(String db, String table, String key) {\n    return db + \":\" + table + \":\" + key;\n}\n```\n\nThe above snippet is from the [Apples \u0026 SnowFlakes FoundationDB adapter inside YCSB](https://github.com/brianfrankcooper/YCSB/blob/ce3eb9ce51c84ee9e236998cdd2cefaeb96798a8/foundationdb/src/main/java/site/ycsb/db/foundationdb/FoundationDBClient.java#L100), but it’s identical across the entire repo.\nIt’s responsible for generating keys for queries, so it runs on the hot path.\nHere is what a modern recommended C++ version would look like:\n\n```cpp\nauto get_row_key(std::string_view db, std::string_view table, std::string_view key) {\n    return std::format(\"{}:{}:{}\", db, table, key);\n}\n```\n\nFrom Java 7 onwards, the [Java String Pool](https://www.baeldung.com/java-string-pool) lives in the Heap space, which is garbage collected by the JVM.\nThis code will produce a `StringBuilder`, a heap-allocated array of pointers to heap-allocated strings, later materializing in the final concatenated `String`.\nOf course, on-heap again.\nAnd if we know something about High-Performance Computing, the heap is expensive, but together with Garbage Collection and multithreading, it becomes completely intolerable.\nThe same applies to the C++ version.\nYes, we are doing only 1 allocation there, but it is also too slow to be called HPC.\nWe need to replace `std::format` with `std::format_to` and export the result into a reusable buffer.\n\n---\n\nIf one example is not enough, below is the [code snippet](https://github.com/brianfrankcooper/YCSB/blob/ce3eb9ce51c84ee9e236998cdd2cefaeb96798a8/core/src/main/java/site/ycsb/generator/ZipfianGenerator.java#L250), which produces random integers before packing them into `String` key.\n\n```java\nlong nextLong(long itemcount) {\n    // from \"Quickly Generating Billion-Record Synthetic Databases\", Jim Gray et al, SIGMOD 1994\n    if (itemcount != countforzeta) {\n        synchronized (this) {\n            if (itemcount \u003e countforzeta) {\n                ...\n            else\n                ...\n        }\n    }\n\n    double u = ThreadLocalRandom.current().nextDouble();\n    double uz = u * zetan;\n    if (uz \u003c 1.0)\n        return base;\n    if (uz \u003c 1.0 + Math.pow(0.5, theta))\n        return base + 1;\n\n    long ret = base + (long) ((itemcount) * Math.pow(eta * u - eta + 1, alpha));\n    setLastValue(ret);\n    return ret;\n}\n```\n\nTo generate a `long`, YCSB is doing numerous operations on `double`-s, by far the most computationally expensive numeric type on modern computers (except for integer division).\nAside from that, this Pseudo-Random Generator contains 4x if statements and `synchronized (this)` mutex.\nCreating random integers for most distributions is generally within 50 CPU cycles or 10 nanoseconds.\nIn this implementation, every if branch may cost that much, and the mutex may cost orders of magnitude more.\nIf you are writing a benchmark, don't do that.\n\n### Incomplete Measurements\n\nIf you use Google Benchmark, you know about its [bunch of nifty tricks](/post/2022-03-04-gbench), like `DoNotOptimize` or the automatic resolution of the number of iterations at runtime.\nIt's widespread in micro-benchmarking, but it begs for extensions when you start profiling a DBMS.\nThe ones shipped with UCSB spawn a sibling process that samples usage statistics from the OS.\nLike `valgrind`, we read from `/proc/*` [files](https://man7.org/linux/man-pages/man5/proc.5.html) and aggregate stats like SSD I/O and overall RAM usage.\nThose are better than nothing, but they are far less accurate, than what can be accomplished with eBPF.\nWe have a pending ticket for its implementation.\nDon't wait, contribute 🤗\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funum-cloud%2Fucsb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funum-cloud%2Fucsb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funum-cloud%2Fucsb/lists"}