{"id":13629836,"url":"https://github.com/utsaslab/pebblesdb","last_synced_at":"2025-04-17T09:36:44.425Z","repository":{"id":41094992,"uuid":"102760387","full_name":"utsaslab/pebblesdb","owner":"utsaslab","description":"The PebblesDB write-optimized key-value store (SOSP 17)","archived":false,"fork":false,"pushed_at":"2024-04-16T05:35:07.000Z","size":874,"stargazers_count":501,"open_issues_count":13,"forks_count":99,"subscribers_count":31,"default_branch":"master","last_synced_at":"2024-08-01T22:44:38.454Z","etag":null,"topics":["flsm","key-value-store","leveldb","sosp17"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/utsaslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-07T16:26:34.000Z","updated_at":"2024-07-26T02:57:53.000Z","dependencies_parsed_at":"2024-08-01T22:52:07.606Z","dependency_job_id":null,"html_url":"https://github.com/utsaslab/pebblesdb","commit_stats":{"total_commits":69,"total_committers":8,"mean_commits":8.625,"dds":0.6231884057971014,"last_synced_commit":"703bd01bba47c586fc3fd07273452528826cd38e"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utsaslab%2Fpebblesdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utsaslab%2Fpebblesdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utsaslab%2Fpebblesdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/utsaslab%2Fpebblesdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/utsaslab","download_url":"https://codeload.github.com/utsaslab/pebblesdb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223751571,"owners_count":17196664,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flsm","key-value-store","leveldb","sosp17"],"created_at":"2024-08-01T22:01:21.325Z","updated_at":"2024-11-08T20:32:14.648Z","avatar_url":"https://github.com/utsaslab.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"## PebblesDB\n\n[![Build Status](https://travis-ci.org/utsaslab/pebblesdb.svg?branch=master)](https://travis-ci.org/utsaslab/pebblesdb)\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\n[PebblesDB](https://github.com/utsaslab/pebblesdb) is a write-optimized key-value store which is built using\nthe novel FLSM (Fragmented Log-Structured Merge Tree) data\nstructure. FLSM is a modification of the standard log-structured merge tree data structure which\naims at achieving higher write throughput and lower write\namplification without compromising on read throughput.\n\nPebblesDB is built by modifying\n[HyperLevelDB](https://github.com/rescrv/HyperLevelDB) which, in turn,\nis built on top of\n[LevelDB](https://github.com/google/leveldb). PebblesDB is API\ncompatible with HyperLevelDB and LevelDB. Thus, PebblesDB is a\n*drop-in* replacement for LevelDB and HyperLevelDB. The source code is available on [Github](https://github.com/utsaslab/pebblesdb). The full paper on\nPebblesDB can be found\n[here](http://www.cs.utexas.edu/~vijay/papers/sosp17-pebblesdb.pdf\n\"PebblesDB SOSP'17\"). The slides for the SOSP 17 talk, which explains the core ideas behind PebblesDB, can be found [here](http://www.cs.utexas.edu/~vijay/papers/pebblesdb-sosp17-slides.pdf). \n\nIf you are using LevelDB in your deployment, do consider trying out\nPebblesDB! PebblesDB can also be used to replace RocksDB as long as\nthe RocksDB-specific functionality like column families are not used.\n\nPlease\n[cite](http://www.cs.utexas.edu/~vijay/bibtex/sosp17-pebblesdb.bib)\nthe following paper if you use PebblesDB: [PebblesDB: Building\nKey-Value Stores using Fragmented Log-Structured Merge\nTrees](http://www.cs.utexas.edu/~vijay/papers/sosp17-pebblesdb.pdf). Pandian\nRaju, Rohan Kadekodi, Vijay Chidambaram, Ittai Abraham. [SOSP\n17](https://www.sigops.org/sosp/sosp17/). [Bibtex](http://www.cs.utexas.edu/~vijay/bibtex/sosp17-pebblesdb.bib)\n\nThe [benchmarks\npage](https://github.com/utsaslab/pebblesdb/blob/master/benchmark.md)\nhas a list of experiments evaluating PebblesDB vs LevelDB,\nHyperLevelDB, and RocksDB. The summary is that PebblesDB outperforms\nthe other stores on write throughput, equals other stores on read\nthroughput, and incurs a penalty for small range queries on fully\ncompacted key-value stores. PebblesDB achieves **6x** the write throughput of RocksDB, while providing similar read throughput, and performing 50% lesser IO. Please see the paper for more details.\n\nIf you would like to run MongoDB with PebblesDB as the storage engine, please check out [mongo-pebbles](https://github.com/utsaslab/mongo-pebbles), a modification of the mongo-rocks layer between RocksDB and MongoDB. \n___\n\n### Dependencies\n\nPebblesDB requires `libsnappy` and `libtool`. To install on Linux, please use\n`sudo apt-get install libsnappy-dev libtool`. For MacOSX, use `brew install snappy` and instead of `ldconfig`, use `update_dyld_shared_cache`.\n\nPebblesDB was built, compiled, and tested with g++-4.7, g++-4.9, and g++-5. It may not work with other versions of g++ and other C++ compilers. \n\n### Installation\n\nUsing Autotools:\n\n```\n$ cd pebblesdb/src\n$ autoreconf -i\n$ ./configure\n$ make\n$ make install\n$ ldconfig\n```\n\nUsing CMake:\n\n```shell\n$ mkdir -p build \u0026\u0026 cd build\n$ cmake .. \u0026\u0026 make install -j16\n```\n\n___\n\n### Running microbenchmark\n1. `cd pebblesdb/src/`\n2. `make db_bench`  (this only works if you are compiling using autotools, and have done `autoreconf` and `configure` before this step)\n3. `./db_bench --benchmarks=\u003clist-of-benchmarks\u003e --num=\u003cnumber-of-keys\u003e --value_size=\u003csize-of-value-in-bytes\u003e --reads=\u003cnumber-of-reads\u003e --db=\u003cdatabase-directory-path\u003e`  \nA complete set of parameters can be found in `db/db_bench.cc`  \n\nSample usage:  \n`./db_bench --benchmarks=fillrandom,readrandom --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000`\n\n\nUse `filter` benchmark property to print the filter policy statistics like memory usage.\n\n`./db_bench --benchmarks=fillrandom,readrandom,filter --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000`\n\n```\n    fillrandom   :     110.460 micros/op;    9.0 MB/s\n    readrandom   :       4.120 micros/op; (5000 of 10000 found)\n\n    Filter in-memory size: 0.024 MB\n    Count of filters: 1928\n```\n\n___\n\n### Optimizations in PebblesDB\n\nPebblesDB uses the FLSM data structure to logically arrange the sstables\non disk. FLSM helps in achieving high write throughput by reducing\nwrite amplification. But in FLSM, each guard can contain multiple\noverlapping sstables. Hence a read or seek over the database requires\nexamining one guard (multiple sstables) per level, thereby increasing\nthe read/seek latency. PebblesDB employs some optimizations to tackle\nthese challenges as follows:\n\n#### Read optimization\n\n* PebblesDB makes use of sstable-level bloom filter instead of block\n  level bloom filter used in HyperLevelDB or LevelDB. With this\n  optimization, even though a guard can contain multiple sstables,\n  PebblesDB effectively reads only one sstable from disk per level.\n\n* By default, this optimization is turned on, but this can be disabled\n  by commenting the macro `#define FILE_LEVEL_FILTER` in\n  `db/version_set.h`. Remember to do `make db_bench` after making a\n  change.\n\n#### Seek optimization\n\nSstable-level bloom filter can't be used to reduce the disk read for\n`seek` operation since `seek` has to examine all files within a guard\neven if a file doesn't contain the key. To tackle this challenge,\nPebblesDB does two optimizations:\n\n1. **Parallel seeks:** PebblesDB employs multiple threads to do\n`seek()` operation on multiple files within a guard. Note that this\noptimization might only be helpful when the size of the data set is\nmuch larger than the RAM size because otherwise the overhead of thread\nsynchronization conceals the benefits obtained by using multiple\nthreads.  By default, this optimization is disabled. This can be\nenabled by uncommenting `#define SEEK_PARALLEL` in `db/version_set.h`.\n\n2. **Forced compaction:** When the workload is seek-heavy, PebblesDB\ncan be configured to do a seek-based forced compaction which aims to\nreduce the number of files within a guard. This can lead to an\nincrease in write IO, but this is a trade-off between write IO and\nseek throughput.  By default, this optimization is enabled. This can\nbe disabled by uncommenting `#define DISABLE_SEEK_BASED_COMPACTION` in\n`db/version_set.h`.\n\n___\n\n### Tuning PebblesDB\n\n* The amount of overhead PebblesDB has for read/seek workloads as well\n  as the amount of gain it has for write workloads depends on a single\n  parameter: `kMaxFilesPerGuardSentinel`, which determines the maximum\n  number of sstables that can be present within a single guard.\n\n* This parameter can be set in `db/dbformat.h` (default value:\n  2). Setting this parameter high will favor write throughput while\n  setting it lower will favor read/seek throughputs.\n\n---\n### Running YCSB Benchmarks\n\nThe Java Native Interface wrapper to PebblesDB is available [here](https://github.com/utsaslab/leveldbjni).\nPlease follow the instructions specified under *Running YCSB Workloads with PebblesDB* section for running the YCSB benchmarks.\n\nThe YCSB bindings for PebblesDB can be found [here](https://github.com/utsaslab/YCSB/tree/master/pebblesdb).\n\n---\n### Improvements made after the SOSP paper\n\nThe following improvements are made to the codebase after the SOSP paper:\n\n- Add CMake build system support (Zeyuan Hu @xxks-kkk)\n- Add JNI Wrapper and support for running YCSB benchmarks (Abhijith Nair @abhijith97)\n- Accounting for memory used by bloom filters (Karuna Grewal @aakp10)\n\n\n---\n### Contact\n\nPlease contact us at `vijay@cs.utexas.edu` with any questions.  Drop\nus a note if you are using or plan to use PebblesDB in your company or\nuniversity.\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futsaslab%2Fpebblesdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Futsaslab%2Fpebblesdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futsaslab%2Fpebblesdb/lists"}