Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/cockroachdb/pebble

RocksDB/LevelDB inspired key-value database in Go
https://github.com/cockroachdb/pebble

Last synced: about 1 month ago
JSON representation

RocksDB/LevelDB inspired key-value database in Go

Lists

README

        

# Pebble [![Build Status](https://github.com/cockroachdb/pebble/actions/workflows/ci.yaml/badge.svg?branch=master)](https://github.com/cockroachdb/pebble/actions/workflows/ci.yaml) [![GoDoc](https://godoc.org/github.com/cockroachdb/pebble?status.svg)](https://godoc.org/github.com/cockroachdb/pebble) [Coverage](https://storage.googleapis.com/crl-codecover-public/pebble/index.html)

#### [Nightly benchmarks](https://cockroachdb.github.io/pebble/)

Pebble is a LevelDB/RocksDB inspired key-value store focused on
performance and internal usage by CockroachDB. Pebble inherits the
RocksDB file formats and a few extensions such as range deletion
tombstones, table-level bloom filters, and updates to the MANIFEST
format.

Pebble intentionally does not aspire to include every feature in RocksDB and
specifically targets the use case and feature set needed by CockroachDB:

* Block-based tables
* Checkpoints
* Indexed batches
* Iterator options (lower/upper bound, table filter)
* Level-based compaction
* Manual compaction
* Merge operator
* Prefix bloom filters
* Prefix iteration
* Range deletion tombstones
* Reverse iteration
* SSTable ingestion
* Single delete
* Snapshots
* Table-level bloom filters

RocksDB has a large number of features that are not implemented in
Pebble:

* Backups
* Column families
* Delete files in range
* FIFO compaction style
* Forward iterator / tailing iterator
* Hash table format
* Memtable bloom filter
* Persistent cache
* Pin iterator key / value
* Plain table format
* SSTable ingest-behind
* Sub-compactions
* Transactions
* Universal compaction style

***WARNING***: Pebble may silently corrupt data or behave incorrectly if
used with a RocksDB database that uses a feature Pebble doesn't
support. Caveat emptor!

## Production Ready

Pebble was introduced as an alternative storage engine to RocksDB in
CockroachDB v20.1 (released May 2020) and was used in production
successfully at that time. Pebble was made the default storage engine
in CockroachDB v20.2 (released Nov 2020). Pebble is being used in
production by users of CockroachDB at scale and is considered stable
and production ready.

## Advantages

Pebble offers several improvements over RocksDB:

* Faster reverse iteration via backwards links in the memtable's
skiplist.
* Faster commit pipeline that achieves better concurrency.
* Seamless merged iteration of indexed batches. The mutations in the
batch conceptually occupy another memtable level.
* L0 sublevels and flush splitting for concurrent compactions out of L0 and
reduced read-amplification during heavy write load.
* Faster LSM edits in LSMs with large numbers of sstables through use of a
copy-on-write B-tree to hold file metadata.
* Delete-only compactions that drop whole sstables that fall within the bounds
of a range deletion.
* Block-property collectors and filters that enable iterators to skip tables,
index blocks and data blocks that are irrelevant, according to user-defined
properties over key-value pairs.
* Range keys API, allowing KV pairs defined over a range of keyspace with
user-defined semantics and interleaved during iteration.
* Smaller, more approachable code base.

See the [Pebble vs RocksDB: Implementation
Differences](docs/rocksdb.md) doc for more details on implementation
differences.

## RocksDB Compatibility

Pebble strives for forward compatibility with RocksDB 6.2.1 (the latest version
of RocksDB used by CockroachDB). Forward compatibility means that a DB generated
by RocksDB 6.2.1 can be upgraded for use by Pebble. Pebble versions in the `v1`
series may open DBs generated by RocksDB 6.2.1. Since its introduction, Pebble
has adopted various backwards-incompatible format changes that are gated behind
new 'format major versions'. The Pebble `master` branch does not support opening
DBs generated by RocksDB. DBs generated by RocksDB may only be used with recent
versions of Pebble after migrating them through format major version upgrades
using previous versions of Pebble. See the below section of format major
versions.

Even the RocksDB-compatible versions of Pebble only provide compatibility with
the subset of functionality and configuration used by CockroachDB. The scope of
RocksDB functionality and configuration is too large to adequately test and
document all the incompatibilities. The list below contains known
incompatibilities.

* Pebble's use of WAL recycling is only compatible with RocksDB's
`kTolerateCorruptedTailRecords` WAL recovery mode. Older versions of
RocksDB would automatically map incompatible WAL recovery modes to
`kTolerateCorruptedTailRecords`. New versions of RocksDB will
disable WAL recycling.
* Column families. Pebble does not support column families, nor does
it attempt to detect their usage when opening a DB that may contain
them.
* Hash table format. Pebble does not support the hash table sstable
format.
* Plain table format. Pebble does not support the plain table sstable
format.
* SSTable format version 3 and 4. Pebble does not support version 3
and version 4 format sstables. The sstable format version is
controlled by the `BlockBasedTableOptions::format_version` option.
See [#97](https://github.com/cockroachdb/pebble/issues/97).

## Format major versions

Over time Pebble has introduced new physical file formats. Backwards
incompatible changes are made through the introduction of 'format major
versions'. By default, when Pebble opens a database, it defaults to the lowest
supported version. In `v1`, this is `FormatMostCompatible`, which is
bi-directionally compatible with RocksDB 6.2.1 (with the caveats described
above).

Databases created by RocksDB or Pebble versions `v1` and earlier must be upgraded
to a compatible format major version before running newer Pebble versions. Newer
Pebble versions will refuse to open databases in no longer supported formats.

To opt into new formats, a user may set `FormatMajorVersion` on the
[`Options`](https://pkg.go.dev/github.com/cockroachdb/pebble#Options)
supplied to
[`Open`](https://pkg.go.dev/github.com/cockroachdb/pebble#Open), or
upgrade the format major version at runtime using
[`DB.RatchetFormatMajorVersion`](https://pkg.go.dev/github.com/cockroachdb/pebble#DB.RatchetFormatMajorVersion).
Format major version upgrades are permanent; There is no option to
return to an earlier format.

The table below outlines the history of format major versions, along with what
range of Pebble versions support that format.

| Name | Value | Migration | Pebble support |
|------------------------------------|-------|------------|----------------|
| FormatMostCompatible | 1 | No | v1 |
| FormatVersioned | 3 | No | v1 |
| FormatSetWithDelete | 4 | No | v1 |
| FormatBlockPropertyCollector | 5 | No | v1 |
| FormatSplitUserKeysMarked | 6 | Background | v1 |
| FormatSplitUserKeysMarkedCompacted | 7 | Blocking | v1 |
| FormatRangeKeys | 8 | No | v1 |
| FormatMinTableFormatPebblev1 | 9 | No | v1 |
| FormatPrePebblev1Marked | 10 | Background | v1 |
| FormatSSTableValueBlocks | 12 | No | v1 |
| FormatFlushableIngest | 13 | No | v1, master |
| FormatPrePebblev1MarkedCompacted | 14 | Blocking | v1, master |
| FormatDeleteSizedAndObsolete | 15 | No | v1, master |
| FormatVirtualSSTables | 16 | No | v1, master |

Upgrading to a format major version with 'Background' in the migration
column may trigger background activity to rewrite physical file
formats, typically through compactions. Upgrading to a format major
version with 'Blocking' in the migration column will block until a
migration is complete. The database may continue to serve reads and
writes if upgrading a live database through
`RatchetFormatMajorVersion`, but the method call will not return until
the migration is complete.

For reference, the table below lists the range of supported Pebble format major
versions for CockroachDB releases.

| CockroachDB release | Earliest supported | Latest supported |
|---------------------|------------------------------------|---------------------------|
| 20.1 through 21.1 | FormatMostCompatible | FormatMostCompatible |
| 21.2 | FormatMostCompatible | FormatSetWithDelete |
| 21.2 | FormatMostCompatible | FormatSetWithDelete |
| 22.1 | FormatMostCompatible | FormatSplitUserKeysMarked |
| 22.2 | FormatMostCompatible | FormatPrePebblev1Marked |
| 23.1 | FormatSplitUserKeysMarkedCompacted | FormatFlushableIngest |
| 23.2 | FormatSplitUserKeysMarkedCompacted | FormatVirtualSSTables |

## Pedigree

Pebble is based on the incomplete Go version of LevelDB:

https://github.com/golang/leveldb

The Go version of LevelDB is based on the C++ original:

https://github.com/google/leveldb

Optimizations and inspiration were drawn from RocksDB:

https://github.com/facebook/rocksdb

## Getting Started

### Example Code

```go
package main

import (
"fmt"
"log"

"github.com/cockroachdb/pebble"
)

func main() {
db, err := pebble.Open("demo", &pebble.Options{})
if err != nil {
log.Fatal(err)
}
key := []byte("hello")
if err := db.Set(key, []byte("world"), pebble.Sync); err != nil {
log.Fatal(err)
}
value, closer, err := db.Get(key)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s %s\n", key, value)
if err := closer.Close(); err != nil {
log.Fatal(err)
}
if err := db.Close(); err != nil {
log.Fatal(err)
}
}
```