https://github.com/tdooner/sstables

Experimental implementation of SSTables in Ruby
https://github.com/tdooner/sstables

Last synced: 3 months ago
JSON representation

Experimental implementation of SSTables in Ruby

Host: GitHub
URL: https://github.com/tdooner/sstables
Owner: tdooner
Created: 2016-02-27T08:14:19.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2016-02-27T08:27:30.000Z (over 9 years ago)
Last Synced: 2025-01-15T00:29:10.061Z (5 months ago)
Language: Ruby
Size: 6.84 KB
Stars: 0
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        Playing with SSTables

======================

I'm only [10 years late to the party here][bigtable], but since we are

deploying Cassandra at work and I didn't learn anything about SSTables or

Log-Structured-Merge in data structures class, a little bit of homework is

required.

This repository implements a basic key-value store with an API modelled after

LevelDB:

```ruby

> table = SSTable.new('/path/to/workdir')

> table.set 'foo', 'bar'

> table.get 'foo'

=> 'bar'

> table.flush                # (writes to disk)

> table2 = SSTable.new('/path/to/workdir')

> table2 = table.get 'foo'

=> 'bar'

> table2.delete 'foo'

> table2.get 'foo'

=> nil

```

This implementation has two limitations:

1. all keys & values must be strings

2. keys must not contain null bytes

Implementation Details

-----------------

Inside of the directory given as a parameter to `SSTable.new`, two files are

created:

* **index**

* **table**

The **index** file contains a serialization of a Ruby hash.  The keys are the

keys the user inserted into the SSTable and the values are the byte offset of

that entry in the SSTable.

The **table** file contains a list of entries in the format:

    4-byte int - Length of Key

    4-byte int - Length of Value

    n-byte utf8 - Key

    m-byte utf8 - Value

I chose to use the length headers as a hack for easy iteration over the table

file, although I'm pretty sure it would be possible to iterate over the file

using offsets found in the index file. This would be preferable because it

lessens the storage overhead per-kv-pair, but would either require sorting the

index offsets (to avoid tons of disk seeks when iterating) or sorting the keys

and thus offsets (as LevelDB does, but I don't yet implement).

When get/set/delete operations are performed, they are not immediately written

to disk. Rather, they are applied in-memory to a **memtable** (which is just a

combination of a Ruby Hash and a Set of keys to remove). When the user calls

`SSTable#flush` then the contents of the memtable are merged with the SSTable

on disk.

Testing

-----------------

```bash

gem install rspec

rspec

```

[bigtable]: http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tdooner/sstables

Awesome Lists containing this project

README