https://github.com/nukep/protos-db

A set of filesystem-based databases for iterative prototyping for Node.js
https://github.com/nukep/protos-db

Last synced: about 1 month ago
JSON representation

A set of filesystem-based databases for iterative prototyping for Node.js

Host: GitHub
URL: https://github.com/nukep/protos-db
Owner: nukep
Created: 2018-07-27T07:54:05.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2022-07-07T08:41:44.000Z (almost 4 years ago)
Last Synced: 2026-01-02T12:43:41.647Z (6 months ago)
Language: JavaScript
Homepage:
Size: 146 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # NOTE: Extremely unstable, work in progress

# What's this?

This is a filesystem-based database library I quickly wrote to assist with iterative prototyping. This is intended as a holdover for **Node.js** projects that don't yet implement a "real" database.

## Types of databases

There are so far two types of databases:

* Blob database

* Time Record database (table-oriented, append-only, indexed by timestamp)

You should choose the database(s) that are most relevant to your use cases.

## Immutable, addressible data is stored as "Blobs"

Each blob is a gzip-compressed string that's addressable by its SHA1 hash.

Why on earth would you want this?

* a) To avoid persisting the same data twice - save on hard drive space. This is desirable for sparse-yet-large data that you want to normalize

* b) To compress file contents that contain lots of repetition, such as JSON.

Usage:

```js

const { createBlobDb } = require('protos-db')

const blobDb = createBlobDb('./path/to/my-blob-db')

blobDb.persistDataAsBlob("Hello World!").then(hash => {

  console.log(`Persisted Hash: ${hash}`)

})

// => Prints "Persisted Hash: 2ef7bde608ce5404e97d5f042f95f89f1c232871"

// later on in your program...

blobDb.readBlobDataAsString("2ef7bde608ce5404e97d5f042f95f89f1c232871").then(contents => {

  console.log(`Contents: ${contents}`)

})

// => Prints "Contents: Hello World!"

```

## Blob data structure, tree

A blob directory might look something like this:

```

my-blob-db

├── 2a

│   └── ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz

├── 87

│   └── 2e18a933e6c41dc5abe6c29a38b58959c8112b.gz

└── b8

    └── dfb080bc33fb564249e34252bf143d88fc018f.gz

```

The database above contains three blobs:

* `2aae6c35c94fcfb415dbe95f408b9ce91ee846ed`

* `872e18a933e6c41dc5abe6c29a38b58959c8112b`

* `b8dfb080bc33fb564249e34252bf143d88fc018f`

If you read the blob, you get a string:

```

$ zcat my-blob-db/2a/ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz

hello world

$

```

If you rehash the uncompressed string, you get the directory+filename back:

```

$ zcat my-blob-db/2a/ae6c35c94fcfb415dbe95f408b9ce91ee846ed.gz | sha1sum

2aae6c35c94fcfb415dbe95f408b9ce91ee846ed  -

$

```

Notice how the first two characters of the SHA1 are used for the directory names.

This is inspired by Git and CDNs that do the same.

* The first reason is for performance. Many file systems, such as Ext*, scan files in a directory linearly. I.e. they start from the top, and check one-by-one until a match is found.

* The second reason for doing this is to stay within the file system's file limit for directories. On Ext3, this is about 32,000.

* The last reason is for humans. If you're troubleshooting the database, you don't want to `ls` the directory and get spammed with tens of thousands of files up-front. By fanning out to 256 buckets, that 10,000 becomes 39.

## Append-only data is stored as records in tables

All records are indexed by calendar day.

Usage:

```js

const { createRecordDb } = require('protos-db')

// We must additionally supply a function to extract the ISO-8583 timestamp field.

// All records in all tables must have a way to extract a ISO-8583 timestamp, e.g. a field.

const recordDb = createRecordDb('./path/to/my-record-db', record => record.timestamp)

// There's new information about Marge Simpson!

recordDb.appendRecordToTable('marge-simpson-info', {

  timestamp: "2018-07-27T09:56:22Z",

  name: "Marge Simpson",

  address: "123 Fake Street",

  children: 3

}).then(() => {

  console.log(`Appended record`)

})

// later on in your program...

recordDb.readLatestRecord('marge-simpson-info').then(record => {

  console.log(`Record: ${JSON.stringify(record)}`)

})

// => Prints "Record: {"timestamp":"2018-07-27T09:56:22Z","name":"Marge Simpson","address":"123 Fake Street","children":3}"

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nukep/protos-db

Awesome Lists containing this project

README