Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/internalfx/regrid-spec
The official ReGrid specification for RethinkDB
https://github.com/internalfx/regrid-spec
Last synced: 17 days ago
JSON representation
The official ReGrid specification for RethinkDB
- Host: GitHub
- URL: https://github.com/internalfx/regrid-spec
- Owner: internalfx
- License: isc
- Created: 2016-02-04T20:32:34.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-07-05T18:04:03.000Z (over 8 years ago)
- Last Synced: 2024-10-03T12:20:20.199Z (about 1 month ago)
- Size: 11.7 KB
- Stars: 9
- Watchers: 6
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - internalfx/regrid-spec - The official ReGrid specification for RethinkDB (others)
README
ReGrid Spec
===========This is the official spec for the [ReGrid](https://github.com/internalfx/regrid) Nodejs library.
ReGrid is meant to be an open specification. You are encouraged to implement a compatible driver in your chosen language.
- Authors [**Bryan Morris**](https://github.com/internalfx), [**Brian Chavez**](https://github.com/bchavez)
- Advisors [**Daniel Mewes**](https://github.com/danielmewes)ReGrid is a method of storing large files inside a RethinkDB database.
### Features
- **Reliable** - Files are replicated across the cluster, benefiting from RethinkDB's automatic failover.
- **Scalable** - Easily store large files in RethinkDB, distributed across the cluster.
- **Consistent** - Sha256 hashes are calculated when the file is written, and verified when read back out.### Contents
- [Document Structure](#document-structure)
- [Files Record](#files-record)
- [Chunks Record](#chunks-record)
- [API](#api)
- [Contructor(connectionOptions, bucketOptions)](#new-regridconnectionoptions-bucketoptions)
- [initBucket()](#bucketinitbucket)
- [createWriteStream(filename, options)](#bucketcreatewritestreamfilename-options)
- [createReadStreamById(file_id)](#createreadstreambyidfile_id)
- [createReadStreamByFilename(filename, options)](#createreadstreambyfilenamefilename-options)### Overview
When a file is written to ReGrid, a **files** record is written to a **files table**. Then the file is broken up into **chunks** which are written as separate records in a **chunks table**. Once all the chunks are written, the **files** record is updated to show that the file is `Complete`. The file is now ready for read operations.
### Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
Document Structure
==================#### Files record
```json
{
"id" : "",
"length" : "",
"chunkSizeBytes" : "",
"finishedAt" : "| Key | Description |
|---|---|
| id | a unique ID for this document. |
| length | the length of this stored file, in bytes. |
| chunkSizeBytes | the size, in bytes, of each data chunk of this file. This value is configurable by file. The default is 255KB (1024 * 255). |
| finishedAt | the date and time this file finished writing to ReGrid. The value of this field MUST be the datetime when the upload completed, not the datetime when it was begun. |
| startedAt | the date and time this file started writing to ReGrid. The value of this field MUST be the datetime when the upload started, not the datetime when it was finished. |
| deletedAt | the date and time this files status was set to `Deleted`. The value of this field MUST be the datetime when file was marked `Deleted`. |
| sha256 | SHA256 checksum for this user file, computed from the file’s data, stored as a hex string (lowercase). |
| filename | the name of this stored file; this does not need to be unique. |
| status | Status may be "Complete" or "Incomplete" or "Deleted". |
| metadata | any additional application data the user wishes to store. |#### Chunks record
```json
{
"id": "",
"file_id": "",
"num": "",
"data": ""
}
```| Key | Description |
|---|---|
| id | a unique ID for this document. |
| file_id | the id for this file (the id from the files table document). |
| num | the index number of this chunk, zero-based |
| data | a chunk of data from the user file |API
===**Note:** Code examples are offered to give a sense of the API design of ReGrid. Adapt to your chosen language as necessary.
### `new ReGrid(connectionOptions, bucketOptions)`
ReGrid drivers MUST provide a constructor to return a new `Bucket` instance, which exposes all the public API methods.
`connectionOptions` MAY be an existing connection, if that is more suitable to your chosen language.
###### Code Example
```javascript
var connectionOptions = {
// required connection options. Adapt to your chosen language.
}var bucketOptions = {
bucketName: 'fs',
chunkSizeBytes: 1024 * 255, // 255KB SHOULD be the default chunk size.
concurrency: 10 // OPTIONAL - useful if you are writing files asynchronously
}var bucket = new ReGrid(connectionOptions, bucketOptions)
bucket // a new bucket instance
```---
### `bucket.initBucket()`
ReGrid drivers MUST provied a method to create required tables and indexes.
##### Table Names
Two tables MUST be created for ReGrid to function, the 'files' table and the 'chunks' table. Tables MUST be a combination of the `bucketName` followed by an underscore and the table type. Given the default `bucketName` of 'fs' the files table MUST be named `fs_files` and the chunks table MUST be named `fs_chunks`
The driver MUST check whether the tables already exist before creating them. If creating the tables fails the driver MUST return an error.
##### Indexes
For efficient retrieval of files and chunks, a few indexes are required by ReGrid. Indexes MUST be named as shown below.
```javascript
r.table('').indexCreate('file_ix', [r.row('status'), r.row('filename'), r.row('finishedAt')])r.table('').indexCreate('chunk_ix', [r.row('file_id'), r.row('num')])
```The driver MUST check whether the indexes already exist before creating them. If creating the indexes fails the driver MUST return an error.
###### Code Example
```javascript
var bucket = new ReGrid(connectionOptions, bucketOptions)// Takes no arguments, and is asynchronous. Node.js ReGrid library returns a promise, adapt to your chosen language.
bucket.initBucket().then(function () {
// Tables and indexes MUST now be ready for use.
// use tableWait() and indexWait()
})
```---
### `bucket.createWriteStream(filename, options)`
Drivers SHOULD use their languages built-in stream abstraction. Otherwise, they MUST provide their own.
###### Code Example
```javascript
// An options object MAY be passed in. All fields are optional.
var options = {
chunkSizeBytes: 1024 * 255,
metadata: null
}bucket.createWriteStream(filename, options) // returns a stream
```---
### `createReadStreamById(file_id)`
Get a `readStream` by `id`
###### Code Example
```javascript
bucket.createReadStreamById(file_id) // returns a stream
```---
### `createReadStreamByFilename(filename, options)`
Get a `readStream` by `filename`. Since filenames are not unique, there can be multiple "revisions" of a file. A user may optionally specify a `revision` in the options object.
###### Code Example
```javascript
var options = {
// Revision numbers are defined as follows:
// 0 = the original stored file
// 1 = the first revision
// 2 = the second revision
// etc…
// -2 = the second most recent revision
// -1 = the most recent revision
// Defaults to -1 (the most recent revision).revision: -1
}bucket.createReadStreamByFilename(filename, options) // returns a stream
```### Finding Files
TODO
### Deleting Files
TODO
### Renaming Files
TODO
### Maintenance Operations
TODO