Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tristanls/k-bucket
Kademlia DHT K-bucket implementation as a binary tree
https://github.com/tristanls/k-bucket
dht
Last synced: 7 days ago
JSON representation
Kademlia DHT K-bucket implementation as a binary tree
- Host: GitHub
- URL: https://github.com/tristanls/k-bucket
- Owner: tristanls
- License: other
- Created: 2013-07-05T05:18:37.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-04-28T06:46:56.000Z (over 1 year ago)
- Last Synced: 2024-05-15T21:32:02.907Z (8 months ago)
- Topics: dht
- Language: JavaScript
- Size: 188 KB
- Stars: 155
- Watchers: 11
- Forks: 33
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-peer-to-peer - k-bucket - bucket implementation (Modules)
- awesome-peer-to-peer - k-bucket - bucket implementation (Modules)
- awesome-network-js - k-bucket - bucket implementation as a binary tree. (Protocols)
README
# k-bucket
_Stability: 2 - [Stable](https://github.com/tristanls/stability-index#stability-2---stable)_
[![NPM version](https://badge.fury.io/js/k-bucket.png)](http://npmjs.org/package/k-bucket)
Kademlia DHT K-bucket implementation as a binary tree.
## Contributors
[@tristanls](https://github.com/tristanls), [@mikedeboer](https://github.com/mikedeboer), [@deoxxa](https://github.com/deoxxa), [@feross](https://github.com/feross), [@nathanph](https://github.com/nathanph), [@allouis](https://github.com/allouis), [@fanatid](https://github.com/fanatid), [@robertkowalski](https://github.com/robertkowalski), [@nazar-pc](https://github.com/nazar-pc), [@jimmywarting](https://github.com/jimmywarting), [@achingbrain](https://github.com/achingbrain)
## Installation
npm install k-bucket
## Tests
npm test
## Usage
```javascript
const KBucket = require('k-bucket')const kBucket1 = new KBucket({
localNodeId: Buffer.from('my node id') // default: random data
})
// or without using Buffer (for example, in the browser)
const id = 'my node id'
const nodeId = new Uint8Array(id.length)
for (let i = 0, len = nodeId.length; i < len; ++i) {
nodeId[i] = id.charCodeAt(i)
}
const kBucket2 = new KBucket({
localNodeId: nodeId // default: random data
})
```## Overview
A [*Distributed Hash Table (DHT)*](http://en.wikipedia.org/wiki/Distributed_hash_table) is a decentralized distributed system that provides a lookup table similar to a hash table.
*k-bucket* is an implementation of a storage mechanism for keys within a DHT. It stores `contact` objects which represent locations and addresses of nodes in the decentralized distributed system. `contact` objects are typically identified by a SHA-1 hash, however this restriction is lifted in this implementation. Additionally, node ids of different lengths can be compared.
This Kademlia DHT k-bucket implementation is meant to be as minimal as possible. It assumes that `contact` objects consist only of `id`. It is useful, and necessary, to attach other properties to a `contact`. For example, one may want to attach `ip` and `port` properties, which allow an application to send IP traffic to the `contact`. However, this information is extraneous and irrelevant to the operation of a k-bucket.
### arbiter function
This *k-bucket* implementation implements a conflict resolution mechanism using an `arbiter` function. The purpose of the `arbiter` is to choose between two `contact` objects with the same `id` but perhaps different properties and determine which one should be stored. As the `arbiter` function returns the actual object to be stored, it does not need to make an either/or choice, but instead could perform some sort of operation and return the result as a new object that would then be stored. See [kBucket._update(node, index, contact)](#kbucket_updatenode-index-contact) for detailed semantics of which `contact` (`incumbent` or `candidate`) is selected.
For example, an `arbiter` function implementing a `vectorClock` mechanism would look something like:
```javascript
// contact example
var contact = {
id: Buffer.from('contactId'),
vectorClock: 0
};function arbiter(incumbent, candidate) {
if (incumbent.vectorClock > candidate.vectorClock) {
return incumbent;
}
return candidate;
};
```Alternatively, consider an arbiter that implements a Grow-Only-Set CRDT mechanism:
```javascript
// contact example
const contact = {
id: Buffer.from('workerService'),
workerNodes: {
'17asdaf7effa2': { host: '127.0.0.1', port: 1337 },
'17djsyqeryasu': { host: '127.0.0.1', port: 1338 }
}
};function arbiter(incumbent, candidate) {
// we create a new object so that our selection is guaranteed to replace
// the incumbent
const merged = {
id: incumbent.id, // incumbent.id === candidate.id within an arbiter
workerNodes: incumbent.workerNodes
}Object.keys(candidate.workerNodes).forEach(workerNodeId => {
merged.workerNodes[workerNodeId] = candidate.workerNodes[workerNodeId];
})return merged
}
```Notice that in the above case, the Grow-Only-Set assumes that each worker node has a globally unique id.
## Documentation
### KBucket
Implementation of a Kademlia DHT k-bucket used for storing contact (peer node) information.
For a step by step example of k-bucket operation you may find the following slideshow useful: [Distribute All The Things](https://docs.google.com/presentation/d/11qGZlPWu6vEAhA7p3qsQaQtWH7KofEC9dMeBFZ1gYeA/edit#slide=id.g1718cc2bc_0661).
KBucket starts off as a single k-bucket with capacity of _k_. As contacts are added, once the _k+1_ contact is added, the k-bucket is split into two k-buckets. The split happens according to the first bit of the contact node id. The k-bucket that would contain the local node id is the "near" k-bucket, and the other one is the "far" k-bucket. The "far" k-bucket is marked as _don't split_ in order to prevent further splitting. The contact nodes that existed are then redistributed along the two new k-buckets and the old k-bucket becomes an inner node within a tree data structure.
As even more contacts are added to the "near" k-bucket, the "near" k-bucket will split again as it becomes full. However, this time it is split along the second bit of the contact node id. Again, the two newly created k-buckets are marked "near" and "far" and the "far" k-bucket is marked as _don't split_. Again, the contact nodes that existed in the old bucket are redistributed. This continues as long as nodes are being added to the "near" k-bucket, until the number of splits reaches the length of the local node id.
As more contacts are added to the "far" k-bucket and it reaches its capacity, it does not split. Instead, the k-bucket emits a "ping" event (register a listener: `kBucket.on('ping', function (oldContacts, newContact) {...});` and includes an array of old contact nodes that it hasn't heard from in a while and requires you to confirm that those contact nodes still respond (literally respond to a PING RPC). If an old contact node still responds, it should be re-added (`kBucket.add(oldContact)`) back to the k-bucket. This puts the old contact on the "recently heard from" end of the list of nodes in the k-bucket. If the old contact does not respond, it should be removed (`kBucket.remove(oldContact.id)`) and the new contact being added now has room to be stored (`kBucket.add(newContact)`).
**Public API**
* [KBucket.arbiter(incumbent, candidate)](#kbucketarbiterincumbent-candidate)
* [KBucket.distance(firstId, secondId)](#kbucketdistancefirstid-secondid)
* [new KBucket(options)](#new-kbucketoptions)
* [kBucket.add(contact)](#kbucketaddcontact)
* [kBucket.closest(id [, n = Infinity])](#kbucketclosestid--n--infinity)
* [kBucket.count()](#kbucketcount)
* [kBucket.get(id)](#kbucketgetid)
* [kBucket.metadata](#kbucketmetadata)
* [kBucket.remove(id)](#kbucketremoveid)
* [kBucket.toArray()](#kbuckettoarray)
* [kBucket.toIterable()](#kbuckettoiterable)
* [Event 'added'](#event-added)
* [Event 'ping'](#event-ping)
* [Event 'removed'](#event-removed)
* [Event 'updated'](#event-updated)#### KBucket.arbiter(incumbent, candidate)
* `incumbent`: _Object_ Contact currently stored in the k-bucket.
* `candidate`: _Object_ Contact being added to the k-bucket.
* Return: _Object_ Contact to updated the k-bucket with.Default arbiter function for contacts with the same `id`. Uses `contact.vectorClock` to select which contact to update the k-bucket with. Contact with larger `vectorClock` field will be selected. If `vectorClock` is the same, `candidat` will be selected.
#### KBucket.distance(firstId, secondId)
* `firstId`: _Uint8Array_ Uint8Array containing first id.
* `secondId`: _Uint8Array_ Uint8Array containing second id.
* Return: _Integer_ The XOR distance between `firstId` and `secondId`.Default distance function. Finds the XOR distance between firstId and secondId.
#### new KBucket(options)
* `options`:
* `arbiter`: _Function_ _(Default: vectorClock arbiter)_
`function (incumbent, candidate) { return contact; }` An optional `arbiter` function that given two `contact` objects with the same `id` returns the desired object to be used for updating the k-bucket. For more details, see [arbiter function](#arbiter-function).
* `distance`: _Function_
`function (firstId, secondId) { return distance }` An optional `distance` function that gets two `id` Uint8Arrays and return distance (as number) between them.
* `localNodeId`: _Uint8Array_ An optional Uint8Array representing the local node id. If not provided, a local node id will be created via `crypto.randomBytes(20)`.
* `metadata`: _Object_ _(Default: {})_ Optional satellite data to include with the k-bucket. `metadata` property is guaranteed not be altered, it is provided as an explicit container for users of k-bucket to store implementation-specific data.
* `numberOfNodesPerKBucket`: _Integer_ _(Default: 20)_ The number of nodes that a k-bucket can contain before being full or split.
* `numberOfNodesToPing`: _Integer_ _(Default: 3)_ The number of nodes to ping when a bucket that should not be split becomes full. KBucket will emit a `ping` event that contains `numberOfNodesToPing` nodes that have not been contacted the longest.Creates a new KBucket.
#### kBucket.add(contact)
* `contact`: _Object_ The contact object to add.
* `id`: _Uint8Array_ Contact node id.
* Any satellite data that is part of the `contact` object will not be altered, only `id` is used.
* Return: _Object_ The k-bucket itself.Adds a `contact` to the k-bucket.
#### kBucket.closest(id [, n = Infinity])
* `id`: _Uint8Array_ Contact node id.
* `n`: _Integer_ _(Default: Infinity)_ The maximum number of closest contacts to return.
* Return: _Array_ Maximum of `n` closest contacts to the node id.Get the `n` closest contacts to the provided node id. "Closest" here means: closest according to the XOR metric of the `contact` node id.
#### kBucket.count()
* Return: _Number_ The number of contacts held in the tree
Counts the total number of contacts in the tree.
#### kBucket.get(id)
* `id`: _Uint8Array_ The ID of the `contact` to fetch.
* Return: _Object_ The `contact` if available, otherwise nullRetrieves the `contact`.
#### kBucket.metadata
* `metadata`: _Object_ _(Default: {})_
The `metadata` property serves as a container that can be used by implementations using k-bucket. One example is storing a timestamp to indicate the last time when a node in the bucket was responding to a ping.
#### kBucket.remove(id)
* `id`: _Uint8Array_ The ID of the `contact` to remove.
* Return: _Object_ The k-bucket itself.Removes `contact` with the provided `id`.
#### kBucket.toArray()
* Return: _Array_ All of the contacts in the tree, as an array
Traverses the tree, putting all the contacts into one array.
#### kBucket.toIterable()
* Return: _Iterable_ All of the contacts in the tree, as an iterable
Traverses the tree, yielding contacts as they are encountered.
#### kBucket._determineNode(node, id [, bitIndex = 0])
_**CAUTION: reserved for internal use**_
* `node`: internal object that has 2 leafs: left and right
* `id`: _Uint8Array_ Id to compare `localNodeId` with.
* `bitIndex`: _Integer_ _(Default: 0)_ The bit index to which bit to check in the `id` Uint8Array.
* Return: _Object_ left leaf if `id` at `bitIndex` is 0, right leaf otherwise.#### kBucket._indexOf(id)
_**CAUTION: reserved for internal use**_
* `node`: internal object that has 2 leafs: left and right
* `id`: _Uint8Array_ Contact node id.
* Return: _Integer_ Index of `contact` with provided `id` if it exists, -1 otherwise.Returns the index of the `contact` with provided `id` if it exists, returns -1 otherwise.
#### kBucket._split(node [, bitIndex])
_**CAUTION: reserved for internal use**_
* `node`: _Object_ node for splitting
* `bitIndex`: _Integer_ _(Default: 0)_ The bit index to which bit to check in the `id` Uint8Array.Splits the node, redistributes contacts to the new nodes, and marks the node that was split as an inner node of the binary tree of nodes by setting `self.contacts = null`. Also, marks the "far away" node as `dontSplit`.
#### kBucket._update(node, index, contact)
_**CAUTION: reserved for internal use**_
* `node`: internal object that has 2 leafs: left and right
* `index`: _Integer_ The index in the bucket where contact exists (index has already been computed in previous calculation).
* `contact`: _Object_ The contact object to update.
* `id`: _Uint8Array_ Contact node id
* Any satellite data that is part of the `contact` object will not be altered, only `id` is used.Updates the `contact` by using the `arbiter` function to compare the incumbent and the candidate. If `arbiter` function selects the old `contact` but the candidate is some new `contact`, then the new `contact` is abandoned. If `arbiter` function selects the old `contact` and the candidate is that same old `contact`, the `contact` is marked as most recently contacted (by being moved to the right/end of the bucket array). If `arbiter` function selects the new `contact`, the old `contact` is removed and the new `contact` is marked as most recently contacted.
#### Event: 'added'
* `newContact`: _Object_ The new contact that was added.
Emitted only when `newContact` was added to bucket and it was not stored in the bucket before.
#### Event: 'ping'
* `oldContacts`: _Array_ The array of contacts to ping.
* `newContact`: _Object_ The new contact to be added if one of old contacts does not respond.Emitted every time a contact is added that would exceed the capacity of a _don't split_ k-bucket it belongs to.
#### Event: 'removed'
* `contact`: _Object_ The contact that was removed.
Emitted when `contact` was removed from the bucket.
#### Event: 'updated'
* `oldContact`: _Object_ The contact that was stored prior to the update.
* `newContact`: _Object_ The new contact that is now stored after the update.Emitted when a previously existing ("previously existing" means `oldContact.id` equals `newContact.id`) contact was added to the bucket and it was replaced with `newContact`.
## Releases
[Current releases](https://github.com/tristanls/k-bucket/releases).
### Policy
We follow the semantic versioning policy ([semver.org](http://semver.org/)) with a caveat:
> Given a version number MAJOR.MINOR.PATCH, increment the:
>
>MAJOR version when you make incompatible API changes,
>MINOR version when you add functionality in a backwards-compatible manner, and
>PATCH version when you make backwards-compatible bug fixes.**caveat**: Major version zero is a special case indicating development version that may make incompatible API changes without incrementing MAJOR version.
## Sources
The implementation has been sourced from:
- [A formal specification of the Kademlia distributed hash table](http://maude.sip.ucm.es/kademlia/files/pita_kademlia.pdf)
- [Distributed Hash Tables (part 2)](https://web.archive.org/web/20140217064545/http://offthelip.org/?p=157)