https://github.com/poga/hyperspark
Decentralized data processing platform
https://github.com/poga/hyperspark
Last synced: about 1 month ago
JSON representation
Decentralized data processing platform
- Host: GitHub
- URL: https://github.com/poga/hyperspark
- Owner: poga
- Created: 2016-10-17T15:32:27.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-01T22:01:32.000Z (about 8 years ago)
- Last Synced: 2025-03-18T14:05:57.365Z (about 2 months ago)
- Language: JavaScript
- Homepage:
- Size: 43 KB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-dat - hyperspark - decentralized data processing platform for dat archives, inspired by `spark` (Outdated / Other Related Dat Project Modules)
- awesome-dat - hyperspark - decentralized data processing platform for dat archives, inspired by `spark` (Outdated / Other Related Dat Project Modules)
README
# Hyperspark
Hyperspark is a decentralized data processing tool for [Dat](http://dat-data.com). Inspired by [Spark](https://spark.apache.org/)
Basically, it's just a fancy wrapper around [Dat Archive](datproject.org)
**This is a work-in-progress. Any idea/suggestion is welcome**
### Goal
* Reuse intermediate data.
* Minimize bandwidth usage.
* Share computation power.## How to use
#### Data owner
It's simple! Just share your data with dat: `dat .`
#### Data Scientist
Define your ideas with [transforms and actions](https://github.com/poga/dat-transform) without worrying about fetching and storing data.
#### Computation Provider
Run transformations defined by researchers. Cache and share intermediate data so everyone can re-use the knowledge without having their own computation cluster.
---
## Synopsis
define RDD on dat with [dat-transform](https://github.com/poga/dat-transform)
word-counting:
```js
const hs = require('hyperspark')
var rdd = hs()// define transforms
var result = rdd
.splitBy(/[\n\s]/)
.filter(x => x !== '')
.map(word => kv(word, 1))// actual run(action)
result.reduceByKey((x, y) => x + y)
.toArray(res => {
console.log(res) // [{bar: 2, baz: 1, foo: 1}]
})
```## Related Modules
* RDD-style data transformation with js. [dat-transform](https://github.com/poga/dat-transform)
* Analyze data inside dat archive with RDD-style API. [dat-ipynb](https://github.com/poga/dat-ipynb-demo), using [nel](https://github.com/poga/nel)
* Convert iPython Notebook to Markdown. [ipynb2md](https://github.com/poga/ipynb2md)
* Attach file to markdown with dat. [markdown-attachment-p2p](https://github.com/poga/markdown-attachment-p2p)