Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benmizrahi/duckspark
duckspark - A DuckDB based distributed data processing engine
https://github.com/benmizrahi/duckspark
data-engineering distributed-systems golang spark
Last synced: 9 days ago
JSON representation
duckspark - A DuckDB based distributed data processing engine
- Host: GitHub
- URL: https://github.com/benmizrahi/duckspark
- Owner: benmizrahi
- License: apache-2.0
- Created: 2023-03-10T15:34:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-16T20:17:34.000Z (5 months ago)
- Last Synced: 2024-07-12T04:59:51.925Z (4 months ago)
- Topics: data-engineering, distributed-systems, golang, spark
- Language: Go
- Homepage:
- Size: 272 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## duckspark.io - go bigger ![](https://github.githubassets.com/images/icons/emoji/unicode/1f680.png?v2)
duckspark is library for enabling distributed processing of large data sets. It allows you to efficiently process data in parallel across multiple machines or nodes, enabling faster and scalable data processing.
[![](https://skillicons.dev/icons?i=go)](https://skillicons.dev)
## Features
- Distributed processing: The library allows you to distribute the processing of large data sets across multiple nodes, leveraging the power of parallel processing.
- Scalability: It supports scaling your data processing by adding more machines or nodes to the cluster, allowing you to handle larger datasets or increasing the processing speed.
Fault tolerance: The library incorporates fault tolerance mechanisms to handle failures or crashes in the cluster. It provides automatic recovery and resilience to ensure uninterrupted processing.- Load balancing: It implements intelligent load balancing algorithms to distribute the workload evenly across nodes, optimizing resource utilization.
- Data partitioning: The library offers efficient data partitioning techniques, enabling parallel processing on smaller subsets of the data across different nodes.
Simplified API: It provides a simple and intuitive API to facilitate the development of distributed data processing applications.----
## Building duckspark
``` apt install -y protobuf-compiler ```
``` go install google.golang.org/protobuf/cmd/[email protected] ```
```protoc --go_out=. protos/*.proto```
## Contributing
for information on how to get started contributing to the project.