Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/benmizrahi/duckspark

duckspark - A DuckDB based distributed data processing engine
https://github.com/benmizrahi/duckspark

data-engineering distributed-systems golang spark

Last synced: about 2 months ago
JSON representation

duckspark - A DuckDB based distributed data processing engine

Host: GitHub
URL: https://github.com/benmizrahi/duckspark
Owner: benmizrahi
License: apache-2.0
Created: 2023-03-10T15:34:55.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-16T20:17:34.000Z (8 months ago)
Last Synced: 2024-11-16T12:46:54.230Z (3 months ago)
Topics: data-engineering, distributed-systems, golang, spark
Language: Go
Homepage:
Size: 272 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## duckspark.io - go bigger ![](https://github.githubassets.com/images/icons/emoji/unicode/1f680.png?v2)

duckspark is library for enabling distributed processing of large data sets. It allows you to efficiently process data in parallel across multiple machines or nodes, enabling faster and scalable data processing.

[![](https://skillicons.dev/icons?i=go)](https://skillicons.dev)

## Features

- Distributed processing: The library allows you to distribute the processing of large data sets across multiple nodes, leveraging the power of parallel processing.

- Scalability: It supports scaling your data processing by adding more machines or nodes to the cluster, allowing you to handle larger datasets or increasing the processing speed.
Fault tolerance: The library incorporates fault tolerance mechanisms to handle failures or crashes in the cluster. It provides automatic recovery and resilience to ensure uninterrupted processing.

- Load balancing: It implements intelligent load balancing algorithms to distribute the workload evenly across nodes, optimizing resource utilization.

- Data partitioning: The library offers efficient data partitioning techniques, enabling parallel processing on smaller subsets of the data across different nodes.
Simplified API: It provides a simple and intuitive API to facilitate the development of distributed data processing applications.

----

## Building duckspark

``` apt install -y protobuf-compiler ```

``` go install google.golang.org/protobuf/cmd/[email protected] ```

```protoc --go_out=. protos/*.proto```

## Contributing

for information on how to get started contributing to the project.