Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/declaredata/fuse_python

PySpark-compatible Python client for DeclareData Fuse Server: a blazing fast data processing engine and drop-in alternative to Spark clusters.
https://github.com/declaredata/fuse_python

data-processing pyspark rust-lang spark

Last synced: 29 days ago
JSON representation

PySpark-compatible Python client for DeclareData Fuse Server: a blazing fast data processing engine and drop-in alternative to Spark clusters.

Host: GitHub
URL: https://github.com/declaredata/fuse_python
Owner: declaredata
License: mit
Created: 2024-11-25T22:21:52.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-01-12T19:29:25.000Z (29 days ago)
Last Synced: 2025-01-12T19:31:21.706Z (29 days ago)
Topics: data-processing, pyspark, rust-lang, spark
Language: Python
Homepage: https://declaredata.com
Size: 462 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 22
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


  

  

    

  

  








  

[![PyPI - Version](https://img.shields.io/pypi/v/declaredata_fuse?label=PyPi%20Release&color=7E22CE)](https://pypi.org/project/declaredata_fuse/)

![Python Version](https://img.shields.io/python/required-version-toml?tomlFilePath=https://raw.githubusercontent.com/declaredata/fuse_python/refs/heads/main/pyproject.toml&label=Python%20Version&color=7E22CE)

[![License](https://img.shields.io/github/license/declaredata/fuse_python.svg?label=License&color=7E22CE)](https://github.com/declaredata/fuse_python/blob/main/LICENSE)

[![CI](https://github.com/declaredata/fuse_python/actions/workflows/python.yml/badge.svg?branch=main)](https://github.com/declaredata/fuse_python/actions)

[![Benchmark](https://github.com/declaredata/fuse_python/actions/workflows/bench.yml/badge.svg?branch=main&color=7E22CE)](https://github.com/declaredata/fuse_python/actions)



# DeclareData Fuse Client Bindings for Python

A Python client library for **DeclareData Fuse Server** that provides a PySpark-compatible API. Scale down your Spark clusters and speed up workloads without changing your code.

> DeclareData Fuse Server and this library are under active development. This is a pre-release version and may contain bugs or incomplete features. Please review and contribute to our [compatibility development status](https://github.com/declaredata/fuse_python/issues/6).

# Contents

- [Prerequisites](#prerequisites)

- [Components](#components)

- [Server Setup](#server-setup)

- [Python Client Installation](#python-client-installation)

- [Quick Start Guide](#quick-start-guide)

  - [Initialize a Session](#initialize-a-session)

  - [Basic Data Operations](#basic-data-operations)

- [Other Documentation 🚧 WIP](#other-documentation--wip)

- [Issue Reporting](#issue-reporting)

# Prerequisites

* Python 3.10 or higher

* 8GB+ available memory

* pip package manager

* Docker

* Available port 8080 (required for gRPC) and port 3000 (optional for web interface)

# Components

* [**DeclareData Fuse Server**](#server-setup): Blazing fast, low-overhead drop-in alternative to Apache Spark clusters that runs anywhere

* [**DeclareData Fuse Python**](#python-client-installation): Python client library providing PySpark-compatible APIs

# Server Setup

Run the Fuse server using Docker:

```bash

docker run -p 8080:8080 -p 3000:3000 ghcr.io/declaredata/fuse:latest

```

> **Note:** All images are published to our GitHub Package Docker repository, which can be found at [github.com/orgs/declaredata/packages/container/package/fuse](https://github.com/orgs/declaredata/packages/container/package/fuse).

# Python Client Installation

Install from PyPI:

```bash

pip install declaredata_fuse

```

Update to the latest version:

```bash

pip install --upgrade declaredata_fuse

```

# Quick Start Guide

## Initialize a Session

```python

from declaredata_fuse.session import FuseSession

# Connect to DeclareData Fuse Server (default: localhost:8080)

fs = FuseSession.builder.getOrCreate()

```

## Basic Data Operations

```python

# Read CSV file

df = fs.read.csv("data.csv")

df.show(10)

# Filter data

df.filter(df.year >= 2000).show(10)

# Sort and select columns

df.sort(

    df.population, ascending=False

).select(

    df.year, df.state_abbr, df.population

).show(10)

# Group and aggregate

import declaredata_fuse.functions as F

df.groupBy("year").agg(

    F.first("population").alias("highest_population_of_year")

).sort(

    df.highest_population_of_year, ascending=False

).show(10)

```

# Other Documentation 🚧 WIP

* Additional API documentation is also available [`here`](https://docs.declaredata.com)

* Usage examples can be found in the [`bench`](./bench/) directory

# Issue Reporting

Please report issues via our [GitHub Issues](https://github.com/declaredata/fuse_python/issues) page with the following information:

* Problem description

* Steps to reproduce

* Expected vs actual behavior

* Environment details (OS, Python version)

* Error messages or logs

For security concerns, please email us directly.