https://github.com/hoaihuongbk/lakeops

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.
https://github.com/hoaihuongbk/lakeops

data data-operations dataengineering datalake

Last synced: 4 months ago
JSON representation

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.

Host: GitHub
URL: https://github.com/hoaihuongbk/lakeops
Owner: hoaihuongbk
License: mit
Created: 2025-02-03T06:53:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-02-21T08:22:37.000Z (5 months ago)
Last Synced: 2026-02-21T13:37:02.431Z (5 months ago)
Topics: data, data-operations, dataengineering, datalake
Language: Python
Homepage: https://hoaihuongbk.github.io/lakeops/
Size: 700 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # LakeOps

[![PyPI version](https://badge.fury.io/py/lakeops.svg)](https://badge.fury.io/py/lakeops)

[![Python Versions](https://img.shields.io/pypi/pyversions/lakeops.svg)](https://pypi.org/project/lakeops/)

[![Tests](https://github.com/hoaihuongbk/lakeops/actions/workflows/test.yml/badge.svg)](https://github.com/hoaihuongbk/lakeops/actions/workflows/test.yml)

[![codecov](https://codecov.io/gh/hoaihuongbk/lakeops/branch/main/graph/badge.svg)](https://codecov.io/gh/hoaihuongbk/lakeops)

A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines

(Spark, Polars) via the same APIs.

## Features

- Multi-format support: Delta, Iceberg, Parquet

- Multiple engine backends: Apache Spark, Polars (default)

- Storage operations: read, write

To learn more, read the [user guide](https://hoaihuongbk.github.io/lakeops/).

## Quick Start

### Installation

```bash

# Using pip

pip install lakeops

# Using uv

uv pip install lakeops

# Using poetry

poetry add lakeops

```

### Sample Usage

```python

from pyspark.sql import SparkSession

from lakeops import LakeOps

from lakeops.core.engine import SparkEngine

# Init Spark session and create LakeOps instance

spark = SparkSession.builder.getOrCreate()

engine = SparkEngine(spark)

ops = LakeOps(engine)

# Read data from table name

df = ops.read("s3://local/test/table", format="parquet")

# Write data to table name

ops.write(df, "s3://local/test/table", format="parquet")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hoaihuongbk/lakeops

Awesome Lists containing this project

README