https://github.com/hoaihuongbk/lakeops
A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.
https://github.com/hoaihuongbk/lakeops
data data-operations dataengineering datalake
Last synced: 7 days ago
JSON representation
A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines (Spark, Polars) via the same APIs.
- Host: GitHub
- URL: https://github.com/hoaihuongbk/lakeops
- Owner: hoaihuongbk
- License: mit
- Created: 2025-02-03T06:53:38.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-02-21T08:22:37.000Z (21 days ago)
- Last Synced: 2026-02-21T13:37:02.431Z (21 days ago)
- Topics: data, data-operations, dataengineering, datalake
- Language: Python
- Homepage: https://hoaihuongbk.github.io/lakeops/
- Size: 700 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# LakeOps
[](https://badge.fury.io/py/lakeops)
[](https://pypi.org/project/lakeops/)
[](https://github.com/hoaihuongbk/lakeops/actions/workflows/test.yml)
[](https://codecov.io/gh/hoaihuongbk/lakeops)
A modern data lake operations toolkit working with multiple table formats (Delta, Iceberg, Parquet) and engines
(Spark, Polars) via the same APIs.
## Features
- Multi-format support: Delta, Iceberg, Parquet
- Multiple engine backends: Apache Spark, Polars (default)
- Storage operations: read, write
To learn more, read the [user guide](https://hoaihuongbk.github.io/lakeops/).
## Quick Start
### Installation
```bash
# Using pip
pip install lakeops
# Using uv
uv pip install lakeops
# Using poetry
poetry add lakeops
```
### Sample Usage
```python
from pyspark.sql import SparkSession
from lakeops import LakeOps
from lakeops.core.engine import SparkEngine
# Init Spark session and create LakeOps instance
spark = SparkSession.builder.getOrCreate()
engine = SparkEngine(spark)
ops = LakeOps(engine)
# Read data from table name
df = ops.read("s3://local/test/table", format="parquet")
# Write data to table name
ops.write(df, "s3://local/test/table", format="parquet")
```