https://github.com/postpayio/ness
A Python datalake client.
https://github.com/postpayio/ness
datalake pandas s3
Last synced: 7 months ago
JSON representation
A Python datalake client.
- Host: GitHub
- URL: https://github.com/postpayio/ness
- Owner: postpayio
- License: mit
- Created: 2021-11-09T01:04:03.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-12-16T09:30:28.000Z (about 3 years ago)
- Last Synced: 2025-05-19T21:13:29.983Z (8 months ago)
- Topics: datalake, pandas, s3
- Language: Python
- Homepage:
- Size: 69.3 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Ness
A Python datalake client.
## Requirements
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
## Installation
```sh
pip install pyarrow ness
```
## Quickstart
```py
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake")
df = dl.read("mytable")
```
## Sync
```py
# Sync all tables
dl.sync()
# Sync a single table
dl.sync("mytable")
# Sync and read a single table
df = dl.read("mytable", sync=True)
```
## Format
Specify the input data source format, the default format is `parquet`:
```py
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake", format="csv")
```
## AWS Profile
Files are synced using `default` AWS profile, you can configure another one:
```py
import ness
dl = ness.dl(bucket="mybucket", key="mydatalake", profile="myprofile")
```
## Command Line
```
Usage: ness sync [OPTIONS] S3_URI
Options:
--format TEXT Data lake source format.
--profile TEXT AWS profile.
--table TEXT Table name to sync.
--help Show this message and exit.
```
```sh
ness sync bucket/key --table mytable
```