An open API service indexing awesome lists of open source software.

https://github.com/postpayio/ness

A Python datalake client.
https://github.com/postpayio/ness

datalake pandas s3

Last synced: 7 months ago
JSON representation

A Python datalake client.

Awesome Lists containing this project

README

          

# Ness


A Python datalake client.




Test


Coverage


Package version

## Requirements

- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

## Installation

```sh
pip install pyarrow ness
```

## Quickstart

```py
import ness

dl = ness.dl(bucket="mybucket", key="mydatalake")
df = dl.read("mytable")
```

## Sync

```py
# Sync all tables
dl.sync()

# Sync a single table
dl.sync("mytable")

# Sync and read a single table
df = dl.read("mytable", sync=True)
```

## Format

Specify the input data source format, the default format is `parquet`:

```py
import ness

dl = ness.dl(bucket="mybucket", key="mydatalake", format="csv")
```

## AWS Profile

Files are synced using `default` AWS profile, you can configure another one:

```py
import ness

dl = ness.dl(bucket="mybucket", key="mydatalake", profile="myprofile")
```

## Command Line

```
Usage: ness sync [OPTIONS] S3_URI

Options:
--format TEXT Data lake source format.
--profile TEXT AWS profile.
--table TEXT Table name to sync.
--help Show this message and exit.
```

```sh
ness sync bucket/key --table mytable
```