Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/leehuwuj/lake-inspector
Inspect your lakehouse data by using PyArrow
https://github.com/leehuwuj/lake-inspector
arrow datalake lakehouse pyarrow
Last synced: 10 days ago
JSON representation
Inspect your lakehouse data by using PyArrow
- Host: GitHub
- URL: https://github.com/leehuwuj/lake-inspector
- Owner: leehuwuj
- Created: 2023-03-07T18:28:56.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-09T02:43:29.000Z (almost 1 year ago)
- Last Synced: 2024-12-23T06:06:20.382Z (16 days ago)
- Topics: arrow, datalake, lakehouse, pyarrow
- Language: Python
- Homepage:
- Size: 447 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![image info](./resources/lake-inspector.png)
# Example:
- Inspect a path in S3:
- Make sure your s3 profile was set:
```
export AWS_ENDPOINT_URL=
export ACCESS_KEY=
export AWS_SECRET_ACCESS_KEY=
```
- Then run s3-inspect.py file:
```shell
poetry run python s3-inspect.py lake-dev/dagster/test
```
-> Output to console
```
{'f1_partitions': ['lake-dev/dagster/test/date=20230315',
'lake-dev/dagster/test/date=20230316'],
'latest_partition': ['date',
'20230316',
'lake-dev/dagster/test/date=20230316'],
'number_of_files': 2,
'path': 'lake-dev/dagster/test',
'size': 23340}
```
- If you want to store metrics into file to S3:
```shell
poetry run python s3-inspect.py --writer s3 --write-uri lake-dev/inspector/metrics/test-dataset/$(date +%s).json lake-dev/dagster/test
```