Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oohdark30/dell-pystarburst-demo
Python application to demonstrate using PyStarburst and ObjectLock with Dell Lakehouse
https://github.com/oohdark30/dell-pystarburst-demo
Last synced: 25 days ago
JSON representation
Python application to demonstrate using PyStarburst and ObjectLock with Dell Lakehouse
- Host: GitHub
- URL: https://github.com/oohdark30/dell-pystarburst-demo
- Owner: OohDark30
- License: apache-2.0
- Created: 2024-07-19T15:03:42.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-19T15:43:58.000Z (6 months ago)
- Last Synced: 2024-07-19T19:33:03.261Z (6 months ago)
- Language: Python
- Size: 20.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dell-pystarburst-demo
dell-pystarburst-demo is a Python application that demonstrates using PyStarburst and Boto3 with the Dell Data Lakehouse to show how bucket versioning and object locking can be used to prevent unintended data loss:1. Create a session with the Dell Data Analytics Engine (DDAE) powered by Starburst
2. Create a schema and table into a Hive catalog that references an S3 Bucket in the Dell Lakehouse
3. Load a Parquet table into the S3 Bucket in the Dell Lakehouse
4. Perform a query of the data
5. Delete the Parquet file using S3
- Since the bucket is version and object lock enabled the object vesrion is "locked" and a delete marker is created
7. Re-query the data to show the data is no longer available
8. Issue a version specific S3 object delete to remove the delete marker to restore the deleted file
9. Re-issue the query again to show the data is returned
10. Clean up object versions, bucket, table, and schema