Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pprzetacznik/datalake-aws
Sample data lake pipeline on AWS implemented using Terraform
https://github.com/pprzetacznik/datalake-aws
aws csv datalake parquet python terraform
Last synced: 11 days ago
JSON representation
Sample data lake pipeline on AWS implemented using Terraform
- Host: GitHub
- URL: https://github.com/pprzetacznik/datalake-aws
- Owner: pprzetacznik
- Created: 2023-10-12T22:04:24.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-10-26T19:08:38.000Z (over 1 year ago)
- Last Synced: 2024-12-08T22:05:36.309Z (2 months ago)
- Topics: aws, csv, datalake, parquet, python, terraform
- Language: HCL
- Homepage:
- Size: 133 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AWS Data Lake example
## Diagram
![AWS Data Lake diagram](diagram/aws_data_lake.png "Data Lake")
## Terraform workspaces structure
### persistance
This layer is extracted from `datalake` workspace to preserve data when refactoring serverless infrastructure.
### iam
IAM users and roles can be set before set up of the data lake infrastructure.
### datalake
This infrastructure is making the majority of the billing costs and can be deleted and restored anytime.
## TODO
* Cleaning the code
* Extracting code from workspaces' main.tf files to modules
* Versioning of modules through git tags