https://github.com/vaibhavbansal26/stedi-human-balance-analytics_spark_datalakes
STEDI Human Balance Analytics (SPARK, DATA LAKES, AWS Glue, AWS Athena, AWS S3)
https://github.com/vaibhavbansal26/stedi-human-balance-analytics_spark_datalakes
Last synced: 2 months ago
JSON representation
STEDI Human Balance Analytics (SPARK, DATA LAKES, AWS Glue, AWS Athena, AWS S3)
- Host: GitHub
- URL: https://github.com/vaibhavbansal26/stedi-human-balance-analytics_spark_datalakes
- Owner: VaibhavBansal26
- Created: 2024-07-09T19:11:33.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-11T19:16:07.000Z (11 months ago)
- Last Synced: 2024-07-12T20:22:03.153Z (11 months ago)
- Language: Python
- Size: 4.57 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# stedi-human-balance-analytics_spark_datalakes
STEDI Human Balance Analytics (SPARK, DATA LAKES, AWS Glue, AWS Athena, AWS S3)Using AWS Glue, AWS S3, Python, and Spark, create or generate Python scripts to build a lakehouse solution in AWS
All th python scripts are present in "glueJob_python_scripts" folder

**Landing Zone (Raw Data)**
Customer: 956
Accelerometer: 81273

Step Trainer: 28680

The customer landing data has multiple rows where shareWithResearchAsOfDate is blank.

**Trusted Zone (Filtering,PII)**
Customer: 482

Accelerometer: 40981

Step Trainer : 14460

Customer Trusted (table that shows no blank shareWithResearchAsOfDate row): Empty
**SELECT* FROM customer_trusted WHERE sharewithresearchasofdate IS NULL** : Empty Table

**Curated Zone**
Customer Curated: 482

Machine Learning Curated: 43681
