Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bhavanachitragar/zillow-data-analytics
A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations
https://github.com/bhavanachitragar/zillow-data-analytics
airflow-dags ec2-instance etl-pipeline lambda-functions quicksight-dashboard redshift s3 zillow-api
Last synced: about 2 months ago
JSON representation
A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations
- Host: GitHub
- URL: https://github.com/bhavanachitragar/zillow-data-analytics
- Owner: bhavanachitragar
- Created: 2024-06-07T04:11:03.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-10T07:10:12.000Z (7 months ago)
- Last Synced: 2024-06-11T06:38:17.694Z (7 months ago)
- Topics: airflow-dags, ec2-instance, etl-pipeline, lambda-functions, quicksight-dashboard, redshift, s3, zillow-api
- Language: Python
- Homepage:
- Size: 66.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Zillow Data Analytics using AWS
![Architecture drawio (1)](https://github.com/bhavanachitragar/zillow-data-analytics/assets/91766461/d36ab26a-5f22-4866-8e1b-ddea2ed2bd98)
## This architecture leverages:
- Airflow: For scheduling and orchestration of the data pipeline tasks.
- EC2: For running the Python scripts for data extraction and transformation.
- Lambda Functions: For serverless, triggered processing of data transfer between S3 buckets.
- S3: For storing data at various stages of the pipeline.
- Redshift: For efficient data warehousing and analytics.
- QuickSight: For data visualization and exploration.## Steps included:
1. Python Script: Extracts data from Zillow in JSON format and stores it in an S3 bucket.
2. S3 Bucket (Staging): Stores the initial extracted JSON data.
3. AWS Lambda Function 1 (Data Transfer): Triggers upon new data in the staging S3 bucket and copies the JSON data to a destination S3 bucket.
4. S3 Bucket (Processing): Holds the JSON data ready for further processing.
5. AWS Lambda Function 2 (Data Transformation): Triggers upon new data in the processing S3 bucket, reads the JSON data, converts it to CSV format, and stores the CSV data in a designated S3 bucket.
6. S3 Bucket (Transformed Data): Stores the final processed data in CSV format.
7. Amazon Redshift: Stores the CSV data from the transformed data S3 bucket for efficient data warehousing and analytics.
8. Amazon QuickSight: Connects to the Redshift data warehouse to visualize and analyze the Zillow data.## Airflow
### DAG View
![Screenshot 2024-06-10 114256](https://github.com/bhavanachitragar/zillow-data-analytics/assets/91766461/deadc3d2-cdbe-4c4e-8b70-50c6e4cdac57)## Redshift
### Transformed data is loaded into Amazon Redshift
![Screenshot 2024-06-10 105008](https://github.com/bhavanachitragar/zillow-data-analytics/assets/91766461/4c4cd696-5982-4b7c-9c9a-bb5552eb87eb)## Quicksight
### Creating visualizations and dashboards from data sources
![Screenshot 2024-06-10 123457](https://github.com/bhavanachitragar/zillow-data-analytics/assets/91766461/324d2a6f-8cfb-4fb7-8125-7c62ac140ab1)-----------------------------------------------------------------------------------------
### Guided by: Opeyemi Olanipekun