Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cleberzumba/data-ingestion-and-analysis-with-azure-databricks
Data ingestion and analysis project using AWS and Azure technologies, integrated with the powerful Databricks platform.
https://github.com/cleberzumba/data-ingestion-and-analysis-with-azure-databricks
Last synced: about 1 month ago
JSON representation
Data ingestion and analysis project using AWS and Azure technologies, integrated with the powerful Databricks platform.
- Host: GitHub
- URL: https://github.com/cleberzumba/data-ingestion-and-analysis-with-azure-databricks
- Owner: cleberzumba
- Created: 2024-07-26T18:00:44.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-21T14:46:55.000Z (4 months ago)
- Last Synced: 2024-11-08T07:42:52.901Z (3 months ago)
- Size: 6.76 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# San Francisco Fire Calls ETL and Analysis
* This pipeline uses the San Francisco Fire Department's call event dataset and demonstrates:
* *End-to-end Data Engineering pipeline covers the extraction, transformation and loading (ETL) steps of large volumes of data, using PySpark for transformation and Spark SQL for queries. Caching techniques were implemented to optimize query performance, and data analysis was conducted to gain insights.*
* *How to answer questions by analizing data using Spark SQL** This project involved:
* *Data Ingestion: Reading emergency call data from the San Francisco Fire Department stored in Amazon S3.*
* *Secure Storage: Transferring and storing the data in Azure Data Lake Storage (ADLS).*
* *Transformation and Analysis: Using Azure Databricks to transform, analyze, and store the transformed data.*![imagem](images/etl-process-image.png)