Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cleberzumba/data-ingestion-and-analysis-with-azure-databricks

Data ingestion and analysis project using AWS and Azure technologies, integrated with the powerful Databricks platform.
https://github.com/cleberzumba/data-ingestion-and-analysis-with-azure-databricks

Last synced: about 1 month ago
JSON representation

Data ingestion and analysis project using AWS and Azure technologies, integrated with the powerful Databricks platform.

Awesome Lists containing this project

README

        

# San Francisco Fire Calls ETL and Analysis

* This pipeline uses the San Francisco Fire Department's call event dataset and demonstrates:
* *End-to-end Data Engineering pipeline covers the extraction, transformation and loading (ETL) steps of large volumes of data, using PySpark for transformation and Spark SQL for queries. Caching techniques were implemented to optimize query performance, and data analysis was conducted to gain insights.*
* *How to answer questions by analizing data using Spark SQL*

* This project involved:

* *Data Ingestion: Reading emergency call data from the San Francisco Fire Department stored in Amazon S3.*
* *Secure Storage: Transferring and storing the data in Azure Data Lake Storage (ADLS).*
* *Transformation and Analysis: Using Azure Databricks to transform, analyze, and store the transformed data.*

![imagem](images/etl-process-image.png)