https://github.com/tyriek-cloud/nyc-dca-etl

Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.
https://github.com/tyriek-cloud/nyc-dca-etl

azure azure-data-factory blob-storage data data-engineering etl-pipeline

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/tyriek-cloud/nyc-dca-etl
Owner: Tyriek-cloud
Created: 2023-03-13T15:19:27.000Z (over 3 years ago)
Default Branch: DCA-ETL
Last Pushed: 2023-03-14T14:27:03.000Z (over 3 years ago)
Last Synced: 2025-04-08T16:19:13.684Z (about 1 year ago)
Topics: azure, azure-data-factory, blob-storage, data, data-engineering, etl-pipeline
Homepage: https://opendata.cityofnewyork.us/
Size: 27.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# NYC DCA ETL

This was a project done in Azure Data Factory.

I began by extracting data from New York City Open Data: https://opendata.cityofnewyork.us/

From there, I created a Blob Container within an existing storage account. Then I initialized Azure Data Factory to do a series of T-SQL transformations on CSV files. I ultimately wanted to load data into a parquet file. The dataflow looks like this:

![image](https://user-images.githubusercontent.com/62261407/224825767-5fed9d29-175a-45cb-b914-6cea558afa56.png)

The final, loaded result of the ETL process resulted in the creation of a parquet file hosted within a generated blob in the container:

![image](https://user-images.githubusercontent.com/62261407/224824921-bf381f03-1c8f-4f73-bc84-ba38e659d3ab.png)

I then went back into my container to look for any issues to trouble shoot. There were no issues to resolve, so I monitored activity in my container in a private dashboard:

![image](https://user-images.githubusercontent.com/62261407/225027680-e834c12a-e771-4284-bb86-89f2433afdbd.png)
![image](https://user-images.githubusercontent.com/62261407/225027790-1646533d-1a54-4b3c-abd2-552dfc35a519.png)
![image](https://user-images.githubusercontent.com/62261407/225027867-d180b4d3-7a78-4fb0-9275-c18ad4148b9e.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tyriek-cloud/nyc-dca-etl

Awesome Lists containing this project

README