https://github.com/tyriek-cloud/nyc-dca-etl
Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.
https://github.com/tyriek-cloud/nyc-dca-etl
azure azure-data-factory blob-storage data data-engineering etl-pipeline
Last synced: 5 months ago
JSON representation
Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.
- Host: GitHub
- URL: https://github.com/tyriek-cloud/nyc-dca-etl
- Owner: Tyriek-cloud
- Created: 2023-03-13T15:19:27.000Z (over 3 years ago)
- Default Branch: DCA-ETL
- Last Pushed: 2023-03-14T14:27:03.000Z (over 3 years ago)
- Last Synced: 2025-04-08T16:19:13.684Z (about 1 year ago)
- Topics: azure, azure-data-factory, blob-storage, data, data-engineering, etl-pipeline
- Homepage: https://opendata.cityofnewyork.us/
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# NYC DCA ETL
This was a project done in Azure Data Factory.
I began by extracting data from New York City Open Data: https://opendata.cityofnewyork.us/
From there, I created a Blob Container within an existing storage account. Then I initialized Azure Data Factory to do a series of T-SQL transformations on CSV files. I ultimately wanted to load data into a parquet file. The dataflow looks like this:

The final, loaded result of the ETL process resulted in the creation of a parquet file hosted within a generated blob in the container:

I then went back into my container to look for any issues to trouble shoot. There were no issues to resolve, so I monitored activity in my container in a private dashboard:


