https://github.com/mdh266/airflowetl
Blog post on ETL pipelines with Airflow
https://github.com/mdh266/airflowetl
airflow data-engineering data-pipeline database etl etl-pipeline postgresql python schedule sql
Last synced: 3 months ago
JSON representation
Blog post on ETL pipelines with Airflow
- Host: GitHub
- URL: https://github.com/mdh266/airflowetl
- Owner: mdh266
- License: mit
- Created: 2017-08-26T23:54:55.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-06-07T18:11:17.000Z (over 5 years ago)
- Last Synced: 2024-06-11T17:53:55.028Z (over 1 year ago)
- Topics: airflow, data-engineering, data-pipeline, database, etl, etl-pipeline, postgresql, python, schedule, sql
- Language: Jupyter Notebook
- Homepage: http://michael-harmon.com/blog/AirflowETL.html
- Size: 782 KB
- Stars: 21
- Watchers: 2
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# An Example ETL Pipeline With Airflow
In this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. You can see the source code for this project here.
*Extracting* data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. If the query is sucessful, then we will receive data back from the API's server. Often times the data we get back is in the form of JSON. JSON can pretty much be thought of a semi-structured data or as a dictionary where the dictionary keys and values are strings. Since the data is a dictionary of strings this means we must *transform* it before storing or *loading* into a database. Airflow is a platform to schedule and monitor workflows and in this post I will show you how to use it to extract the daily weather in New York from the OpenWeatherMap API, convert the temperature to Celsius and load the data in a simple PostgreSQL database.
## Requirements
To install the requirements (except for Python and postgres) type:
pip install -r requirements.t
You can see the actual blog post here.