Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vicnesterenko/apache-airflow-labs
Building data processing pipelines in Apache Airflow/ELT patern
https://github.com/vicnesterenko/apache-airflow-labs
airflow apache-airflow kpi-fict kpi-ua
Last synced: 3 days ago
JSON representation
Building data processing pipelines in Apache Airflow/ELT patern
- Host: GitHub
- URL: https://github.com/vicnesterenko/apache-airflow-labs
- Owner: vicnesterenko
- Created: 2023-06-17T13:43:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-24T19:44:10.000Z (over 1 year ago)
- Last Synced: 2024-11-11T18:09:24.049Z (3 days ago)
- Topics: airflow, apache-airflow, kpi-fict, kpi-ua
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache_Airflow_labs
## Building data processing pipelines in Apache Airflow/ELT pattern
_This repository showcases the results of the labs completed during my first semester at Igor Sikorsky Kyiv Polytechnic Institute, where I pursued a Master's degree in Informatics and Software Engineering🎓_ - ![Link of faculty](https://fiot.kpi.ua/)
_The labs primarily focus on Apache Airflow and demonstrate data processing pipelines built using the ELT pattern. Through this repository, I aim to share my practical experiences and learnings from these labs with others interested in data engineering and workflow automation using Apache Airflow._
### Installation
1. Before proceeding, ensure you have Apache Airflow installed on your PC. If you are using a Windows system, you can use the Ubuntu subsystem available at https://www.microsoft.com/en-us/p/ubuntu/9nblggh4msv6. Make sure to enable developer mode in Windows Developer Settings and activate the Windows Subsystem for Linux component in Windows Features.
![Ubuntu Subsystem](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/337f98ea-2967-469c-b3d5-bac2362765d1)
2. Install the required packages by running the following commands:
```bash
sudo apt-get update
sudo apt-get install libmysqlclient-dev
sudo apt-get install libkrb5-dev
sudo apt-get install libsasl2-dev
sudo apt-get install postgresql postgresql-contrib
sudo service postgresql start
sudo nano /etc/postgresql/*/main/pg_hba.conf
```![PostgreSQL Configuration](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/5cfedbe4-6189-4890-afa6-825180b4838c)
```bash
sudo service postgresql restart
sudo apt install python3-pip
pip install apache-airflow
```![Install Apache Airflow](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/c119b70e-4842-4c4f-8ffb-8472d07d5409)
```bash
sudo pip install apache-airflow
airflow db init
```![Initialize Airflow Database](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/a5162d61-3ba4-4f07-84f0-28980a32741b)
```bash
sudo apt-get install build-dep python-psycopg2
pip install psycopg2-binary
```![Install psycopg2-binary](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/f01e5968-c352-4dc0-aa72-9641f6d4b31a)
### Setting up DAGs
3. Place your DAGs in the following folder path: C:/Users/vicwa/AppData/Local/Packages/CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc/LocalState/rootfs/home/vic/airflow/dags
![DAGs Path](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/46ca20aa-926a-4116-8e46-32139f2a55b9)
### Creating the database
4. Create a database using the following command:
```bash
psql -h 127.0.0.1 -d airflow -U vic
```![Create Database](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/75ceeca1-0eac-427d-a31c-ab9a2faa6851)
### Running Airflow
5. Run the following commands in the Ubuntu console:
```bash
sudo service postgresql restart
airflow db init
airflow webserver -p 8080
airflow scheduler
sudo service postgresql restart
```![Start Airflow Services](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/dd971cac-6da8-446c-8b99-9959ce434290)
### Lab1 Results
6. Results for Lab1:
![Result 1](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/6cf7000d-f5e1-4c8c-a1ab-3eeea626b964)
![Result 2](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/40104034-f325-47ad-9324-818564eef9c7)
![Result 3](https://github.com/vicnesterenko/Apache_Airflow_labs/assets/136901590/aa55f2eb-67ca-456f-8489-0f69549520c8)