https://github.com/abrahamkoloboe27/airflow-pipeline-dashboard-compagnie-aerienne
Lien de l'application
https://github.com/abrahamkoloboe27/airflow-pipeline-dashboard-compagnie-aerienne
airflow atlas data-engineering docker docker-compose dockerfile duckdb etl etl-pipeline etl-pipelines extract-transform-load makefile mongodb mongodb-atlas orchestration postgresql python streamlit streamlit-dashboard
Last synced: about 2 months ago
JSON representation
Lien de l'application
- Host: GitHub
- URL: https://github.com/abrahamkoloboe27/airflow-pipeline-dashboard-compagnie-aerienne
- Owner: abrahamkoloboe27
- Created: 2024-09-15T21:05:18.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-12-18T13:47:39.000Z (5 months ago)
- Last Synced: 2025-03-27T01:35:31.744Z (2 months ago)
- Topics: airflow, atlas, data-engineering, docker, docker-compose, dockerfile, duckdb, etl, etl-pipeline, etl-pipelines, extract-transform-load, makefile, mongodb, mongodb-atlas, orchestration, postgresql, python, streamlit, streamlit-dashboard
- Language: Python
- Homepage: https://airflow-pipeline-dashboard-compagnie-aerienne.streamlit.app/
- Size: 555 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# ✈️ Airline Company Dashboard
This project is an interactive dashboard for an airline company, integrating a data pipeline orchestrated with Airflow to handle data ingestion, transformation, and visualization.
The data used in this project is available through this [link](https://edu.postgrespro.com/demo-big-en.zip).
## 📋 Table of Contents
1. [Project Overview](#1-project-overview)
2. [Data Source](#2-data-source)
3. [Pipeline Architecture](#3-pipeline-architecture)
4. [Technologies Used](#4-technologies-used)
5. [How to Run the Project](#5-how-to-run-the-project)
6. [Makefile Commands](#6-makefile-commands)---
## 1. 📊 Project Overview
The project consists of two main components:
1. **Data Pipeline**: Managed by Airflow, this pipeline extracts data from a PostgreSQL database, transforms it using DuckDB, and loads it into MongoDB Atlas for efficient storage.
2. **Streamlit Dashboard**: This application retrieves the data stored in MongoDB Atlas, processes it, and displays it in an interactive dashboard for real-time visualization.## 2. 🗄️ Data Source
The data used for this project is sourced from a PostgreSQL database and can be downloaded from the following [link](https://edu.postgrespro.com/demo-big-en.zip). The data includes information about flights, passengers, and various operational aspects of the airline.
## 3. 🛠️ Pipeline Architecture

The data pipeline follows these steps:
1. **Data Extraction (PostgreSQL)**: Airflow orchestrates the extraction of data from the PostgreSQL database.
2. **Transformation (DuckDB)**: DuckDB is used to perform fast and efficient data transformations.
3. **Loading (MongoDB Atlas)**: The transformed data is loaded into MongoDB Atlas, ready for visualization.
4. **Visualization (Streamlit)**: The Streamlit app connects to MongoDB, retrieves the data, processes it, and displays it in an interactive dashboard.## 4. 💻 Technologies Used
The following technologies are utilized in this project:
- **Airflow**: A workflow orchestrator used to automate the ETL pipeline.
- **DuckDB**: An OLAP engine used for efficient data processing.
- **PostgreSQL**: A relational database, the source of the data.
- **MongoDB Atlas**: A NoSQL database for storing the processed data.
- **Streamlit**: A web interface to display the data in an interactive dashboard.
- **Docker & Docker Compose**: Used to containerize the services and manage orchestration.## 5. 🚀 How to Run the Project
### Prerequisites
Make sure you have the following tools installed on your machine:
- **Docker**
- **Docker Compose**
- **Make**### Installation Steps
1. Clone the GitHub repository:
```bash
git clone https://github.com/abrahamkoloboe27/Airflow-Pipeline-Dashboard-Compagnie-Aerienne
cd Airflow-Pipeline-Dashboard-Compagnie-Aerienne
```2. Configure Airflow connections:
- After starting the services, navigate to the Airflow web interface.
- Go to **Admin > Connections** in Airflow.
- Add connections for PostgreSQL and MongoDB Atlas with the correct URI, login, and password.### Launching the Services
- **Build Docker Images**:
```bash
make build
```
This command builds the necessary Docker images for the services.- **Start the Services**:
```bash
make up
```
This command starts Airflow, PostgreSQL, MongoDB Atlas, and the Streamlit app.- **Build and Start Services Simultaneously**:
```bash
make up-build
```
This command rebuilds the services if necessary and then starts them.- **Stop the Services**:
```bash
make down
```
This command stops all running services.## 6. 📜 Makefile Commands
The Makefile included in the project allows you to execute the following commands:
- **`make build`**: Builds the Docker images required for the project.
- **`make up`**: Starts the containerized services (Airflow, PostgreSQL, MongoDB, Streamlit).
- **`make up-build`**: Rebuilds the Docker images and starts the services.
- **`make down`**: Stops all the running services.---
## 🎯 Conclusion
This project provides a comprehensive solution for data management and visualization for an airline company. It integrates a complete data pipeline that automates extraction, transformation, and loading (ETL) of data, while Streamlit provides an interactive environment for exploring and analyzing the data in real time.
---