https://github.com/valeqm/latam-ml-challenge
Implementing a machine learning model to predict flight delays at SCL airport, deploying it as a FastAPI service, and operationalizing it on Google Cloud with proper CI/CD pipelines.
https://github.com/valeqm/latam-ml-challenge
api fastapi gcp google-cloud-platform machine-learning python unit-testing unittest
Last synced: 6 months ago
JSON representation
Implementing a machine learning model to predict flight delays at SCL airport, deploying it as a FastAPI service, and operationalizing it on Google Cloud with proper CI/CD pipelines.
- Host: GitHub
- URL: https://github.com/valeqm/latam-ml-challenge
- Owner: valeqm
- Created: 2023-09-01T23:03:49.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-03T15:56:39.000Z (almost 2 years ago)
- Last Synced: 2025-04-09T20:53:02.595Z (6 months ago)
- Topics: api, fastapi, gcp, google-cloud-platform, machine-learning, python, unit-testing, unittest
- Language: Jupyter Notebook
- Homepage:
- Size: 1.67 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Software Engineer (ML & LLMs) Challenge
## Overview
Welcome to the **Software Engineer (ML & LLMs)** Application Challenge. In this, you will have the opportunity to get closer to a part of the reality of the role, and demonstrate your skills and knowledge in machine learning and cloud.
## Problem
A jupyter notebook (`exploration.ipynb`) has been provided with the work of a Data Scientist (from now on, the DS). The DS, trained a model to predict the probability of **delay** for a flight taking off or landing at SCL airport. The model was trained with public and real data, below we provide you with the description of the dataset:
|Column|Description|
|-----|-----------|
|`Fecha-I`|Scheduled date and time of the flight.|
|`Vlo-I`|Scheduled flight number.|
|`Ori-I`|Programmed origin city code.|
|`Des-I`|Programmed destination city code.|
|`Emp-I`|Scheduled flight airline code.|
|`Fecha-O`|Date and time of flight operation.|
|`Vlo-O`|Flight operation number of the flight.|
|`Ori-O`|Operation origin city code.|
|`Des-O`|Operation destination city code.|
|`Emp-O`|Airline code of the operated flight.|
|`DIA`|Day of the month of flight operation.|
|`MES`|Number of the month of operation of the flight.|
|`AÑO`|Year of flight operation.|
|`DIANOM`|Day of the week of flight operation.|
|`TIPOVUELO`|Type of flight, I =International, N =National.|
|`OPERA`|Name of the airline that operates.|
|`SIGLAORI`|Name city of origin.|
|`SIGLADES`|Destination city name.|In addition, the DS considered relevant the creation of the following columns:
|Column|Description|
|-----|-----------|
|`high_season`|1 if `Date-I` is between Dec-15 and Mar-3, or Jul-15 and Jul-31, or Sep-11 and Sep-30, 0 otherwise.|
|`min_diff`|difference in minutes between `Date-O` and `Date-I`|
|`period_day`|morning (between 5:00 and 11:59), afternoon (between 12:00 and 18:59) and night (between 19:00 and 4:59), based on `Date-I`.|
|`delay`|1 if `min_diff` > 15, 0 if not.|## Challenge
### Instructions
1. Create a repository in **github** and copy all the challenge content into it. Remember that the repository must be **public**.
2. Use the **main** branch for any official release that we should review. It is highly recommended to use [GitFlow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) development practices. **NOTE: do not delete your development branches.**
3. Please, do not change the structure of the challenge (names of folders and files).
4. All the documentation and explanations that you have to give us must go in the `challenge.md` file inside `docs` folder.5. To send your challenge, you must do a `POST` request to:
`https://advana-challenge-check-api-cr-k4hdbggvoq-uc.a.run.app/software-engineer`
This is an example of the `body` you must send:
```json
{
"name": "Juan Perez",
"mail": "juan.perez@example.com",
"github_url": "https://github.com/juanperez/latam-challenge.git",
"api_url": "https://juan-perez.api"
}
```
##### ***PLEASE, SEND THE REQUEST ONCE.***If your request was successful, you will receive this message:
```json
{
"status": "OK",
"detail": "your request was received"
}
```***NOTE: We recommend to send the challenge even if you didn't manage to finish all the parts.***
### Context:
We need to operationalize the data science work for the airport team. For this, we have decided to enable an `API` in which they can consult the delay prediction of a flight.
*We recommend reading the entire challenge (all its parts) before you start developing.*
### Part I
In order to operationalize the model, transcribe the `.ipynb` file into the `model.py` file:
- If you find any bug, fix it.
- The DS proposed a few models in the end. Choose the best model at your discretion, argue why. **It is not necessary to make improvements to the model.**
- Apply all the good programming practices that you consider necessary in this item.
- The model should pass the tests by running `make model-test`.> **Note:**
> - **You cannot** remove or change the name or arguments of **provided** methods.
> - **You can** change/complete the implementation of the provided methods.
> - **You can** create the extra classes and methods you deem necessary.### Part II
Deploy the model in an `API` with `FastAPI` using the `api.py` file.
- The `API` should pass the tests by running `make api-test`.
> **Note:**
> - **You cannot** use other framework.### Part III
Deploy the `API` in your favorite cloud provider (we recomend to use GCP).
- Put the `API`'s url in the `Makefile` (`line 26`).
- The `API` should pass the tests by running `make stress-test`.> **Note:**
> - **It is important that the API is deployed until we review the tests.**### Part IV
We are looking for a proper `CI/CD` implementation for this development.
- Create a new folder called `.github` and copy the `workflows` folder that we provided inside it.
- Complete both `ci.yml` and `cd.yml`(consider what you did in the previous parts).