Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/softchris/mlops-exercise
exericse for mlops
https://github.com/softchris/mlops-exercise
Last synced: 16 days ago
JSON representation
exericse for mlops
- Host: GitHub
- URL: https://github.com/softchris/mlops-exercise
- Owner: softchris
- Created: 2024-05-01T20:22:23.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-07-16T16:48:19.000Z (4 months ago)
- Last Synced: 2024-10-07T13:23:04.612Z (about 1 month ago)
- Language: Python
- Size: 10.7 KB
- Stars: 2
- Watchers: 1
- Forks: 38
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Exercise - understanding MLOps using GitHub Actions
## Requirements
- A GitHub account.
- Basic knowledge of Python.
- Python installed on your machine.## Problem Statement
You have a python script or Notebook and you want to ensure that any changes made to the script or notebook are automatically tested so that you can ensure that the changes do not break the code or worsens the performance of the model.
## Theory
What's a GitHub Action?
GitHub Actions help you automate tasks within your software development life cycle. GitHub Actions are event-driven, meaning that you can run a series of commands after a specified event has occurred. For example, every time someone creates a pull request for a repository, you can automatically run a command that executes a software testing script.
What's a workflow?
A workflow is a configurable automated process made up of one or more jobs. You must create a YAML file to define your workflow configuration. The file must be stored in the `.github/workflows` directory of your repository.
Example of a workflow file:
```yaml
name: Manually triggered workflow
on:
workflow_dispatch:
jobs:
check-bats-version:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '14'
- run: npm install -g bats
- run: bats -v
```The above workflow file is triggered manually (`workflow_dispatch`).
- It defines a job called `check-bats-version` that runs on the latest version of Ubuntu.
- It defines a series of steps that the job should take:
- Checks out the repository.
- Sets up Node.js version 14.
- Installs `bats` (Bash Automated Testing System) globally.
- Runs the `bats -v` command to check the version of `bats`.## Assignment
In this assignment, we'll set up a workflow that runs a test script whenever a push is made to the repository. The test script will test the performance of a model that predicts whether a credit card transaction is fraudulent or not.
### Step 0 (optional, if you want to test the code locally)
In this steps, you'll try to run the code and the tests. It would therefore require you to have the following installed on your machine:
- Python
1. Clone/Fork this repository to your GitHub account.
1. Start the project by creating a virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```For Windows users:
```bash
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
```1. Generate a random dataset using the following code:
```python
python util/generate.py
```This command stores credit_card_records.csv in the data folder.
1. Run the app using the following command:
```bash
python app.py
```It should output the following:
```bash
Model accuracy:
```1. Run the tests using the following command, using `pytest`:
```bash
pytest
```It should output the following:
```bash
.
----------------------------------------------------------------------
Ran 2 tests in 0.000sOK
```### Step 1 - your first workflow
The object of this step is to get your first feel for what it's like to have GitHub Actions automate things for you.
1. Create a workflow file that runs the test script whenever you choose to run it manually.
Copy the file `solution/workflows/manual.yml` to `.github/workflows/manual.yml`.
Run the workflow by going to your repo on GitHub, clicking on the `Actions` tab, and then clicking on the `Run workflow` button.
You should see the workflow running and bats version being printed in the logs.
**What did you learn?**
You've learned how the GitHub Actions platform works and how to create a workflow that runs on demand. Next, we'll apply GitHub Actions to our specific use case.
### Step 2 - run the tests on new pull request
A common way of working is to create a new branch for a new feature or bug fix, make the changes, and then create a pull request to merge the changes into the main branch. This way of working gives your colleagues a chance to review your code before it's merged.
Here's a great opportunity to let a GitHub Action run the tests for you whenever a new pull request is created. A failed test means you've introduced a bug or worsened the performance of the model.
1. Copy the file `solution/workflows/pull_request.yml` to `.github/workflows/pull_request.yml`.
Let's examine why this works:
```yml
on:
pull_request:
types: [opened, reopened]
```The above code listens for pull requests that are opened or reopened and runs the job `check_code` if so.
Inspecting the job, we see the following definition:
```yml
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Test with pytest
run: |
pip install pytest pytest-cov
pytest tests.py --doctest-modules --junitxml=junit/test-results.xml --cov=com --cov-report=xml --cov-report=html
```Here's what the job does:
- It runs on the latest version of Ubuntu.
- It checks out the repository.
- It sets up Python 3.x.
- It installs dependencies.
- It runs the tests using `pytest`.**What did you learn?**
If the test fails, the pull request will indicate so, this is a great way to signal that this code should not be merged into the main branch as it would break the code or worsen the performance of the model.
### Step 3 (optional) - compare the performance of the model
So far, our tests look like so:
```python
def test_model_file_created():
app.main() # Assuming the main function encapsulates the training logic
assert os.path.exists('models/model.pkl')def test_model_score():
score = app.main() # Assuming the main function returns the score
assert isinstance(score, float)
assert 0.0 <= score <= 1.0
```These tests are great for ensuring the model is created and that the score is within a certain range. However, they don't test the performance of the model versus earlier version of the model. So how can we introduce such a mechanism?
One way to do this is to:
- Store the score of the model in a file. Such a file could look like so:
```json
[{
"version": "1.0",
"score": 0.8
},
{
"version": "1.1",
"score": 0.04
}]
```Here we see that the model is improving for each version. In the context of testing the model to see if we've improved the model or worsened it is to train the model and compare the score with the previous score, in this case, compare the score of version 1.1 with the current version you're working on.
Here's the changes we need to make to the tests:
1. Create a file called `model_scores.json` in the root of the project.
1. Add the following code to the test file, `tests.py`:
```python
import json
import osdef test_model_score():
score = app.main() # Assuming the main function returns the score
assert isinstance(score, float)
assert 0.0 <= score <= 1.0# Load the model scores
with open('model_scores.json', 'r') as f:
model_scores = json.load(f)# Get the latest model score
latest_score = model_scores[-1]['score']# Compare the latest score with the current score
assert score >= latest_score
```Now, we've added a test that compares the latest score with the current score. If the current score is worse than the latest score, the test will fail.
> NOTE: if the tests pass, you will see that in the PR, add a new entry if so to the JSON file like so:
```json
{
"version": "1.2",
"score":
}
```> TIP: it's a good idea if you're a developer to tag this commit with a version number, e.g., `v1.2` like so: `git tag v1.2` and then push the tag to GitHub like so: `git push origin v1.2`. This way, you can easily find where in your code the model was improved or worsened.
**What did you learn?**
You've learned how to compare the performance of the model with the previous version of the model. This is a great way to ensure that the model is improving and not worsening.
## Hand in
Send a link to your repository on GitHub to your teacher.