https://github.com/pyronear/pyro-mlops

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/pyronear/pyro-mlops
Owner: pyronear
License: apache-2.0
Created: 2023-05-03T07:00:09.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-02-14T11:09:11.000Z (over 1 year ago)
Last Synced: 2025-05-08T07:23:29.144Z (about 1 year ago)
Language: Python
Size: 155 MB
Stars: 6
Watchers: 4
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Pyronear: Machine Learning Pipeline for Wildfire Detection 🔥

![Pyronear Logo](./docs/assets/images/pyronear_logo.png)

Machine Learning Pipeline for Wildfire Detection.

![Pipeline Overview](./docs/assets/images/pipeline.png)

## Setup

### Dependencies

- [Poetry](https://python-poetry.org/): Python packaging and dependency
management - Install it with something like `pipx`
- [Git LFS](https://git-lfs.com/): Git Large File Storage replaces large
files such as jupyter notebooks with text pointers inside Git while
storing the file contents on a remote server like github.com
- [DVC](https://dvc.org/): Data Version Control - This will get
installed automatically
- [MLFlow](https://mlflow.org/): ML Experiment Tracking - This will get
installed automatically

### Install

#### Poetry

Follow the [official documentation](https://python-poetry.org/docs/) to install `poetry`.

#### Git LFS

Make sure [`git-lfs`](https://git-lfs.com/) is installed on your system.

Run the following command to check:

```sh
git lfs install
```

If not installed, one can install it with the following:

##### Linux

```sh
sudo apt install git-lfs
git-lfs install
```

##### Mac

```sh
brew install git-lfs
git-lfs install
```

##### Windows

Download and run the latest [windows installer](https://github.com/git-lfs/git-lfs/releases).

#### Project Dependencies

Create a virtualenv and install python version with conda - or use a
combination of pyenv and venv:

```sh
conda create -n pyronear-mlops python=3.12
```

Activate the virtual environment:

```sh
conda activate pyronear-mlops
```

Install python dependencies

```sh
poetry install
```

#### Data Dependencies

To get the data dependencies one can use DVC - To fully use this
repository you would need access to our DVC remote storage which is
currently reserved for Pyronear members. On request, you will be provided with
AWS credentials to access our remote storage.

Once setup, run the following command:

```sh
dvc pull
```

![Random batch sample from the dataset](./docs/assets/images/batch.jpg)

##### Setup S3 access

Create the following file `~/.aws/config`:

```toml
[profile pyronear]
region = eu-west-3
```

Add your credentials in the file `~/.aws/credentials` - replace `XXX`
with your access key id and your secret access key:

```toml
[pyronear]
aws_access_key_id = XXX
aws_secret_access_key = XXX
```

Make sure you use the AWS `pyronear` profile:

```bash
export AWS_PROFILE=pyronear
```

## Project structure and conventions

The project is organized following mostly the [cookie-cutter-datascience
guideline](https://drivendata.github.io/cookiecutter-data-science/#directory-structure).

### Data

All the data lives in the `data` folder and follows some [data engineering
conventions](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention).

### Library Code

The library code is available under the `pyronear_mlops` folder.

### Notebooks

The notebooks live in the `notebooks` folder. They are automatically synced to
the Git LFS storage.
Please follow [this
convention](https://drivendata.github.io/cookiecutter-data-science/#notebooks-are-for-exploration-and-communication)
to name your Notebooks.

`--.ipynb` - e.g., `0.3-mateo-visualize-distributions.ipynb`.

### Scripts

The scripts live in the `scripts` folder, they are
commonly CLI interfaces to the library
code.

## DVC

DVC is used to track and define data pipelines and make them
reproducible. See `dvc.yaml`.

To get an overview of the pipeline DAG:

```sh
dvc dag
```

To run the full pipeline:

```sh
dvc repro
```

## MLFlow

An MLFlow server is running when running ML experiments to track
hyperparameters and performances and to streamline model
selection.

To start the mlflow UI server, run the following command:

```sh
make mlflow_start
```

To stop the mlflow UI server, run the following command:

```sh
make mlflow_stop
```

To browse the different runs, open your browser and navigate to the URL:
[http://localhost:5000](http://localhost:5000)

## Contribute to the project

### New ML experiments

Follow the steps:

1. Work on a separate git branch: `git checkout -b "/"`
2. Modify and iterate on the code, then run `dvc repro`. It will rerun
parts of the pipeline that have been updated.
3. Commit your changes and open a Pull Request to get your changes
approved and merged.

### Run Random Hyperparameter Search

#### YOLOv8

Use the following commands to run random hyperparameter search:

```sh
make run_yolov8_hyperparameter_search
```

It will run 10 random training runs with hyperparameters drawn from the
hyperparameter space defined in
`pyronear_mlops/model/yolo/hyperparameters/yolov8.py`

#### YOLOv9

Use the following commands to run random hyperparameter search:

```sh
make run_yolov9_hyperparameter_search
```

It will run 10 random training runs with hyperparameters drawn from the
hyperparameter space defined in
`pyronear_mlops/model/yolo/hyperparameters/yolov9.py`

### Generate a benchmark CSV file

#### YOLOv8

```sh
make yolov8_benchmark
```

#### YOLOv9

```sh
make yolov9_benchmark
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pyronear/pyro-mlops

Awesome Lists containing this project

README