{"id":30719606,"url":"https://github.com/pyronear/pyro-train","last_synced_at":"2025-09-03T10:05:52.672Z","repository":{"id":306735513,"uuid":"1008407061","full_name":"pyronear/pyro-train","owner":"pyronear","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-03T19:46:23.000Z","size":2844,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-03T20:44:30.527Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pyronear.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-25T13:50:40.000Z","updated_at":"2025-07-30T09:29:45.000Z","dependencies_parsed_at":"2025-07-27T10:20:20.048Z","dependency_job_id":"74f930e0-900e-4822-b0b5-3dc969260557","html_url":"https://github.com/pyronear/pyro-train","commit_stats":null,"previous_names":["pyronear/pyro-train"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pyronear/pyro-train","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyronear%2Fpyro-train","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyronear%2Fpyro-train/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyronear%2Fpyro-train/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyronear%2Fpyro-train/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pyronear","download_url":"https://codeload.github.com/pyronear/pyro-train/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyronear%2Fpyro-train/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273019418,"owners_count":25031887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-03T10:05:49.071Z","updated_at":"2025-09-03T10:05:52.655Z","avatar_url":"https://github.com/pyronear.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pyronear: Machine Learning Pipeline for Wildfire Detection 🔥\n\nMachine Learning Training Pipeline for Wildfire Detection.\n\n[\u003cimg src=\"./docs/assets/images/ml_space.png\" /\u003e](https://www.earthtoolsmaker.org/spaces/early_forest_fire_detection/)\n\n![Pipeline Overview](./docs/assets/images/pipeline.png)\n\n## Data Pipeline\n\nThe whole repository is organized as a data pipeline that can be run to\ntrain the models and export them to the appropriate formats.\n\nThe Data pipeline is organized with a [dvc.yaml](./dvc.yaml) file.\n\n### DVC Stages\n\nThis section list and describes all the DVC stages that are defined in the\n[dvc.yaml](./dvc.yaml) file:\n\n- __build_model_input__: Generate model input for YOLO custom dataset training\nusing the provided raw dataset.\n- __train_yolo_baseline_small__: Train a YOLO baseline model on a subset of the\nfull dataset.\n- __train_yolo_baseline__: Train a YOLO baseline model on the full dataset.\n- __train_yolo_best__: Train the best YOLO model on the full dataset.\n- __build_manifest_yolo_best__: Build the manifest.yaml file to attach with the model.\n- __export_yolo_best__: Export the best YOLO model to different formats (ONNX, NCNN).\n\n## Setup\n\n### 🐍 Python dependencies\n\nInstall `uv` with `pipx`:\n\n```sh\npipx install uv\n```\n\nCreate a virtualenv and install the dependencies with `uv`:\n\n```sh\nuv sync\n```\n\nActivate the `uv` virutalenv:\n\n```sh\nsource .venv/bin/activate\n```\n\n### Git LFS\n\nMake sure [`git-lfs`](https://git-lfs.com/) is installed on your system.\n\nRun the following command to check:\n\n```sh\ngit lfs install\n```\n\nIf not installed, one can install it with the following:\n\n#### Linux\n\n```sh\nsudo apt install git-lfs\ngit-lfs install\n```\n\n#### Mac\n\n```sh\nbrew install git-lfs\ngit-lfs install\n```\n\n#### Windows\n\nDownload and run the latest [windows installer](https://github.com/git-lfs/git-lfs/releases).\n\n\n### Data Dependencies\n\nTo get the data dependencies one can use DVC - To fully use this\nrepository you would need access to our DVC remote storage which is\ncurrently reserved for Pyronear members. On request, you will be provided with\nAWS credentials to access our remote storage.\n\nPull the data files needed for training the model:\n\n```sh\ndvc get . ./data/03_model_input/\n```\n\nPull all the data files tracked by DVC using this command:\n\n```sh\ndvc pull\n```\n\n![Random batch sample from the dataset](./docs/assets/images/batch.jpg)\n\n##### Setup S3 access\n\nCreate the following file `~/.aws/config`:\n\n```toml\n[profile pyronear]\nregion = eu-west-3\n```\n\nAdd your credentials in the file `~/.aws/credentials` - replace `XXX`\nwith your access key id and your secret access key:\n\n```toml\n[pyronear]\naws_access_key_id = XXX\naws_secret_access_key = XXX\n```\n\nMake sure you use the AWS `pyronear` profile:\n\n```bash\nexport AWS_PROFILE=pyronear\n```\n\n## Project structure and conventions\n\nThe project is organized following mostly the [cookie-cutter-datascience\nguideline](https://drivendata.github.io/cookiecutter-data-science/#directory-structure).\n\n### Data\n\nAll the data lives in the `data` folder and follows some [data engineering\nconventions](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention).\n\n### Library Code\n\nThe library code is available under the `pyronear_mlops` folder.\n\n### Notebooks\n\nThe notebooks live in the `notebooks` folder. They are automatically synced to\nthe Git LFS storage.\nPlease follow [this\nconvention](https://drivendata.github.io/cookiecutter-data-science/#notebooks-are-for-exploration-and-communication)\nto name your Notebooks.\n\n`\u003cstep\u003e-\u003cghuser\u003e-\u003cdescription\u003e.ipynb` - e.g., `0.3-mateo-visualize-distributions.ipynb`.\n\n### Scripts\n\nThe scripts live in the `scripts` folder, they are\ncommonly CLI interfaces to the library\ncode.\n\n## DVC\n\nDVC is used to track and define data pipelines and make them\nreproducible. See `dvc.yaml`.\n\nTo get an overview of the pipeline DAG:\n\n```sh\ndvc dag\n```\n\nTo run the full pipeline:\n\n```sh\ndvc repro\n```\n\n## MLFlow\n\nAn MLFlow server is running when running ML experiments to track\nhyperparameters and performances and to streamline model\nselection.\n\nTo start the mlflow UI server, run the following command:\n\n```sh\nmake mlflow_start\n```\n\nTo stop the mlflow UI server, run the following command:\n\n```sh\nmake mlflow_stop\n```\n\nTo browse the different runs, open your browser and navigate to the URL:\n[http://localhost:5000](http://localhost:5000)\n\n## Test Suite\n\nRun the test suite with the following commmand:\n\n```sh\nmake run_test_suite\n```\n\n## Contribute to the project\n\n### New ML experiments\n\nFollow the steps:\n\n1. Work on a separate git branch: `git checkout -b \"\u003cuser\u003e/\u003cexperiment-name\u003e\"`\n2. Modify and iterate on the code, then run `dvc repro`. It will rerun\n   parts of the pipeline that have been updated.\n3. Commit your changes and open a Pull Request to get your changes\n   approved and merged.\n\n### Run Random Hyperparameter Search\n\nWe use random hyperparameter search to find the best set of hyperparameters for\nour models.\n\n#### Wide \u0026 Fast\n\nThe initial stage is to optimize for exploration of all hyperparameter ranges.\nA [wide.yaml](./scripts/model/yolo/spaces/wide.yaml) hyperparamter config file\nis available for performing this type of search.\n\nIt is good practice to run this search on a small subset of the full dataset to\nmake quickly iterate over many different combinations of hyperparameters.\n\nRun the wide and fast hyperparameter search with:\n\n```sh\nmake run_yolo_wide_hyperparameter_search\n```\n\n#### Narrow \u0026 Deep\n\n\nThe second stage of the hyperparameter search is to run some more narrow and\nlocal searches on identified combinations of good parameters from stage 1.\nA [narrow.yaml](./scripts/model/yolo/spaces/narrow.yaml) hyperparameter config\nfile is available for this type of search.\n\nIt is good practice to run this search on the full dataset to get the actual\nmodel performances of the randomly drawn sets of hyperparameters.\n\nRun the narrow and deep hyperparameter search with:\n\n```sh\nmake run_yolo_narrow_hyperparameter_search\n```\n\n#### Custom\n\nAdapt and run this command to launch a specific hyperparamater space search:\n\n```sh\nuv run python ./scripts/model/yolo/hyperparameter_search.py \\\n   --data ./data/03_model_input/wildfire/full/datasets/data.yaml \\\n   --output-dir ./data/04_models/yolo/ \\\n   --experiment-name \"random_hyperparameter_search\" \\\n   --filepath-space-yaml ./scripts/model/yolo/spaces/default.yaml \\\n   --n 5 \\\n   --loglevel \"info\"\n```\n\nOne can adapt the hyperparameter space to search by adding a new `space.yaml`\nfile based on the [default.yaml](./scripts/model/yolo/spaces/default.yaml)\n\n```yaml\nmodel_type:\n  type: array\n  array_type: str\n  values:\n    - yolo11n.pt\n    - yolo11s.pt\n    - yolo12n.pt\n    - yolo12s.pt\nepochs:\n  type: space\n  space_type: int\n  space_config:\n    type: linear\n    start: 50\n    stop: 70\n    num: 10\npatience:\n  type: space\n  space_type: int\n  space_config:\n    type: linear\n    start: 10\n    stop: 50\n    num: 10\nbatch:\n  type: array\n  array_type: int\n  values:\n    - 16\n    - 32\n    - 64\n...\n```\n\n### Generate a benchmark CSV file\n\n```sh\nmake run_yolo_benchmark\n```\n\n## 🌎 Release a new Model to the world\n\nThe script to release a new version of the model is located in\n`./scripts/model/yolo/release.py`.\nMake sure to set your `GITHUB_ACCESS_TOKEN` as an env variable in your shell\nbefore running the following script:\n\n```sh\nexport GITHUB_ACCESS_TOKEN=XXX\nuv run python ./scripts/release.py \\\n  --version v4.0.0 \\\n  --release-name \"dazzling dragonfly\" \\\n  --github-owner earthtoolsmaker \\\n  --github-repo pyro-train\n```\n\nThis will create a new release in the github repository with the model\nartifacts such as its weights.\n\n__Note__: The current naming convention for release is to use an adjective\npaired with an animal name starting with the same letter (eg. artistic alpaca,\nwise wolf, ...).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyronear%2Fpyro-train","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpyronear%2Fpyro-train","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyronear%2Fpyro-train/lists"}