{"id":13641263,"url":"https://github.com/roboflow/maestro","last_synced_at":"2025-05-14T03:10:22.490Z","repository":{"id":209362617,"uuid":"723015178","full_name":"roboflow/maestro","owner":"roboflow","description":"streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL","archived":false,"fork":false,"pushed_at":"2025-05-12T17:23:03.000Z","size":11109,"stargazers_count":2559,"open_issues_count":19,"forks_count":204,"subscribers_count":35,"default_branch":"develop","last_synced_at":"2025-05-13T09:12:10.533Z","etag":null,"topics":["captioning","fine-tuning","florence-2","multimodal","objectdetection","paligemma","phi-3-vision","qwen2-vl","transformers","vision-and-language","vqa"],"latest_commit_sha":null,"homepage":"https://maestro.roboflow.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roboflow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-24T13:28:57.000Z","updated_at":"2025-05-13T07:30:59.000Z","dependencies_parsed_at":"2023-12-04T13:29:52.674Z","dependency_job_id":"b61be838-039c-42e3-b6a3-14e499cc00f2","html_url":"https://github.com/roboflow/maestro","commit_stats":{"total_commits":205,"total_committers":11,"mean_commits":"18.636363636363637","dds":"0.46829268292682924","last_synced_commit":"78d0ba01226eec7a026da4b6b8038b125a617529"},"previous_names":["roboflow/set-of-mark","roboflow/multimodal-maestro"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fmaestro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fmaestro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fmaestro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fmaestro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roboflow","download_url":"https://codeload.github.com/roboflow/maestro/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254059512,"owners_count":22007769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captioning","fine-tuning","florence-2","multimodal","objectdetection","paligemma","phi-3-vision","qwen2-vl","transformers","vision-and-language","vqa"],"created_at":"2024-08-02T01:01:19.277Z","updated_at":"2025-05-14T03:10:17.481Z","avatar_url":"https://github.com/roboflow.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n  \u003ch1\u003emaestro\u003c/h1\u003e\n\n  \u003cdiv\u003e\n      \u003cimg\n        src=\"https://github.com/user-attachments/assets/c9416f1f-a2bf-4590-86da-d2fc89ba559b\"\n        width=\"80\"\n        height=\"40\"\n      /\u003e\n      \u003cimg\n        src=\"https://github.com/user-attachments/assets/75dc7214-e82a-498d-950e-c64d90218e49\"\n        width=\"80\"\n        height=\"40\"\n      /\u003e\n      \u003cimg\n        src=\"https://github.com/user-attachments/assets/5d265473-b938-4501-b894-6a44a6a28a8c\"\n        width=\"80\"\n        height=\"40\"\n      /\u003e\n      \u003cimg\n        src=\"https://github.com/user-attachments/assets/b7ccdf39-ac77-4dbd-8608-0fa2d9dadf0a\"\n        width=\"80\"\n        height=\"40\"\n      /\u003e\n  \u003c/div\u003e\n\n  \u003cbr\u003e\n\n  [![version](https://badge.fury.io/py/maestro.svg)](https://badge.fury.io/py/maestro)\n  [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_qwen2_5_vl_json_extraction.ipynb)\n  [![discord](https://img.shields.io/discord/1159501506232451173?logo=discord\u0026label=discord\u0026labelColor=fff\u0026color=5865f2\u0026link=https%3A%2F%2Fdiscord.gg%2FGbfgXGJ8Bk)](https://discord.gg/GbfgXGJ8Bk)\n\n\u003c/div\u003e\n\n## Hello\n\n**maestro** is a streamlined tool to accelerate the fine-tuning of multimodal models.\nBy encapsulating best practices from our core modules, maestro handles configuration,\ndata loading, reproducibility, and training loop setup. It currently offers ready-to-use\nrecipes for popular vision-language models such as **Florence-2**, **PaliGemma 2**, and\n**Qwen2.5-VL**.\n\n## Fine-tune VLMs for free\n\n| model, task and acceleration                                |                                                                                          open in colab                                                                                           |\n|:------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| Florence-2 (0.9B) object detection with LoRA (experimental) | [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_florence_2_object_detection.ipynb) |\n| PaliGemma 2 (3B) JSON data extraction with LoRA             | [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_paligemma_2_json_extraction.ipynb) |\n| Qwen2.5-VL (3B) JSON data extraction with QLoRA             | [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_qwen2_5_vl_json_extraction.ipynb)  |\n| Qwen2.5-VL (7B) object detection with QLoRA (experimental)  | [![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/maestro/blob/develop/cookbooks/maestro_qwen2_5_vl_object_detection.ipynb) |\n\n## News\n\n- `2025/02/05` (`1.0.0`): This release introduces support for Florence-2, PaliGemma 2, and Qwen2.5-VL and includes LoRA, QLoRA, and graph freezing to keep hardware requirements in check. It offers a single CLI/SDK to reduce code complexity, and a consistent JSONL format to streamline data handling.\n\n## Quickstart\n\n### Install\n\nTo begin, install the model-specific dependencies. Since some models may have clashing requirements,\nwe recommend creating a dedicated Python environment for each model.\n\n```bash\npip install \"maestro[paligemma_2]\"\n```\n\n### CLI\n\nKick off fine-tuning with our command-line interface, which leverages the configuration\nand training routines defined in each model’s core module. Simply specify key parameters such as\nthe dataset location, number of epochs, batch size, optimization strategy, and metrics.\n\n```bash\nmaestro paligemma_2 train \\\n  --dataset \"dataset/location\" \\\n  --epochs 10 \\\n  --batch-size 4 \\\n  --optimization_strategy \"qlora\" \\\n  --metrics \"edit_distance\"\n```\n\n### Python\n\nFor greater control, use the Python API to fine-tune your models.\nImport the train function from the corresponding module and define your configuration\nin a dictionary. The core modules take care of reproducibility, data preparation,\nand training setup.\n\n```python\nfrom maestro.trainer.models.paligemma_2.core import train\n\nconfig = {\n    \"dataset\": \"dataset/location\",\n    \"epochs\": 10,\n    \"batch_size\": 4,\n    \"optimization_strategy\": \"qlora\",\n    \"metrics\": [\"edit_distance\"]\n}\n\ntrain(config)\n```\n\n## Contribution\n\nWe appreciate your input as we continue refining Maestro. Your feedback is invaluable in guiding our improvements. To\nlearn how you can help, please check out our [Contributing Guide](https://github.com/roboflow/maestro/blob/develop/CONTRIBUTING.md).\nIf you have any questions or ideas, feel free to start a conversation in our [GitHub Discussions](https://github.com/roboflow/maestro/discussions).\nThank you for being a part of our journey!\n","funding_links":[],"categories":["Prompts","Python","多模态大模型","Object Detection Applications","Training"],"sub_categories":["资源传输下载","FineTune"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Fmaestro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froboflow%2Fmaestro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Fmaestro/lists"}