{"id":28633781,"url":"https://github.com/roboflow/rf100-vl","last_synced_at":"2025-09-06T11:38:22.765Z","repository":{"id":282785630,"uuid":"949615441","full_name":"roboflow/rf100-vl","owner":"roboflow","description":"Code from the paper \"Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models\"","archived":false,"fork":false,"pushed_at":"2025-06-02T14:28:21.000Z","size":8982,"stargazers_count":62,"open_issues_count":1,"forks_count":3,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-06-10T00:49:11.805Z","etag":null,"topics":["computer-vision","multimodal-datasets","object-detection","object-detection-benchmarks","rf100"],"latest_commit_sha":null,"homepage":"https://rf100-vl.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roboflow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-16T20:49:08.000Z","updated_at":"2025-06-02T14:28:26.000Z","dependencies_parsed_at":"2025-04-13T01:39:15.716Z","dependency_job_id":"cec557b6-5273-4dc9-b73c-41df522df016","html_url":"https://github.com/roboflow/rf100-vl","commit_stats":null,"previous_names":["roboflow/rf100vl","roboflow/rf100-vl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/roboflow/rf100-vl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Frf100-vl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Frf100-vl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Frf100-vl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Frf100-vl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roboflow","download_url":"https://codeload.github.com/roboflow/rf100-vl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Frf100-vl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259495162,"owners_count":22866627,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","multimodal-datasets","object-detection","object-detection-benchmarks","rf100"],"created_at":"2025-06-12T15:39:09.402Z","updated_at":"2025-06-12T15:39:11.896Z","avatar_url":"https://github.com/roboflow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch2\u003eRoboflow 100-VL:\u003cbr\u003eA Multi-Domain Object Detection\nBenchmark \u003cbr\u003efor Vision-Language Models\u003c/h2\u003e\n\nPeter Robicheaux \u003csup\u003e1†\u003c/sup\u003e\nMatvei Popov\u003csup\u003e1†\u003c/sup\u003e\nAnish Madan \u003csup\u003e2\u003c/sup\u003e\nIsaac Robinson \u003csup\u003e1\u003c/sup\u003e\n\nJoseph Nelson \u003csup\u003e1\u003c/sup\u003e\nDeva Ramanan \u003csup\u003e2\u003c/sup\u003e\nNeehar Peri \u003csup\u003e2\u003c/sup\u003e\n\n\u003ca target=\"_blank\" href=\"https://roboflow.com\"\u003eRoboflow\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003ca target=\"_blank\" href=\"https://www.cmu.edu/\"\u003eCarnegie Mellon University\u003c/a\u003e\n\n\u003cp class=\"first-authors\"\u003e† Equal Contribution\u003c/p\u003e\n\n\u003cdiv\u003e\n\u003c!-- \u003ca href=\"https://www.arxiv.org/pdf/2502.13130\" target=\"_blank\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/📄_Paper-arXiv-red?style=for-the-badge\" alt=\"Paper\" /\u003e\n\u003c/a\u003e\u0026nbsp; --\u003e\n\u003ca href=\"https://universe.roboflow.com/rf100-vl/\" target=\"_blank\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/🌐_Datasets-Roboflow_Universe-blue?style=for-the-badge\" alt=\"Datasets\" /\u003e\n\u003c/a\u003e\u0026nbsp;\n\u003ca href=\"https://rf100-vl.org\" target=\"_blank\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/🔗_Website-rf100--vl.org-green?style=for-the-badge\" alt=\"Website\" /\u003e\n\u003c/a\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\n\nIntroduced in the paper \"[Roboflow 100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models](https://arxiv.org/pdf/2505.20612)\", RF100-VL is a large-scale collection of 100 multi-modal datasets with diverse concepts not commonly found in VLM pre-training.\n\nThe benchmark includes images, with corresponding annotations, from seven domains: flora and fauna, sport, industry, document processing, laboratory imaging, aerial imagery, and miscellaneous datasets related to various use cases for which detection models are commonly used.\n\nYou can use RF100-VL to benchmark fully supervised, semi-supervised and few-shot object detection models, and Vision Language Models (VLMs) with localization capabilities.\n\n## Download RF100-VL\n\nTo download RF100-VL, first install the `rf100vl` pip package:\n\n```\npip install rf100vl\n```\n\nRF100-VL is hosted on Roboflow Universe, the world's largest repository of annotated computer vision dataset. You will need a free Roboflow Universe API key to download the dataset. [Learn how to find your API key]()\n\nExport your API key into an environment variable called `ROBOFLOW_API_KEY`:\n\n```\nexport ROBOFLOW_API_KEY=YOUR_KEY\n```\n\nSeveral helper functions are available to download RF100-VL and its subsets. These are split up into two categories: functions that retrieve Dataset objects with the name of each project and its category. (that start with `get_`), and data downloaders (that start with `download_`).\n\n| Data Loader Name               | Dataset Name           |\n|--------------------------------|------------------------|\n| `get_rf100vl_fsod_projects`      | RF100-VL-FSOD          |\n| `get_rf100vl_projects`           | RF100-VL               |\n| `get_rf20vl_fsod_projects`       | RF20-VL-FSOD           |\n| `get_rf20vl_full_projects`       | RF20-VL           |\n| `download_rf100vl_fsod`          | RF100-VL-FSOD          |\n| `download_rf100vl`               | RF100-VL               |\n| `download_rf20vl_fsod`           | RF20-VL-FSOD           |\n| `download_rf20vl_full`           | RF20-VL           |\n\nEach dataset object has its own `download` method.\n\nHere is an example showing how to download the full dataset:\n\n```python\nfrom rf100vl import download_rf100vl\n\ndownload_rf100vl(path=\"./rf100-vl/\")\n```\n\nThe datasets will be downloaded in COCO JSON format to a directory called `rf100-vl`. Every dataset will be in its own sub-folder.\n\n## CVPR 2025 Workshop Challenge: Few-Shot Object Detection from Annotator Instructions\n\n**Organized by:** Anish Madan, Neehar Peri, Deva Ramanan\n\n### Introduction\n\nThis challenge focuses on few-shot object detection (FSOD) with 10 examples of each class provided by a human annotator. Existing FSOD benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice.\n\nRather than pre-training on only a small number of base categories, we argue that it is more practical to download a foundational model (e.g., a vision-language model (VLM) pretrained on web-scale data) and fine-tune it for specific applications. We propose a new FSOD benchmark protocol that evaluates detectors pre-trained on any external dataset (not including the target dataset), and fine-tuned on K-shot annotations per C target classes.\n\nWe evaluate a subset of 20 datasets from Roboflow-VL. Each dataset is independently evaluated using AP. Roboflow-VL includes datasets that are out-of-distribution from typical internet-scale pre-training data, making it a particularly challenging (even for VLMs) for Foundational FSOD.\n\n:rotating_light: Top performing teams can win cash prizes! :rotating_light:\n\n:1st_place_medal: 1st Place: $750\n\n:2nd_place_medal: 2nd Place: $500\n\n:3rd_place_medal: 3rd Place: $250\n\nTo be eligible for prizes, teams must submit a technical report, open source their code, and provide instructions on how to reproduce their results. Teams must also beat our best performing official baseline to be eligible for prizes. Many thanks to Roboflow for sponsoring prizes!\n\n### Benchmarking Protocols\n\n**Goal:** Developing robust object detectors using few annotations provided by annotator instructions. The detector should detect object instances of interest in real-world testing images.\n\n**Environment for model development:**\n- **Pretraining:** Models are allowed to pre-train on any existing datasets.\n- **Fine-Tuning:** Models can fine-tune on 10 shots from each of RF20-VL-FSOD's datasets\n- **Evaluation:** Models are evaluated on RF20-VL-FSOD's test set. Each dataset is evaluated independently.\n\n**Evaluation metrics:**\n- **AP:** The average precision of IoU thresholds from 0.5 to 0.95 with the step size 0.05.\n\n### Submission Details\n\nSubmit a zip file with pickle files for each dataset. The name of each pickle file should match the name of each dataset. Each pickle file should use the following COCO format.\n\n```json\n[ \n\"image_id\": int,\n\"instances\":\n  [{ \"image_id\": int,\n  \"category_id\": int,\n  \"bbox\": [x,y,width,height],\n  \"score\": float\n  },\n  {\"image_id\": int,\n  \"category_id\": int,\n  \"bbox\": [x,y,width,height],\n  \"score\": float }, ... ,],\n  ...,\n]\n```\n\nWe've provided a [sample submission](https://drive.google.com/file/d/1Pp8oAYMMnCxTzFa078NS3CuzdNlIEVzp/view) for your reference. Submissions should be uploaded to our [EvalAI server](https://eval.ai/web/challenges/challenge-page/2459/overview). \n\n### Official Baseline\n\nWe pre-train Detic on ImageNet21-K, COCO Captions, and LVIS. We evaluate this pre-trained model zero-shot on the datasets in RF20-VL. \n\nOur baseline code is available [here](https://github.com/anishmadan23/foundational_fsod/tree/fsod_rf20vl?tab=readme-ov-file).\n\n### Timeline\n\n- Submission opens: March 15th, 2025\n- Submission closes: June 8th, 2025, 11:59 pm Pacific Time\n- The top 3 participants on the leaderboard will be invited to give a talk at the workshop\n\n### References\n\n1. Madan et. al. \"Revisiting Few-Shot Object Detection with Vision-Langugage Models\". Proceedings of the Conference on Neural Information Processing Systems. 2024\n2. Zhou et. al. \"Detecting Twenty-Thousand Classes Using Image-Level Supervision\". Proceedings of the IEEE European Conference on Computer Vision. 2022\n\n## Acknowledgements\n\nThis work was supported in part by compute provided by NVIDIA, and the NSF GRFP (Grant No. DGE2140739).\n\n## License\n\nThe datasets that comprise RF100-VL are licensed under an [Apache 2.0 license](LICENSE).\n\n## Citation\nIf you find our paper and code repository useful, please cite us:\n```bib\n@article{robicheaux2025roboflow100vl,\n  title={Roboflow100-vl: A multi-domain object detection benchmark for vision-language models},\n  author={Robicheaux, Peter and Popov, Matvei and Madan, Anish and Robinson, Isaac and Nelson, Joseph and Ramanan, Deva and Peri, Neehar},\n  journal={arXiv preprint arXiv:2505.20612},\n  year={2025}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Frf100-vl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froboflow%2Frf100-vl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Frf100-vl/lists"}