{"id":18456423,"url":"https://github.com/pollen-robotics/pollen-vision","last_synced_at":"2025-04-05T18:05:41.192Z","repository":{"id":226396356,"uuid":"716039272","full_name":"pollen-robotics/pollen-vision","owner":"pollen-robotics","description":"Simple and unified interface to zero-shot computer vision models curated for robotics use cases.","archived":false,"fork":false,"pushed_at":"2025-03-27T10:56:28.000Z","size":89677,"stargazers_count":116,"open_issues_count":10,"forks_count":8,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2025-03-27T11:35:05.686Z","etag":null,"topics":["computer-vision","grasping","object-detection","object-segmentation","robotics"],"latest_commit_sha":null,"homepage":"https://www.pollen-robotics.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pollen-robotics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-08T10:54:32.000Z","updated_at":"2025-03-27T10:53:14.000Z","dependencies_parsed_at":"2024-04-08T08:43:20.069Z","dependency_job_id":"d2133104-ee1d-418f-b82d-5a1402d9a827","html_url":"https://github.com/pollen-robotics/pollen-vision","commit_stats":null,"previous_names":["pollen-robotics/pollen-vision"],"tags_count":4,"template":false,"template_full_name":"pollen-robotics/python-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollen-robotics%2Fpollen-vision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollen-robotics%2Fpollen-vision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollen-robotics%2Fpollen-vision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollen-robotics%2Fpollen-vision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pollen-robotics","download_url":"https://codeload.github.com/pollen-robotics/pollen-vision/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247378138,"owners_count":20929296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","grasping","object-detection","object-segmentation","robotics"],"created_at":"2024-11-06T08:11:29.980Z","updated_at":"2025-04-05T18:05:41.177Z","avatar_url":"https://github.com/pollen-robotics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003c!-- \u003cp align=\"center\" width=\"50%\"\u003e\n    \u003cimg width=\"33%\" src=\"assets/pollen_vision_logo.png\"\u003e\n\u003c/p\u003e --\u003e\n\n\u003cp align=\"center\" width=\"50%\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"assets/pollen_vision_logo.png\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"assets/pollen_vision_logo_light_theme.png\"\u003e\n    \u003cimg alt=\"Pollen vision library\" src=\"assets/pollen_vision_logo.png\" width=\"33%\"\u003e\n  \u003c/picture\u003e\n  \u003cbr/\u003e\n  \u003cbr/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cb\u003eSimple and unified interface to zero-shot computer vision models curated for robotics use cases.\u003c/b\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca target=\"_blank\" href=\"https://huggingface.co/spaces/pollen-robotics/pollen-vision-demo\"\u003e![Huggingface space](https://img.shields.io/badge/🤗-HuggingFace%20Space-cyan.svg)\u003c/a\u003e\n  \u003ca target=\"_blank\" href=\"https://drive.google.com/drive/folders/1Xx42Pk4exkS95iyD-5arHIYQLXyRWTXw?usp=drive_link\"\u003e![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)\u003c/a\u003e\n  \u003ca href=\"https://github.com/pollen-robotics/pollen-vision/blob/main/LICENSE\"\u003e\n    \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/huggingface/transformers.svg?color=blue\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/psf/black\"\u003e![Code style: black](https://github.com/pollen-robotics/pollen-vision/actions/workflows/lint.yml/badge.svg)\u003c/a\u003e\n  \u003ca href=\"\"\u003e![pytest](https://github.com/pollen-robotics/reachy2-sdk/actions/workflows/unit_tests.yml/badge.svg)\u003c/a\u003e\n  \u003ca href=\"\"\u003e\u003cimg src=\"https://img.shields.io/badge/Fabien-Approved-green\" alt=\"Fabien Approved\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\n\u003c!-- # Pollen Vision --\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n![demo](assets/pollen_vision_intro.gif)\n\n\u003c/div\u003e\n\nCheck out our [HuggingFace space](https://huggingface.co/pollen-robotics) for an online demo or try pollen-vision in a [Colab notebook](https://drive.google.com/drive/folders/1Xx42Pk4exkS95iyD-5arHIYQLXyRWTXw?usp=drive_link)!\n\n## Get started in very few lines of code!\nPerform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:\n```python\nimport cv2\n\nfrom pollen_vision.vision_models.object_detection import OwlVitWrapper\nfrom pollen_vision.vision_models.object_segmentation import MobileSamWrapper\nfrom pollen_vision.utils import Annotator, get_bboxes\n\n\nowl = OwlVitWrapper()\nsam = MobileSamWrapper()\nannotator = Annotator()\n\ncap = cv2.VideoCapture(0)\n\nwhile True:\n    ret, frame = cap.read()\n    predictions = owl.infer(\n        frame, [\"paper cups\"]\n    )  # zero-shot object detection | put your classes here\n    bboxes = get_bboxes(predictions)\n\n    masks = sam.infer(frame, bboxes=bboxes)  # zero-shot object segmentation\n    annotated_frame = annotator.annotate(frame, predictions, masks=masks)\n\n    cv2.imshow(\"frame\", annotated_frame)\n    if cv2.waitKey(1) \u0026 0xFF == ord(\"q\"):\n        cv2.destroyAllWindows()\n        break\n```\n\u003cp align=\"center\"\u003e\n    \u003cimg width=\"20%\" src=\"https://github.com/pollen-robotics/pollen-vision/assets/6552564/9f162321-2226-48fc-86e5-eb47c8996ee9\"\u003e\n\u003c/p\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSupported models\u003c/summary\u003e\n\nWe continue to work on adding new models that could be useful for robotics perception applications.\n\nWe chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.\n\nRight now, we support:\n#### Object detection\n- `Owl-Vit` for zero-shot object detection and localization\n- `Recognize-Anything` for zero-shot object detection (without localization)\n\n#### Object segmentation\n- `Mobile-SAM` for (fast) zero-shot object segmentation\n\n#### Monocular depth estimation\n- `Depth Anything` for (non metric) monocular depth estimation\n\nBelow is an example of combining `Owl-Vit` and `Mobile-Sam` to detect and segment objects in a point cloud, all live.\n(Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)\n\nhttps://github.com/pollen-robotics/pollen-vision/assets/6552564/a5285627-9cba-4af5-aafb-6af3d1e6d40c\n\n\n\n\nWe also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eInstallation\u003c/summary\u003e\n\n# Installation\n\n```\nNote: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor), with python3.10.\n```\n## Git LFS\nThis repository uses Git LFS to store large files. You need to install it before cloning the repository.\n\n### Ubuntu\n```console\nsudo apt-get install git-lfs\n```\n\n### macOS\n```console\nbrew install git-lfs\n```\n\n## One line installation\nYou can install the package directly from the repository without having to clone it first with:\n```console\npip install \"pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main\"\n```\n\n\u003e Note: here we install the package with the `vision` extra, which includes the vision models. You can also install the `depthai_wrapper` extra to use the Luxonis depthai wrappers.\n\n## Install from source\nClone this repository and then install the package either in \"production\" mode or \"dev\" mode.\n\n\u003e👉 We recommend using a virtual environment to avoid conflicts with other packages.\n\nAfter cloning the repository, you can either install everything with:\n```console\npip install .[all]\n```\nor install only the modules you want:\n```console\npip install .[depthai_wrapper]\npip install .[vision]\n```\nTo add \"dev\" mode dependencies (CI/CD, testing, etc):\n```console\npip install -e .[dev]\n```\n\n## Luxonis depthai specific information\n\nIf this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:\n```\necho 'SUBSYSTEM==\"usb\", ATTRS{idVendor}==\"03e7\", MODE=\"0666\"' | sudo tee /etc/udev/rules.d/80-movidius.rules\nsudo udevadm control --reload-rules \u0026\u0026 sudo udevadm trigger\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eGradio demo\u003c/summary\u003e\n\n# Gradio demo\n## Test the demo online\nA gradio demo is available on Pollen Robotics' [Huggingface space](https://huggingface.co/spaces/pollen-robotics/pollen-vision-demo). It allows to test the models on your own images without having to install anything.\n\n## Run the demo locally\nIf you want to run the demo locally, you can install the dependencies with the following command:\n```console\npip install pollen_vision[gradio]\n```\n\nYou can then run the demo locally on your machine with:\n```console\npython pollen-vision/gradio/app.py\n```\n\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003eExamples\u003c/summary\u003e\n\n# Examples\n\n## Vision models wrappers\nCheck our [example notebooks](examples/vision_models_examples/)!\n\n## Luxonis depthai wrappers\nCheck our [example scripts](examples/camera_wrappers_examples/)!\n\n\u003c/details\u003e\n\n\n[![Twitter URL](https://img.shields.io/twitter/url?url=https%3A%2F%2Ftwitter.com%2Fpollenrobotics)](https://twitter.com/pollenrobotics)\n[![Linkedin URL](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com/company/pollen-robotics/mycompany/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpollen-robotics%2Fpollen-vision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpollen-robotics%2Fpollen-vision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpollen-robotics%2Fpollen-vision/lists"}