{"id":18852560,"url":"https://github.com/quva-lab/pin","last_synced_at":"2025-04-14T10:10:49.508Z","repository":{"id":222028418,"uuid":"729460692","full_name":"QUVA-Lab/PIN","owner":"QUVA-Lab","description":"Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs","archived":false,"fork":false,"pushed_at":"2025-01-14T17:22:20.000Z","size":24107,"stargazers_count":25,"open_issues_count":3,"forks_count":3,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-27T23:23:28.172Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://quva-lab.github.io/PIN/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QUVA-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-09T09:56:20.000Z","updated_at":"2024-12-04T10:09:22.000Z","dependencies_parsed_at":"2024-03-11T16:31:21.167Z","dependency_job_id":"84ad630c-4fb8-4abe-90c3-7c53d5e17215","html_url":"https://github.com/QUVA-Lab/PIN","commit_stats":null,"previous_names":["quva-lab/pin"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QUVA-Lab%2FPIN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QUVA-Lab%2FPIN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QUVA-Lab%2FPIN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QUVA-Lab%2FPIN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QUVA-Lab","download_url":"https://codeload.github.com/QUVA-Lab/PIN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248860217,"owners_count":21173342,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T03:40:34.495Z","updated_at":"2025-04-14T10:10:49.477Z","avatar_url":"https://github.com/QUVA-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"#  PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs\n\n[Project Page](https://quva-lab.github.io/PIN/)\n[ArXiv](https://arxiv.org/abs/2402.08657)\n\n\nThis is the official code repository. For a paper summary, check out our [project page](https://quva-lab.github.io/PIN/)!\n\n\u003e **Note:**\n\u003e More functionalities, scripts and pretrained PINs are released in the coming weeks!\n\n## Installation\n\nPlease create a conda environment for running PIN on OpenFlamingo, run\n\n```\nconda env create -f environment.yml\n```\ncurrently we are working on incorporating BLIP-2 into a single environment file, please stay tuned.\n\n## Evaluation\n\nWe can evaluate the trained PIN on OpenFlamingo using for COCO, PVOC and LVIS. For that, please set up the corresponding datasets according to their repo or website. Alternatively, one can set the flag download to true to downlaod via our code for PVOC. The metrics and visualizations are logged using wandb. The test script can be started using \n\n```\nsh scripts/test_OF_PIN.sh\n```\n\nafter adding the disk path to each dataset.\n\n## Training\n\nFirst we need to set up the training data. For background images we use the [BG20k](https://github.com/JizhiziLi/animal-matting) dataset, please download from their repo and save on disk. Please copy the lvis category list from [here](https://www.lvisdataset.org/dataset) to the utils folder. Synthetic images are generated following [XPaste](https://github.com/yoctta/XPaste). We create a synthetic dataset with 100 samples, after cleaning around 60k objects remained. We will release our generated synthetic images soon and share a link here. \n\nAfter setting up the datasets, you can start a training run for PIN using \n```\nsh scripts/run_OF_PIN.sh\n```\n\nTraining metrics are logged using wandb. \n\n## Contact\n\nIf you have questions or find a bug, feel free to open a GitHub issue or send a mail to m.l.dorkenwald at uva.nl. \n\n## BibTeX\n\n```\n@InProceedings{Dorkenwald_PIN_CVPR_2024,\n    author    = {Dorkenwald, Michael and Barazani, Nimrod and Snoek, Cees G. M. and Asano, Yuki M.},\n    title     = {PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2024},\n    pages     = {13548-13558}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquva-lab%2Fpin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquva-lab%2Fpin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquva-lab%2Fpin/lists"}