{"id":13691476,"url":"https://github.com/EvolvingLMMs-Lab/RelateAnything","last_synced_at":"2025-05-02T15:32:16.415Z","repository":{"id":154167548,"uuid":"627239988","full_name":"EvolvingLMMs-Lab/RelateAnything","owner":"EvolvingLMMs-Lab","description":"Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.","archived":false,"fork":false,"pushed_at":"2023-07-04T17:00:53.000Z","size":71172,"stargazers_count":454,"open_issues_count":6,"forks_count":21,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-24T20:17:07.035Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvolvingLMMs-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-13T04:09:06.000Z","updated_at":"2025-03-26T10:04:55.000Z","dependencies_parsed_at":"2024-04-08T02:01:22.417Z","dependency_job_id":"e7dc8b81-fdb7-40ba-993d-3cde086ac7b9","html_url":"https://github.com/EvolvingLMMs-Lab/RelateAnything","commit_stats":null,"previous_names":["evolvinglmms-lab/relateanything"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvolvingLMMs-Lab%2FRelateAnything","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvolvingLMMs-Lab%2FRelateAnything/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvolvingLMMs-Lab%2FRelateAnything/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvolvingLMMs-Lab%2FRelateAnything/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvolvingLMMs-Lab","download_url":"https://codeload.github.com/EvolvingLMMs-Lab/RelateAnything/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250909273,"owners_count":21506331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T17:00:45.530Z","updated_at":"2025-05-02T15:32:16.409Z","avatar_url":"https://github.com/EvolvingLMMs-Lab.png","language":"Python","funding_links":[],"categories":["\u003cspan id=\"head3\"\u003e *3. Related Project* \u003c/span\u003e","Recent Works"],"sub_categories":["\u003cspan id=\"head2023\"\u003e *2023* \u003c/span\u003e"],"readme":"\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"assets/ram_logo.png\"  width=\"80%\" height=\"80%\"\u003e\n\u003c/p\u003e\n\n# RAM: Relate-Anything-Model\n\nThe following developers have equally contributed to this project in their spare time, the names are in alphabetical order.\n\n[Zujin Guo](https://scholar.google.com/citations?user=G8DPsoUAAAAJ\u0026hl=zh-CN), \n[Bo Li](https://brianboli.com/), \n[Jingkang Yang](https://jingkang50.github.io/), \n[Zijian Zhou](https://sites.google.com/view/zijian-zhou/home).\n\n**Affiliate: [MMLab@NTU](https://www.mmlab-ntu.com/)** \u0026 **[VisCom Lab, KCL/TongJi](https://viscom.nms.kcl.ac.uk/)**\n\n---\n\n🚀 🚀 🚀 This is a demo that combine Meta's [Segment-Anything](https://segment-anything.com/) model with the ECCV'22 paper: [Panoptic Scene Graph Generation](https://psgdataset.org/). \n\n🔥🔥🔥 Please star our codebase [OpenPSG](https://github.com/Jingkang50/OpenPSG) and [RAM](https://github.com/Luodian/RelateAnything) if you find it useful/interesting.\n\n[[`Huggingface Demo`](https://huggingface.co/spaces/mmlab-ntu/relate-anything-model)]\n\n[[`Gradio Demo (Faster)`](https://bf5e65e511446cbe60.gradio.live/)]\n\n[[`Dataset`](https://psgdataset.org/)]\n\nRelate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image. Subsequently, RAM can provide an analysis of the relationship between any arbitrary objects mask. \n\nThe object masks are generated using SAM. RAM was trained to detect the relationships between the object masks using the OpenPSG dataset, and the specifics of this method are outlined in a subsequent section.\n\n[![demo.png](https://i.postimg.cc/CKh8tSB4/demo.png)](https://postimg.cc/k2HDRryV)\n\n## Examples\n\nOur current demo supports:\n\n(1) generate arbitary objects masks and reason relationships in between. \n\n(2) given coordinates then generate object masks and reason the relationship between given objects and other objects in the image.\n\nWe will soon add support for detecting semantic labels of objects with the help of [OVSeg](https://github.com/facebookresearch/ov-seg).\n\nHere are some examples of the Relate Anything Model in action about playing soccer, dancing, and playing basketball.\n\n\u003c!-- ![](./assets/basketball.gif) --\u003e\n\n![](./assets/basketball.png)\n\n![](./assets/soccer.png)\n\n![](https://i.postimg.cc/43VkhRNp/shaking-hands.png)\n\n![](./assets/collie.png)\n\n![](https://i.postimg.cc/9QpRyK8w/coord.png)\n\n## Method\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"./assets/method.png\"  width=\"90%\"\u003e\n\u003c/p\u003e\n\nOur method is based on the winning solution of the PSG competition, with some modifications. The original report can be found [here](https://arxiv.org/abs/2302.02651).\n\n### Inference\nOur approach uses the Segment Anything Model (SAM) to identify and mask objects in an image. The model then extracts features for each segmented object. We use a Transformer module to enable interaction between the object features, allowing us to compute pairwise object relationships and categorize their interrelations.\n\n### Training\nWe train our model using [the PSG dataset](http://psgdataset.org/). For each training PSG image, SAM segments multiple objects, but only a few of them match the ground truth (GT) masks in PSG. We perform a simple matching between SAM's predictions and the GT masks based on their intersection-over-union (IOU) scores, so that (almost) every GT mask is assigned to a SAM mask. We then re-generate the relation map according to SAM's masks. With the GT data prepared, we train our model using cross entropy loss, as shown in the figure above.\n\n## Setup\n\nTo set up the environment, we use Conda to manage dependencies.\nTo specify the appropriate version of cudatoolkit to install on your machine, you can modify the environment.yml file, and then create the Conda environment by running the following command:\n\n```bash\nconda env create -f environment.yml\n```\n\nMake sure to use `segment_anything` in this repository, which includes the mask feature extraction operation.\n\nDownload the pretrained model\n1. SAM: [link](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)\n2. RAM: [link](https://1drv.ms/u/s!AgCc-d5Aw1cumQapZwcaKob8InQm?e=qyMeTS)\n\nPlace these two models in `./checkpoints/` from the root directory.\n\n### **Gradio demo**\n- We also provide a UI for testing our method that is built with gradio. This demo also supports generating new directions on the fly! Running the following command in a terminal will launch the demo: \n    ```\n    python app.py\n    ```\n- This demo is also hosted on HuggingFace [here](https://huggingface.co/spaces/mmlab-ntu/relate-anything-model).\n\n\u003c!-- ## Developers\n\nWe have equally contributed to this project in our spare time, in alphabetical order.\n[Zujin Guo](https://scholar.google.com/citations?user=G8DPsoUAAAAJ\u0026hl=zh-CN), \n[Bo Li](https://brianboli.com/), \n[Jingkang Yang](https://jingkang50.github.io/), \n[Zijian Zhou](https://sites.google.com/view/zijian-zhou/home).\n\n**[MMLab@NTU](https://www.mmlab-ntu.com/)** \u0026 **[VisCom Lab, KCL](https://viscom.nms.kcl.ac.uk/)** --\u003e\n\n## Acknowledgement\n\nWe thank [Chunyuan Li](https://chunyuan.li/) for his help in setting up the demo.\n\n## Citation\nIf you find this project helpful for your research, please consider citing the following BibTeX entry.\n```BibTex\n@inproceedings{yang2022psg,\n    author = {Yang, Jingkang and Ang, Yi Zhe and Guo, Zujin and Zhou, Kaiyang and Zhang, Wayne and Liu, Ziwei},\n    title = {Panoptic Scene Graph Generation},\n    booktitle = {ECCV}\n    year = {2022}\n}\n\n@inproceedings{yang2023pvsg,\n    author = {Yang, Jingkang and Peng, Wenxuan and Li, Xiangtai and Guo, Zujin and Chen, Liangyu and Li, Bo and Ma, Zheng and Zhou, Kaiyang and Zhang, Wayne and Loy, Chen Change and Liu, Ziwei},\n    title = {Panoptic Video Scene Graph Generation},\n    booktitle = {CVPR},\n    year = {2023},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEvolvingLMMs-Lab%2FRelateAnything","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEvolvingLMMs-Lab%2FRelateAnything","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEvolvingLMMs-Lab%2FRelateAnything/lists"}