{"id":13456786,"url":"https://github.com/facebookresearch/ImageBind","last_synced_at":"2025-03-24T11:31:22.722Z","repository":{"id":163102994,"uuid":"618029110","full_name":"facebookresearch/ImageBind","owner":"facebookresearch","description":"ImageBind One Embedding Space to Bind Them All","archived":false,"fork":false,"pushed_at":"2024-07-31T18:44:13.000Z","size":2688,"stargazers_count":8548,"open_issues_count":88,"forks_count":801,"subscribers_count":100,"default_branch":"main","last_synced_at":"2025-03-18T23:41:26.431Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-23T15:52:47.000Z","updated_at":"2025-03-18T11:42:50.000Z","dependencies_parsed_at":"2024-10-09T13:40:45.924Z","dependency_job_id":"240e15e2-504d-464c-ae82-3d243420b468","html_url":"https://github.com/facebookresearch/ImageBind","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FImageBind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FImageBind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FImageBind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FImageBind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/ImageBind/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245260859,"owners_count":20586485,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:01:27.727Z","updated_at":"2025-03-24T11:31:20.676Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","funding_links":[],"categories":["Python","Project List","Multimodal Embedding Space","其他_机器视觉","Open Source Projects","Summary","Paper List","Repos","Voice \u0026 Multimodal (local) (16)","App","多模态嵌入 (Multimodal Embeddings)"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Creative Uses of Generative AI Image Synthesis Tools","网络服务_其他","Seminal Papers","视频 (Video)"],"readme":"# ImageBind: One Embedding Space To Bind Them All\n\n**[FAIR, Meta AI](https://ai.facebook.com/research/)** \n\nRohit Girdhar*,\nAlaaeldin El-Nouby*,\nZhuang Liu,\nMannat Singh,\nKalyan Vasudev Alwala,\nArmand Joulin,\nIshan Misra*\n\nTo appear at CVPR 2023 (*Highlighted paper*)\n\n[[`Paper`](https://facebookresearch.github.io/ImageBind/paper)] [[`Blog`](https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/)] [[`Demo`](https://imagebind.metademolab.com/)] [[`Supplementary Video`](https://dl.fbaipublicfiles.com/imagebind/imagebind_video.mp4)] [[`BibTex`](#citing-imagebind)]\n\nPyTorch implementation and pretrained models for ImageBind. For details, see the paper: **[ImageBind: One Embedding Space To Bind Them All](https://facebookresearch.github.io/ImageBind/paper)**.\n\nImageBind learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.\n\n\n\n![ImageBind](https://user-images.githubusercontent.com/8495451/236859695-ffa13364-3e39-4d99-a8da-fbfab17f9a6b.gif)\n\n## ImageBind model\n\nEmergent zero-shot classification performance.\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModel\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:blue\"\u003eIN1k\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:purple\"\u003eK400\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:green\"\u003eNYU-D\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:LightBlue\"\u003eESC\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:orange\"\u003eLLVIP\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003e\u003cspan style=\"color:purple\"\u003eEgo4D\u003c/span\u003e\u003c/th\u003e\n    \u003cth\u003edownload\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eimagebind_huge\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e77.7\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e50.0\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e54.0\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e66.9\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e63.4\u003c/td\u003e\n    \u003ctd align=\"right\"\u003e25.0\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth\"\u003echeckpoint\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \n\u003c/table\u003e\n\n## Usage\n\nInstall pytorch 1.13+ and other 3rd party dependencies.\n\n```shell\nconda create --name imagebind python=3.10 -y\nconda activate imagebind\n\npip install .\n```\n\nFor windows users, you might need to install `soundfile` for reading/writing audio files. (Thanks @congyue1977)\n\n```\npip install soundfile\n```\n\n\nExtract and compare features across modalities (e.g. Image, Text and Audio).\n\n```python\nfrom imagebind import data\nimport torch\nfrom imagebind.models import imagebind_model\nfrom imagebind.models.imagebind_model import ModalityType\n\ntext_list=[\"A dog.\", \"A car\", \"A bird\"]\nimage_paths=[\".assets/dog_image.jpg\", \".assets/car_image.jpg\", \".assets/bird_image.jpg\"]\naudio_paths=[\".assets/dog_audio.wav\", \".assets/car_audio.wav\", \".assets/bird_audio.wav\"]\n\ndevice = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n\n# Instantiate model\nmodel = imagebind_model.imagebind_huge(pretrained=True)\nmodel.eval()\nmodel.to(device)\n\n# Load data\ninputs = {\n    ModalityType.TEXT: data.load_and_transform_text(text_list, device),\n    ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device),\n    ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, device),\n}\n\nwith torch.no_grad():\n    embeddings = model(inputs)\n\nprint(\n    \"Vision x Text: \",\n    torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.TEXT].T, dim=-1),\n)\nprint(\n    \"Audio x Text: \",\n    torch.softmax(embeddings[ModalityType.AUDIO] @ embeddings[ModalityType.TEXT].T, dim=-1),\n)\nprint(\n    \"Vision x Audio: \",\n    torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.AUDIO].T, dim=-1),\n)\n\n# Expected output:\n#\n# Vision x Text:\n# tensor([[9.9761e-01, 2.3694e-03, 1.8612e-05],\n#         [3.3836e-05, 9.9994e-01, 2.4118e-05],\n#         [4.7997e-05, 1.3496e-02, 9.8646e-01]])\n#\n# Audio x Text:\n# tensor([[1., 0., 0.],\n#         [0., 1., 0.],\n#         [0., 0., 1.]])\n#\n# Vision x Audio:\n# tensor([[0.8070, 0.1088, 0.0842],\n#         [0.1036, 0.7884, 0.1079],\n#         [0.0018, 0.0022, 0.9960]])\n\n```\n\n## Model card\nPlease see the [model card](model_card.md) for details.\n\n## License\n\nImageBind code and model weights are released under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for additional details.\n\n## Contributing\n\nSee [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).\n\n## Citing ImageBind\n\nIf you find this repository useful, please consider giving a star :star: and citation\n\n```\n@inproceedings{girdhar2023imagebind,\n  title={ImageBind: One Embedding Space To Bind Them All},\n  author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang\nand Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},\n  booktitle={CVPR},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FImageBind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FImageBind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FImageBind/lists"}