{"id":48395791,"url":"https://github.com/naver-ai/pcme","last_synced_at":"2026-04-06T01:22:32.687Z","repository":{"id":51970589,"uuid":"377045157","full_name":"naver-ai/pcme","owner":"naver-ai","description":"Official Pytorch implementation of \"Probabilistic Cross-Modal Embedding\" (CVPR 2021)","archived":false,"fork":false,"pushed_at":"2024-03-01T12:15:43.000Z","size":2215,"stargazers_count":119,"open_issues_count":1,"forks_count":17,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-05-14T00:19:55.203Z","etag":null,"topics":["cross-modal-retrieval","cvpr2021","probabilistic-embeddings","probabilistic-machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naver-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-06-15T05:22:12.000Z","updated_at":"2024-05-06T10:34:24.000Z","dependencies_parsed_at":"2024-03-01T13:42:13.206Z","dependency_job_id":null,"html_url":"https://github.com/naver-ai/pcme","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/naver-ai/pcme","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver-ai%2Fpcme","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver-ai%2Fpcme/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver-ai%2Fpcme/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver-ai%2Fpcme/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naver-ai","download_url":"https://codeload.github.com/naver-ai/pcme/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver-ai%2Fpcme/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31455833,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"ssl_error","status_checked_at":"2026-04-05T21:22:51.943Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-modal-retrieval","cvpr2021","probabilistic-embeddings","probabilistic-machine-learning"],"created_at":"2026-04-06T01:22:32.635Z","updated_at":"2026-04-06T01:22:32.681Z","avatar_url":"https://github.com/naver-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Probabilistic Cross-Modal Embedding (PCME) CVPR 2021\n\nOfficial Pytorch implementation of PCME | [Paper](https://arxiv.org/abs/2101.05068)\n\n[Sanghyuk Chun](https://sanghyukchun.github.io/home/)\u003csup\u003e1\u003c/sup\u003e [Seong Joon Oh](https://seongjoonoh.com/)\u003csup\u003e1\u003c/sup\u003e Rafael Sampaio de Rezende\u003csup\u003e2\u003c/sup\u003e [Yannis Kalantidis](https://www.skamalas.com/)\u003csup\u003e2\u003c/sup\u003e Diane Larlus\u003csup\u003e2\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003e\u003csub\u003e[NAVER AI LAB](https://naver-career.gitbook.io/en/teams/clova-cic)\u003c/sub\u003e\u003cbr\u003e\n\u003csup\u003e2\u003c/sup\u003e\u003csub\u003e[NAVER LABS Europe](https://europe.naverlabs.com/)\u003c/sub\u003e\n\n\n\u003ca href=\"https://www.youtube.com/watch?v=J_DaqSLEcVk\"\u003e\u003cimg src=\"http://img.youtube.com/vi/J_DaqSLEcVk/0.jpg\" \nalt=\"VIDEO\" width=\"700\" border=\"10\" /\u003e\u003c/a\u003e\n\n\n## Updates\n\n- Jan 2024: [PCME++](https://openreview.net/forum?id=ft1mr3WlGM), the improved version of PCME, is accepted at ICLR 2024. Please use [naver-ai/pcmepp](https://github.com/naver-ai/pcmepp) for the improved version!\n- 16 Jul, 2022: Add PCME CutMix-pretrained weight (used for [ECCV Caption](https://github.com/naver-ai/eccv-caption) paper)\n- 23 Jun, 2021: Initial upload.\n\n## Installation\n\nInstall dependencies using the following command.\n\n```\npip install cython \u0026\u0026 pip install -r requirements.txt\npython -c 'import nltk; nltk.download(\"punkt\", download_dir=\"/opt/conda/nltk_data\")'\ngit clone https://github.com/NVIDIA/apex \u0026\u0026 cd apex \u0026\u0026 pip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n\n### Dockerfile\n\nYou can use my docker image as well\n```\ndocker pull sanghyukchun/pcme:torch1.2-apex-dali\n```\n\nPlease Add `--model__cache_dir /vector_cache` when you run the code\n\n## Configuration\n\nAll experiments are based on configuration files (see [config/coco](config/coco) and [config/cub](config/cub)).\nIf you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:\n\n```\npython \u003ctrain | eval\u003e.py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob\n```\n\nSee [config/parser.py](config/parser.py) for details\n\n## Dataset preparation\n\n### COCO Caption\n\nWe followed the same split provided by [VSE++](http://www.cs.toronto.edu/~faghri/vsepp/data.tar).\nDataset splits can be found in [datasets/annotations](datasets/annotations).\n\nNote that we also need `instances_\u003ctrain | val\u003e2014.json` for computing PMRP score.\n\n### CUB Caption\n\nDownload images (CUB-200-2011) from [this link](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html), and download caption from [reedscot/cvpr2016](https://github.com/reedscot/cvpr2016).\nYou can use the image path and the caption path separately in the code.\n\n## Evaluate pretrained models\n\nNOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient: \u003cbr\u003e\nIt first dumps all ranked items for each item to a local file, and compute R-precision. \u003cbr\u003e\nWe are planning to re-implement efficient PMRP as soon as possible.\n\n### COCO Caption\n\n```\n# Compute recall metrics\npython evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --model_path model_last.pth \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\n```\n# Compute plausible match R-Precision (PMRP) metric\npython extract_rankings_coco.py ./config/coco/pcme_coco.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --model_path model_last.pth \\\n    --dump_to \u003cdumped_ranking_file\u003e \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n\npython evaluate_pmrp_coco.py --ranking_file \u003cdumped_ranking_file\u003e\n```\n\n| Method   | I2T 1K PMRP | I2T 1K R@1 | I2T ECCV mAP@R | T2I 1K PMRP | T2I 1K R@1 | T2I ECCV mAP@R | Model file |\n|----------|----------|---------|----------|----------|---------|----------|------------|\n| PCME     | 45.0     | 68.8    |   26.2   | 46.0     | 54.6    |   48.0   | [link](https://github.com/naver-ai/pcme/releases/download/v1.0.0/pcme_coco.pth) |\n| PCME (CutMix-pretrained) | 46.2 | 68.3 | 28.6 | 47.1 | 56.7 | 54.9 | [link](https://github.com/naver-ai/pcme/releases/download/v1.0.0/pcme_cutmix_coco.pth) |\n| PVSE K=1 | 40.3     | 66.7    |   23.4   | 41.8     | 53.5    |   44.6   | -          |\n| PVSE K=2 | 42.8     | 69.2    |   26.7   | 43.6     | 55.2    |   53.8   | -          |\n| VSRN     | 41.2     | 76.2    |   30.8   | 42.4     | 62.8    |   53.8   | -          |\n| VSRN + AOQ | 44.7   | 77.5    |   30.7   | 45.6     | 63.5    |   51.2   | -          |\n\nCheck [ECCV Caption dataset](https://github.com/naver-ai/eccv-caption) for more details of \"ECCV mAP@R\".\n- Paper: [ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO](https://arxiv.org/abs/2204.03359)\n- GitHub: [naver-ai/eccv-caption](https://github.com/naver-ai/eccv-caption)\n\n### CUB Caption\n\n```\npython evaluate_cub.py ./config/cub/pcme_cub.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --caption_root \u003cyour_caption_path\u003e \\\n    --model_path model_last.pth \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\nNOTE: If you just download file from [reedscot/cvpr2016](https://github.com/reedscot/cvpr2016), then `caption_root` will be `cvpr2016_cub/text_c10`\n\nIf you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:\n\n```\npython evaluate_cub.py ./config/cub/pcme_cub.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --caption_root \u003cyour_caption_path\u003e \\\n    --model_path model_last.pth \\\n    --model__eval_method \u003cdistance_method\u003e \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\nYou can choose `distance_method` in `['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']`\n\n\n## How to train\n\nNOTE: we train each model with mixed-precision training (O2) on a single V100.\u003cbr\u003e\nSince, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.\u003cbr\u003e\nPlease note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.\n\n### COCO Caption\n\n```\npython train_coco.py ./config/coco/pcme_coco.yaml --dataset_root \u003cyour_dataset_path\u003e \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\nIt takes about 46 hours in a single V100 with mixed precision training.\n\n### CUB Caption\n\nWe use CUB Caption dataset [(Reed, et al. 2016)](https://openaccess.thecvf.com/content_cvpr_2016/papers/Reed_Learning_Deep_Representations_CVPR_2016_paper.pdf) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as **positive**. Since our split is based on the zero-shot learning benchmark [(Xian, et al. 2017)](https://openaccess.thecvf.com/content_cvpr_2017/papers/Xian_Zero-Shot_Learning_-_CVPR_2017_paper.pdf), we leave out 50 classes from 200 bird classes for the evaluation.\n\n- Reed, Scott, et al. \"Learning deep representations of fine-grained visual descriptions.\" Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.\n- Xian, Yongqin, Bernt Schiele, and Zeynep Akata. \"Zero-shot learning-the good, the bad and the ugly.\" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.\n\n#### hyperparameter search\n\nWe additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation. \n\n```\npython train_cub.py ./config/cub/pcme_cub.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --caption_root \u003cyour_caption_path\u003e \\\n    --dataset_name cub_trainval1 \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\nSimilarly, you can use `cub_trainval2` and `cub_trainval3` as well.\n\n#### training with full training classes\n\n```\npython train_cub.py ./config/cub/pcme_cub.yaml \\\n    --dataset_root \u003cyour_dataset_path\u003e \\\n    --caption_root \u003cyour_caption_path\u003e \\\n    # --model__cache_dir /vector_cache # if you use my docker image\n```\n\nIt takes about 4 hours in a single V100 with mixed precision training.\n\n## How to cite\n\n```\n@inproceedings{chun2021pcme,\n    title={Probabilistic Embeddings for Cross-Modal Retrieval},\n    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},\n    year={2021},\n    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},\n}\n```\n\nI would like to suggest citing [ECCV Caption](https://github.com/naver-ai/eccv-caption) and [PCME++](https://github.com/naver-ai/pcmepp), too.\n```\n@inproceedings{chun2022eccv_caption,\n    title={ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO}, \n    author={Chun, Sanghyuk and Kim, Wonjae and Park, Song and Chang, Minsuk Chang and Oh, Seong Joon},\n    year={2022},\n    booktitle={European Conference on Computer Vision (ECCV)},\n}\n\n@inproceedings{chun2024pcmepp,\n    title={Improved Probabilistic Image-Text Representations},\n    author={Chun, Sanghyuk},\n    year={2024},\n    booktitle={International Conference on Learning Representations (ICLR)},\n}\n```\n\n## License\n\n```\nMIT License\n\nCopyright (c) 2021-present NAVER Corp.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver-ai%2Fpcme","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnaver-ai%2Fpcme","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver-ai%2Fpcme/lists"}