{"id":27064873,"url":"https://github.com/3dlg-hcvc/tricolo","last_synced_at":"2025-04-05T17:19:30.000Z","repository":{"id":89028728,"uuid":"445647694","full_name":"3dlg-hcvc/tricolo","owner":"3dlg-hcvc","description":"[WACV 2024] TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval","archived":false,"fork":false,"pushed_at":"2024-03-23T22:54:05.000Z","size":7523,"stargazers_count":18,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-20T20:53:31.425Z","etag":null,"topics":["3d","computer-vision","multimodal-learning","natual-language-processing","pytorch","pytorch-lightning"],"latest_commit_sha":null,"homepage":"https://3dlg-hcvc.github.io/tricolo/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/3dlg-hcvc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-01-07T20:43:39.000Z","updated_at":"2024-04-17T16:43:02.000Z","dependencies_parsed_at":"2024-03-23T23:31:13.480Z","dependency_job_id":"1ce9e3ef-e40d-437f-90c8-116ee87c2efa","html_url":"https://github.com/3dlg-hcvc/tricolo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3dlg-hcvc%2Ftricolo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3dlg-hcvc%2Ftricolo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3dlg-hcvc%2Ftricolo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3dlg-hcvc%2Ftricolo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/3dlg-hcvc","download_url":"https://codeload.github.com/3dlg-hcvc/tricolo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247370221,"owners_count":20927974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","computer-vision","multimodal-learning","natual-language-processing","pytorch","pytorch-lightning"],"created_at":"2025-04-05T17:19:29.499Z","updated_at":"2025-04-05T17:19:29.991Z","avatar_url":"https://github.com/3dlg-hcvc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TriCoLo\n\n\u003ca href=\"https://pytorch.org/\"\u003e\u003cimg alt=\"PyTorch\" src=\"https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pytorchlightning.ai/\"\u003e\u003cimg alt=\"Lightning\" src=\"https://img.shields.io/badge/Lightning-792DE4?style=for-the-badge\u0026logo=pytorch-lightning\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://wandb.ai/site\"\u003e\u003cimg alt=\"WandB\" src=\"https://img.shields.io/badge/Weights_\u0026_Biases-FFBE00?style=for-the-badge\u0026logo=WeightsAndBiases\u0026logoColor=white\"\u003e\u003c/a\u003e\n\nThis repo is the official implementation for TriCoLo: **Tri**modal **Co**ntrastive **Lo**ss for Text to Shape Retrieval\n\n([*Paper*](https://arxiv.org/pdf/2201.07366.pdf)) ([*Project Page*](https://3dlg-hcvc.github.io/tricolo/))\n\n## Setup\n### Conda (recommended)\nWe recommend the use of [miniconda](https://docs.conda.io/en/latest/miniconda.html) to manage system dependencies.\n\n```shell\n# create and activate the conda environment\nconda create -n tricolo python=3.10\nconda activate tricolo\n\n# install PyTorch 2.0.1\nconda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia\n\n# install Python libraries\npip install .\n```\n\n### Pip (without conda)\n```shell\n# create and activate the virtual environment\nvirtualenv --no-download env\nsource env/bin/activate\n\n# install PyTorch 2.0.1\npip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2\n\n# install Python libraries\npip install .\n```\n\n## Data Preparation\n\n### ShapeNet\nDownload [ShapeNet](https://shapenet.org/), and place `ShapeNetCore.v2` in the `data/text2shape-data` folder.\n\n\n### Text2Shape (Chair \u0026 Table)\n\n1. Download [Text2Shape](http://text2shape.stanford.edu/) and place `shapenet.json` and `processed_caption_{train/val/test}.p` in the `text2shape-data/chair_table` folder.\n2. Download [ShapeNet solid voxels (Chair \u0026 Table)](http://text2shape.stanford.edu/):\n   ```shell\n   cd text2shape-data\n   mkdir chair_table\n   cd chair_table\n   wget http://text2shape.stanford.edu/dataset/shapenet/nrrd_256_filter_div_32_solid.zip\n   wget http://text2shape.stanford.edu/dataset/shapenet/nrrd_256_filter_div_64_solid.zip\n   wget http://text2shape.stanford.edu/dataset/shapenet/nrrd_256_filter_div_128_solid.zip\n   unzip nrrd_256_filter_div_32_solid.zip\n   unzip nrrd_256_filter_div_64_solid.zip\n   unzip nrrd_256_filter_div_128_solid.zip\n   ```\n   Finally, the dataset files should be organized as follows:\n   ```shell\n   tricolo\n   ├── data\n   │   ├── preprocess_all_data.py\n   │   ├── text2shape-data\n   │   │   ├── ShapeNetCore.v2\n   │   │   ├── chair_table\n   │   │   │   ├── nrrd_256_filter_div_32_solid\n   │   │   │   ├── nrrd_256_filter_div_64_solid\n   │   │   │   ├── nrrd_256_filter_div_128_solid\n   │   │   │   ├── processed_captions_train.p\n   │   │   │   ├── processed_captions_val.p\n   │   │   │   ├── processed_captions_test.p\n   │   │   │   ├── shapenet.json\n   ```\n\n3. Preprocess the dataset\n   ```shell\n   python data/preprocess_all_data.py data=text2shape_chair_table +cpu_workers={num_processes}\n   ```\n\n4. Precache the CLIP embeddings (optional)\n   ```shell\n   python extract_clip_feats.py data=text2shape_chair_table data.image_size=224\n   ```\n\n### Text2Shape (C13)\n1. Download [Text2Shape C13](https://aspis.cmpt.sfu.ca/projects/tricolo/data/c13.csv).\n\n## Training, Inference and Evaluation\nNote: Configuration files are managed by [Hydra](https://hydra.cc/), you can easily add or override any configuration attributes by passing them as arguments.\n\n```shell\n# log in to WandB\nwandb login\n\n# train a model from scratch\n# available voxel_encoder_name: SparseCNNEncoder, null\n# available image_encoder_name: MVCNNEncoder, CLIPImageEncoder, null\n# available text_encoder_name: BiGRUEncoder, CLIPTextEncoder\n# available dataset_name: text2shape_chair_table, text2shape_c13\npython train.py data={dataset_name} model.voxel_encoder={voxel_encoder_name} \\\nmodel.image_encoder={image_encoder_name} model.text_encoder={text_encoder_name} \\\nexperiment_name={any_string}\n\n# train a model from a checkpoint\npython train.py data={dataset_name} model.voxel_encoder={voxel_encoder_name} \\\nmodel.image_encoder={image_encoder_name} model.text_encoder={text_encoder_name} \\\nexperiment_name={checkpoint_experiment_name} ckpt_name={checkpoint_file_name}\n\n# test a pretrained model\npython test.py data={dataset_name} model.voxel_encoder={voxel_encoder_name} \\\nmodel.image_encoder={image_encoder_name} model.text_encoder={text_encoder_name} \\\nexperiment_name={checkpoint_experiment_name} +ckpt_path={checkpoint_file_path}\n\n# evaluate inference results\n# currently unavailable\n```\n## Checkpoints\n\n| Modality | Dataset                    | Split  | RR@1  | RR@5  | NDCG@5 | Download                                                                                              |\n|:---------|:---------------------------|:-----|:------|:------|:-------|:------------------------------------------------------------------------------------------------------|\n| Tri(I+V) | Text2Shape (Chair \u0026 Table) | Val | 12.60 | 33.34 | 23.30  | [chair_table_tri.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/chair_table_tri.ckpt)   |\n| Bi(I)    | Text2Shape (Chair \u0026 Table) | Val | 11.67 | 30.63 | 21.49  | [chair_table_bi_i.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/chair_table_bi_i.ckpt) |\n| Bi(V)    | Text2Shape (Chair \u0026 Table) | Val | 9.33  | 27.52 | 18.62  | [chair_table_bi_v.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/chair_table_bi_v.ckpt) |\n| Tri(I+V) | Text2Shape (C13)           | Val | 12.96 | 34.87 | 24.19  | [c13_tri.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/c13_tri.ckpt)                   |\n| Bi(I)    | Text2Shape (C13)           | Val | 11.89 | 33.48 | 22.96  | [c13_bi_i.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/c13_bi_i.ckpt)                 |\n| Bi(V)    | Text2Shape (C13)           | Val | 9.73  | 29.24 | 19.69  | [c13_bi_v.ckpt](https://aspis.cmpt.sfu.ca/projects/tricolo/checkpoints/c13_bi_v.ckpt)                 |\n\n## Acknowledgements\n1. [ConVIRT](https://github.com/edreisMD/ConVIRT-pytorch): Our overall training framework is heavily based on the [ConVIRT](https://github.com/edreisMD/ConVIRT-pytorch) implementation. [*Paper*](https://arxiv.org/pdf/2010.00747.pdf)\n2. [MVCNN](https://github.com/jongchyisu/mvcnn_pytorch) The MVCNN implementation we used is from [this](https://github.com/jongchyisu/mvcnn_pytorch) implementation. [*Paper*](https://arxiv.org/pdf/1505.00880.pdf)\n3. [Text2Shape](https://github.com/kchen92/text2shape/): We download the dataset and modify the evaluation code from the original [Text2Shape dataset](http://text2shape.stanford.edu/). [*Paper*](https://arxiv.org/pdf/1803.08495.pdf)\n\nWe thank the authors for their work and the implementations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3dlg-hcvc%2Ftricolo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F3dlg-hcvc%2Ftricolo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3dlg-hcvc%2Ftricolo/lists"}