{"id":31715235,"url":"https://github.com/daveredrum/scanrefer","last_synced_at":"2025-10-09T01:54:49.728Z","repository":{"id":38210738,"uuid":"235654890","full_name":"daveredrum/ScanRefer","owner":"daveredrum","description":"[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language","archived":false,"fork":false,"pushed_at":"2023-02-10T12:25:33.000Z","size":38154,"stargazers_count":211,"open_issues_count":8,"forks_count":28,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-04-28T09:32:18.244Z","etag":null,"topics":["3d","computer-vision","dataset","deep-learning","eccv","natural-language-processing","point-cloud","pytorch","visual-grounding"],"latest_commit_sha":null,"homepage":"https://daveredrum.github.io/ScanRefer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daveredrum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-22T19:57:14.000Z","updated_at":"2024-04-18T06:32:50.000Z","dependencies_parsed_at":"2023-01-21T00:00:16.454Z","dependency_job_id":null,"html_url":"https://github.com/daveredrum/ScanRefer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/daveredrum/ScanRefer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveredrum%2FScanRefer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveredrum%2FScanRefer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveredrum%2FScanRefer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveredrum%2FScanRefer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daveredrum","download_url":"https://codeload.github.com/daveredrum/ScanRefer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveredrum%2FScanRefer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000703,"owners_count":26082894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","computer-vision","dataset","deep-learning","eccv","natural-language-processing","point-cloud","pytorch","visual-grounding"],"created_at":"2025-10-09T01:54:48.715Z","updated_at":"2025-10-09T01:54:49.719Z","avatar_url":"https://github.com/daveredrum.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"demo/ScanRefer.gif\" width=\"600px\"/\u003e\u003c/p\u003e\n\n## Introduction\n\nWe introduce the new task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, where the core idea is to learn a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor then correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of the 3D bounding box of the target object. In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 51,583 descriptions of 11,046 objects from 800 [ScanNet](http://www.scan-net.org/) scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.\n\nPlease also check out the project website [here](https://daveredrum.github.io/ScanRefer/).\n\nFor additional detail, please see the ScanRefer paper:  \n\"[ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language](https://arxiv.org/abs/1912.08830)\"  \nby [Dave Zhenyu Chen](https://www.niessnerlab.org/members/zhenyu_chen/profile.html), [Angel X. Chang](https://angelxuanchang.github.io/) and [Matthias Nießner](https://www.niessnerlab.org/members/matthias_niessner/profile.html)  \nfrom [Technical University of Munich](https://www.tum.de/en/) and [Simon Fraser University](https://www.sfu.ca/).\n\n## Changelog\n01/20/2023: Released annotated viewpoints for descriptions!.\n\n11/11/2020: Updated paper with the improved results due to bug fixing.\n\n11/05/2020: Released pre-trained weights.\n\n08/08/2020: Fixed the issue with `lib/box_util.py`.\n\n08/03/2020: Fixed the issue with `lib/solver.py` and `script/eval.py`.\n\n06/16/2020: Fixed the issue with multiview features.\n\n01/31/2020: Fixed the issue with bad tokens.\n\n01/21/2020: Released the ScanRefer dataset.\n\n## :star2: Benchmark Challenge :star2:\nWe provide the ScanRefer Benchmark Challenge for benchmarking your model automatically on the hidden test set! Learn more at our [benchmark challenge website](http://kaldir.vc.in.tum.de/scanrefer_benchmark/).\nAfter finishing training the model, please download [the benchmark data](http://kaldir.vc.in.tum.de/scanrefer_benchmark_data.zip) and put the unzipped `ScanRefer_filtered_test.json` under `data/`. Then, you can run the following script the generate predictions:\n```shell\npython scripts/predict.py --folder \u003cfolder_name\u003e --use_color\n```\nNote that the flags must match the ones set before training. The training information is stored in `outputs/\u003cfolder_name\u003e/info.json`. The generated predictions are stored in `outputs/\u003cfolder_name\u003e/pred.json`.\nFor submitting the predictions, please compress the `pred.json` as a .zip or .7z file and follow the [instructions](http://kaldir.vc.in.tum.de/scanrefer_benchmark/documentation) to upload your results.\n\n## Dataset\n\nIf you would like to access to the ScanRefer dataset, please fill out [this form](https://forms.gle/aLtzXN12DsYDMSXX6). Once your request is accepted, you will receive an email with the download link.\n\n\u003e Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the [ScanNet Instructions](data/scannet/README.md) for more details.\n\nDownload the dataset by simply executing the wget command:\n```shell\nwget \u003cdownload_link\u003e\n```\n\n### Data format\n```\n\"scene_id\": [ScanNet scene id, e.g. \"scene0000_00\"],\n\"object_id\": [ScanNet object id (corresponds to \"objectId\" in ScanNet aggregation file), e.g. \"34\"],\n\"object_name\": [ScanNet object name (corresponds to \"label\" in ScanNet aggregation file), e.g. \"coffee_table\"],\n\"ann_id\": [description id, e.g. \"1\"],\n\"description\": [...],\n\"token\": [a list of tokens from the tokenized description] \n```\n\n## :star2: Annotated viewpoints :star2:\n\nYou can now download the viewpoints via \u003ca href=\"https://kaldir.vc.in.tum.de/annotated_cameras.zip\" target=\"_blank\"\u003ethis link\u003c/a\u003e. Once you've downloaded the dataset, you can also play around the viewpoints that are recorded during annotation.\n\n### Viewpoint format\n```\n\"scene_id\": [ScanNet scene id, e.g. \"scene0000_00\"],\n\"object_id\": [ScanNet object id (corresponds to \"objectId\" in ScanNet aggregation file), e.g. \"34\"],\n\"object_name\": [ScanNet object name (corresponds to \"label\" in ScanNet aggregation file), e.g. \"coffee_table\"],\n\"ann_id\": [description id, e.g. \"1\"],\n\"id\": \"\u003cscene_id\u003e-\u003cobject_id\u003e_\u003cann_id\u003e\"\n\"camera\": {\n    \"position\": [...] # camera position in the original ScanNet scene\n    \"rotation\": [...] # camera rotation in the original ScanNet scene\n    \"lookat\": [...] # the location that the camera is currently pointing at\n}\n```\n\n## Setup\n~~The code is tested on Ubuntu 16.04 LTS \u0026 18.04 LTS with PyTorch 1.2.0 CUDA 10.0 installed. There are some issues with the newer version (\u003e=1.3.0) of PyTorch. You might want to make sure you have installed the correct version. Otherwise, please execute the following command to install PyTorch:~~\n\nThe code is now compatiable with PyTorch 1.6! Please execute the following command to install PyTorch\n\n```shell\nconda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch\n```\n\nInstall the necessary packages listed out in `requirements.txt`:\n```shell\npip install -r requirements.txt\n```\nAfter all packages are properly installed, please run the following commands to compile the CUDA modules for the PointNet++ backbone:\n```shell\ncd lib/pointnet2\npython setup.py install\n```\n__Before moving on to the next step, please don't forget to set the project root path to the `CONF.PATH.BASE` in `lib/config.py`.__\n\n### Data preparation\n1. Download the ScanRefer dataset and unzip it under `data/`. \n2. Download the preprocessed [GLoVE embeddings (~990MB)](http://kaldir.vc.in.tum.de/glove.p) and put them under `data/`.\n3. Download the ScanNetV2 dataset and put (or link) `scans/` under (or to) `data/scannet/scans/` (Please follow the [ScanNet Instructions](data/scannet/README.md) for downloading the ScanNet dataset).\n\u003e After this step, there should be folders containing the ScanNet scene data under the `data/scannet/scans/` with names like `scene0000_00`\n4. Pre-process ScanNet data. A folder named `scannet_data/` will be generated under `data/scannet/` after running the following command. Roughly 3.8GB free space is needed for this step:\n```shell\ncd data/scannet/\npython batch_load_scannet_data.py\n```\n\u003e After this step, you can check if the processed scene data is valid by running:\n\u003e ```shell\n\u003e python visualize.py --scene_id scene0000_00\n\u003e ```\n\u003c!-- 5. (Optional) Download the preprocessed [multiview features (~36GB)](http://kaldir.vc.in.tum.de/enet_feats.hdf5) and put it under `data/scannet/scannet_data/`. --\u003e\n5. (Optional) Pre-process the multiview features from ENet. \n\n    a. Download [the ENet pretrained weights (1.4MB)](http://kaldir.vc.in.tum.de/ScanRefer/scannetv2_enet.pth) and put it under `data/`\n    \n    b. Download and decompress [the extracted ScanNet frames (~13GB)](http://kaldir.vc.in.tum.de/3dsis/scannet_train_images.zip).\n\n    c. Change the data paths in `config.py` marked with __TODO__ accordingly.\n\n    d. Extract the ENet features:\n    ```shell\n    python script/compute_multiview_features.py\n    ```\n\n    e. Project ENet features from ScanNet frames to point clouds; you need ~36GB to store the generated HDF5 database:\n    ```shell\n    python script/project_multiview_features.py --maxpool\n    ```\n    \u003e You can check if the projections make sense by projecting the semantic labels from image to the target point cloud by:\n    \u003e ```shell\n    \u003e python script/project_multiview_labels.py --scene_id scene0000_00 --maxpool\n    \u003e ```\n\n## Usage\n### Training\nTo train the ScanRefer model with RGB values:\n```shell\npython scripts/train.py --use_color\n```\nFor more training options (like using preprocessed multiview features), please run `scripts/train.py -h`.\n\n### Evaluation\nTo evaluate the trained ScanRefer models, please find the folder under `outputs/` with the current timestamp and run:\n```shell\npython scripts/eval.py --folder \u003cfolder_name\u003e --reference --use_color --no_nms --force --repeat 5\n```\nNote that the flags must match the ones set before training. The training information is stored in `outputs/\u003cfolder_name\u003e/info.json`\n\n### Visualization\nTo predict the localization results predicted by the trained ScanRefer model in a specific scene, please find the corresponding folder under `outputs/` with the current timestamp and run:\n```shell\npython scripts/visualize.py --folder \u003cfolder_name\u003e --scene_id \u003cscene_id\u003e --use_color\n```\nNote that the flags must match the ones set before training. The training information is stored in `outputs/\u003cfolder_name\u003e/info.json`. The output `.ply` files will be stored under `outputs/\u003cfolder_name\u003e/vis/\u003cscene_id\u003e/`\n\n## Models\nFor reproducing our results in the paper, we provide the following training commands and the corresponding pre-trained models:\n\n\u003ctable\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccolgroup span=\"2\"\u003e\u003c/colgroup\u003e\n    \u003ccolgroup span=\"2\"\u003e\u003c/colgroup\u003e\n    \u003ccolgroup span=\"2\"\u003e\u003c/colgroup\u003e\n    \u003ccol\u003e\n    \u003ctr\u003e\n        \u003cth rowspan=2\u003eName\u003c/th\u003e\n        \u003cth rowspan=2\u003eCommand\u003c/th\u003e\n        \u003cth colspan=2 scope=\"colgroup\"\u003eUnique\u003c/th\u003e\n        \u003cth colspan=2 scope=\"colgroup\"\u003eMultiple\u003c/th\u003e\n        \u003cth colspan=2 scope=\"colgroup\"\u003eOverall\u003c/th\u003e\n        \u003cth rowspan=2\u003eWeights\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.25IoU\u003c/td\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.5IoU\u003c/td\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.25IoU\u003c/td\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.5IoU\u003c/td\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.25IoU\u003c/td\u003e\n        \u003ctd\u003eAcc\u003c!-- --\u003e@\u003c!-- --\u003e0.5IoU\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --no_lang_cls\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e63.98\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e43.57\u003c/td\u003e\n        \u003ctd\u003e29.28\u003c/td\u003e\n        \u003ctd\u003e18.99\u003c/td\u003e\n        \u003ctd\u003e36.01\u003c/td\u003e\n        \u003ctd\u003e23.76\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+rgb\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_color --no_lang_cls\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e63.24\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e41.78\u003c/td\u003e\n        \u003ctd\u003e30.06\u003c/td\u003e\n        \u003ctd\u003e19.23\u003c/td\u003e\n        \u003ctd\u003e36.5\u003c/td\u003e\n        \u003ctd\u003e23.61\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_COLOR.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+rgb+normals\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_color --use_normal --no_lang_cls\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e64.63\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e43.65\u003c/td\u003e\n        \u003ctd\u003e31.89\u003c/td\u003e\n        \u003ctd\u003e20.77\u003c/td\u003e\n        \u003ctd\u003e38.24\u003c/td\u003e\n        \u003ctd\u003e25.21\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_COLOR_NORMAL.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+multiview\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_multiview --no_lang_cls\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e77.2\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e52.69\u003c/td\u003e\n        \u003ctd\u003e32.08\u003c/td\u003e\n        \u003ctd\u003e19.86\u003c/td\u003e\n        \u003ctd\u003e40.84\u003c/td\u003e\n        \u003ctd\u003e26.23\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_MULTIVIEW.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+multiview+normals\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_multiview --use_normal --no_lang_cls\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e78.22\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e52.38\u003c/td\u003e\n        \u003ctd\u003e33.61\u003c/td\u003e\n        \u003ctd\u003e20.77\u003c/td\u003e\n        \u003ctd\u003e42.27\u003c/td\u003e\n        \u003ctd\u003e26.9\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_MULTIVIEW_NORMAL.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+lobjcls\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e64.31\u003c/td\u003e\t\t\t\t\t\t\t\t\t\t\n        \u003ctd\u003e44.04\u003c/td\u003e\n        \u003ctd\u003e30.77\u003c/td\u003e\n        \u003ctd\u003e19.44\u003c/td\u003e\n        \u003ctd\u003e37.28\u003c/td\u003e\n        \u003ctd\u003e24.22\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_LANGCLS.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+rgb+lobjcls\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_color\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e65.00\u003c/td\u003e\t\t\t\t\t\t\t\t\t\t\n        \u003ctd\u003e43.31\u003c/td\u003e\n        \u003ctd\u003e30.63\u003c/td\u003e\n        \u003ctd\u003e19.75\u003c/td\u003e\n        \u003ctd\u003e37.30\u003c/td\u003e\n        \u003ctd\u003e24.32\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_COLOR_LANGCLS.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+rgb+normals+lobjcls\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_color --use_normal\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e67.64\u003c/td\u003e\t\t\t\t\t\n        \u003ctd\u003e46.19\u003c/td\u003e\n        \u003ctd\u003e32.06\u003c/td\u003e\n        \u003ctd\u003e21.26\u003c/td\u003e\n        \u003ctd\u003e38.97\u003c/td\u003e\n        \u003ctd\u003e26.10\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_COLOR_LANGCLS.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+multiview+lobjcls\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_multiview\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e76.00\u003c/td\u003e\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n        \u003ctd\u003e50.40\u003c/td\u003e\n        \u003ctd\u003e34.05\u003c/td\u003e\n        \u003ctd\u003e20.73\u003c/td\u003e\n        \u003ctd\u003e42.19\u003c/td\u003e\n        \u003ctd\u003e26.50\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_MULTIVIEW_LANGCLS.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003exyz+multiview+normals+lobjcls\u003c/td\u003e\n        \u003ctd\u003e\u003cpre lang=\"shell\"\u003epython script/train.py --use_multiview --use_normal\u003c/pre\u003e\u003c/td\u003e\n        \u003ctd\u003e76.33\u003c/td\u003e\t\t\t\t\t\t\t\t\t\t\n        \u003ctd\u003e53.51\u003c/td\u003e\n        \u003ctd\u003e32.73\u003c/td\u003e\n        \u003ctd\u003e21.11\u003c/td\u003e\n        \u003ctd\u003e41.19\u003c/td\u003e\n        \u003ctd\u003e27.40\u003c/td\u003e\n        \u003ctd\u003e\u003ca href=http://kaldir.vc.in.tum.de/scanrefer_pretrained_XYZ_MULTIVIEW_NORMAL_LANGCLS.zip\u003eweights\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \n\u003c/table\u003e\n\nIf you would like to try out the pre-trained models, please download the model weights and extract the folder to `outputs/`. Note that the results are higher than before because of a few iterations of code refactoring and bug fixing.\n\n## Citation\n\nIf you use the ScanRefer data or code in your work, please kindly cite our work and the original ScanNet paper:\n\n```bibtex\n@inproceedings{chen2020scanrefer,\n    title={Scanrefer: 3d object localization in rgb-d scans using natural language},\n    author={Chen, Dave Zhenyu and Chang, Angel X and Nie{\\ss}ner, Matthias},\n    booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16},\n    pages={202--221},\n    year={2020},\n    organization={Springer}\n}\n\n@inproceedings{dai2017scannet,\n    title={Scannet: Richly-annotated 3d reconstructions of indoor scenes},\n    author={Dai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\\ss}ner, Matthias},\n    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n    pages={5828--5839},\n    year={2017}\n}\n```\n\n## Acknowledgement\nWe would like to thank [facebookresearch/votenet](https://github.com/facebookresearch/votenet) for the 3D object detection codebase and [erikwijmans/Pointnet2_PyTorch](https://github.com/erikwijmans/Pointnet2_PyTorch) for the CUDA accelerated PointNet++ implementation.\n\n## License\nScanRefer is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License](LICENSE).\n\nCopyright (c) 2020 Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaveredrum%2Fscanrefer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaveredrum%2Fscanrefer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaveredrum%2Fscanrefer/lists"}