{"id":14497785,"url":"https://github.com/NVlabs/FoundationPose","last_synced_at":"2025-08-30T20:32:05.501Z","repository":{"id":212187080,"uuid":"730898383","full_name":"NVlabs/FoundationPose","owner":"NVlabs","description":"[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects","archived":false,"fork":false,"pushed_at":"2024-08-18T05:24:06.000Z","size":123579,"stargazers_count":1263,"open_issues_count":36,"forks_count":162,"subscribers_count":31,"default_branch":"main","last_synced_at":"2024-08-18T06:35:52.059Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://nvlabs.github.io/FoundationPose/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-12T23:15:54.000Z","updated_at":"2024-08-18T05:24:09.000Z","dependencies_parsed_at":"2024-05-10T17:47:53.701Z","dependency_job_id":"807e559b-b73c-4841-adac-2b43c26ff86a","html_url":"https://github.com/NVlabs/FoundationPose","commit_stats":null,"previous_names":["nvlabs/foundationpose"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFoundationPose","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFoundationPose/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFoundationPose/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFoundationPose/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/FoundationPose/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":217594095,"owners_count":16201636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-03T12:01:23.072Z","updated_at":"2025-08-30T20:32:05.495Z","avatar_url":"https://github.com/NVlabs.png","language":"Python","funding_links":[],"categories":["对象检测、分割","👁️ Computer Vision \u0026 Perception","6D Object Pose Estimation"],"sub_categories":["网络服务_其他","Methods"],"readme":"# FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects\n[[Paper]](https://arxiv.org/abs/2312.08344) [[Website]](https://nvlabs.github.io/FoundationPose/)\n\nThis is the official implementation of our paper to be appeared in CVPR 2024 (Highlight)\n\nContributors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield\n\nWe present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions.\n\n\n\u003cimg src=\"assets/intro.jpg\" width=\"70%\"\u003e\n\n**🤖 For ROS version, please check [Isaac ROS Pose Estimation](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation), which enjoys TRT fast inference and C++ speed up.**\n\n\\\n**🥇 No. 1 on the world-wide [BOP leaderboard](https://bop.felk.cvut.cz/leaderboards/pose-estimation-unseen-bop23/core-datasets/) (as of 2024/03) for model-based novel object pose estimation.**\n\u003cimg src=\"assets/bop.jpg\" width=\"80%\"\u003e\n\n## Demos\n\nRobotic Applications:\n\nhttps://github.com/NVlabs/FoundationPose/assets/23078192/aa341004-5a15-4293-b3da-000471fd74ed\n\n\nAR Applications:\n\nhttps://github.com/NVlabs/FoundationPose/assets/23078192/80e96855-a73c-4bee-bcef-7cba92df55ca\n\n\nResults on YCB-Video dataset:\n\nhttps://github.com/NVlabs/FoundationPose/assets/23078192/9b5bedde-755b-44ed-a973-45ec85a10bbe\n\n\n\n# Bibtex\n```bibtex\n@InProceedings{foundationposewen2024,\nauthor        = {Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield},\ntitle         = {{FoundationPose}: Unified 6D Pose Estimation and Tracking of Novel Objects},\nbooktitle     = {CVPR},\nyear          = {2024},\n}\n```\n\nIf you find the model-free setup useful, please also consider cite:\n\n```bibtex\n@InProceedings{bundlesdfwen2023,\nauthor        = {Bowen Wen and Jonathan Tremblay and Valts Blukis and Stephen Tyree and Thomas M\\\"{u}ller and Alex Evans and Dieter Fox and Jan Kautz and Stan Birchfield},\ntitle         = {{BundleSDF}: {N}eural 6-{DoF} Tracking and {3D} Reconstruction of Unknown Objects},\nbooktitle     = {CVPR},\nyear          = {2023},\n}\n```\n\n# Data prepare\n\n\n1) Download all network weights from [here](https://drive.google.com/drive/folders/1DFezOAD0oD1BblsXVxqDsl8fj0qzB82i?usp=sharing) and put them under the folder `weights/`. For the refiner, you will need `2023-10-28-18-33-37`. For scorer, you will need `2024-01-11-20-02-45`.\n\n1) [Download demo data](https://drive.google.com/drive/folders/1pRyFmxYXmAnpku7nGRioZaKrVJtIsroP?usp=sharing) and extract them under the folder `demo_data/`\n\n1) [Optional] Download our large-scale training data: [\"FoundationPose Dataset\"](https://drive.google.com/drive/folders/1s4pB6p4ApfWMiMjmTXOFco8dHbNXikp-?usp=sharing)\n\n1) [Optional] Download our preprocessed reference views [here](https://drive.google.com/drive/folders/1PXXCOJqHXwQTbwPwPbGDN9_vLVe0XpFS?usp=sharing) in order to run model-free few-shot version.\n\n# Env setup option 1: docker (recommended)\n  ```\n  cd docker/\n  docker pull wenbowen123/foundationpose \u0026\u0026 docker tag wenbowen123/foundationpose foundationpose  # Or to build from scratch: docker build --network host -t foundationpose .\n  bash docker/run_container.sh\n  ```\n\n\nIf it's the first time you launch the container, you need to build extensions. Run this command *inside* the Docker container.\n```\nbash build_all.sh\n```\n\nLater you can execute into the container without re-build.\n```\ndocker exec -it foundationpose bash\n```\n\nFor more recent GPU such as 4090, refer to [this](https://github.com/NVlabs/FoundationPose/issues/27).\nIn short, do the following:\n```\ndocker pull shingarey/foundationpose_custom_cuda121:latest\n```\nThen modify the bash script to use this image instead of `foundationpose:latest`.\n\n\n# Env setup option 2: conda (experimental)\n\n- Setup conda environment\n\n```bash\n# create conda environment\nconda create -n foundationpose python=3.9\n\n# activate conda environment\nconda activate foundationpose\n\n# Install Eigen3 3.4.0 under conda environment\nconda install conda-forge::eigen=3.4.0\nexport CMAKE_PREFIX_PATH=\"$CMAKE_PREFIX_PATH:/eigen/path/under/conda\"\n\n# install dependencies\npython -m pip install -r requirements.txt\n\n# Install NVDiffRast\npython -m pip install --quiet --no-cache-dir git+https://github.com/NVlabs/nvdiffrast.git\n\n# Kaolin (Optional, needed if running model-free setup)\npython -m pip install --quiet --no-cache-dir kaolin==0.15.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.0.0_cu118.html\n\n# PyTorch3D\npython -m pip install --quiet --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt200/download.html\n\n# Build extensions\nCMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.9/site-packages/pybind11/share/cmake/pybind11 bash build_all_conda.sh\n```\n\n\n# Run model-based demo\nThe paths have been set in argparse by default. If you need to change the scene, you can pass the args accordingly. By running on the demo data, you should be able to see the robot manipulating the mustard bottle. Pose estimation is conducted on the first frame, then it automatically switches to tracking mode for the rest of the video. The resulting visualizations will be saved to the `debug_dir` specified in the argparse. (Note the first time running could be slower due to online compilation)\n```\npython run_demo.py\n```\n\n\n\u003cimg src=\"assets/demo.jpg\" width=\"50%\"\u003e\n\n\nFeel free to try on other objects (**no need to retrain**) such as driller, by changing the paths in argparse.\n\n\u003cimg src=\"assets/demo_driller.jpg\" width=\"50%\"\u003e\n\n\n# Run on public datasets (LINEMOD, YCB-Video)\n\nFor this you first need to download LINEMOD dataset and YCB-Video dataset.\n\nTo run model-based version on these two datasets respectively, set the paths based on where you download. The results will be saved to `debug` folder\n```\npython run_linemod.py --linemod_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD --use_reconstructed_mesh 0\n\npython run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video --use_reconstructed_mesh 0\n```\n\nTo run model-free few-shot version. You first need to train Neural Object Field. `ref_view_dir` is based on where you download in the above \"Data prepare\" section. Set the `dataset` flag to your interested dataset.\n```\npython bundlesdf/run_nerf.py --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16 --dataset ycbv\n```\n\nThen run the similar command as the model-based version with some small modifications. Here we are using YCB-Video as example:\n```\npython run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video --use_reconstructed_mesh 1 --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16\n```\n\n# Troubleshooting\n\n\n- For more recent GPU such as 4090, refer to [this](https://github.com/NVlabs/FoundationPose/issues/27).\n\n- For setting up on Windows, refer to [this](https://github.com/NVlabs/FoundationPose/issues/148).\n\n- If you are getting unreasonable results, check [this](https://github.com/NVlabs/FoundationPose/issues/44#issuecomment-2048141043) and [this](https://github.com/030422Lee/FoundationPose_manual)\n\n# Training data download\nOur training data include scenes using 3D assets from GSO and Objaverse, rendered with high quality photo-realism and large domain randomization. Each data point includes **RGB, depth, object pose, camera pose, instance segmentation, 2D bounding box**. [[Google Drive]](https://drive.google.com/drive/folders/1s4pB6p4ApfWMiMjmTXOFco8dHbNXikp-?usp=sharing).\n\n\u003cimg src=\"assets/train_data_vis.png\" width=\"80%\"\u003e\n\n- To parse the camera params including extrinsics and intrinsics\n  ```\n  glcam_in_cvcam = np.array([[1,0,0,0],\n                          [0,-1,0,0],\n                          [0,0,-1,0],\n                          [0,0,0,1]]).astype(float)\n  W, H = camera_params[\"renderProductResolution\"]\n  with open(f'{base_dir}/camera_params/camera_params_000000.json','r') as ff:\n    camera_params = json.load(ff)\n  world_in_glcam = np.array(camera_params['cameraViewTransform']).reshape(4,4).T\n  cam_in_world = np.linalg.inv(world_in_glcam)@glcam_in_cvcam\n  world_in_cam = np.linalg.inv(cam_in_world)\n  focal_length = camera_params[\"cameraFocalLength\"]\n  horiz_aperture = camera_params[\"cameraAperture\"][0]\n  vert_aperture = H / W * horiz_aperture\n  focal_y = H * focal_length / vert_aperture\n  focal_x = W * focal_length / horiz_aperture\n  center_y = H * 0.5\n  center_x = W * 0.5\n\n  fx, fy, cx, cy = focal_x, focal_y, center_x, center_y\n  K = np.eye(3)\n  K[0,0] = fx\n  K[1,1] = fy\n  K[0,2] = cx\n  K[1,2] = cy\n  ```\n\n\n\n# Notes\nDue to the legal restrictions of Stable-Diffusion that is trained on LAION dataset, we are not able to release the diffusion-based texture augmented data, nor the pretrained weights using it. We thus release the version without training on diffusion-augmented data. Slight performance degradation is expected.\n\n# Acknowledgement\n\nWe would like to thank Jeff Smith for helping with the code release; NVIDIA Isaac Sim and Omniverse team for the support on synthetic data generation; Tianshi Cao for the valuable discussions. Finally, we are also grateful for the positive feebacks and constructive suggestions brought up by reviewers and AC at CVPR.\n\n\u003cimg src=\"assets/cvpr_review.png\" width=\"100%\"\u003e\n\n\n# License\nThe code and data are released under the NVIDIA Source Code License. Copyright © 2024, NVIDIA Corporation. All rights reserved.\n\n\n# Contact\nFor questions, please contact [Bowen Wen](https://wenbowen123.github.io/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFoundationPose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVlabs%2FFoundationPose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFoundationPose/lists"}