{"id":13563256,"url":"https://github.com/tjiiv-cprg/EPro-PnP","last_synced_at":"2025-04-03T19:32:46.534Z","repository":{"id":37477806,"uuid":"471963739","full_name":"tjiiv-cprg/EPro-PnP","owner":"tjiiv-cprg","description":"[CVPR 2022 Oral, Best Student Paper] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation","archived":false,"fork":false,"pushed_at":"2023-07-30T15:29:07.000Z","size":13022,"stargazers_count":1130,"open_issues_count":54,"forks_count":108,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-04-03T17:12:26.353Z","etag":null,"topics":["3d-object-detection","6dof","cvpr","gauss-newton","levenberg-marquardt","monocular","perspective-n-point","pose-estimation","pytorch"],"latest_commit_sha":null,"homepage":"https://www.youtube.com/watch?v=TonBodQ6EUU","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tjiiv-cprg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-03-20T11:48:33.000Z","updated_at":"2025-04-01T02:51:09.000Z","dependencies_parsed_at":"2024-01-14T03:48:35.193Z","dependency_job_id":"17ea5613-4cfb-4a3c-b6cf-071967737298","html_url":"https://github.com/tjiiv-cprg/EPro-PnP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tjiiv-cprg%2FEPro-PnP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tjiiv-cprg%2FEPro-PnP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tjiiv-cprg%2FEPro-PnP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tjiiv-cprg%2FEPro-PnP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tjiiv-cprg","download_url":"https://codeload.github.com/tjiiv-cprg/EPro-PnP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247065470,"owners_count":20877785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","6dof","cvpr","gauss-newton","levenberg-marquardt","monocular","perspective-n-point","pose-estimation","pytorch"],"created_at":"2024-08-01T13:01:16.920Z","updated_at":"2025-04-03T19:32:46.479Z","avatar_url":"https://github.com/tjiiv-cprg.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# EPro-PnP\n\n📢 **NEWS:** We have released [EPro-PnP-v2](https://github.com/tjiiv-cprg/EPro-PnP-v2). A new updated preprint can be found on [arXiv](https://arxiv.org/abs/2303.12787).\n\n**EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation**\n\u003cbr\u003e\nIn CVPR 2022 (Oral, **Best Student Paper**). [[paper](https://arxiv.org/pdf/2203.13254.pdf)][[video](https://www.youtube.com/watch?v=TonBodQ6EUU)]\n\u003cbr\u003e\n[Hansheng Chen](https://lakonik.github.io/)\\*\u003csup\u003e1,2\u003c/sup\u003e, [Pichao Wang](https://wangpichao.github.io/)†\u003csup\u003e2\u003c/sup\u003e, [Fan Wang](https://scholar.google.com/citations?user=WCRGTHsAAAAJ\u0026hl=en)\u003csup\u003e2\u003c/sup\u003e, [Wei Tian](https://scholar.google.com/citations?user=aYKQn88AAAAJ\u0026hl=en)†\u003csup\u003e1\u003c/sup\u003e, [Lu Xiong](https://www.researchgate.net/scientific-contributions/Lu-Xiong-71708073)\u003csup\u003e1\u003c/sup\u003e, [Hao Li](https://scholar.google.com/citations?user=pHN-QIwAAAAJ\u0026hl=zh-CN)\u003csup\u003e2\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003eTongji University, \u003csup\u003e2\u003c/sup\u003eAlibaba Group\n\u003cbr\u003e\n\\*Part of work done during an internship at Alibaba Group.\n\u003cbr\u003e\n†Corresponding Authors: Pichao Wang, Wei Tian.\n\n## Introduction\n\nEPro-PnP is a probabilistic Perspective-n-Points (PnP) layer for end-to-end 6DoF pose estimation networks. Broadly speaking, it is essentially a continuous counterpart of the widely used categorical Softmax layer, and is theoretically generalizable to other learning models with nested \u003c!-- $\\mathrm{arg\\,min}$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?%5Cmathrm%7Barg%5C%2Cmin%7D\"\u003e optimization.\n\n\u003cimg src=\"intro.png\" width=\"500\"  alt=\"\"/\u003e\n\nGiven the layer input: an \u003c!-- $N$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?N\"\u003e-point correspondence set \u003c!-- $X = \\left\\{x^\\text{3D}_i,x^\\text{2D}_i,w^\\text{2D}_i\\,\\middle|\\,i=1\\cdots N\\right\\}$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?X%20%3D%20%5Cleft%5C%7Bx%5E%5Ctext%7B3D%7D_i%2Cx%5E%5Ctext%7B2D%7D_i%2Cw%5E%5Ctext%7B2D%7D_i%5C%2C%5Cmiddle%7C%5C%2Ci%3D1%5Ccdots%20N%5Cright%5C%7D\"\u003e consisting of 3D object coordinates \u003c!-- $x^\\text{3D}_i \\in \\mathbb{R}^3$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?x%5E%5Ctext%7B3D%7D_i%20%5Cin%20%5Cmathbb%7BR%7D%5E3\"\u003e, 2D image coordinates \u003c!-- $x^\\text{2D}_i \\in \\mathbb{R}^2$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?x%5E%5Ctext%7B2D%7D_i%20%5Cin%20%5Cmathbb%7BR%7D%5E2\"\u003e, and 2D weights \u003c!-- $w^\\text{2D}_i \\in \\mathbb{R}^2_+ $ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?w%5E%5Ctext%7B2D%7D_i%20%5Cin%20%5Cmathbb%7BR%7D%5E2_%2B\"\u003e, a conventional PnP solver searches for an optimal pose \u003c!-- $y^\\ast$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?y%5E%5Cast\"\u003e (rigid transformation in SE(3)) that minimizes the weighted reprojection error. Previous work tries to backpropagate through the PnP operation, yet \u003c!-- $y^\\ast$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?y%5E%5Cast\"\u003e is inherently non-differentiable due to the inner \u003c!-- $\\mathrm{arg\\,min}$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?%5Cmathrm%7Barg%5C%2Cmin%7D\"\u003e operation. This leads to convergence issue if all the components in \u003c!-- $X$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?X\"\u003e must be learned by the network.\n\nIn contrast, our probabilistic PnP layer outputs a posterior distribution of pose, whose probability density \u003c!-- $p(y|X)$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?p(y%7CX)\"\u003e can be derived for proper backpropagation. The distribution is approximated via Monte Carlo sampling. With EPro-PnP, the correspondences \u003c!-- $X$ --\u003e \u003cimg style=\"transform: translateY(0.1em); background: white;\" src=\"https://latex.codecogs.com/svg.latex?X\"\u003e can be learned from scratch altogether by minimizing the KL divergence between the predicted and target\npose distribution.\n\n## Models\n\n### V1 models in this repository\n\n#### **[EPro-PnP-6DoF](EPro-PnP-6DoF) for 6DoF pose estimation**\u003cbr\u003e\n  \u003cimg src=\"EPro-PnP-6DoF/viz.gif\" width=\"500\" alt=\"\"/\u003e\n\n#### **[EPro-PnP-Det](EPro-PnP-Det) for 3D object detection**\n\n  \u003cimg src=\"EPro-PnP-Det/resources/viz.gif\" width=\"500\" alt=\"\"/\u003e\n\n### New V2 models\n\n#### **[EPro-PnP-Det v2](https://github.com/tjiiv-cprg/EPro-PnP-v2/tree/main/EPro-PnP-Det_v2): state-of-the-art monocular 3D object detector**\n\nMain differences to [v1b](EPro-PnP-Det):\n\n- Use GaussianMixtureNLLLoss as auxiliary coordinate regression loss\n- Add auxiliary depth and bbox losses\n  \nAt the time of submission (Aug 30, 2022), EPro-PnP-Det v2 **ranks 1st** among all camera-based single-frame object detection models on the [official nuScenes benchmark](https://www.nuscenes.org/object-detection?externalData=no\u0026mapData=no\u0026modalities=Camera) (test split, without extra data).\n\n| Method                                                   | TTA | Backbone |    NDS    |    mAP    |   mATE    |   mASE    |   mAOE    |   mAVE    |   mAAE    | Schedule |\n|:---------------------------------------------------------|:---:|:---------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:--------:|\n| EPro-PnP-Det v2 (ours)                                   |  Y  | R101     | **0.490** |   0.423   |   0.547   | **0.236** | **0.302** |   1.071   |   0.123   |  12 ep   |\n| [PETR](https://github.com/megvii-research/petr)          |  N  | Swin-B   |   0.483   | **0.445** |   0.627   |   0.249   |   0.449   |   0.927   |   0.141   |  24 ep   |\n| [BEVDet-Base](https://github.com/HuangJunJie2017/BEVDet) |  Y  | Swin-B   |   0.482   |   0.422   | **0.529** | **0.236** |   0.395   |   0.979   |   0.152   |  20 ep   |\n| EPro-PnP-Det v2 (ours)                                   |  N  | R101     |   0.481   |   0.409   |   0.559   |   0.239   |   0.325   |   1.090   | **0.115** |  12 ep   |\n| [PolarFormer](https://github.com/fudan-zvg/PolarFormer)  |  N  | R101     |   0.470   |   0.415   |   0.657   |   0.263   |   0.405   | **0.911** |   0.139   |  24 ep   |\n| [BEVFormer-S](https://github.com/zhiqi-li/BEVFormer)     |  N  | R101     |   0.462   |   0.409   |   0.650   |   0.261   |   0.439   |   0.925   |   0.147   |  24 ep   |\n| [PETR](https://github.com/megvii-research/petr)          |  N  | R101     |   0.455   |   0.391   |   0.647   |   0.251   |   0.433   |   0.933   |   0.143   |  24 ep   |\n| [EPro-PnP-Det v1](EPro-PnP-Det_v2)                       |  Y  | R101     |   0.453   |   0.373   |   0.605   |   0.243   |   0.359   |   1.067   |   0.124   |  12 ep   | \n| [PGD](https://github.com/open-mmlab/mmdetection3d)       |  Y  | R101     |   0.448   |   0.386   |   0.626   |   0.245   |   0.451   |   1.509   |   0.127   | 24+24 ep |\n| [FCOS3D](https://github.com/open-mmlab/mmdetection3d)    |  Y  | R101     |   0.428   |   0.358   |   0.690   |   0.249   |   0.452   |   1.434   |   0.124   |    -     |\n\n#### **[EPro-PnP-6DoF v2](https://github.com/tjiiv-cprg/EPro-PnP-v2/tree/main/EPro-PnP-6DoF_v2) for 6DoF pose estimation**\u003cbr\u003e\n\nMain differences to [v1b](EPro-PnP-6DoF):\n\n- Fix w2d scale handling **(very important)**\n- Improve network initialization\n- Adjust loss weights\n\nWith these updates the v2 model can be trained **without 3D models** to achieve better performance (ADD 0.1d = 93.83) than [GDRNet](https://github.com/THU-DA-6D-Pose-Group/GDR-Net) (ADD 0.1d = 93.6), unleashing the full potential of simple end-to-end training.\n\n## Use EPro-PnP in Your Own Model\n\nWe provide a [demo](demo/fit_identity.ipynb) on the usage of the EPro-PnP layer.\n\n## Citation\n\nIf you find this project useful in your research, please consider citing:\n\n```\n@inproceedings{epropnp, \n  author = {Hansheng Chen and Pichao Wang and Fan Wang and Wei Tian and Lu Xiong and Hao Li, \n  title = {EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation}, \n  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, \n  year = {2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftjiiv-cprg%2FEPro-PnP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftjiiv-cprg%2FEPro-PnP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftjiiv-cprg%2FEPro-PnP/lists"}