{"id":21204525,"url":"https://github.com/maudzung/rtm3d","last_synced_at":"2025-04-09T22:19:27.900Z","repository":{"id":44430474,"uuid":"286189464","full_name":"maudzung/RTM3D","owner":"maudzung","description":"Unofficial PyTorch implementation of \"RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving\" (ECCV 2020)","archived":false,"fork":false,"pushed_at":"2020-08-18T05:09:01.000Z","size":4938,"stargazers_count":291,"open_issues_count":19,"forks_count":63,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-02T20:09:35.011Z","etag":null,"topics":["3d-object-detection","autonomous-driving","autonomous-vehicles","centernet","kitti-dataset","monocular-images","pytorch","pytorch-implementation","real-time","rtm3d","self-driving-car"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2001.03343.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maudzung.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-09T07:34:22.000Z","updated_at":"2025-01-11T04:51:12.000Z","dependencies_parsed_at":"2022-07-16T14:30:33.729Z","dependency_job_id":null,"html_url":"https://github.com/maudzung/RTM3D","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maudzung%2FRTM3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maudzung%2FRTM3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maudzung%2FRTM3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maudzung%2FRTM3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maudzung","download_url":"https://codeload.github.com/maudzung/RTM3D/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248119510,"owners_count":21050780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","autonomous-driving","autonomous-vehicles","centernet","kitti-dataset","monocular-images","pytorch","pytorch-implementation","real-time","rtm3d","self-driving-car"],"created_at":"2024-11-20T20:36:12.550Z","updated_at":"2025-04-09T22:19:27.866Z","avatar_url":"https://github.com/maudzung.png","language":"Python","readme":"# RTM3D-PyTorch\n\n[![python-image]][python-url]\n[![pytorch-image]][pytorch-url]\n\nThe PyTorch Implementation of the paper: \n[RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving](https://arxiv.org/pdf/2001.03343.pdf) (ECCV 2020)\n\n---\n\n## Demonstration\n\n![demo](./docs/demo.gif)\n\n## Features\n- [x] Realtime 3D object detection based on a monocular RGB image\n- [x] Support [distributed data parallel training](https://github.com/pytorch/examples/tree/master/distributed/ddp)\n- [x] Tensorboard\n- [x] ResNet-based **K**eypoint **F**eature **P**yramid **N**etwork (KFPN) (Using by setting `--arch fpn_resnet_18`)\n- [ ] Use images from both left and right cameras (Control by setting the `use_left_cam_prob` argument)\n- [ ] Release pre-trained models \n\n\n\n## Some modifications from the paper\n- _**Formula (3)**_:  \n   - A negative value can't be an input of the `log` operator, so please **don't normalize dim** as mentioned in\nthe paper because the normalized dim values maybe less than `0`. Hence I've directly regressed to absolute dimension values in meters.\n   - Use `L1 loss` for depth estimation (applying the `sigmoid` activation to the depth output first).\n\n- _**Formula (5)**_: I haven't taken the absolute values of the ground-truth, \nI have used the **relative values** instead. [The code is here](https://github.com/maudzung/RTM3D/blob/45b9d8af1298a6ad7dacb99a8f538f285696ded4/src/data_process/kitti_dataset.py#L284)\n\n- _**Formula (7)**_: `argmin` instead of `argmax`\n\n- Generate heatmap for the center and vertexes of objects as the CenterNet paper. If you want to use the strategy from RTM3D paper,\nyou can pass the `dynamic-sigma` argument to the `train.py` script.\n\n\n## 2. Getting Started\n### 2.1. Requirement\n\n```shell script\npip install -U -r requirements.txt\n```\n\n### 2.2. Data Preparation\nDownload the 3D KITTI detection dataset from [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d).\n\nThe downloaded data includes:\n\n- Training labels of object data set _**(5 MB)**_\n- Camera calibration matrices of object data set _**(16 MB)**_\n- **Left color images** of object data set _**(12 GB)**_\n- **Right color images** of object data set _**(12 GB)**_\n\nPlease make sure that you construct the source code \u0026 dataset directories structure as below.\n\n### 2.3. RTM3D architecture\n\n\n![architecture](./docs/rtm3d_architecture.png)\n\n\nThe model takes **only the RGB images** as the input and outputs the `main center heatmap`, `vertexes heatmap`, \nand `vertexes coordinate` as the base module to estimate `3D bounding box`.\n\n### 2.4. How to run\n\n#### 2.4.1. Visualize the dataset \n\n```shell script\ncd src/data_process\n```\n\n- To visualize camera images with 3D boxes, let's execute:\n\n```shell script\npython kitti_dataset.py\n```\n\nThen _Press **n** to see the next sample \u003e\u003e\u003e Press **Esc** to quit..._\n\n\n#### 2.4.2. Inference\n\nDownload the trained model from [**_here_**](https://drive.google.com/drive/folders/1lKOLHhWZasoC7cKNLcB714LBDS91whCr?usp=sharing) (will be released),\nthen put it to `${ROOT}/checkpoints/` and execute:\n\n```shell script\npython test.py --gpu_idx 0 --arch resnet_18 --pretrained_path ../checkpoints/rtm3d_resnet_18.pth\n```\n\n#### 2.4.3. Evaluation\n\n```shell script\npython evaluate.py --gpu_idx 0 --arch resnet_18 --pretrained_path \u003cPATH\u003e\n```\n\n#### 2.4.4. Training\n\n##### 2.4.4.1. Single machine, single gpu\n\n```shell script\npython train.py --gpu_idx 0 --arch \u003cARCH\u003e --batch_size \u003cN\u003e --num_workers \u003cN\u003e...\n```\n\n##### 2.4.4.2. Multi-processing Distributed Data Parallel Training\nWe should always use the `nccl` backend for multi-processing distributed training since it currently provides the best \ndistributed training performance.\n\n- **Single machine (node), multiple GPUs**\n\n```shell script\npython train.py --dist-url 'tcp://127.0.0.1:29500' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0\n```\n\n- **Two machines (two nodes), multiple GPUs**\n\n_**First machine**_\n\n```shell script\npython train.py --dist-url 'tcp://IP_OF_NODE1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0\n```\n_**Second machine**_\n\n```shell script\npython train.py --dist-url 'tcp://IP_OF_NODE2:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1\n```\n\nTo reproduce the results, you can run the bash shell script\n\n```bash\n./train.sh\n```\n\n\n#### Tensorboard\n\n- To track the training progress, go to the `logs/` folder and \n\n```shell script\ncd logs/\u003csaved_fn\u003e/tensorboard/\ntensorboard --logdir=./\n```\n\n- Then go to [http://localhost:6006/](http://localhost:6006/):\n\n\n## Contact\n\nIf you think this work is useful, please give me a star! \u003cbr\u003e\nIf you find any errors or have any suggestions, please contact me (**Email:** `nguyenmaudung93.kstn@gmail.com`). \u003cbr\u003e\nThank you!\n\n\n## Citation\n\n```bash\n@article{RTM3D,\n  author = {Peixuan Li,  Huaici Zhao, Pengfei Liu, Feidao Cao},\n  title = {RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving},\n  year = {2020},\n  conference = {ECCV 2020},\n}\n@misc{RTM3D-PyTorch,\n  author =       {Nguyen Mau Dung},\n  title =        {{RTM3D-PyTorch: PyTorch Implementation of the RTM3D paper}},\n  howpublished = {\\url{https://github.com/maudzung/RTM3D-PyTorch}},\n  year =         {2020}\n}\n```\n\n## References\n\n[1] CenterNet: [Objects as Points paper](https://arxiv.org/abs/1904.07850), [PyTorch Implementation](https://github.com/xingyizhou/CenterNet)\n\n## Folder structure\n\n```\n${ROOT}\n└── checkpoints/    \n    ├── rtm3d_resnet_18.pth\n    ├── rtm3d_fpn_resnet_18.pth\n└── dataset/    \n    └── kitti/\n        ├──ImageSets/\n        │   ├── test.txt\n        │   ├── train.txt\n        │   └── val.txt\n        ├── training/\n        │   ├── image_2/ (left color camera)\n        │   ├── image_3/ (right color camera)\n        │   ├── calib/\n        │   ├── label_2/\n        └── testing/  \n        │   ├── image_2/ (left color camera)\n        │   ├── image_3/ (right color camera)\n        │   ├── calib/\n        └── classes_names.txt\n└── src/\n    ├── config/\n    │   ├── train_config.py\n    │   └── kitti_config.py\n    ├── data_process/\n    │   ├── kitti_dataloader.py\n    │   ├── kitti_dataset.py\n    │   └── kitti_data_utils.py\n    ├── models/\n    │   ├── fpn_resnet.py\n    │   ├── resnet.py\n    │   ├── model_utils.py\n    └── utils/\n    │   ├── evaluation_utils.py\n    │   ├── logger.py\n    │   ├── misc.py\n    │   ├── torch_utils.py\n    │   ├── train_utils.py\n    ├── evaluate.py\n    ├── test.py\n    ├── train.py\n    └── train.sh\n├── README.md \n└── requirements.txt\n```\n\n\n## Usage\n\n```\nusage: train.py [-h] [--seed SEED] [--saved_fn FN] [--root-dir PATH]\n                [--arch ARCH] [--pretrained_path PATH] [--head_conv HEAD_CONV]\n                [--hflip_prob HFLIP_PROB]\n                [--use_left_cam_prob USE_LEFT_CAM_PROB] [--dynamic-sigma]\n                [--no-val] [--num_samples NUM_SAMPLES]\n                [--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE]\n                [--print_freq N] [--tensorboard_freq N] [--checkpoint_freq N]\n                [--start_epoch N] [--num_epochs N] [--lr_type LR_TYPE]\n                [--lr LR] [--minimum_lr MIN_LR] [--momentum M] [-wd WD]\n                [--optimizer_type OPTIMIZER] [--steps [STEPS [STEPS ...]]]\n                [--world-size N] [--rank N] [--dist-url DIST_URL]\n                [--dist-backend DIST_BACKEND] [--gpu_idx GPU_IDX] [--no_cuda]\n                [--multiprocessing-distributed] [--evaluate]\n                [--resume_path PATH] [--K K]\n\nThe Implementation of RTM3D using PyTorch\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --seed SEED           re-produce the results with seed random\n  --saved_fn FN         The name using for saving logs, models,...\n  --root-dir PATH       The ROOT working directory\n  --arch ARCH           The name of the model architecture\n  --pretrained_path PATH\n                        the path of the pretrained checkpoint\n  --head_conv HEAD_CONV\n                        conv layer channels for output head0 for no conv\n                        layer-1 for default setting: 64 for resnets and 256\n                        for dla.\n  --hflip_prob HFLIP_PROB\n                        The probability of horizontal flip\n  --use_left_cam_prob USE_LEFT_CAM_PROB\n                        The probability of using the left camera\n  --dynamic-sigma       If true, compute sigma based on Amax, Amin then\n                        generate heamapIf false, compute radius as CenterNet\n                        did\n  --no-val              If true, dont evaluate the model on the val set\n  --num_samples NUM_SAMPLES\n                        Take a subset of the dataset to run and debug\n  --num_workers NUM_WORKERS\n                        Number of threads for loading data\n  --batch_size BATCH_SIZE\n                        mini-batch size (default: 16), this is the totalbatch\n                        size of all GPUs on the current node when usingData\n                        Parallel or Distributed Data Parallel\n  --print_freq N        print frequency (default: 50)\n  --tensorboard_freq N  frequency of saving tensorboard (default: 50)\n  --checkpoint_freq N   frequency of saving checkpoints (default: 5)\n  --start_epoch N       the starting epoch\n  --num_epochs N        number of total epochs to run\n  --lr_type LR_TYPE     the type of learning rate scheduler (cosin or\n                        multi_step)\n  --lr LR               initial learning rate\n  --minimum_lr MIN_LR   minimum learning rate during training\n  --momentum M          momentum\n  -wd WD, --weight_decay WD\n                        weight decay (default: 1e-6)\n  --optimizer_type OPTIMIZER\n                        the type of optimizer, it can be sgd or adam\n  --steps [STEPS [STEPS ...]]\n                        number of burn in step\n  --world-size N        number of nodes for distributed training\n  --rank N              node rank for distributed training\n  --dist-url DIST_URL   url used to set up distributed training\n  --dist-backend DIST_BACKEND\n                        distributed backend\n  --gpu_idx GPU_IDX     GPU index to use.\n  --no_cuda             If true, cuda is not used.\n  --multiprocessing-distributed\n                        Use multi-processing distributed training to launch N\n                        processes per node, which has N GPUs. This is the\n                        fastest way to use PyTorch for either single node or\n                        multi node data parallel training\n  --evaluate            only evaluate the model, not training\n  --resume_path PATH    the path of the resumed checkpoint\n  --K K                 the number of top K\n```\n\n\n\n[python-image]: https://img.shields.io/badge/Python-3.6-ff69b4.svg\n[python-url]: https://www.python.org/\n[pytorch-image]: https://img.shields.io/badge/PyTorch-1.5-2BAF2B.svg\n[pytorch-url]: https://pytorch.org/","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaudzung%2Frtm3d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaudzung%2Frtm3d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaudzung%2Frtm3d/lists"}