{"id":18972484,"url":"https://github.com/junweiliang/object_detection_tracking","last_synced_at":"2025-04-05T08:08:09.119Z","repository":{"id":37492936,"uuid":"188732917","full_name":"JunweiLiang/Object_Detection_Tracking","owner":"JunweiLiang","description":"Out-of-the-box code and models for CMU's object detection and tracking system for multi-camera surveillance videos. Speed optimized Faster-RCNN model. Tensorflow based. Also supports EfficientDet. WACVW'20","archived":false,"fork":false,"pushed_at":"2022-12-16T05:20:31.000Z","size":45599,"stargazers_count":443,"open_issues_count":4,"forks_count":128,"subscribers_count":19,"default_branch":"master","last_synced_at":"2024-02-14T21:38:07.314Z","etag":null,"topics":["activity-detection","computer-vision","deep-learning","detection-tracking","efficientdet","maskrcnn","multi-camera","multi-camera-tracking","multi-camera-vehicle-reid","object-detection","reid","surveillance-videos","tracking","tracking-detection","video-object-detection","video-object-tracking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JunweiLiang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-26T21:23:44.000Z","updated_at":"2024-02-13T13:27:35.000Z","dependencies_parsed_at":"2023-01-29T09:45:58.285Z","dependency_job_id":null,"html_url":"https://github.com/JunweiLiang/Object_Detection_Tracking","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2FObject_Detection_Tracking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2FObject_Detection_Tracking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2FObject_Detection_Tracking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2FObject_Detection_Tracking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JunweiLiang","download_url":"https://codeload.github.com/JunweiLiang/Object_Detection_Tracking/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247305934,"owners_count":20917208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["activity-detection","computer-vision","deep-learning","detection-tracking","efficientdet","maskrcnn","multi-camera","multi-camera-tracking","multi-camera-vehicle-reid","object-detection","reid","surveillance-videos","tracking","tracking-detection","video-object-detection","video-object-tracking"],"created_at":"2024-11-08T15:08:57.903Z","updated_at":"2025-04-05T08:08:09.099Z","avatar_url":"https://github.com/JunweiLiang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CMU Object Detection \u0026 Tracking for Surveillance Video Activity Detection\n\nThis repository contains the code and models for object detection and tracking from the CMU [DIVA](https://www.iarpa.gov/index.php/research-programs/diva) system. Our system (INF \u0026 MUDSML) achieves the **best performance** on the ActEv [leaderboard](https://actev.nist.gov/prizechallenge#tab_leaderboard) ([Cached](images/actev-prizechallenge-06-2019.png)).\n\nIf you find this code useful in your research then please cite\n\n```\n@inproceedings{chen2019minding,\n  title={Minding the Gaps in a Video Action Analysis Pipeline},\n  author={Chen, Jia and Liu, Jiang and Liang, Junwei and Hu, Ting-Yao and Ke, Wei and Barrios, Wayner and Huang, Dong and Hauptmann, Alexander G},\n  booktitle={2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)},\n  pages={41--46},\n  year={2019},\n  organization={IEEE}\n}\n@inproceedings{liu2020wacv,\n  author = {Liu, Wenhe and Kang, Guoliang and Huang, Po-Yao and Chang, Xiaojun and Qian, Yijun and Liang, Junwei and Gui, Liangke and Wen, Jing and Chen, Peng},\n  title = {Argus: Efficient Activity Detection System for Extended Video Analysis},\n  booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops},\n  month = {March},\n  year = {2020}\n}\n```\n\n\n## Introduction\nWe utilize state-of-the-art object detection and tracking algorithm in surveillance videos. Our best object detection model basically uses Faster RCNN with a backbone of Resnet-101 with dilated CNN and FPN. The tracking algo (Deep SORT) uses ROI features from the object detection model. The ActEV trained models are good for small object detection in outdoor scenes. For indoor cameras, COCO trained models are better.\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cdiv style=\"\"\u003e\n      \u003cimg src=\"images/Person_vis_video.gif\" height=\"300px\" /\u003e\n      \u003cimg src=\"images/Vehicle_vis_video.gif\" height=\"300px\" /\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\nAlso supports multi-camera tracking and ReID:\n\n\u003cdiv align=\"center\"\u003e\n  \u003cdiv style=\"\"\u003e\n      \u003cimg src=\"images/multi-camera-reid.gif\" height=\"450px\" /\u003e\n      \u003cimg src=\"images/vehicle_multi_reid.gif\" height=\"350px\" /\u003e\n      \u003cimg src=\"images/person_multi_reid2.gif\" height=\"338px\" /\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\n## Updates\n+ [12/2022] Due to CMU shutting down some servers, I have updated the model/data links. Some models are no longer available.\n+ [05/2021] Added multi-camera tracking and ReID. [Instruction.](#tracking-with-tmot-algo-and-ReID)\n+ [05/2021] Added [TMOT](https://github.com/Zhongdao/Towards-Realtime-MOT) tracking and single video ReID. [Instruction.](#tracking-with-tmot-algo-and-ReID)\n+ [12/2020] Added [multi-thread inferencing](#multi-thread-inferencing), another \\~25% speed up.\n+ [12/2020] Added [multiple-image batch inferencing](#multiple-image-batch-inferencing), \\~30% speed up.\n\n+ [10/2020] Added experiments comparing EfficientDet and MaskRCNN on VIRAT and AVA-Kinetics [here](COMMANDS.md#10-2020-comparing-efficientdet-with-maskrcnn-on-video-datasets).\n\n+ [05/2020] Added [EfficientDet (CVPR 2020)](https://github.com/google/automl/tree/master/efficientdet) for inferencing. The D7 model is reported to be more than 12 mAP better than the Resnet-50 FPN model we used. Modified to be more efficient and tested with Python 2 \u0026 3 and TF 1.15. See example commands and notes [here](COMMANDS.md).\n\n+ [02/2020] We used Resnet-50 FPN model trained on MS-COCO for [MEVA](http://mevadata.org/) activity detection and got a competitive pAUDC of [0.49](images/inf_actev_0.49audc_02-2020.png) on the [leaderboard](https://actev.nist.gov/sdl) with a total processing speed of 0.64x real-time on a 4-GPU machine. The object detection module's processing speed is about 0.125x real-time. \\[[Frozen Model](https://precognition.team/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1280x720_rpn300.pb)\\] \\[[Example Command](COMMANDS.md)\\]\n\n+ [01/2020] We discovered a problem with using OpenCV to extract frames for avi videos. Some avi videos have duplicate frames that are not physically presented in the files but only text instructions to duplicate previous frames. The problem is that OpenCV skip these frames without warning according to [this bug report](https://github.com/opencv/opencv/issues/9053) and [here](https://stackoverflow.com/questions/44488636/opencv-reading-frames-from-videocapture-advances-the-video-to-bizarrely-wrong-l/44551037). Therefore with OpenCV you may get fewer frames which causes the frame index of detection results to be incorrect. Solution: 1. convert the avi videos to mp4 format; 2. use MoviePy or PyAV loader but they are 10% ~ 30% slower than OpenCV frame extraction. See `obj_detect_tracking.py` for implementation.\n\n## Dependencies\nThe latest inferencing code is tested with Tensorflow-GPU==1.15 and Python 2/3.\n\nOther dependencies: numpy; scipy; sklearn; cv2; matplotlib; pycocotools\n\n## Code Overview\n- `obj_detect_tracking.py`: Inference code for object detection \u0026 tracking.\n- `models.py`: Main model definition.\n- `nn.py`: Some layer definitions.\n- `main.py`: Code I used for training and testing experiments.\n- `eval.py`: Code I used for getting mAP/mAR.\n- `vis_json.py`: visualize the json outputs.\n- `get_frames_resize.py`: code for extracting frames from videos.\n- `utils.py`: some helper classes like getting moving average of losses and GPU usage.\n\n## Inferencing\n1. First download some test videos and the v3 model (v4-v6 models are un-verified models as we don't have a test set with ground truth):\n```\n$ wget https://precognition.team/shares/diva_obj_detect_models/v1-val_testvideos.tgz\n$ tar -zxvf v1-val_testvideos.tgz\n$ ls v1-val_testvideos \u003e v1-val_testvideos.lst\n$ wget https://precognition.team/shares/diva_obj_detect_models/models/obj_v3_model.tgz\n$ tar -zxvf obj_v3_model.tgz\n```\n\n2. Run object detection \u0026 tracking on the test videos\n```\n$ python obj_detect_tracking.py --model_path obj_v3_model --version 3 --video_dir v1-val_testvideos \\\n--video_lst_file v1-val_testvideos.lst --frame_gap 1 --get_tracking \\\n--tracking_dir test_track_out\n```\nTo have the object detection output in COCO json format, add `--out_dir test_json_out `; To have the bounding box visualization, add `--visualize  --vis_path test_vis_out`.\nTo speed it up, try `--frame_gap 8`, and the tracks between detection frames will be linearly interpolated.\nThe tracking results will be in `test_track_out/` and in MOTChallenge format.\n\nTo run with **EfficientDet** models, download checkpoint from the official [repo](https://github.com/google/automl/tree/master/efficientdet) or [my-d0-snapshot](https://precognition.team/shares/diva_obj_detect_models/models/efficientdet-d0.tar.gz). Then run with `--is_efficientdet` and `--efficientdet_modelname efficientdet-d0`.\n\n\n3. You can also run inferencing with frozen graph (See [this](SPEED.md) for instructions of how to pack the model). Change `--model_path obj_v3.pb` and add `--is_load_from_pb`. It is about 30% faster. For running on [MEVA](http://mevadata.org/) dataset (avi videos \u0026 indoor scenes) or with [EfficientDet](https://github.com/google/automl/tree/master/efficientdet) models, see examples [here](COMMANDS.md).\n\n4. You can also run object detection on a list of images. Suppose you have a file list `imgs.lst` with absolute paths to images. Run with COCO trained MaskRCNN model:\n```\n# get model from Tensorpack\n$ wget http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R101FPN1x.npz\n$ python obj_detect_imgs.py --model_path COCO-MaskRCNN-R101FPN1x.npz --version 2 \\\n--img_lst imgs.lst --out_dir detection_out_maskrcnn --max_size 480 \\\n--short_edge_size 320 --is_coco_model --visualize --vis_path detection_vis_maskrcnn\n```\nAdjust the image input size as you wish. Run with COCO trained EfficientDet model:\n```\n$ https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-d0.tar.gz; tar -zxvf efficientdet-d0.tar.gz\n$ python obj_detect_imgs.py --model_path efficientdet-d0/ --version 2 \\\n--is_efficientdet --efficientdet_modelname efficientdet-d0 --img_lst imgs.lst \\\n --out_dir detection_out_d0 --max_size 480 --short_edge_size 320 --is_coco_model \\\n --visualize --vis_path detection_vis_d0\n```\n\n\n### Visualization\nTo visualize the tracking results:\n```\n# Put \"Person/Vehicle\" tracks visualization into the same video\n$ ls $PWD/v1-val_testvideos/* \u003e v1-val_testvideos.abs.lst\n$ python get_frames_resize.py v1-val_testvideos.abs.lst v1-val_testvideos_frames/ --use_2level\n$ python tracks_to_json.py test_track_out/ v1-val_testvideos.abs.lst test_track_out_json\n$ python vis_json.py v1-val_testvideos.abs.lst v1-val_testvideos_frames/ test_track_out_json/ test_track_out_vis\n# then use ffmpeg to make videos\n$ ffmpeg -framerate 30 -i test_track_out_vis/VIRAT_S_000205_05_001092_001124/VIRAT_S_000205_05_001092_001124_F_%08d.jpg vis_video.mp4\n```\nNow you have the tracking visualization videos for both \"Person\" and \"Vehicle\" class.\n\n\n## Multiple-Image Batch Inferencing\n\n1. First download some test videos:\n```\n$ wget https://precognition.team/shares/diva_obj_detect_models/meva_outdoor_test.tgz\n$ tar -zxvf meva_outdoor_test.tgz\n$ ls meva_outdoor_test \u003e meva_outdoor_test.lst\n```\n\n2. Get the COCO-trained MaskRCNN model from Tensorpack:\n```\n$ wget http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz\n```\n\n3. Run object detection \u0026 tracking on the test videos with batch_size=8 code:\n```\n$ python obj_detect_tracking_multi.py --model_path COCO-MaskRCNN-R50FPN2x.npz --version 2 \\\n--video_dir meva_outdoor_test --video_lst_file meva_outdoor_test.lst --frame_gap 8 \\\n--get_tracking --tracking_dir fpnr50_multib4_trackout_1280x720 --gpuid_start 0 --max_size \\\n1280 --short_edge_size 720 --is_coco --use_lijun --im_batch_size 8 --log\n```\n\nThis should be \\~30% faster than the original batch_size=1 code:\n```\n$ python obj_detect_tracking.py --model_path COCO-MaskRCNN-R50FPN2x.npz --version 2 \\\n--video_dir meva_outdoor_test --video_lst_file meva_outdoor_test.lst --frame_gap 8 \\\n--get_tracking --tracking_dir fpnr50_b1_trackout_1280x720 --gpuid_start 0 --max_size 1280 \\\n--short_edge_size 720 --is_coco --use_lijun --im_batch_size 1 --log\n```\nYou can visualize the results according to [these instructions](#visualization).\nSpeed experiments are recorded [here](./SPEED.md#122020-multiple-image-batch-processing).\n\n## Multi-Thread Inferencing\n\nAdded queue and multi-threading to parallel CPU and GPU. Run object detection \u0026 tracking with multi-thread processing on videos:\n```\n$ python obj_detect_tracking_multi_queuer.py --model_path COCO-MaskRCNN-R50FPN2x.npz --version 2 \\\n--video_dir meva_outdoor_test --video_lst_file meva_outdoor_test.lst --frame_gap 8 \\\n--get_tracking --tracking_dir fpnr50_multib8thread_trackout_1280x720 --gpuid_start 0 --max_size \\\n1280 --short_edge_size 720 --is_coco --use_lijun --im_batch_size 8 --log --prefetch 10\n```\nThis should be 20-30% faster than single-thread. Speed experiments are recorded [here](./SPEED.md#122020-multiple-image-batch-processing).\n\nFor object detection on list of images, we can have a lot more threads, similar to PyTorch's [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader):\n```\n$ python obj_detect_imgs_multi_queuer.py --model_path COCO-MaskRCNN-R50FPN2x.npz --version 2 \\\n--resnet50 --img_lst imgs.lst --out_dir obj_jsons/ --max_size 1920 --short_edge_size 1080 \\\n--is_coco_model --im_batch_size 8 --log --prefetch 10 --num_cpu_worker 4\n```\n## Tracking with TMOT Algo and ReID\nAn alternative to deep SORT. This also uses Kalman filter. Checkout their [paper](https://arxiv.org/pdf/1909.12605v1.pdf).\n\nRun! Note that this by default outputs original detection boxes instead of KF predicted/smoothed boxes.\n```\n$ python obj_detect_tracking_multi_queuer_tmot.py --model_path COCO-MaskRCNN-R50FPN2x.npz --version 2 \\\n--video_dir meva_outdoor_test --video_lst_file meva_outdoor_test.lst --frame_gap 8 \\\n--get_tracking --tracking_dir fpnr50_multib8thread_trackout_1280x720_tmot --gpuid_start 0 --max_size \\\n1280 --short_edge_size 720 --is_coco --use_lijun --im_batch_size 8 --log --prefetch 10\n```\n\nIf you want less ID switches (10-20% less), you can run ReID (Person \u0026 Vehicle) again with the tracking results:\n```\n$ python single_video_reid.py fpnr50_multib8thread_trackout_1280x720_tmot/ meva_outdoor_test.lst \\\nmeva_outdoor_test/ fpnr50_multib8thread_trackout_1280x720_tmot_reid/ --gpuid 0 \\\n--vehicle_reid_model reid_models/vehicle_reid_res101.pth \\\n--person_reid_model reid_models/person_reid_osnet_market.pth --use_lijun\n```\nWe use person-ReID model trained by the [TorchReID repo](https://kaiyangzhou.github.io/deep-person-reid/MODEL_ZOO) and vehicle-ReID model from the winner of AI City Challenge 2020 of [this repo](https://github.com/KevinQian97/ELECTRICITY-MTMC).\n\nNow we support multi-camera tracking and ReID as well. The tracks in each video will be compared based on spatial and feature constraints with bipartite matching. But I have only tested this on the [MEVA dataset](https://gitlab.kitware.com/meva/meva-data-repo)([Link2](http://mevadata.org/)), as it requires camera models for spatial constraints. More detailed instructions in the future.\n\n```\n$ python multi_video_reid.py fpnr50_multib8thread_trackout_1280x720_tmot_reid/ \\\ncamera_group.json meva-data-repo/metadata/camera-models/krtd/ top_down_north_up.json \\\nvideos/ multi_reid_out/ --gpuid 0 --vehicle_reid_model reid_models/vehicle_reid_res101.pth \\\n --person_reid_model reid_models/person_reid_osnet_market.pth --use_lijun \\\n --feature_box_num 100 --feature_box_gap 3 --spatial_dist_tol 100\n```\n\n## Models\nThese are the models you can use for inferencing. The original ActEv annotations can be downloaded from [here](https://next.cs.cmu.edu/data/actev-v1-drop4-yaml.tgz). I will add instruction for training and testing if requested. Click to download each model.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n  \t\u003ctd colspan=\"6\"\u003e\n  \t\t\u003c!--\u003ca href=\"https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_v2_model.tgz\"\u003eObject v2\u003c/a\u003e--\u003e\n      Object v2\n  \t: Trained on v1-train\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eProp\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.831\u003c/td\u003e\n    \u003ctd\u003e0.405\u003c/td\u003e\n    \u003ctd\u003e0.682\u003c/td\u003e\n    \u003ctd\u003e0.982\u003c/td\u003e\n    \u003ctd\u003e0.725\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.906\u003c/td\u003e\n    \u003ctd\u003e0.915\u003c/td\u003e\n    \u003ctd\u003e0.899\u003c/td\u003e\n    \u003ctd\u003e0.983\u003c/td\u003e\n    \u003ctd\u003e0.926\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n  \t\u003ctd colspan=\"6\"\u003e\n  \t\t\u003ca href=\"https://precognition.team/shares/diva_obj_detect_models/models/obj_v3_model.tgz\"\u003eObject v3\u003c/a\u003e\n      \u003c!--(\u003ca href=\"https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_v3.pb\"\u003eFrozen Graph for tf v1.13\u003c/a\u003e)--\u003e\n  \t: Trained on v1-train, Dilated CNN\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eProp\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.836\u003c/td\u003e\n    \u003ctd\u003e0.448\u003c/td\u003e\n    \u003ctd\u003e0.702\u003c/td\u003e\n    \u003ctd\u003e0.984\u003c/td\u003e\n    \u003ctd\u003e0.742\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.911\u003c/td\u003e\n    \u003ctd\u003e0.910\u003c/td\u003e\n    \u003ctd\u003e0.895\u003c/td\u003e\n    \u003ctd\u003e0.985\u003c/td\u003e\n    \u003ctd\u003e0.925\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n  \t\u003ctd colspan=\"6\"\u003e\n  \t\t\u003c!--\u003ca href=\"https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_v4_model.tgz\"\u003eObject v4\u003c/a\u003e--\u003e\n      Object v4\n  \t: Trained on v1-train \u0026 v1-val, Dilated CNN, Class-agnostic\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eProp\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.961\u003c/td\u003e\n    \u003ctd\u003e0.960\u003c/td\u003e\n    \u003ctd\u003e0.971\u003c/td\u003e\n    \u003ctd\u003e0.985\u003c/td\u003e\n    \u003ctd\u003e0.969\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.979\u003c/td\u003e\n    \u003ctd\u003e0.984\u003c/td\u003e\n    \u003ctd\u003e0.989\u003c/td\u003e\n    \u003ctd\u003e0.985\u003c/td\u003e\n    \u003ctd\u003e0.984\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n  \t\u003ctd colspan=\"6\"\u003e\n  \t\t\u003ca href=\"https://precognition.team/shares/diva_obj_detect_models/models/obj_v5_model.tgz\"\u003eObject v5\u003c/a\u003e\n  \t: Trained on v1-train \u0026 v1-val, Dilated CNN, Class-agnostic\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eProp\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.969\u003c/td\u003e\n    \u003ctd\u003e0.981\u003c/td\u003e\n    \u003ctd\u003e0.985\u003c/td\u003e\n    \u003ctd\u003e0.988\u003c/td\u003e\n    \u003ctd\u003e0.981\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.983\u003c/td\u003e\n    \u003ctd\u003e0.994\u003c/td\u003e\n    \u003ctd\u003e0.995\u003c/td\u003e\n    \u003ctd\u003e0.989\u003c/td\u003e\n    \u003ctd\u003e0.990\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n  \t\u003ctd colspan=\"6\"\u003e\n  \t\t\u003c!--\u003ca href=\"https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_v6_model.tgz\"\u003eObject v6\u003c/a\u003e--\u003e\n      Object v6\n  \t: Trained on v1-train \u0026 v1-val, Squeeze-Excitation CNN, Class-agnostic\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eProp\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.973\u003c/td\u003e\n    \u003ctd\u003e0.986\u003c/td\u003e\n    \u003ctd\u003e0.990\u003c/td\u003e\n    \u003ctd\u003e0.987\u003c/td\u003e\n    \u003ctd\u003e0.984\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.984\u003c/td\u003e\n    \u003ctd\u003e0.994\u003c/td\u003e\n    \u003ctd\u003e0.996\u003c/td\u003e\n    \u003ctd\u003e0.988\u003c/td\u003e\n    \u003ctd\u003e0.990\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"6\"\u003e\n      \u003c!--\u003ca href=\"https://aladdin-eax.inf.cs.cmu.edu/shares/diva_obj_detect_models/models/obj_coco_tfv1.14.pb\"\u003eObject COCO\u003c/a\u003e--\u003e\n      Object COCO\n    : COCO trained Resnet-101 FPN model. Better for indoor scenes.\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson\u003c/td\u003e\n    \u003ctd\u003eBike\u003c/td\u003e\n    \u003ctd\u003ePush_Pulled_Object\u003c/td\u003e\n    \u003ctd\u003eVehicle\u003c/td\u003e\n    \u003ctd\u003eMean\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.378\u003c/td\u003e\n    \u003ctd\u003e0.398\u003c/td\u003e\n    \u003ctd\u003eN/A\u003c/td\u003e\n    \u003ctd\u003e0.947\u003c/td\u003e\n    \u003ctd\u003eN/A\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.585\u003c/td\u003e\n    \u003ctd\u003e0.572\u003c/td\u003e\n    \u003ctd\u003eN/A\u003c/td\u003e\n    \u003ctd\u003e0.965\u003c/td\u003e\n    \u003ctd\u003eN/A\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"6\"\u003e\n      \u003ca href=\"https://precognition.team/shares/diva_obj_detect_models/models/obj_coco_resnet50_partial_tfv1.14_1920x1080_rpn300.pb\"\u003eObject COCO partial\u003c/a\u003e\n    : Same model as above with only Person/Vehicle/Bike classes. Save time on NMS. Use it with `--use_partial_classes`\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nActivity Box Experiments:\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"8\"\u003e\n      \u003ca href=\"https://drive.google.com/open?id=1SWvdHJcTDgxgEkX3l0UbhmzU47GYTHYJ\"\u003eBUPT-MCPRL\u003c/a\u003e at the \u003ca href=\"http://activity-net.org/challenges/2019/program.html\"\u003eActivityNet\u003c/a\u003e Workshop, CVPR 2019: 3D Faster-RCNN (Numbers taken from their slides)\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEvaluation\u003c/td\u003e\n    \u003ctd\u003ePerson-Vehicle\u003c/td\u003e\n    \u003ctd\u003ePull\u003c/td\u003e\n    \u003ctd\u003eRiding\u003c/td\u003e\n    \u003ctd\u003eTalking\u003c/td\u003e\n    \u003ctd\u003eTransport_HeavyCarry\u003c/td\u003e\n    \u003ctd\u003eVehicle-Turning\u003c/td\u003e\n    \u003ctd\u003eactivity_carrying\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.232\u003c/td\u003e\n    \u003ctd\u003e0.38\u003c/td\u003e\n    \u003ctd\u003e0.468\u003c/td\u003e\n    \u003ctd\u003e0.258\u003c/td\u003e\n    \u003ctd\u003e0.183\u003c/td\u003e\n    \u003ctd\u003e0.278\u003c/td\u003e\n    \u003ctd\u003e0.235\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"8\"\u003e\n      Our Actbox v1: Trained on v1-train, Dilated CNN, Class-agnostic\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eEval on v1-val\u003c/td\u003e\n    \u003ctd\u003ePerson-Vehicle\u003c/td\u003e\n    \u003ctd\u003ePull\u003c/td\u003e\n    \u003ctd\u003eRiding\u003c/td\u003e\n    \u003ctd\u003eTalking\u003c/td\u003e\n    \u003ctd\u003eTransport_HeavyCarry\u003c/td\u003e\n    \u003ctd\u003eVehicle-Turning\u003c/td\u003e\n    \u003ctd\u003eactivity_carrying\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAP\u003c/td\u003e\n    \u003ctd\u003e0.378\u003c/td\u003e\n    \u003ctd\u003e0.582\u003c/td\u003e\n    \u003ctd\u003e0.435\u003c/td\u003e\n    \u003ctd\u003e0.497\u003c/td\u003e\n    \u003ctd\u003e0.438\u003c/td\u003e\n    \u003ctd\u003e0.403\u003c/td\u003e\n    \u003ctd\u003e0.425\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eAR\u003c/td\u003e\n    \u003ctd\u003e0.780\u003c/td\u003e\n    \u003ctd\u003e0.973\u003c/td\u003e\n    \u003ctd\u003e0.942\u003c/td\u003e\n    \u003ctd\u003e0.876\u003c/td\u003e\n    \u003ctd\u003e0.901\u003c/td\u003e\n    \u003ctd\u003e0.899\u003c/td\u003e\n    \u003ctd\u003e0.899\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Training \u0026 Testing\nInstruction to train a new object detection model is [here](TRAINING.md).\n\n## Training \u0026 Testing (Activity Box)\nInstruction to train a new frame-level activity detection model is [here](ACTIVITY_BOX.md).\n\n## Speed Optimization\n**TL;DR**:\n- TF v1.10 -\u003e v1.13 (CUDA 9 \u0026 cuDNN v7.1 -\u003e CUDA 10 \u0026 cuDNN v7.4) ~ +9% faster\n- Use frozen graph  ~ +30% faster\n- Use TensorRT (FP32/FP16) optimized graph ~ +0% faster\n- Use TensorRT (INT8) optimized graph ?\n\nExperiments are recorded [here](SPEED.md).\n\n## Other things I have tried\nThese are my experiences with working on this [surveillance dataset](https://actev.nist.gov/):\n1. FPN provides significant improvement over non-FPN backbone;\n2. Dilated CNN in backbone also helps but Squeeze-Excitation block is unclear (see model obj_v6);\n3. Deformable CNN in backbone seems to achieve same improvement as dilated CNN but [my implementation](nn.py#L1375) is way too slow.\n4. Cascade RCNN doesn't help (IOU=0.5). I'm using IOU=0.5 in my evaluation since the original annotations are not \"tight\" bounding boxes.\n5. Decoupled RCNN (using a separate Resnet-101 for box classification) slightly improves AP (Person: 0.836 -\u003e 0.837) but takes 7x more time.\n6. SoftNMS shows mixed results and add 5% more computation time to system (since I used the CPU version). So I don't use it.\n7. Tried [Mix-up](https://arxiv.org/abs/1710.09412) by randomly mixing ground truth bounding boxes from different frames. Doesn't improve performance.\n8. Focal loss doesn't help.\n9. [Relation Network](https://arxiv.org/abs/1711.11575) does not improve and the model is huge ([my implementation](nn.py#L100)).\n10. ResNeXt does not see significant improvement on this dataset.\n\n## TODO\n+ ~~Use Python Queue and a separate thread for frame extraction~~ (Done!)\n+ ~~Make batch_size \u003e 1 for inferencing~~ (Done!)\n+ Make batch_size \u003e 1 for training\n\n## Acknowledgements\nI made this code by studying the nice example in [Tensorpack](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN). The EfficientDet part is modified from the official [repo](https://github.com/google/automl/tree/master/efficientdet).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunweiliang%2Fobject_detection_tracking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjunweiliang%2Fobject_detection_tracking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunweiliang%2Fobject_detection_tracking/lists"}