{"id":13643844,"url":"https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection","last_synced_at":"2025-04-21T06:31:59.967Z","repository":{"id":37663397,"uuid":"283239835","full_name":"AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection","owner":"AlbertoSabater","description":null,"archived":false,"fork":false,"pushed_at":"2023-04-03T13:29:41.000Z","size":9652,"stargazers_count":150,"open_issues_count":8,"forks_count":18,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-09T16:43:11.203Z","etag":null,"topics":["efficient","object-detection","post-processing","video-object-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlbertoSabater.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-07-28T14:39:32.000Z","updated_at":"2024-11-04T11:43:34.000Z","dependencies_parsed_at":"2024-01-14T12:28:15.955Z","dependency_job_id":null,"html_url":"https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlbertoSabater","download_url":"https://codeload.github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250008220,"owners_count":21359949,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficient","object-detection","post-processing","video-object-detection"],"created_at":"2024-08-02T01:01:53.567Z","updated_at":"2025-04-21T06:31:54.953Z","avatar_url":"https://github.com/AlbertoSabater.png","language":"Python","funding_links":[],"categories":["Object Detection Applications","Frameworks"],"sub_categories":[],"readme":"# Robust and efficient post-processing for Video Object Detection (REPP)\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/robust-and-efficient-post-processing-for/video-object-detection-on-imagenet-vid)](https://paperswithcode.com/sota/video-object-detection-on-imagenet-vid?p=robust-and-efficient-post-processing-for)\n\n[[Paper](https://arxiv.org/abs/2009.11050)] [[Supplementary video](https://youtu.be/_awoB6NfnL0)]\n\n__REPP__ is a learning based post-processing method to improve video object detections from any object detector. REPP links detections accross frames by evaluating their similarity and refines their classification and location to suppress false positives and recover misdetections.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/pipeline.png\" alt=\"Post-processing pipeline\" width=\"450\"/\u003e\u003c/p\u003e\n\nREPP improves video detections both for specific Image and Video Object Detectors and it supposes a light computation overhead.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./figures/results_table.png\" alt=\"Results\" width=\"1000\"/\u003e\u003c/p\u003e\n\n\n## Installation\n\nREPP has been tested with Python 3.6.\n\nIts dependencies can be found in _repp_requirements.txt_ file.\n\n```pip install -r repp_requirements.txt```\n\n\n## Quick usage guide\n\nVideo detections must be stored with pickle as tuples (video_name, {frame_dets}) as following:\n\n```\n(\"video_name\", {\"000001\": [ det_1, det_2, ..., det_N ],\n                \"000002\": [ det_1, det_2, ..., det_M ]},\n                ...)\n```\n\nIf the stored predictions file contains detections for different videos, they must be saved as a stream of tuples with the above format.\n\nAnd each detection must have the following format:\n\n```\ndet_1: {'image_id': image_id,     # Same as the used in ILSVRC if applies\n        'bbox': [ x_min, y_min, width, height ],\n        'scores': scores,         # Vector of class confidence scores\n        'bbox_center': (x,y) }    # Relative bounding box center\n```\n\n_bbox_center_ coordinates are bounded by 0 and 1 and referes to the center of the detection when the image has been padded vertically or horizontally to fit a square shape. \n\nCheck [this code](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection/blob/master/demos/YOLOv3/get_repp_predictions.py) for a better insight about the predictions format.\n\nPost-processed detections can be saved both with the COCO or IMDB format.\n\n\n```\npython REPP.py --repp_cfg ./REPP_cfg/cfg.json --predictions_file predictions_file.pckl --store_coco --store_imdb\n```\n\nAs a REPP configuration file, you can use either _fgfa_repp_cfg.json_ or _yolo_repp_cfg.json_. The first one works better with high performing detectors such as SELSA or FGFA and the second one works better for lower quality detectors. We recommend to set _appearance_matching_ to false in the config file since it requires a non-trivial training of extra models and it's not mandatory for the performance bossting. If needed, the following config parameters can be tunned:\n\n* _min_tubelet_score_ and _min_pred_score_: threshold used to suppress low-scoring detections. Higher values speeds up the post-processing execution.\n* _clf_thr_: threshold to suppress low-scoring detections linking. Lower values will lead to more False Positives and higher ones will lead to fewer detections.\n* _recoordinate_std_: higher values lead to a more aggressive recoordinating, lower values to a smoother one.\n\nBelow you will find instructions to perform any video predictions with YOLOv3 and apply REPP.\n\n\n## Demos\n\nIn order to reproduce the results of the paper, you can download the predictions of the different models from the following [link](https://unizares-my.sharepoint.com/:u:/g/personal/asabater_unizar_es/EdtTvM9EklBCsAeLIhwZSvIB4XZQPLTo3h4QU2QFKpb92w?e=dNGY6I) and locate them in the project folder as structured in the downloaded zip folder. \n\nImagenet VID dataset must be downloaded and stored with the following folder structure:\n```\n/path/to/dataset/ILSVRC2015/\n/path/to/dataset/ILSVRC2015/Annotations/DET\n/path/to/dataset/ILSVRC2015/Annotations/VID\n/path/to/dataset/ILSVRC2015/Data/DET\n/path/to/dataset/ILSVRC2015/Data/VID\n/path/to/dataset/ILSVRC2015/ImageSets\n```\n\nFollowing commands will apply the REPP post-processing and will evaluate the results by calculating the mean Average Precision for different object motions:\n\n```\n# YOLO\npython REPP.py --repp_cfg ./REPP_cfg/yolo_repp_cfg.json --predictions_file './demos/YOLOv3/predictions/base_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt  --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb\n\u003e {'mAP_total': 0.7506216640807263, 'mAP_slow': 0.825347229618856, 'mAP_medium': 0.742908326433008, 'mAP_fast': 0.5657881762511975}\n\n# FGFA\npython REPP.py --repp_cfg ./REPP_cfg/fgfa_repp_cfg.json --predictions_file './demos/Flow-Guided-Feature-Aggregation/predictions/base_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb\n\u003e {'mAP_total': 0.8009014265948871, 'mAP_slow': 0.8741923949671497, 'mAP_medium': 0.7909183123072739, 'mAP_fast': 0.6137783055850773}\n\n# SELSA\npython REPP.py --repp_cfg ./REPP_cfg/selsa_repp_cfg.json --predictions_file './demos/Sequence-Level-Semantics-Aggregation/predictions/old_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb\n\u003e {'mAP_total': 0.8421329795837483, 'mAP_slow': 0.8871784038276325, 'mAP_medium': 0.8332090469178383, 'mAP_fast': 0.7109387713303483}\n```\n\nInstead of download the base predictions, you can also compute them. To do so, you must __install the proper dependencies__ for each model as specified in the original model repositories ([YOLOv3](https://github.com/AlbertoSabater/Robust-and-efficient-post-processing-for-video-object-detection/tree/master/demos/YOLOv3), [FGFA](https://github.com/guanfuchen/Flow-Guided-Feature-Aggregation), [SELSA](https://github.com/happywu/Sequence-Level-Semantics-Aggregation)). You must also download their weights and config files from the following [link](https://unizares-my.sharepoint.com/:u:/g/personal/asabater_unizar_es/Ecbuh0leCgdPg0Skl0LYoAYBgAURhldr-6Ng5cgSxBGYvA?e=Ufvn3Q) and locate them in the project folder as structured in the downloaded zip file. Then execute the following commands:\n\n```\n# YOLO\ncd demos/YOLOv3/\npython get_repp_predictions.py --yolo_path ./pretrained_models/ILSVRC/1203_1758_model_8/ --repp_format --add_appearance --from_annotations ../../data_annotations/annotations_val_ILSVRC.txt --dataset_path /path/to/dataset/ILSVRC2015/Data/VID/\n\n# FGFA\ncd demos/Flow-Guided-Feature-Aggregation/fgfa_rfcn/\npython get_repp_predictions.py  --det_path 'path_to_dataset/ILSVRC2015/'\n# SELSA\n\ncd demos/Sequence-Level-Semantics-Aggregation/\npython experiments/selsa/get_repp_predictions.py --dataset_path 'path_to_dataset/ILSVRC2015/'\n```\n\n\n## REPP applied to custom videos\n\nREPP can be also applied to the predictions from any video as long as they have the specified REPP format. Following code shows how to compute YOLO predictions from any video and apply REPP post-processing.\n```\n# Extract YOLOv3 predictions\ncd demos/YOLOv3/\npython get_repp_predictions.py --yolo_path ./pretrained_models/ILSVRC/1203_1758_model_8/ --repp_format --add_appearance --from_video ./test_images/video_1.mp4\n\n# Apply REPP\ncd ../..\npython REPP.py --repp_cfg ./REPP_cfg/yolo_repp_cfg.json --predictions_file './demos/YOLOv3/predictions/preds_repp_app_video_1.pckl' --store_coco\n```\n\n\n## REPP matching model training on ILSVRC\n\nThe present project includes trained linking models both to perform the detection matching with and without appearance descriptors. These models have been trained with data from Imagenet VID, but they are able to improve detections for any other dataset or custom video. These Logistic Regression models have been trained using the following steps, that can be adapted to any other custom dataset:\n\n1. Generate annotations for the Logistic Regression training, based on triplet tuplets (Anchor, Positive, Negative):\n```\npython create_triplet_ilsvrc_annotations.py --path_dataset '/path/to/dataset/ILSVRC2015/'\n```\n2. Generate matching features from the annotations:\n```\npython clf_dataset_generation.py --path_dataset '/path/to/dataset/ILSVRC2015/' --add_appearance\n```\n3. Train and store the Logistic Regression model:\n```\npython train_clf_model.py --add_appearance\n```\n\nPrevious steps include appearance features calculated from a pretrained YOLOv3 model. If you are going to use a different dataset or detection model, it's recommended to omit the _--add_appearance_ parameter.\n\n\n## Citation\n```\n@inproceedings{sabater2020repp,\n  title={Robust and efficient post-processing for Video Object Detection},\n  author={Alberto Sabater, Luis Montesano, Ana C. Murillo},\n  booktitle={International Conference of Intelligent Robots and Systems (IROS)},\n  year={2020}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlbertoSabater%2FRobust-and-efficient-post-processing-for-video-object-detection/lists"}