{"id":13643147,"url":"https://github.com/chumingqian/Model_Compression_For_YOLOV3-V4","last_synced_at":"2025-04-20T21:33:16.539Z","repository":{"id":39159118,"uuid":"318916993","full_name":"chumingqian/Model_Compression_For_YOLOV3-V4","owner":"chumingqian","description":"In this  repository  using the sparse training, group channel pruning and  knowledge distilling for  YOLOV4,","archived":false,"fork":false,"pushed_at":"2023-05-11T06:59:59.000Z","size":9671,"stargazers_count":30,"open_issues_count":1,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-09T14:42:10.245Z","etag":null,"topics":["knowledge-distillation","network-pruning","yolov3","yolov4","yolov4-pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chumingqian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-12-06T00:20:11.000Z","updated_at":"2024-09-23T14:40:28.000Z","dependencies_parsed_at":"2024-01-14T12:28:10.632Z","dependency_job_id":null,"html_url":"https://github.com/chumingqian/Model_Compression_For_YOLOV3-V4","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumingqian%2FModel_Compression_For_YOLOV3-V4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumingqian%2FModel_Compression_For_YOLOV3-V4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumingqian%2FModel_Compression_For_YOLOV3-V4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chumingqian%2FModel_Compression_For_YOLOV3-V4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chumingqian","download_url":"https://codeload.github.com/chumingqian/Model_Compression_For_YOLOV3-V4/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249965566,"owners_count":21352926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["knowledge-distillation","network-pruning","yolov3","yolov4","yolov4-pytorch"],"created_at":"2024-08-02T01:01:42.603Z","updated_at":"2025-04-20T21:33:12.341Z","avatar_url":"https://github.com/chumingqian.png","language":"Python","funding_links":[],"categories":["Lighter and Deployment Frameworks"],"sub_categories":[],"readme":"# Model_Compression_For_YOLOV4\nIn this  repository  using the dynamic sparse training( variable sparse rate s which can speed up the sparse training process), channel pruning and  knowledge distilling for  YOLOV3 and YOLOV4;\n\nAnd  if you like it, give it star. \n\n\n\n\n# 1. YOLOv3v4 -ModelCompression-Training\n\nThis repository  project mainly include three parts.\n\n- Part1.  Common training and sparse training(prepare for the channel pruning) for object detection datasets(COCO2017, VOC, OxforHand).\n\n- Part2.  General model compression algorithm including pruning and knowledge distillation.\n\n- Part3.  A brief introduce for Network quantization .\n\nSource code using Pytorch implementation to [ultralytics/yolov3](https://github.com/ultralytics/yolov3) for yolov3 source code.\n\nFor the  YOLOV4 pytorch version, try this https://github.com/Tianxiaomo/pytorch-YOLOv4.\n\n\n# 2.  Datasets and  Environment Requirements\nMake a COCO or VOC dataset for this project try here [dataset_for_Ultralytics_training](https://github.com/chumingqian/Make_Dataset-for-Ultralytics-yolov3v4).\n\nThe environment is Pytorch \u003e= 1.1.0 , see the ./requiremnts.txt and also can reference the [ultralytics/yolov3](https://github.com/ultralytics/yolov3) ./requirements.txt .\n\n---------\n  \n# 3.   Part1. Common training and sparse training\n\n here sparse training on  object detection datasets  is prepared for the channel pruning .\n\n## 3.1 For the common training use the following command: \n\n   `python3 train.py --data ...  --cfg ...  -pt  --weights ...  --img_size ... --batch-size ... --epochs  ... ` :\n      ` -pt means that will  use the pretrained  model's weight`.\n\n## 3.2 For the sparse training use the:\n```bash\npython3 train.py --data ... -sr --s 0.001 --prune 0  -pt --weights ... --cfg ... --img_size ...  --batch-size 32  --epochs ...\n```\n\n## 3.3 parameter explaination:\n\n`-sr`: Sparse training,`--s`: Specifies the sparsity factor，`--prune` :Specify the sparsity type.\n\n`--prune 0` is the sparsity of normal pruning and regular pruning.\n\n`--prune 1` is the sparsity of shortcut pruning.\n\n`--prune 2` is the sparsity of layer pruning.\n\n-`details see the 2.1`.\n\n\n## 3.4 Notice for the sparse training:\n\n-The reason for using sparse training before we prune the network is that we need to select out the unimportant channels in the network, through the sparse training we can select out and prune  these unimportant channels in the network.\n    \n-When the classes you trian network is not too much, such 1-5 classes. There maybe  no difference with sparse training or without sparse training before prune the network.\n  \n-When the training classes are above 10 clasees, sparse training  play an important role,  in this situation  prune the channel directly  without sparse training  will  bring an irreparable damage to the network's accuracy, even later use the fine-tune  or distillation it brings  a little effect.   Meanwhile, doing the sparse training firstly,  then prune the network it may reduce the network's accuracy  temporary, after we fine-tune or distilling the pruned  network,  the pruned network's accuracy will be regained.\n   \n   \n```bash\npython3 train.py --data ... -sr --s 0.001 --prune 0  -pt --weights ... --cfg ... --img_size ...  --batch-size 32  --epochs ...\n```\n\n\n\n## 3.5  The original weights  distribution  and  sparse training weights distribution\n\n\n\u003cdiv align=\"center\"\u003e\n   \u003cimg src=\"./image_in_readme/01_original_distribution.png\"  height=200\u003e\u003cimg src=\"./image_in_readme/02_sparse_training.png\" height=200\u003e\n   \u003cdiv style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\"\n    \u003e Fig1: left- the original weight distribution  right: the weight distribution after sprse training \u003c/div\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\n\u003ccenter\u003e\n    \u003cimg style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"./image_in_readme/03_sparse_training.png\" width = \"65%\" alt=\"\"/\u003e \n    \u003cdiv style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\"\n    \u003e Fig2: The mAP during the  sparse training on the VOC dataset \u003c/div\u003e\n    \u003cbr\u003e\n\u003c/center\u003e\n\n\n\n  \n\n## 3.6 Testing and detect command:\n\n`python3 test.py --data ... --cfg ... `: Test the mAP@0.5 command\n\n`python3 detect.py --data ... --cfg ... --source ...`: Detection a single image/video command, default address of source is data/samples, the output result is saved in the output file.\n\n-------\n\n\n#  4. Part2 Model compression algorithm \n\nthis part  included pruning and knowledge distillation.\n\n## 4.1 channel pruning types \n|\u003ccenter\u003emethod\u003c/center\u003e |\u003ccenter\u003eadvantage\u003c/center\u003e|\u003ccenter\u003edisadvantage\u003c/center\u003e |\n| --- | --- | --- |\n|Normal pruning        |Not prune for shortcut layer. It has a considerable and stable compression rate that requires no fine tuning.|The compression rate is not extreme.  |\n|Shortcut pruning      |Very high compression rate.  |Fine-tuning is necessary.  |\n|Silmming              |Shortcut fusion method was used to improve the precision of shear planting.|Best way for shortcut pruning|\n|Regular pruning       |Designed for hardware deployment, the number of filters after pruning is a multiple of 2, no fine-tuning, support tiny-yolov3 and Mobilenet series.|Part of the compression ratio is sacrificed for regularization. |\n|layer pruning         |ResBlock is used as the basic unit for purning, which is conducive to hardware deployment. |It can only cut backbone. |\n|layer-channel pruning |First, use channel pruning and then use layer pruning, and pruning rate was very high. |Accuracy may be affected. |\n\n## 4.2  Pruning the network command:\n\n-for the  channel pruning  types:\n```bash\npython3 normal(or regular/shortcut/slim)_prune.py --cfg ... --data ... --weights ... --percent ...\n```\n\n-for the layer pruning(it is actually based on the channel pruning):\n```bash\npython3 layer_prune.py --cfg ... --data ... --weights ... --shortcut ...\n\npython3 layer_channel_prune.py --cfg ... --data ... --weights ... --shortcut ... --percent ...\n```\n\n-Notice that we can get more compression by increasing the percent value, but if the sparsity is not enough and the percent value is too high, the program will report an error.\n\n\n\n## 4.3 Network  Knowledge  distillation:\n\n\n### 4.3.1 Important Notice \n- For the pruned model,  we can fine tune 20~ 50 epochs to recover the pruned model's accuracy!\n- After that we use the pruned and fine-tuned model  as student network, the original network(before sparse training ) to do the  knowledge distillation.\n\n\n-The basic distillation method [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) was proposed by Hinton in 2015, and has been partially improved in combination with the detection network.\n\n### 4.3.2 Knowledge  distillation command, add the `--t_cfg --t_weights --KDstr`  choice:\n\n```bash\npython train.py --data ... --batch-size ... --weights ... --cfg ... --img-size ... --epochs ... --t_cfg ... --t_weights ...\n```\n \n`--t_cfg` :cfg file of teacher model   `--t_weights`: weights file of teacher model    `--KDstr` :KD strategy\n\n\n    `--KDstr 1` KLloss can be obtained directly from the output of teacher network and the output of student network and added to the overall loss.\n    `--KDstr 2` To distinguish between box loss and class loss, the student does not learn directly from the teacher. L2 distance is calculated respectively for student, teacher and GT. When student is greater than teacher, an additional loss is added for student and GT.\n    `--KDstr 3` To distinguish between Boxloss and ClassLoss, the student learns directly from the teacher.\n    `--KDstr 4` KDloss is divided into three categories, box loss, class loss and feature loss.\n    `--KDstr 5` On the basis of KDstr 4, the fine-grain-mask is added into the feature\n\n\nUsually, the original(or unpruned model but has been sparse trained) model is used as the teacher model, and the post-compression model is used as the student model for distillation training to improve the mAP of student network.\n\n-----------\n\n\n\n\n# 5 Part3. A brief introduce for Network quantization\n\n## 5.1 Due to the model weight has been  quantized   from  FP32  to INT8:\n\n- Most our  personal PC  machine  can not  run the quantized  model with this  int8  data type.\n\n- And quantization method  usually  co-operate with  specific hardware  platform,  such  as Xilinx  use Vitis Ai to quantize the model and deploy on the Zynq-ultraScale  series(like pynq-z2, ultra_96_v2, ZCU104);  Nvidia  use  TensorRT to quantize the model  and deploy  it on the Jetson (like Nano, TX1, TX2) ;\n\n- Here  are  the reference [On Ultra_96_v2](https://github.com/chumingqian/Deploy_Yolov4_On_Ultra96_v2), [On Jetson Nano](https://github.com/chumingqian/Deploy_Yolov4_On_Jetson_Nano)  we  use  their  tools  quantize our pruned  yolov4 network  and deploy it on thier  hardware  target.\n\n\n## 5.2 Recently, the Pytorch 1.8 has launch a \"torch.fx\" module:\n\n-This would be a fortune for  us  to  reasearch  on  the  quantization;\n\n-We wish we  can bring  the other  repository  that  focus on  quantization,  God bless us.\n\n\n\n## 5.3 Quantize  command:\n`--quantized 2` Dorefa quantization method\n\n```bash\npython train.py --data ... --batch-size ... --weights ... --cfg ... --img-size ... --epochs ... --quantized 2\n```\n\n`--quantized 1` Google quantization method\n\n```bash\npython train.py --data ... --batch-size ... --weights ... --cfg ... --img-size ... --epochs ... --quantized 3\n```\n\n`--BN_Flod` using BN Flod training, `--FPGA` Pow(2) quantization for FPGA.\n\n\n\n\n\n# Reference: \n----\n-Papers:\nPruning method based on BN layer comes from [Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519).\n\nPruning without fine-tune [Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://arxiv.org/pdf/1802.00124.pdf).\n\nAttenton transfer distilling [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/abs/1612.03928)\n\n-Repositories:\nChannel pruning method based on BN layers for the  yolov3 and yolov4, we recommond the following repository:\n\nhttps://github.com/tanluren/yolov3-channel-and-layer-pruning\n\n[coldlarry/YOLOv3-complete-pruning](https://github.com/coldlarry/YOLOv3-complete-pruning)\n \nhttps://github.com/SpursLipu/YOLOv3v4-ModelCompression-MultidatasetTraining-Multibackbone\n\nThanks  for  your contributions.  \n\n\n\n# Further reading\nHere is our paper [Group channel pruning and spatial attention distilling for object detection](https://link.springer.com/article/10.1007/s10489-022-03293-x)\n\n\n@article{chu2022group,\n  \n  title={Group channel pruning and spatial attention distilling for object detection},\n  \n  author={Chu, Yun and Li, Pu and Bai, Yong and Hu, Zhuhua and Chen, Yongqing and Lu, Jiafeng},\n  \n  journal={Applied Intelligence},\n  \n  pages={1--19},\n  year={2022},\n  \n  publisher={Springer}\n}\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchumingqian%2FModel_Compression_For_YOLOV3-V4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchumingqian%2FModel_Compression_For_YOLOV3-V4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchumingqian%2FModel_Compression_For_YOLOV3-V4/lists"}