{"id":22270535,"url":"https://github.com/dk-liang/fidtm","last_synced_at":"2025-07-21T11:35:42.848Z","repository":{"id":45690143,"uuid":"317765919","full_name":"dk-liang/FIDTM","owner":"dk-liang","description":"Focal Inverse Distance Transform Maps for Crowd Localization [IEEE TMM]","archived":false,"fork":false,"pushed_at":"2023-03-30T03:32:30.000Z","size":26282,"stargazers_count":149,"open_issues_count":4,"forks_count":41,"subscribers_count":6,"default_branch":"master","last_synced_at":"2023-11-07T14:24:02.193Z","etag":null,"topics":["crowd","crowd-counting","crowd-localization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dk-liang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-02T06:02:38.000Z","updated_at":"2023-11-03T05:35:44.000Z","dependencies_parsed_at":"2023-01-26T05:01:45.591Z","dependency_job_id":null,"html_url":"https://github.com/dk-liang/FIDTM","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dk-liang%2FFIDTM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dk-liang%2FFIDTM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dk-liang%2FFIDTM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dk-liang%2FFIDTM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dk-liang","download_url":"https://codeload.github.com/dk-liang/FIDTM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227914251,"owners_count":17839245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crowd","crowd-counting","crowd-localization"],"created_at":"2024-12-03T12:08:45.845Z","updated_at":"2024-12-03T12:08:46.605Z","avatar_url":"https://github.com/dk-liang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Focal Inverse Distance Transform Map\n[[Project page](https://dk-liang.github.io/FIDTM/)] [[paper](https://arxiv.org/abs/2102.07925)]   \nAn officical implementation of \"Focal Inverse Distance Transform Map for Crowd Localization\" (Accepted by IEEE TMM).   \nWe propose a novel label named Focal Inverse Distance Transform (FIDT) map,  which can represent each head location information.\n\n## News\nWe now provide the predicted coordinates txt files, and other researchers can use them to fairly evaluate the localization performance.\n## Overview\n![avatar](./image/overview.png)\n\n# Visualizations\nCompared with density map\n![avatar](./image/fidtmap.png)\n\nVisualizations for bounding boxes\n![avatar](./image/bounding_boxes.jpeg)\n\n# Progress\n- [x] Testing Code (2021.3.16)\n- [x] Training baseline code (2021.4.29)\n- [x] Pretrained model\n  - [x] ShanghaiA  (2021.3.16)\n  - [x] ShanghaiB  (2021.3.16)\n  - [x] UCF_QNRF (2021.4.29)\n  - [x] JHU-Crowd++ (2021.4.29)\n  - [x] NWPU-Crowd++ (2021.4.29)\n- [x] Bounding boxes visualizations(2021.3.24)\n- [x] Video demo(2021.3.29)\n- [x] Predicted coordinates txt file(2021.8.20)\n# Environment\n\n\tpython \u003e=3.6 \n\tpytorch \u003e=1.4\n\topencv-python \u003e=4.0\n\tscipy \u003e=1.4.0\n\th5py \u003e=2.10\n\tpillow \u003e=7.0.0\n\timageio \u003e=1.18\n\tnni \u003e=2.0 (python3 -m pip install --upgrade nni)\n\n# Datasets\n\n- Download ShanghaiTech dataset from [Baidu-Disk](https://pan.baidu.com/s/15WJ-Mm_B_2lY90uBZbsLwA), passward:cjnx; or [Google-Drive](https://drive.google.com/file/d/1CkYppr_IqR1s6wi53l2gKoGqm7LkJ-Lc/view?usp=sharing)\n- Download UCF-QNRF dataset from [here](https://www.crcv.ucf.edu/data/ucf-qnrf/)\n- Download JHU-CROWD ++ dataset from [here](http://www.crowd-counting.com/)\n- Download NWPU-CROWD dataset from [Baidu-Disk](https://pan.baidu.com/s/1VhFlS5row-ATReskMn5xTw), passward:3awa; or [Google-Drive](https://drive.google.com/file/d/1drjYZW7hp6bQI39u7ffPYwt4Kno9cLu8/view?usp=sharing)\n\n# Generate FIDT Ground-Truth\n\n```\ncd data\nrun  python fidt_generate_xx.py\n```\n\n“xx” means the dataset name, including sh, jhu, qnrf, and nwpu. You should change the dataset path.\n\n# Model\n\nDownload the pretrained model from [Baidu-Disk](https://pan.baidu.com/s/1SaPppYrkqdWeHueNlcvUJw), passward:gqqm, or [OneDrive](https://1drv.ms/u/s!Ak_WZsh5Fl0lhCneubkIv1mTllAZ?e=0zMHSM)\n\n# Quickly test\n\n```\ngit clone https://github.com/dk-liang/FIDTM.git\n```\nDownload Dataset and Model  \nGenerate FIDT map ground-truth  \n\n```\nGenerate image file list: python make_npydata.py\n```\n\n**Test example:**\n\n```\npython test.py --dataset ShanghaiA --pre ./model/ShanghaiA/model_best.pth --gpu_id 0\npython test.py --dataset ShanghaiB --pre ./model/ShanghaiB/model_best.pth --gpu_id 1  \npython test.py --dataset UCF_QNRF --pre ./model/UCF_QNRF/model_best.pth --gpu_id 2  \npython test.py --dataset JHU --pre ./model/JHU/model_best.pth --gpu_id 3  \n```\n**If you want to generate bounding boxes,**\n\n```\npython test.py --test_dataset ShanghaiA --pre model_best.pth  --visual True\n(remember to change the dataset path in test.py)  \n```\n**If you want to test a video,**\n\n```\npython video_demo.py --pre model_best.pth  --video_path demo.mp4\n(the output video will in ./demo.avi; By default, the video size is reduced by two times for inference. You can change the input size in the video_demo.py)\n```\n![avatar](./image/demo.jpeg)\nVisiting [bilibili](https://www.bilibili.com/video/BV17v41187fs?from=search\u0026seid=12553003238808495181) or [Youtube](https://youtu.be/YdH6YpHywM4) to watch the video demonstration. The original demo video can be downloaded from [Baidu-Disk](https://pan.baidu.com/s/1-PD2no_1VPBV-tEa7uLObA), passed: cebh\n\nMore config information is provided in config.py\n# Evaluation localization performance\n| Shanghai Teach Part A | Precision | Recall | F1-measure |\n| :-------------------- | :-------- | :----- | ---------- |\n| σ=4                   | 59.1%     | 58.2%  | 58.6%      |\n| σ=8                   | 78.1%     | 77.0%  | 77.6%      |\n\n| Shanghai Teach Part B | Precision | Recall | F1-measure |\n| :-------------------- | :-------- | :----- | ---------- |\n| σ=4                   | 64.9%     | 64.5%  | 64.7%      |\n| σ=8                   | 83.9%     | 83.2%  | 83.5%      |\n\n| JHU_Crowd++ \u003cbr /\u003e(test set) | Precision | Recall | F1-measure |\n| :-------------------: | :-------: | :----: | :--------: |\n| σ=4                   | 38.9% | 38.7% | 38.8% |\n| σ=8                   | 62.5% | 62.4% | 62.4% |\n\n| UCF_QNRF | Av.Precision | Av.Recall | Av. F1-measure |\n| :-------------------- | :-------- | :----- | ---------- |\n| σ=1....100                   | 84.49% | 80.10% | 82.23% |\n\n| NWPU-Crowd (val set) | Precision | Recall | F1-measure |\n| :-------------------- | :-------- | :----- | ---------- |\n| σ=σ_l               | 82.2% | 75.9% | 78.9% |\n| σ=σ_s | 76.7% | 70.9% | 73.7% |\n\n\n**Evaluation example:**  \n\nFor Shanghai tech,  JHU-Crowd (test set), and NWPU-Crowd (val set):\n\n```\ncd ./local_eval\npython eval.py ShanghaiA  \npython eval.py ShanghaiB\npython eval.py JHU  \npython eval.py NWPU\n```\nFor UCF-QNRF dataset:\n```\npython eval_qnrf.py --data_path path/to/UCF-QNRF_ECCV18 \n```\nFor NWPU-Crowd (test set), please submit the nwpu_pred_fidt.txt to the [website](https://www.crowdbenchmark.com/nwpucrowdloc.html).\n\nWe also provide the predicted coordinates txt file in './local_eval/point_files/', and you can use them to fairly evaluate the other localization metric.   \n\n (We hope the community can provide the predicted coordinates file to help other researchers fairly evaluate the localization performance.)\n\n**Tips**:  \nThe GT format is:\n\n```\n1 total_count x1 y1 4 8 x2 y2 4 8 ..... \n2 total_count x1 y1 4 8 x2 y2 4 8 .....\n```\nThe predicted format is:\n```\n1 total_count x1 y1 x2 y2.....\n2 total_count x1 y1 x2 y2.....\n```\nThe evaluation code is modifed from [NWPU](https://github.com/gjy3035/NWPU-Crowd-Sample-Code-for-Localization).\n\n\n# Training\n\nThe training strategy is very simple. You can replace the density map with the FIDT map in any regressors for training. \n\nIf you want to train based on the HRNET (borrow from the IIM-code [link](https://github.com/taohan10200/IIM/tree/main/model/HR_Net)), please first download the ImageNet pre-trained models from the official [link](https://onedrive.live.com/?authkey=!AKvqI6pBZlifgJk\u0026cid=F7FD0B7F26543CEB\u0026id=F7FD0B7F26543CEB!116\u0026parId=F7FD0B7F26543CEB!105\u0026action=locate), and replace the pre-trained model path in HRNET/congfig.py (__C.PRE_HR_WEIGHTS). \n\nHere, we provide the training baseline code:\n\n**Training baseline example:**\n\n```\npython train_baseline.py --dataset ShanghaiA --crop_size 256 --save_path ./save_file/ShanghaiA \npython train_baseline.py --dataset ShanghaiB --crop_size 256 --save_path ./save_file/ShanghaiB  \npython train_baseline.py --dataset UCF_QNRF --crop_size 512 --save_path ./save_file/QNRF\npython train_baseline.py --dataset JHU --crop_size 512 --save_path ./save_file/JHU\n```\nFor ShanghaiTech, you can train by a GPU with 8G memory. For other datasets, please utilize a single GPU with 24G memory or multiple GPU for training. \n\n**Improvements**\nWe have not studied the effect of some hyper-parameter. Thus, the results can be further improved by using some tricks, such as adjust the learning rate, batch size, crop size, and data augmentation. \n\n# Reference\nIf you find this project is useful for your research, please cite:\n```\n@article{liang2022focal,\n  title={Focal inverse distance transform maps for crowd localization},\n  author={Liang, Dingkang and Xu, Wei and Zhu, Yingying and Zhou, Yu},\n  journal={IEEE Transactions on Multimedia},\n  year={2022},\n  publisher={IEEE}\n}\n```\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdk-liang%2Ffidtm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdk-liang%2Ffidtm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdk-liang%2Ffidtm/lists"}