{"id":23224767,"url":"https://github.com/bryanbocao/vitag","last_synced_at":"2025-04-05T17:22:38.874Z","repository":{"id":51823033,"uuid":"520645299","full_name":"bryanbocao/vitag","owner":"bryanbocao","description":"Repository of the paper ViTag in SECON 2022 and demo (Best Demo Award).","archived":false,"fork":false,"pushed_at":"2024-03-07T18:09:26.000Z","size":411,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T14:25:41.759Z","etag":null,"topics":["deep-learning","multimodal","multimodal-association","multimodal-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bryanbocao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-08-02T20:47:46.000Z","updated_at":"2023-07-31T20:00:59.000Z","dependencies_parsed_at":"2023-01-19T02:30:17.572Z","dependency_job_id":"51bbd9d9-a765-462f-926e-cbcda3caa8b3","html_url":"https://github.com/bryanbocao/vitag","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryanbocao%2Fvitag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryanbocao%2Fvitag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryanbocao%2Fvitag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bryanbocao%2Fvitag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bryanbocao","download_url":"https://codeload.github.com/bryanbocao/vitag/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247370873,"owners_count":20928102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","multimodal","multimodal-association","multimodal-learning"],"created_at":"2024-12-18T23:44:17.584Z","updated_at":"2025-04-05T17:22:38.811Z","avatar_url":"https://github.com/bryanbocao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ViTag\n\nRepository of our paper accepted in [SECON 2022](https://secon2022.ieee-secon.org/program/):\n\n**Bryan Bo Cao**, Abrar Alali, Hansi Liu, Nicholas Meegan, Marco Gruteser, Kristin Dana, Ashwin Ashok, Shubham Jain, **ViTag: Online WiFi Fine Time Measurements Aided Vision-Motion Identity Association in Multi-person Environments**, 2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).\n\n**Bryan Bo Cao**, Abrar Alali, Hansi Liu, Nicholas Meegan, Marco Gruteser, Kristin Dana, Ashwin Ashok, Shubham Jain, **Demo: Tagging Vision with Smartphone Identities by Vision2Phone Translation**, 2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).\nReceived **[Best Demonstration Award](https://secon2022.ieee-secon.org/program/)**\n\n# Vi-Fi Dataset\n\n**New** 01/16/2024: We released the synchronized version (**RAN4model_dfv4p4**) of our data for future usage. This version is convenient for your research without undergoing preprocessing the raw data again. Check out the details in the [DATA.md](https://github.com/bryanbocao/vitag/blob/main/DATA.md) file.\n\n[Official Dataset (Raw Data) link](https://sites.google.com/winlab.rutgers.edu/vi-fidataset/home)\n\n[paperswithcode link](https://paperswithcode.com/dataset/vi-fi-multi-modal-dataset)\n\n## Abstract\nWe demonstrate our system _ViTag_ to associate user identities across multimodal data from cameras and smartphones. _ViTag_ associates a sequence of vision tracker generated bounding boxes with Inertial Measurement Unit (IMU) data and Wi-Fi Fine Time Measurements (FTM) from smartphones. Our system first performs cross-modal translation using a multimodal LSTM encoder-decoder network (_X-Translator_) that translates one modality to another, e.g. reconstructing IMU and FTM readings purely from camera bounding boxes. Next, an association module finds identity matches between camera and phone domains, where the translated modality is then matched with the observed data from the same modality. Our system performs in real-world indoor and outdoor environments and achieves an average Identity Precision Accuracy (IDP) of 88.39% on a 1 to 3 seconds window. Further study on modalities within the phone domain shows the FTM can improve association performance by 12.56% on average.\n\n## Motivation\nAssociating visually detected subjects with corresponding phone identifiers using multimodal data.\n\u003cimg src=\"https://user-images.githubusercontent.com/14010288/182496142-ad216041-b6b4-427c-bc37-d9e40fb5b56b.jpg\" width=\"420\"\u003e\n\n## System Overview\nThe system first translates data from camera domain to phone domain using the proposed model _X-Translator_, then it finds the matching between the reconstructed and observed phone data. Vision tracklets (_T\u003csub\u003ec\u003c/sub\u003e_) are fed into _X-Translator_ to reconstruct the corresponding phone tracklets for IMU (_T\u003csub\u003ei\u003c/sub\u003e'_) and FTM (_T\u003csub\u003ef\u003c/sub\u003e'_). We demonstrate _ViTag_’s association capacity by visualizing the estimated Vision-Phone correspondences in the video stream.\n![system_overview](https://user-images.githubusercontent.com/14010288/182496181-bef70770-8aea-422b-845b-bfba84293240.jpg)\n\n## X-Translator\n_X-Translator_ architecture: A bidirectional LSTM based encoder-decoder model. Encoders are used to learn unimodal representations from vision tracklets (_T\u003csub\u003ec\u003c/sub\u003e_) and IMU data (_T\u003csub\u003ei\u003c/sub\u003e_). A joint represention is then learned for the two modailities, implemented by element-wise summation layer. In the final layer, Decoders translate one modality to another.\n![X-Translator](https://user-images.githubusercontent.com/14010288/182496123-443ffd2c-278a-4900-8419-613da327bce2.jpg)\n\n## Result\n| Method | PDR+PA [1], [2], [4] | Vi-Fi [3] | ViTag |\n| ------------- | ------------- |  ------------- |  ------------- |\n| Avg. IDP | 38.41% | 82.93% | **88.39%** |\n\u003cimg src=\"https://user-images.githubusercontent.com/14010288/182496241-0fe0adda-7897-46e4-815a-2b9da495a434.png\" width=\"500\"\u003e\n\n## References\n[1] J. C. Gower. Generalized procrustes analysis. Psychometrika, 40(1):33–51, 1975. \u003cbr/\u003e\n[2] W. Krzanowski. Principles of multivariate analysis, volume 23. OUP Oxford, 2000. \u003cbr/\u003e\n[3] H. Liu, A. Alali, M. Ibrahim, B. B. Cao, N. Meegan, H. Li, M. Gruteser, S. Jain, K. Dana, A. Ashok, et al. Vi-fi: Associating moving subjects across vision and wireless sensors. \u003cbr/\u003e\n[4] B. Wang, X. Liu, B. Yu, R. Jia, and X. Gan. Pedestrian dead reckoning based on motion mode recognition using a smartphone. Sensors, 18(6):1811, 2018.\n\n---\n\n# Code Instructions\n## Environments\n#### Ubuntu\n```\n18.04.1 LTS\n```\n#### NVIDIA\nNVIDIA-SMI \u0026 Driver Version ```460.32.03```\n\nCUDA Version ```11.2```\n\n#### TensorFlow\n```\ntensorflow              2.3.0\ntensorflow-estimator    2.3.0\ntensorflow-gpu          2.3.0\n```\n#### Keras\n```\nKeras                   2.4.3\nKeras-Applications      1.0.8\nKeras-Preprocessing     1.1.2\n```\n\n## Conda Environment\n```\nconda create --name via --file via.txt\nconda activate via\n```\n\nThis repository contains two main steps: (1) [Data Conversion](https://github.com/bryanbocao/vitag/tree/main/src/data_converters) and (2) [Model](https://github.com/bryanbocao/vitag/tree/main/src/model). **(1) Data Conversion** preprocesses and synchronizes the raw data to prepare a synchronized dataset for **(2) Model**'s usage in a neat way. We provide the processed(RAN4model_dfv3) datasets so that skip the first part and run the second part of model scripts directly.\n\n## Dataset for Model - dfv3\nDownload ```RAN4model``` [[Google Drive](https://drive.google.com/file/d/119aXAow_svc8NuIGbsXDoZgDTa_kyVzT/view?usp=sharing)] [[OneDrive](https://1drv.ms/u/s!AqkVlEZgdjnYi1NHD5LPJeSQ3XiO?e=vcx43N)] and follow the folder structure:\n```\nvitag\n  |-Data\n     |-checkpoints\n     |-datasets\n        |-RAN4model\n  |-src\n     |-...\n```\n\n## Training\n__Indoor-scene0__ ```src/model/BBX5in_IMU19_FTM2/indoor```\n```\ncd src/model/BBX5in_IMU19_FTM2/indoor\n```\nTrain from scratch\n```\npython3 train.py -l mse -tsid_idx 0\n```\nResume training from last checkpoint\n```\npython3 train.py -l mse -tsid_idx 0 -rt\n```\n\n__Outdoor-scene1__ ```src/model/BBX5in_IMU19_FTM2/outdoor```\nTrain from scratch\n```\npython3 train_rand_ss_scene1.py -l mse -tt rand_ss -tsid_idx 0\n```\nResume training from last checkpoint\n```\npython3 train_rand_ss_scene1.py -l mse -tt rand_ss -tsid_idx 0 -rt\n```\nwhere \n```\n-l: training loss function of mse\n-tsid_idx: training sequences by testing sequence id index(0-indexed). For instance, \n  when -tsid_idx 2, the sequence id with index 2 will be used for testing \n  while the remaining 14 sequences in the lab are used for training.\n```\n\n## Testing for Association\n\nNote that Euclidean distance function(eucl) is used for all modalities by default.\n\n### ViTag Models\nDownload [checkpoints](https://drive.google.com/drive/folders/1ETKgtK0Vs0Y8zCcEIfw8-2EyMOgHPIyL?usp=sharing) and follow the folder structure:\n```\nvitag\n  |-Data\n     |-checkpoints\n       |-X22_indoor_BBX5in_IMU19_FTM2_test_idx_0\n       |-...\n       |-X22_outdoor_BBX5in_IMU19_FTM2_rand_ss_scene1_seq_0\n       |-...\n     |-datasets\n  |-src\n     |-...\n```\nPlease add ```-bw``` to load these best weights to reproduce results.\n\n__indoor__ ```src/model/BBX5in_IMU19_FTM2/indoor```\n```\npython3 eval_w_vis_save_pred.py -fps 10 -k 10 -wd_ls 1 1 5 0 -bw -tsid_idx 0\n```\n__outdoor__ ```src/model/BBX5in_IMU19_FTM2/outdoor/eval_w_trained_scene1```\n\nrand_ss \u0026 eucl(by default)\n```\npython3 eval_w_vis_save_pred.py -fps 10 -k 10 -tt rand_ss -wd_ls 1 1 1 0 -bw -tsid_idx 0\n```\nrand_ss \u0026 bhattacharyya distance for FTM\n```\npython3 eval_w_vis_save_pred.py -fps 10 -k 10 -tt rand_ss -wd_ls 1 1 1 0 -bw -f_d b -tsid_idx 0\n```\ncrowded \u0026 eucl(by default)\n```\npython3 eval_w_vis_save_pred.py -fps 10 -k 10 -tt crowded -wd_ls 1 1 1 0 -bw -tsid_idx 0\n```\ncrowded \u0026 bhattacharyya distance for FTM\n```\npython3 eval_w_vis_save_pred.py -fps 10 -k 10 -tt crowded -wd_ls 1 1 1 0 -bw -f_d b -tsid_idx 0\n```\nwhere \n```\n-fps: frame rate\n-k: window length(# of frames in a window)\n-wd_ls: list of weights of distances for different modalities in this order -- \n  0: BBX5, 1: IMU, 2: FTM, 3: D_FTM\n-bw: load best weight, results can be reproduced provided weights\n-tsid_idx: testing sequence index\n-tt: test type of rand_ss or crowded.\n-f_d: distance function for FTM, eucl(by default), b - bhattacharyya distance\n```\n### Result Interpretation\n```\n20201223_142404_f_d_eucl_12_28_21_12_08_49_w_ls_1_1_5_0_gd_cumu_Cam_IDP_0.4505_gd_cumu_Phone_IDP_0.6640_hg_cumu_Cam_IDP_0.6915_hg_cumu_Phone_IDP_0.9426\n```\nwhere\n```\nf_d_eucl: Euclidean distance is used for FTM modality.\nw_ls: list of weights of distances for different modalities in this order -- \n  0: BBX5, 1: IMU, 2: FTM, 3: D_FTM\ngd: Greedy-Matching\nhg: Hungarian\ncumu_Cam_IDP: Cumulative IDP in Camera domain\ncumu_Phone_IDP: Cumulative IDP in Phone domain\n```\nWe report ```hg_cumu_Phone_IDP``` in the paper. In the above example, we achieved IDP of ```0.9426``` in test sequenced index ```3``` in the ```indoor``` dataset.\n\n\n### Baseline - Pedestrian Dead Reckoning + Procrustes Analysis (PDR + PA)\n__indoor__ ```src/model/BBX5in_IMU19_FTM2/indoor```\n```\npython3 eval_prct.py -fps 10 -k 10 -tsid_idx 0\n```\nTest with noise level ```-nl 0.1```\n```\npython3 eval_prct.py -fps 10 -k 10 -tsid_idx 0 -nl 0.1\n```\n__outdoor__ ```src/model/BBX5in_IMU19_FTM2/outdoor/eval_w_trained_scene1```\n\nrand_ss\n```\npython3 eval_prct.py -fps 10 -tt rand_ss -k 10 -tsid_idx 2\n```\n```\npython3 eval_prct.py -fps 10 -tt crowded -k 10 -tsid_idx 2\n```\nTest with noise level ```-nl 0.1```\n```\npython3 eval_prct.py -fps 10 -tt rand_ss -k 10 -tsid_idx 0 -nl 0.1\n```\n\n---\n\n# Citation\nViTag BibTeX:\n```\n@inproceedings{cao2022vitag,\n  title={ViTag: Online WiFi Fine Time Measurements Aided Vision-Motion Identity Association in Multi-person Environments},\n  author={Cao, Bryan Bo and Alali, Abrar and Liu, Hansi and Meegan, Nicholas and Gruteser, Marco and Dana, Kristin and Ashok, Ashwin and Jain, Shubham},\n  booktitle={2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)},\n  pages={19--27},\n  year={2022},\n  organization={IEEE}\n}\n```\n\nVi-Fi (dataset) BibTex:\n```\n@inproceedings{liu2022vi,\n  title={Vi-Fi: Associating Moving Subjects across Vision and Wireless Sensors},\n  author={Liu, Hansi and Alali, Abrar and Ibrahim, Mohamed and Cao, Bryan Bo and Meegan, Nicholas and Li, Hongyu and Gruteser, Marco and Jain, Shubham and Dana, Kristin and Ashok, Ashwin and others},\n  booktitle={2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)},\n  pages={208--219},\n  year={2022},\n  organization={IEEE}\n}\n```\n```\n@misc{vifisite,\n  author        = \"Hansi Liu\",\n  title         = \"Vi-Fi Dataset\",\n  month         = \"Dec. 05,\", \n  year          = \"2022 [Online]\",\n  url           = \"https://sites.google.com/winlab.rutgers.edu/vi-fidataset/home\"\n}\n```\n\n[Reality-Aware Networks Project Website](https://ashwinashok.github.io/realityawarenetworks/)\n\n# Acknowledgement\nThis research has been supported by the National Science Foundation (NSF) under Grant Nos. CNS-2055520, CNS1901355, CNS-1901133. \nWe thank Rashed Rahman,Shardul Avinash, Abbaas Alif, Bhagirath Tallapragada and Kausik Amancherla for their help with data labeling.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbryanbocao%2Fvitag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbryanbocao%2Fvitag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbryanbocao%2Fvitag/lists"}