{"id":18972502,"url":"https://github.com/junweiliang/aicity_action","last_synced_at":"2025-04-19T16:13:34.419Z","repository":{"id":45767150,"uuid":"482679984","full_name":"JunweiLiang/aicity_action","owner":"JunweiLiang","description":"Code and model for the AI City Challenge (CVPR 2022) Track 3 Action Detection (Naturalistic Driving Action Recognition)","archived":false,"fork":false,"pushed_at":"2023-07-22T04:16:52.000Z","size":1463,"stargazers_count":28,"open_issues_count":0,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T09:51:18.636Z","etag":null,"topics":["action-recognition","ai","ai-city-challenge","computer-vision"],"latest_commit_sha":null,"homepage":"https://www.aicitychallenge.org/2022-challenge-winners/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JunweiLiang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-18T01:44:54.000Z","updated_at":"2024-11-23T08:24:18.000Z","dependencies_parsed_at":"2022-09-06T00:20:55.028Z","dependency_job_id":null,"html_url":"https://github.com/JunweiLiang/aicity_action","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2Faicity_action","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2Faicity_action/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2Faicity_action/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JunweiLiang%2Faicity_action/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JunweiLiang","download_url":"https://codeload.github.com/JunweiLiang/aicity_action/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249223935,"owners_count":21232833,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","ai","ai-city-challenge","computer-vision"],"created_at":"2024-11-08T15:08:58.454Z","updated_at":"2025-04-16T09:33:07.711Z","avatar_url":"https://github.com/JunweiLiang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Modified PySlowFast (with MViT v2) for AI City Challenge\n\n## Introduction\n\n  The AI City Challenge's [Naturalistic Driving Action Recognition](https://www.aicitychallenge.org/2022-challenge-tracks/) intends to temporally localize driver actions given multi-view video streams. \n  Our system, the Stargazer system, achieves **second-place** performance on the [public leaderboard](https://arxiv.org/pdf/2204.10380.pdf) and third-place in the [final test](https://www.aicitychallenge.org/2022-challenge-winners/). \n  Our system is based on the [improved multi-scale vision transformers](https://arxiv.org/abs/2112.01526) and large-scale pretraining on the Kinetics-700 dataset. Our CVPR workshop paper detailing the designs is [here](https://openaccess.thecvf.com/content/CVPR2022W/AICity/papers/Liang_Stargazer_A_Transformer-Based_Driver_Action_Detection_System_for_Intelligent_Transportation_CVPRW_2022_paper.pdf).\n\n  \u003cdiv align=\"center\"\u003e\n    \u003cdiv style=\"\"\u003e\n        \u003cimg src=\"figures/aicity_figure1.png\" height=\"300px\" /\u003e\n        \u003cimg src=\"figures/leaderboard_042022.png\" height=\"300px\" /\u003e\n    \u003c/div\u003e\n  \u003c/div\u003e\n\n## Citations\nIf you find this code useful in your research then please cite\n\n```\n@inproceedings{liang2022stargazer,\n  title={Stargazer: A transformer-based driver action detection system for intelligent transportation},\n  author={Liang, Junwei and Zhu, He and Zhang, Enwei and Zhang, Jun},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={3160--3167},\n  year={2022}\n}\n```\n\n## Requirement\n  + ffmpeg \u003e= 3.4 for cutting the videos into clips for training.\n  + python 3.8, tqdm, decord, opencv, pyav, pytorch\u003e=1.9.0, fairscale\n\n## Data Preparation\n  1. Put all videos into a single path under `data/A1_A2_videos/`. There should be 60 \".MP4\" under this directory\n  2. Download our processed annotations from [here](https://drive.google.com/file/d/1-Xj0HsYJqsA_mdrTBUijp4GHSr8Zrin6/view?usp=sharing). This annotation simply re-formats the original annotation. Put the file under `data/annotations/`\n\n  3. Generate files for training on A1 videos.\n\n     + Get the processed annotations and video cutting cmds\n\n       ```\n       $ python scripts/aicity_convert_anno.py data/annotations/annotation_A1.edited.csv \\\n       data/A1_A2_videos/ data/annotations/processed_anno_original.csv \\\n       A1_cut.sh data/A1_clips/ --resolution=-2:540\n       ```\n       The `processed_anno_original.csv` should have 1115 lines.\n\n     + Cut the videos (you can also directly run bash).\n\n       ```\n       $ mkdir data/A1_clips\n       $ parallel -j 4 \u003c A1_cut.sh\n       ```\n\n     + Make annotation splits (without empty segments, see paper for details)\n\n       ```\n       $ python scripts/aicity_split_anno.py data/annotations/processed_anno_original.csv \\\n       data/annotations/pyslowfast_anno_na0 --method 1\n       ```\n\n     + Make annotation splits (with empty segments)\n\n       ```\n       $ python scripts/aicity_split_anno.py data/annotations/processed_anno_original.csv \\\n       data/annotations/pyslowfast_anno_naempty0 --method 2\n       ```\n\n     + Make annotation files for training on the whole A1 set\n\n       ```\n       $ mkdir data/annotations/pyslowfast_anno_na0/full\n       $ cat data/annotations/pyslowfast_anno_na0/splits_1/train.csv \\\n       data/annotations/pyslowfast_anno_na0/splits_1/val.csv \\\n       \u003e data/annotations/pyslowfast_anno_na0/full/train.csv\n       $ cp data/annotations/pyslowfast_anno_na0/splits_1/val.csv \\\n       data/annotations/pyslowfast_anno_na0/full/\n       ```\n\n     + download pre-trained K700 checkpoints from [here](https://drive.google.com/file/d/1wn1392Kn6CFxcSH6lJpqZky9-PJxqTlY/view?usp=sharing). Put the `k700_train_mvitV2_full_16x4_fromscratch_e200_448.pyth` under `models/`. This model achieves 71.91 top-1 accuracy on Kinetics700 validation sets.\n\n## Training\n  Train using the 16x4, 448 crop K700 pretrained model on A1 videos for 200 epochs, as in the paper.\n  Here we test it with a machine with 3-GPUs (11GB memory per GPU). The code base supports multi-machine training as well.\n\n  First we need to add the code file path (root path) to PYTHONPATH:\n\n  ```\n    $ export PYTHONPATH=$PWD/:$PYTHONPATH;\n  ```\n\n  Remove `Dashboard_User_id_24026_NoAudio_3.24026.533.535.MP4` from `data/annotations/pyslowfast_anno_na0/full/train.csv`.\n\n  Train:\n\n  ```\n    $ mkdir -p exps/aicity_train\n    $ cd exps/aicity_train\n    $ python ../../tools/run_net.py --cfg ../../configs/MVITV2_FULL_B_16x4_CONV_448.yaml \\\n    TRAIN.CHECKPOINT_FILE_PATH ../../models/k700_train_mvitV2_full_16x4_fromscratch_e200_448.pyth \\\n    DATA.PATH_PREFIX ../../data/A1_clips \\\n    DATA.PATH_TO_DATA_DIR ../../data/annotations/pyslowfast_anno_na0/full \\\n    TRAIN.ENABLE True TRAIN.BATCH_SIZE 3 NUM_GPUS 3 TEST.BATCH_SIZE 3 TEST.ENABLE False \\\n    DATA_LOADER.NUM_WORKERS 8 SOLVER.BASE_LR 0.000005 SOLVER.WARMUP_START_LR 1e-7 \\\n    SOLVER.WARMUP_EPOCHS 30.0 SOLVER.COSINE_END_LR 1e-7 SOLVER.MAX_EPOCH 200 LOG_PERIOD 1000 \\\n    TRAIN.CHECKPOINT_PERIOD 100 TRAIN.EVAL_PERIOD 200 USE_TQDM True \\\n    DATA.DECODING_BACKEND decord DATA.TRAIN_CROP_SIZE 448 DATA.TEST_CROP_SIZE 448 \\\n    TRAIN.AUTO_RESUME True TRAIN.CHECKPOINT_EPOCH_RESET True \\\n    TRAIN.MIXED_PRECISION False MODEL.ACT_CHECKPOINT True \\\n    TENSORBOARD.ENABLE False TENSORBOARD.LOG_DIR tb_log \\\n    MIXUP.ENABLE False MODEL.LOSS_FUNC cross_entropy \\\n    MODEL.DROPOUT_RATE 0.5 MVIT.DROPPATH_RATE 0.4 \\\n    SOLVER.OPTIMIZING_METHOD adamw\n  ```\n\n  The model we used that ranks No.2 on the leaderboard was trained using 2x8 A100 GPUs with a global batch size of 64 and a learning rate of 1e-4 (also with gradient check-pointing but no mixed precision training). So for a 3-GPU train, we use a batch size of 3 and a learning rate of 0.000005 according to the linear scaling rule. However, in order to reproduce our results, a similar number of batch size is recommended.\n\n  To run this code on multi-machine with PyTorch DDP, add `--init_method \"tcp://${MAIN_IP}:${PORT}\" --num_shards ${NUM_MACHINE} --shard_id ${INDEX}` to the commands. `${MAIN_IP}` is the IP for the root node. `${INDEX}` is the node's index.\n\n## Inference\n  To get submission file for a test dataset, we need the model, threshold file, the videos, and the video_ids.csv.\n\n  1. Get the model\n\n     Follow the Training process or download our checkpoint from [here](https://drive.google.com/file/d/12LQ_2iZZyFJcUjJ6zpU1CcHCbYEmoGJs/view?usp=sharing). Put the models under `models/`. This is the model that achieves No.2 on the A2 leaderboard.\n\n  2. Get the thresholds. Put them under `thresholds/`.\n\n     + Best public leaderboard threshold from [here](https://drive.google.com/file/d/1_TqeoV7MEuVp0LzlN99t3Kj5TvG-1Ry5/view?usp=sharing). (Empirically searched)\n\n     + Best general leaderboard threshold from [here](https://drive.google.com/file/d/1xu3heJctorJ5QDyXCL2z81cUb3B3cwoN/view?usp=sharing). (Grid searched)\n\n     + A1 pyslowfast_anno_naempty0/splits_1 trained and empirically searched from [here](https://drive.google.com/file/d/14gBk-mckw3eKKGu-rJtW2crn_z-4f9ug/view?usp=sharing).\n\n  3. Run sliding-window classification (single GPU).\n\n     Given a list of video names and the path to the videos, run the model.\n     16x4, 448 model with batch_size=1 will take 5 GB GPU memory to run.\n\n     ```\n      # cd back to the root path\n      $ python scripts/run_action_classification_temporal_inf.py A2_videos.lst data/A1_A2_videos/ \\\n      models/aicity_train_mvitV2_16x4_fromk700_e200_lr0.0001_yeswarmup_nomixup_dp0.5_dpr0.4_adamw_na0_full_448.pyth \\\n      test/16x4_s16_448_full_na0_A2test \\\n      --model_dataset aicity --frame_length 16 --frame_stride 4 --proposal_length 64 \\\n      --proposal_stride 16 --video_fps 30.0  --frame_size 448 \\\n      --pyslowfast_cfg configs/Aicity/MVITV2_FULL_B_16x4_CONV_448.yaml \\\n      --batch_size 1 --num_cpu_workers 4\n     ```\n\n  4. Run post-processing with the given threshold file to get the submission files.\n\n     ```\n     $ python scripts/aicity_inf.py test/16x4_s16_448_full_na0_A2test thresholds/public_leaderboard_thres.txt \\\n     A2_video_ids.csv test/16x4_s16_448_full_na0_A2test.txt --agg_method avg \\\n     --chunk_sort_base_single_vid score --chunk_sort_base_multi_vid length --use_num_chunk 1\n     ```\n\n     The submission file is `test/16x4_s16_448_full_na0_A2test.txt`. This should get F1=0.3295 as on the leaderboard on A2 test.\n\n## Acknowledgement\n  This code base heavily adopts the [PySlowFast](https://github.com/facebookresearch/SlowFast) code base.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunweiliang%2Faicity_action","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjunweiliang%2Faicity_action","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunweiliang%2Faicity_action/lists"}