{"id":32973329,"url":"https://github.com/MichiganCOG/M-PACT","last_synced_at":"2025-11-16T00:00:58.599Z","repository":{"id":100217527,"uuid":"129807977","full_name":"MichiganCOG/M-PACT","owner":"MichiganCOG","description":"A one stop shop for all of your activity recognition needs.","archived":false,"fork":false,"pushed_at":"2019-05-24T16:53:35.000Z","size":479955,"stargazers_count":106,"open_issues_count":2,"forks_count":24,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-04-18T11:17:10.122Z","etag":null,"topics":["action-recognition","activity-recognition","c3d","computer-vision","convnet","deep-learning","i3d","inception","lrcn","lrcn-network","m-pact","mpact","resnet","resnet-50","temporal-segment-networks","tensorflow","tfrecords","tsn","video"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MichiganCOG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-16T21:28:10.000Z","updated_at":"2024-01-13T19:53:59.000Z","dependencies_parsed_at":"2023-05-12T22:00:16.367Z","dependency_job_id":null,"html_url":"https://github.com/MichiganCOG/M-PACT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MichiganCOG/M-PACT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MichiganCOG%2FM-PACT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MichiganCOG%2FM-PACT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MichiganCOG%2FM-PACT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MichiganCOG%2FM-PACT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MichiganCOG","download_url":"https://codeload.github.com/MichiganCOG/M-PACT/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MichiganCOG%2FM-PACT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284640341,"owners_count":27039411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-15T02:00:06.050Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","activity-recognition","c3d","computer-vision","convnet","deep-learning","i3d","inception","lrcn","lrcn-network","m-pact","mpact","resnet","resnet-50","temporal-segment-networks","tensorflow","tfrecords","tsn","video"],"created_at":"2025-11-13T05:00:41.535Z","updated_at":"2025-11-16T00:00:58.586Z","avatar_url":"https://github.com/MichiganCOG.png","language":"Python","funding_links":[],"categories":["Action Recognition and Video Understanding"],"sub_categories":["Video Representation"],"readme":"# [M-PACT: Michigan Platform for Activity Classification in Tensorflow](https://arxiv.org/abs/1804.05879)\n\nThis python framework provides modular access to common activity recognition models for the use of baseline comparisons between the current state of the art and custom models.\n\u003cbr\u003eThis README will walk you through the process of installing dependencies, downloading and formatting datasets, testing the framework, and expanding the framework to train your own models.\n\nThis repository holds the code and models for the paper \u003cbr\u003e\n[**M-PACT: Michigan Platform for Activity Classification in Tensorflow**](https://arxiv.org/abs/1804.05879), [Eric Hofesmann](https://github.com/ehofesmann), [Madan Ravi Ganesh](https://github.com/zeonzir), and [Jason J. Corso](http://web.eecs.umich.edu/~jjcorso/), arXiv, April 2018.\n\n**ATTENTION**: Please cite the arXiv paper introducing this platform when releasing any work that used this code.\n\u003cbr\u003e Link: https://arxiv.org/abs/1804.05879\n\n\n### Implemented Model's Classification Accuracy:\n\n|  Model Architecture  |      Dataset (Split 1)      |  M-PACT Accuracy (%)  |  Original Authors Accuracy (%) |  \n|:----------:|:------:| :----:| :----:|\n| I3D | HMDB51 | [68.10](#i3d-training-hmdb51) |  74.80* |\n| C3D | HMDB51 | [51.90](#c3d-training-hmdb51) | 50.30* |\n| TSN | HMDB51 | [51.70](#tsn-training-hmdb51) |  54.40 |\n| ResNet50 + LSTM |   HMDB51   | [43.86](#resnet50-lstm-training-hmdb51) |  43.90  |\n|||||\n| I3D | UCF101 |  [92.55](#i3d-training-ucf101)  |  95.60* |\n| C3D | UCF101 |  [93.66](#c3d-fine-tuning-ucf101)   |  82.30* |\n| TSN | UCF101 |  [85.25](#tsn-training-ucf101)   |  85.50 |\n| ResNet50 + LSTM |   UCF101   |  [80.20](#resnet50-lstm-training-ucf101)  |  84.30 |\n\n(*) Indicates that results are shown across all three splits\n\n## Table of Contents\n\n\n\n* [Introduction and Setup](#introduction-and-setup)\n    *  [Requirements](#requirements)\n\t*  [Configuring Datasets](#configuring-datasets)\n\t*  [Using the Framework](#using-the-framework)\n\t*  [Framework File Structure](#framework-file-structure)\n\t*  [Examples of Common Uses](#examples-of-common-uses)\n* [Add Custom Components](#add-custom-components)\n\t* [Adding a Model](#adding-a-model)\n\t* [Adding a Dataset](#adding-a-dataset)\n* [Results](#expected-results)\n* [Version History](#version-history)\n* [Acknowledgements](#acknowledgements)\n* [Code Acknowledgements](#code-acknowledgements)\n* [References](#references)\n\n## Introduction and Setup\n\n### Common Datasets:\n\n* HMDB51\n* UCF101\n* Kinetics\n* Moments in Time\n\n\n### Requirements\n\n#### Hardware and Software:\n* Nvidia Graphics Card\n* Ubuntu 16.04\n* Python 2.7\n* Cuda\n* Cudnn\n* Gflags\n\n#### Python Dependencies (All can be installed using pip):\n* [Tensorflow 1.2.1](https://www.tensorflow.org/install/install_linux)\n* [Numpy](https://askubuntu.com/questions/868599/how-to-install-scipy-and-numpy-on-ubuntu-16-04?utm_medium=organic\u0026utm_source=google_rich_qa\u0026utm_campaign=google_rich_qa)\n* [Scikit Learn](http://scikit-learn.org/stable/install.html)\n* [H5py](http://docs.h5py.org/en/2.7.1/build.html)\n* [OpenCV](https://pypi.org/project/opencv-python/) (Only for dataset to tfrecords conversion, can use other video reading programs)\n\n### Configuring Datasets\n\nIn order to use this framework, the datasets will need to be downloaded and formatted correctly.  Datasets are not included and must be downloaded and converted to TFRecords format. Converting dataset videos into TFRecords binary files allows for optimized tensorflow data loading and processing.  \n\nMethods to import and configure datasets correctly can be found in the section [Adding a Dataset](#adding-a-dataset).\n\n\n\n### Using the framework\n\nFrom the root directory, the training and testing is done through `train.py` and `test.py`.\nImplemented models can be used if the weights have been acquired.\nDownload weights and mean files by running the script `sh scripts/shell/download_weights.sh`.\n\nNOTE: The download links may not work for users in China. Alternative downloads can be found here: http://academictorrents.com/details/dcea7fa53925b31215bd8437d2f0805253d6b00f\nand https://app.nihaocloud.com/d/fb0c387c9a644f86b257/\n\nEx. Train ResNet50+LSTM on HMDB51 using 4 GPUs\n\n```\npython train.py  --model resnet  --dataset HMDB51  --numGpus 4  --load 0  --size 224  --inputDims 50  --outputDims 51  --seqLength 50  --dropoutRate 0.5  --expName example_1  --numVids 3570  --lr 0.01  --nEpochs 30  --baseDataPath /data  --fName trainlist  --optChoice adam\n```\n\n\nThe parameters to train are:\n\n```\npython  train.py \\\n\n--model             The model archetecture to be used (i3d, c3d, tsn, resnet)   **REQUIRED**\n\n--dataset           The dataset to use for training (UCF101, HMDB51)    **REQUIRED**\n\n--size              Size of the input frame into network, sets both height and width (224 for ResNet, I3D, TSN and 112 for C3D) **REQUIRED**\n\n--inputDims         Input dimensions (number of frames to pass into model)  **REQUIRED**\n\n--outputDims        Output dimensions (number of classes in dataset)    **REQUIRED**\n\n--seqLength         Sequence length when output from model (50 for ResNet50, 250 for TSN, 1 for I3D and C3D)    **REQUIRED**\n\n--expName           Experiment name **REQUIRED**\n\n--baseDataPath      The path to where all datasets are stored (Ex. For HMDB51, this directory should then contain tfrecords_HMDB51/Split1/trainlist/exampleVidName.tfrecords)   **REQUIRED**\n\n--fName\t\t\t    Which dataset list to use (trainlist, testlist, vallist)    **REQUIRED**\n\n--numGpus           Number of GPUs to train on over a single node (default 1)\n\n--train             1 or 0 whether to set up model in testing or training format (default 1)\n\n--load              1 or 0 whether to use the current trained checkpoints with the same experiment_name or to train from random initialized weights\n\n--modelAlpha        Optional rsampling factor constant value resampling or initializing other resampling strategies maininly during training.\n\n--inputAlpha        Resampling factor for constant value resampling of input video, used mainly for testing models.\n\n--dropoutRate      Value indicating proability of keeping inputs of the model's dropout layers. (defaulat 0.5)\n\n--freeze            Freeze weights during training of any layers within the model that have the option manually set. (default 0)\n\n--numVids           Number of videos to train on within the specified split\n\n--lr                Initial learning rate (Default 0.001)\n\n--wd                Weight decay value for training layers (Defaults 0.0)\n\n--lossType          String defining loss type associated with chosen model (multiple losses are optionally defined in model)\n\n--nEpochs           Number of epochs to train over (default 1)\n\n--split             Dataset split to use (deafult 1)\n\n--saveFreq\t\t    Frequency in epochs to save model checkpoints (default 1 aka every epoch)\n\n--returnLayer\t    Which model layers to be returned by the model's inference during testing. ('logits' during training)\n\n--optChoice         String indication optimizer choice (Default sgd)\n\n--gradClipValue     Value of normalized gradient at which to clip (Default 5.0)\n\n--clipLength        Length of clips to cut video into (default -1 indicates using the entire video as one clip)\n\n--videoOffset       (none or random) indicating where to begin selecting video clips assuming clipOffset is none\n\n--clipOffset        (none or random) indicating if clips are selected sequentially or randomly\n\n--numClips          Number of clips to break video into, -1 indicates breaking the video into the maximum number of clips based on clipLength, clipOverlap, and clipOffset\n\n--clipStride        Number of frames that overlap between clips, 0 indicates no overlap and -1 indicates clips are randomly selected and not sequential\n\n--batchSize         Number of clips to load into the model each step (default 1)\n\n--metricsDir        Name of sub directory within experiment to store metrics. Unique directory names allow for parallel testing.\n\n--metricsMethod     Which method to use to calculate accuracy metrics. During training only used to set up correct file structure. (avg_pooling, last_frame, svm, svm_train or extract_features)\n\n--preprocMethod     Which preprocessing method to use, allows for the use of multiple preprocessing files per model architecture\n\n--randomInit        Randomly initialize model weights, not loading from any files (deafult False)\n\n--shuffleSeed       Seed integer for random shuffle of files in load_dataset function\n\n--preprocDebugging  Boolean indicating whether to load videos and clips in a queue or to load them directly for debugging. Errors in preprocessing setup will not show up properly otherwise (Default 0)\n\n--loadedCheckpoint  Specify the step of the saved model checkpoint that will be loaded for testing. Defaults to most recently saved checkpoint.\n\n--gpuList           List of GPU IDs to be used\n\n--gradClipValue     Value of normalized gradient at which to clip.\n\n--lrboundary        List of boundary epochs at which lr will be updated\n\n--lrvalues          List of lr multiplier values, length of list must equal lrboundary\n\n--loadWeights       String which can be used to specify the default weights to load.\n\n--verbose           Boolean switch to display all print statements or not\n```\n\n\nThe parameters to test are:\n\n```\npython  test.py \\\n\n--model             The model archetecture to be used (i3d, c3d, tsn, resnet)   **REQUIRED**\n\n--dataset           The dataset to use (UCF101, HMDB51) **REQUIRED**\n\n--size              Size of the input frame into network, sets both height and width (224 for ResNet, I3D, TSN and 112 for C3D) **REQUIRED**\n\n--inputDims         Input dimensions (number of frames to pass into model)  **REQUIRED**\n\n--outputDims        Output dimensions(number of classes in dataset) **REQUIRED**\n\n--seqLength         Sequence length when output from model (50 for ResNet50, 250 for TSN, 1 for I3D and C3D)    **REQUIRED**\n\n--expName           Experiment name **REQUIRED**\n\n--numVids           Number of videos to test on within the split   **REQUIRED**\n\n--fName\t\t\t    Which dataset list to use (trainlist, testlist, vallist)    **REQUIRED**\n\n--loadedDataset\t    Dataset that the model was trained on. This is to be used when testing a model on a different dataset than it was trained on.   **REQUIRED**\n\n--train             0 or 1 whether to set up model in testing or training format (default 0)\n\n--load              1 or 0 whether to use the current trained checkpoints with the same experiment_name or to test from default weights (default 1)\n\n--modelAlpha        Resampling factor constant value resampling or initializing other resampling strategies maininly during training, optional.\n\n--inputAlpha        Resampling factor for constant value resampling of input video, used mainly for testing models.\n\n--dropoutRate       Value indicating proability of keeping inputs of the model's dropout layers. (defaulat 0.5)\n\n--freeze            Freeze weights during training of any layers within the model that have the option manually set. (default 0)\n\n--split             Dataset split to use (default 1)\n\n--baseDataPath      The path to where all datasets are stored (Ex. For HMDB51, this directory should then contain tfrecords_HMDB51/Split1/testlist/exampleVidName.tfrecords)\n\n--returnLayer\t    String indicating which layer to apply 'metricsMethod' on (default ['logits'])\n\n--gpuList           List of GPU device ids to be used, must be \u003c= 1 for testing.\n\n--clipLength        Length of clips to cut video into, -1 indicates using the entire video as one clip\n\n--videoOffset       (none or random) indicating where to begin selecting video clips assuming clipOffset is none\n\n--clipOffset        (none or random) indicating if clips are selected sequentially or randomly\n\n--numClips          Number of clips to break video into, -1 indicates breaking the video into the maximum number of clips based on clipLength, clipOverlap, and clipOffset\n\n--clipStride        Number of frames that overlap between clips, 0 indicates no overlap and -1 indicates clips are randomly selected and not sequential\n\n--metricsMethod     Which method to use to calculate accuracy metrics. (avg_pooling, last_frame, svm, svm_train or extract_features)\n\n--preprocMethod     Which preprocessing method to use, allows for the use of multiple preprocessing files per model architecture\n\n--batchSize         Number of clips to load into the model each step (default 1)\n\n--metricsDir        Name of sub directory within experiment to store metrics. Unique directory names allow for parallel testing.\n\n--loadedCheckpoint  Specify the step of the saved model checkpoint that will be loaded for testing. (Defaults to most recent checkpoint)\n\n--randomInit        Randomly initialize model weights, not loading from any files (Default 0)\n\n--avgClips          Boolean indicating whether to average predictions across clips (Default 0)\n\n--useSoftmax        Boolean indicating whether to apply softmax to the inference of the model (Default 1)\n\n--preprocDebugging  Boolean indicating whether to load videos and clips in a queue or to load them directly for debugging. Errors in preprocessing setup will not show up properly otherwise (Default 0)\n\n--loadWeights       String which can be used to specify the default weights to load.\n\n--verbose           Boolean switch to display all print statements or not\n```\n\nEx. Test C3D on UCF101 split 1\n\n```\npython test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112  --expName example_2 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /data --fName testlist\n```\n\n### Framework File Structure\n```\n/tf-activity-recognition-framework\n   train.py  \n   test.py\n   create_model.py\n   load_a_video.py\n\n   /models\n        /model_name\n            modelname_model.py\n            default_preprocessing.py\n            model_weights.npy shortcut to ../weights/model_weights.npy (Optional)\n\n        /weights\n            model_weights.npy\n\n   /results  \n        /model_name\n            /dataset_name\n                /preprocessing_method\n                    /experiment_name\n        \t            /checkpoints\n        \t                checkpoint\n        \t                checkpoint-100.npy\n        \t                checkpoint-100.dat\n        \t            /metrics_method\n        \t                testing_results.npy\n\n    /logs\n        /model_name\n            /dataset_name\n                /preprocessing_method\n                    /metrics_method\n                        /experiment_name\n                            tensorboard_log\n\n    /scripts\n        /shell\n            download_weights.sh\n\n    /utils\n        generate_tfrecords_dataset.py\n        convert_checkpoint.py\n        checkpoint_utils.py\n        layers_utils.py\n        metrics_utils.py\n        preprocessing_utils.py\n        sys_utils.py\n        logger.py\n\n\n\n```\n`train.py` - Train a model\n\n`test.py` - Test a model\n\n`create_model.py` - Create model and preprocessing files for your custom model, include function that need to be filled in that can be found at [Adding a Model](#adding-a-model)\n\n`load_a_video.py` - Load a video using the M-PACT input pipeline to ensure proper conversion of a dataset.\n\n\nmodels - Includes the model class and video preprocessing required for that model\n\nresults - Saved model weights at specified checkpoints\n\nlogs - Tensorboard logs\n\nscripts - Scripts to set up the platform. Ex: downloading necessary weights\n\nutils - Python programs containing functions commonly used across other modules in this platform\n\n\n\n\n\n\n\n\n\n\n\n\n### Examples of Common Uses\n\n#### Testing using existing models\n\n\n#### Training models from scratch\n\n\n\n\n\n\n\n\n\n## Add Custom Components\n\n### Adding a model\n\n\n##### Step 1: Create Model Directory Structure\n\nRun the python prgoram `create_model.py`:\n```\npython create_model.py --modelName MyModel\n```\n\n\n##### Step 2: Add Model Functions\n\nNavigate to the model file:\n```\n/models/mymodel/mymodel_model.py\n```\n\nRequired functions to fill in:\n\ninference():\n```\n    def inference(self, inputs, is_training, input_dims, output_dims, seq_length, batch_size, scope, dropout_rate = 0.5, return_layer=['logits'], weight_decay=0.0):\n        \"\"\"\n        Args:\n            :inputs:       Input to model of shape [BatchSize x Frames x Height x Width x Channels]\n            :is_training:  Boolean variable indicating phase (TRAIN OR TEST)\n            :input_dims:   Length of input sequence\n            :output_dims:  Integer indicating total number of classes in final prediction\n            :seq_length:   Length of output sequence from LSTM\n            :scope:        Scope name for current model instance\n            :dropout_rate: Value indicating proability of keep inputs\n            :return_layer: String matching name of a layer in current model\n            :weight_decay: Double value of weight decay\n            :batch_size:   Number of videos or clips to process at a time\n\n        Return:\n            :layers[return_layer]: The requested layer's output tensor\n        \"\"\"\n\n        ############################################################################\n        #                       Add MODELNAME Network Layers HERE                  #\n        ############################################################################\n\n        if self.verbose:\n            print('Generating MODELNAME network layers')\n\n        # END IF\n\n        with tf.name_scope(scope, 'MODELNAME', [inputs]):\n            layers = {}\n\n            ########################################################################################\n            #        TODO: Add any desired layers from layers_utils to this layers dictionary      #\n            #                                                                                      #\n            #       EX: layers['conv1'] = conv3d_layer(input_tensor=inputs,                        #\n            #           filter_dims=[dim1, dim2, dim3, dim4],                                      #\n            #           name=NAME,                                                                 #\n            #           weight_decay = wd)                                                         #\n            ########################################################################################\n\n\n            ########################################################################################\n            #       TODO: Final Layer must be 'logits'                                             #\n            #                                                                                      #\n            #  EX:  layers['logits'] = [fully_connected_layer(input_tensor=layers['previous'],     #\n            #                                         out_dim=output_dims, non_linear_fn=None,     #\n            #                                         name='out', weight_decay=weight_decay)]      #\n            ########################################################################################\n\n            layers['logits'] = # TODO Every model must return a layer named 'logits'\n\n            layers['logits'] = tf.reshape(layers['logits'], [batch_size, seq_length, output_dims])\n\n        # END WITH\n\n        return [layers[x] for x in return_layer]\n```\n\npreprocess_tfrecords():\n```\n    def preprocess_tfrecords(self, input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step):\n        \"\"\"\n        Args:\n            :input_data_tensor:     Data loaded from tfrecords containing either video or clips\n            :frames:                Number of frames in loaded video or clip\n            :height:                Pixel height of loaded video or clip\n            :width:                 Pixel width of loaded video or clip\n            :channel:               Number of channels in video or clip, usually 3 (RGB)\n            :input_dims:            Number of frames used in input\n            :output_dims:           Integer number of classes in current dataset\n            :seq_length:            Length of output sequence\n            :size:                  List detailing values of height and width for final frames\n            :label:                 Label for loaded data\n            :is_training:           Boolean value indication phase (TRAIN OR TEST)\n            :video_step:            Tensorflow variable indicating the total number of videos (not clips) that have been loaded\n        \"\"\"\n\n        ####################################################\n        # TODO: Add more preprcessing arguments if desired #\n        ####################################################\n\n        return preprocess(input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step, self.input_alpha)\n\n```\n\nloss():\n```\n    \"\"\" Function to return loss calculated on given network \"\"\"\n    def loss(self, logits, labels, loss_type):\n        \"\"\"\n        Args:\n           :logits:     Unscaled logits returned from final layer in model\n           :labels:     True labels corresponding to loaded data\n           :loss_type:  Allow for multiple losses that can be selected at run time. Implemented through if statements\n        \"\"\"\n\n        ####################################################################################\n        #  TODO: ADD CUSTOM LOSS HERE, DEFAULT IS CROSS ENTROPY LOSS                       #\n        #                                                                                  #\n        #   EX: labels = tf.cast(labels, tf.int64)                                         #\n        #       cross_entropy_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, #\n        #                                                            logits=logits)        #\n        #        return cross_entropy_loss                                                 #\n        ####################################################################################\n```\n\n(Optional) load_default_weights():\n```\n    def load_default_weights(self):\n        \"\"\"\n        return: Numpy dictionary containing the names and values of the weight tensors used to initialize this model\n        \"\"\"\n\n        ############################################################################\n        # TODO: Add default model weights to models/weights/ and import them here  #\n        #                          ( OPTIONAL )                                    #\n        #                                                                          #\n        # EX: return np.load('models/weights/model_weights.npy')                   #\n        #                                                                          #\n        ############################################################################\n```\n\n\n\n#### Step 3: Add Model Preprocessing Steps\nNavigate to the preprocessing file:\n```\n/models/mymodel/default_preprocessing.py\n```\n\nRequired functions to fill in:\n\npreprocess():\n```\ndef preprocess(input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step, input_alpha=1.0):\n    \"\"\"\n    Preprocessing function corresponding to the chosen model\n    Args:\n        :input_data_tensor: Raw input data\n        :frames:            Total number of frames\n        :height:            Height of frame\n        :width:             Width of frame\n        :channel:           Total number of color channels\n        :input_dims:        Number of frames to be provided as input to model\n        :output_dims:       Total number of labels\n        :seq_length:        Number of frames expected as output of model\n        :size:              Output size of preprocessed frames\n        :label:             Label of current sample\n        :istraining:        Boolean indicating training or testing phase\n\n    Return:\n        Preprocessing input data and labels tensor\n    \"\"\"\n\n    # Allow for resampling of input during testing for evaluation of the model's stability over video speeds\n    input_data_tensor = tf.cast(input_data_tensor, tf.float32)\n    input_data_tensor = resample_input(input_data_tensor, frames, frames, input_alpha)\n\n    # Apply preprocessing related to individual frames (cropping, flipping, resize, etc.... )\n    input_data_tensor = tf.map_fn(lambda img: preprocess_image(img, size[0], size[1], is_training=istraining, resize_side_min=size[0]), input_data_tensor)\n\n\n    ##########################################################################################################################\n    #                                                                                                                        #\n    # TODO: Add any video related preprocessing (looping, resampling, etc.... Options found in utils/preprocessing_utils.py) #\n    #                                                                                                                        #\n    ##########################################################################################################################\n\n\n    return input_data_tensor\n```\n\npreprocess_for_train():\n```\ndef preprocess_for_train(image, output_height, output_width, resize_side):\n    \"\"\"Preprocesses the given image for training.\n    Args:\n    image: A `Tensor` representing an image of arbitrary size.\n    output_height: The height of the image after preprocessing.\n    output_width: The width of the image after preprocessing.\n    resize_side: The smallest side of the image for aspect-preserving resizing.\n    Returns:\n    A preprocessed image.\n    \"\"\"\n\n    ############################################################################\n    #             TODO: Add preprocessing done during training phase           #\n    #         Preprocessing option found in utils/preprocessing_utils.py       #\n    #                                                                          #\n    #  EX:    image = aspect_preserving_resize(image, resize_side)             #\n    #         image = central_crop([image], output_height, output_width)[0]    #\n    #         image.set_shape([output_height, output_width, 3])                #\n    #         image = tf.to_float(image)                                       #\n    #         return image                                                     #\n    ############################################################################\n```\n\npreprocess_for_eval():\n```\ndef preprocess_for_eval(image, output_height, output_width, resize_side):\n    \"\"\"Preprocesses the given image for evaluation.\n    Args:\n    image: A `Tensor` representing an image of arbitrary size.\n    output_height: The height of the image after preprocessing.\n    output_width: The width of the image after preprocessing.\n    resize_side: The smallest side of the image for aspect-preserving resizing.\n    Returns:\n    A preprocessed image.\n    \"\"\"\n\n    ############################################################################\n    #             TODO: Add preprocessing done during training phase           #\n    #         Preprocessing option found in utils/preprocessing_utils.py       #\n    #                                                                          #\n    #  EX:    image = aspect_preserving_resize(image, resize_side)             #\n    #         image = central_crop([image], output_height, output_width)[0]    #\n    #         image.set_shape([output_height, output_width, 3])                #\n    #         image = tf.to_float(image)                                       #\n    #         return image                                                     #\n    ############################################################################\n```\n\n\n\n\n\n\n### Adding a dataset\nAdding a new dataset requires that the videos converted to tfrecords and stored in a specific format. A tfrecord is simply a method of storing a video and information about the video in a binary file that is easily imported into tensorflow graphs.\n\nEach tfrecord contains a dictionary with the following information from the original video:\n\n* Label - Action class the video belongs to (type int64)\n* Data - RGB or optical flow values for the entire video (type bytes)\n* Frames - Total number of frames in the video (type int64)\n* Height - Frame height in pixels (type int64)\n* Width - Frame width in pixels (type int64)\n* Channels - Number of channels (3 for RGB) (type int64)\n* Name - Name of the video (type bytes)\n\n\nWe provide a script that converts a dataset to tfrecords using OpenCV, as long as the dataset is being stored using the correct file structure.\n```\n/dataset\n    /action_class\n        /video1.avi\n```\n\n\nAn important note is that the TFRecords for each dataset must be stored in a specific file structure, HMDB51 for example:\n```\n/tfrecords_HMDB51\n\t/Split1\n\t\t/trainlist\n\t\t\tvidName1.tfrecords\n\t\t\tvidName2.tfrecords\n\t\t/testlist\n\t\t/vallist\n\t/Split2\n\t/Split3\n```\nThis means that either before or after the videos are converted, they need to be arranged into this file structure!!!\nA vallist is not required, just a trainlist and testlist stored inside the folder 'Split1'.\nAdditionally, if only one split is desired, it still must be named 'Split1'\n\n\n\n\nYou can also manually convert your dataset to tfrecords if need be.\nThe following code snipped is an example of how to convert a single video to tfrecords given the video data in the form of a numpy array.\n```\ndef save_tfrecords(data, label, vidname, save_dir):\n    filename = os.path.join(save_dir, vidname+'.tfrecords')\n    writer = tf.python_io.TFRecordWriter(filename)\n\n    features = {}\n    features['Label'] = _int64(label)\n    features['Data'] = _bytes(np.array(data).tostring())\n    features['Frames'] = _int64(data.shape[0])\n    features['Height'] = _int64(data.shape[1])\n    features['Width'] = _int64(data.shape[2])\n    features['Channels'] = _int64(data.shape[3])\n    features['Name'] = _bytes(str(vidname))\n\n\n    example = tf.train.Example(features=tf.train.Features(feature=features))\n    writer.write(example.SerializeToString())\n    writer.close()\n\ndef _int64(value):\n    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))\n\ndef _bytes(value):\n    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))\n\n```\nA prerequisite to this is that the video must be passed in as a numpy or python array of floats/ints which can be done a number of ways. For example using OpenCV, matplotlib, or any other desired method.\n\n\n## Expected Results\n\n### Accuracies of Models\nThe install of this framework can be tested by comparing the output with these expected testing results of the various models trained on the datasets.\n\n|  Model Architecture  |      Dataset (Split 1)      |  M-PACT Accuracy (%)  |  Original Authors Accuracy (%) |  \n|:----------:|:------:| :----:| :----:|\n| I3D | HMDB51 | [68.10](#i3d-training-hmdb51) |  74.80* |\n| C3D | HMDB51 | [51.90](#c3d-training-hmdb51) | 50.30* |\n| TSN | HMDB51 | [51.70](#tsn-training-hmdb51) |  54.40 |\n| ResNet50 + LSTM |   HMDB51   | [43.86](#resnet50-lstm-training-hmdb51) |  43.90  |\n|||||\n| I3D | UCF101 |  [92.55](#i3d-training-ucf101)  |  95.60* |\n| C3D | UCF101 |  [93.66](#c3d-fine-tuning-ucf101)   |  82.30* |\n| TSN | UCF101 |  [85.25](#tsn-training-ucf101)   |  85.50 |\n| ResNet50 + LSTM |   UCF101   |  [80.20](#resnet50-lstm-training-ucf101)  |  84.30 |\n\n(*) Indicates that results are shown across all three splits\n\n### Command to Execute Model Training and Testing\n\n#### ResNet50 + LSTM Training (HMDB51)\n```\npython train.py --model resnet --inputDims 50 --outputDims 51 --dataset HMDB51 --load 0 --fName trainlist --seqLength 50 --size 224 --baseDataPath /z/dat --train 1 --numGpus 4 --expName resnet_half_loss_HMDB51 --numVids 3570 --split 1 --wd 0.0 --lr 0.001 --nEpochs 30 --saveFreq 1 --dropoutRate 0.5 --freeze 1 --lossType half_loss\n```\n#### ResNet50 + LSTM Testing (HMDB51)\n```\npython test.py --model resnet --dataset HMDB51 --loadedDataset HMDB51 --train 0 --load 1 --inputDims 50 --outputDims 51 --seqLength 50 --size 224 --expName resnet_half_loss_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --freeze 1\n```\n#### ResNet50 + LSTM Training (UCF101)\n```\npython train.py --model resnet --inputDims 50 --outputDims 101 --dataset UCF101 --load 0 --fName trainlist --seqLength 50 --size 224 --baseDataPath /z/dat --train 1 --numGpus 4 --expName resnet_half_loss_UCF101 --numVids 9537 --split 1 --wd 0.0 --lr 0.001 --nEpochs 11 --saveFreq 1 --dropoutRate 0.5 --freeze 1 --lossType half_loss\n\n```\n#### ResNet50 + LSTM Testing (UCF101)\n```\npython test.py --model resnet --dataset UCF101 --loadedDataset UCF101 --train 0 --load 1 --inputDims 50 --outputDims 101 --seqLength 50 --size 224 --expName resnet_half_loss_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --metricsMethod last_frame --fName testlist --freeze 1\n```\n\n#### I3D Training (HMDB51)\n```\npython train.py --model i3d --inputDims 64 --outputDims 51 --dataset HMDB51 --load 0 --expName i3d_HMDB51 --numVids 3570 --fName trainlist --seqLength 1 --size 224 --numGpus 4 --train 1 --split 1 --wd 0.0 --lr 0.01 --nEpochs 30 --baseDataPath /z/dat --saveFreq 1 --dropoutRate 0.5 --gradClipValue 100.0 --optChoice adam --batchSize 16\n```\n#### I3D Testing (HMDB51)\n```\npython test.py --model i3d --numGpus 1 --dataset HMDB51 --loadedDataset HMDB51 --train 0 --load 1 --inputDims 250 --outputDims 51 --seqLength 1 --size 224  --expName i3d_0_5_crop_0_5_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1 --loadedCheckpoint 837 --metricsDir checkpoint_837\n```\n* Currently best performing checkpoint - **837**\n\n#### I3D Training (UCF101)\n```\npython train.py --model i3d --inputDims 64 --outputDims 101 --dataset UCF101 --load 0 --expName i3d_UCF101 --numVids 9537 --fName trainlist --seqLength 1 --size 224 --numGpus 4 --train 1 --split 1 --wd 0.0 --lr 0.01 --nEpochs 11 --baseDataPath /z/dat --saveFreq 1 --dropoutRate 0.5 --gradClipValue 100.0 --optChoice adam --batchSize 10\n```\n#### I3D Testing (UCF101)\n```\npython test.py --model i3d --numGpus 1 --dataset UCF101 --loadedDataset UCF101 --train 0 --load 1 --inputDims 250 --outputDims 101 --seqLength 1 --size 224 --expName i3d_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1 --loadedCheckpoint 2146 --metricsDir checkpoint_2146\n```\n* Currently best performing checkpoint - **2146**\n\n\n#### C3D Training (HMDB51)\n```\npython train.py --model c3d --numGpus 4 --dataset HMDB51 --load 0 --inputDims 16 --outputDims 51 --seqLength 1 --size 112  --expName c3d_HMDB51 --numClips 5 --clipLength 16 --clipOffset random --numVids 3570 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 41 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10\n```\n#### C3D Testing (HMDB51)\n```\npython test.py --model c3d --dataset HMDB51 --loadedDataset HMDB51 --load 1 --inputDims 16 --outputDims 51 --seqLength 1 --size 112  --expName c3d_HMDB51 --numClips 1 --clipLength 16 --clipOffset random --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1\n```\n#### C3D Training (UCF101)\n(NOTE: Results not shown)\n```\npython train.py --model c3d --numGpus 4 --dataset UCF101 --load 0 --inputDims 16 --outputDims 101 --seqLength 1 --size 112  --expName c3d_sports1m_UCF101 --numClips 5 --clipLength 16 --clipOffset random --numVids 9537 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 10 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10\n\n```\n#### C3D Testing (UCF101)\n(NOTE: Results not shown)\n```\npython test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112  --expName c3d_sports1m_UCF101 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1\n```\n#### C3D Fine-tuning (UCF101)\n(NOTE: Best results are shown, 93.55% when fine-tuning on the model \"C3D UCF101 split1\" at [https://github.com/hx173149/C3D-tensorflow](https://github.com/hx173149/C3D-tensorflow))\n```\npython train.py --model c3d --numGpus 4 --dataset UCF101 --train 1 --load 0 --inputDims 16 --outputDims 101 --seqLength 1 --size 112  --expName c3d_finetune_UCF101 --numClips 5 --clipLength 16 --clipOffset random --numVids 9537 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 10 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10 --loadWeights Sports1M_finetune_UCF101\n\n```\n#### C3D Fine-tuned Testing (UCF101)\n(NOTE: Best results are shown, 93.55% when fine-tuning on the model \"C3D UCF101 split1\" at [https://github.com/hx173149/C3D-tensorflow](https://github.com/hx173149/C3D-tensorflow))\n```\npython test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112  --expName c3d_finetune_UCF101 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1\n```\n#### TSN Training (HMDB51)\n```\npython train.py --model tsn --dataset HMDB51 --loadedDataset HMDB51 --numGpus 4 --load 0 --inputDims 3 --outputDims 51 --batchSize 56 --seqLength 3 --size 224  --expName tsn_HMDB51 --numVids 3570 --lr 0.001 --wd 0.0005 --nEpochs 40 --split 1 --baseDataPath /z/dat --fName trainlist --gradClipVal 20 --optChoice momentum\n```\n#### TSN Testing (HMDB51) (NOTE: Results do not use our trained model. Uses weights from the original author.)\n```\npython test.py --model tsn --dataset HMDB51 --loadedDataset HMDB51 --load 1 --inputDims 250 --outputDims 51 --seqLength 250 --size 224  --expName tsn_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --loadWeights pretrained_HMDB51\n```\n\n#### TSN Training (UCF101)\n```\npython train.py --model tsn --dataset UCF101 --loadedDataset UCF101 --numGpus 4 --load 0 --inputDims 3 --outputDims 101 --batchSize 56 --seqLength 3 --size 224  --expName tsn_UCF101 --numVids 9537 --lr 0.001 --wd 0.0005 --nEpochs 80 --split 1 --baseDataPath /z/dat --fName trainlist --gradClipVal 20 --optChoice momentum\n```\n#### TSN Testing (UCF101) (NOTE: Results do not use our trained model. Uses weights from the original author.)\n```\npython test.py --model tsn --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 250 --outputDims 101 --seqLength 250 --size 224  --expName tsn_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist  --loadWeights pretrained_UCF101\n```\n\n\n\n\n## Version History\n\n\n### Current Version: 3.0\n\n#### Version 3.0 (GitHub Release)\nAutomated the generation of models and preprocessing files as well as importing models. Provide weights and mean files available for download. Matched authors performance of most models (C3D, TSN, ResNet50+LSTM, I3D) on UCF101 and HMDB51 datasets.\n\n#### Version 2.0\nImplemented TFRecords based data loading to replace HDF5 files for increased performance.  Training has been updated to allow models to be trained on multiple GPUs concurrently.  Parallel data loading has been incorporated using TFRecords queues to allow maximized use of available GPUs.  The tensorflow saver checkpoints have been replaced with a custom version which reads and writes models weights directly to numpy arrays.  This will allow existing model weights from other sources to be more easily imported into this framework. Currently validation is not compatible with this tfrecords framework.\n\n#### Version 1.0\nInitial release. Using pre generated HDF5 files, test LRCN model on UCF101 dataset and train ResNet and VGG16 models on HMDB51 dataset.  Tensorboard supported, single processor and single GPU implementation with the ability to cancel and resume training every 50 steps.  Documentation includes basic overview and example of training and testing commands.\n\n### Future features:\n\n* Include validation during training\n* Add training and testing on optical flow\n\n## Acknowledgements\nSupported by the Intelligence Advanced Research Projects Activity (IARPA) via\nDepartment of Interior/ Interior Business Center (DOI/IBC) contract number\nD17PC00341. The U.S. Government is authorized to reproduce and distribute\nreprints for Governmental purposes notwithstanding any copyright annotation\nthereon. Disclaimer: The views and conclusions contained herein are those of\nthe authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or\nthe U.S. Government.\nThis work was also partially supported by NIST 60NANB17D191 and ARO\nW911NF-15-1-0354.\n\n## Code Acknowledgements\nWe would like to thank the following contributors for helping shape our platform and their invaluable input in achieving current levels of performance,\n- [Kyle Min](https://sites.google.com/umich.edu/kylemin/home)\n- Nadha Gafoor\n- [A. J. Piergiovanni](https://github.com/piergiaj)\n\n\n## References\n[1] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, [*Learning Spatiotemporal Features with 3D Convolutional Networks*](https://arxiv.org/pdf/1412.0767), ICCV 2015\n\n[2] J. Carreira, A. Zisserman, [*Quo vadis, action recognition? a new model and the kinetics dataset*](http://openaccess.thecvf.com/content_cvpr_2017/papers/Carreira_Quo_Vadis_Action_CVPR_2017_paper.pdf), CVPR 2017\n\n[3] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, L. Van Gool, [*Temporal segment networks: Towards good practices for deep action recognition*](https://arxiv.org/pdf/1608.00859), ECCV 2016\n\n[4] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, [*Long-term recurrent convolutional networks for visual recognition and description*](http://openaccess.thecvf.com/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf), CVPR 2015\n\n[5] K. He, X. Zhang, S. Ren, J. Sun, [*Deep residual learning for image recognition*](http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf), CVPR 2016.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMichiganCOG%2FM-PACT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMichiganCOG%2FM-PACT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMichiganCOG%2FM-PACT/lists"}