https://github.com/madhawav/mml

Multi-faceted Video Moment Localizer
https://github.com/madhawav/mml

moment-localization pytorch research-project video

Last synced: 2 months ago
JSON representation

Multi-faceted Video Moment Localizer

Host: GitHub
URL: https://github.com/madhawav/mml
Owner: madhawav
Created: 2019-12-13T05:56:03.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-06-19T00:47:24.000Z (almost 5 years ago)
Last Synced: 2025-03-16T17:24:38.640Z (2 months ago)
Topics: moment-localization, pytorch, research-project, video
Language: Python
Homepage:
Size: 279 KB
Stars: 17
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.MD

Awesome Lists containing this project

README

        # Multi-Faceted Moment Localizer

A PyTorch implementation of [Multi-Faceted Moment Localizer](https://arxiv.org/abs/2006.10260), an improved version of [MAC](https://arxiv.org/pdf/1811.08925.pdf) with additional features for better moment localization.

Newly introduced features include:

 - Semantic segmentation features from ADE20K [MobileNetV2dilated + C1_deepsup](https://github.com/CSAILVision/semantic-segmentation-pytorch) pretrained model.

 - Video captioning features

 - [BERT](https://arxiv.org/abs/1810.04805) Sentence Embeddings

![Architecture Diagram](doc/ArchDiag.png)

## Training the model

1) Download "Object Segmentation Features" and "Video Understanding Features" (Video caption features) from the following link and extract to ./data directory.

   - Link: https://drive.google.com/drive/folders/1z46eCEicPI1HQzDmczC1uXD0HcZT2a-P?usp=sharing

2) Download [c3d visual features](https://drive.google.com/open?id=1vFxDw4AkGVgfILH-6xaHofLZ7PbWwFC2), [c3d visual activity concepts](https://drive.google.com/open?id=1biKPDmb7hbzowKLMIRSTLE0w_tWbGPAe), [ref_info](https://drive.google.com/open?id=16rFGu9rnhnH-WQeUmN7VtMgljrhGspll) provided by authors of MAC and extract to ./data directory.

3) Now, the data directory should have following sub directories: `all_fc6_unit16_overlap0.5`, `clip_object_features_test`, `clip_object_features_train`, `ref_info`, `test_softmax`, `train_softmax`, `video_understanding_features_test`, `video_understanding_features_train`.

4) Create a Python 2 Conda environmnt with `pytorch 0.4.1` and `torchvision`. Additionally, install following dependencies using pip.

   - `pip install pytorch_pretrained_bert`

   - `pip install numpy pickle`

5) Start training with `python train.py`

## Testing Pretrained Model

1) Follow steps 1-4 of "Training the model" section.

2) Download the pre-trained model and place it in `./checkpoints` directory:

   - Pre-trained model with BERT Sentence Embeddings, Glove VO Embedding, Object Segmentation Features and Video Captioning Features: [Link](https://drive.google.com/file/d/1A2NB2fyS_TTtOvaKbr6AnmKSYA5IYddr/view?usp=sharing).

3) Final Output:

    ```bash

     [dr-obj 0.50][dr-act 0.00][sr 0.005][csr: 0.005] best_R1_IOU5: 0.319

     [dr-obj 0.50][dr-act 0.00][sr 0.005][csr: 0.005] best_R5_IOU5: 0.652

    ```

## Contributors

- [Madhawa Vidanapathirana](https://github.com/madhawav)

- [Supriya Pandhre](https://github.com/supriya-pandhre)

- [Sonia Raychaudhuri](https://github.com/sonia-raychaudhuri)

- [Anjali Khurana](https://github.com/anjali-khurana28)

## Notes

This is not the code we used to identify the hyper-parameters used in the model. This a simplified version of the code released to the public.

Code related to feature extraction of Object Segmentation and Video Captioning will be released in the future.

## Acknowledgements

Code improved from PyTorch implementation of MAC available [here](https://github.com/WuJie1010/Temporally-language-grounding).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/madhawav/mml

Awesome Lists containing this project

README