Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Alvin-Zeng/Awesome-Temporal-Action-Localization

A curated list of temporal action localization/detection and related area (e.g. temporal action proposal) resources.
https://github.com/Alvin-Zeng/Awesome-Temporal-Action-Localization

List: Awesome-Temporal-Action-Localization

Last synced: 14 days ago
JSON representation

A curated list of temporal action localization/detection and related area (e.g. temporal action proposal) resources.

Awesome Lists containing this project

README

        

# Awesome Temporal Action Localization: [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
![Update](https://img.shields.io/github/last-commit/Alvin-Zeng/Awesome-Temporal-Action-Localization?color=green&label=last-updated&logo=update&style=flat-squre) [![Contributor](https://img.shields.io/static/v1?label=by&message=bolixinyu&color=blue&style=flat-squre)](https://github.com/bolixinyu)

A curated list of temporal action localization/detection and related area (e.g. temporal action proposal) resources.

## Contents
- [Temporal Action Localization](#tal)
- [Paper](#tal-paper)
- [2021](#tal-2021) - [2020](#tal-2020) - [2019](#tal-2019) - [2018](#tal-2018) - [2017](#tal-2017) - [2016](#tal-2016)
- [Dataset](#tal-data)
- [Benchmark Results](#tal-result)
- [THUMOS14](#tal-result-thumos14) - [ActivityNet v1.3](#tal-result-activitynet13)
- [Weakly Supervised Temporal Action Localization](#wstal)
- [Paper](#wstal-paper)
- [2021](#wstal-2021) - [2020](#wstal-2020) - [2019](#wstal-2019) - [2018](#wstal-2018) - [2017](#wstal-2017)
- [Dataset](#wstal-data)
- [Benchmark Results](#wstal-result)
- [THUMOS14](#wstal-thumos14) - [ActivityNet v1.3](#wstal-activitynet13) - [ActivityNet v1.2](#wstal-activitynet12)

---

Contributors:
SCUT: Runhao Zeng, Zeng You, Xinyu Sun
NPU: Le Yang

## **Temporal Action Localization**

## Papers

### 2022
- [[TALLFormer]](#12204) [**TALLFormer: Temporal Action Localization with Long-memory Transformer**](http://arxiv.org/abs/2204.01680) - Feng Cheng et al, `ECCV 2022`.[[code]](https://github.com/klauscc/TALLFormer)
- [[ActionFormer]](#12203) [**ActionFormer: Localizing Moments of Actions with Transformers**](http://arxiv.org/abs/2202.07925) - Chenlin Zhang et al, `ECCV 2022`. [[code]](https://github.com/happyharrycn/actionformer_release)
- [[RCL]](#12202) [**RCL: Recurrent Continuous Localization for Temporal Action Detection**](http://arxiv.org/abs/2103.03027) - Qiang Wang et al, `CVPR 2022`.
- [[DCAN]](#12201) [**DCAN: Improving Temporal Action Detection via Dual Context Aggregation**](https://www.aaai.org/AAAI22Papers/AAAI-7134.ChenG.pdf.pdf) - Guo Chen et al, `AAAI 2022`.
- [[TadTR]](#12108) [**End-to-end Temporal Action Detection with Transformer**](https://arxiv.org/pdf/2106.10271.pdf) - Xiaolong Liu et al, `TIP 2022`. [[code]]()

### 2021
- [[RTD-Net]](#12117) [**Relaxed Transformer Decoders for Direct Action Proposal Generation**](http://arxiv.org/abs/2103.15233) - Jing Tan et al, `ICCV 2021`. [[code]](https://github.com/MCG-NJU/RTD-Action.)
- [[LoFi]](#12116) [**Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization**](http://arxiv.org/abs/2103.15233) - Mengmeng Xu et al, `NIPS 2021`.
- [[ATAG]](#12115) [**Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation**](http://arxiv.org/abs/2103.16024) - Shuning Chang et al, `arXiv 2021`.
- [[AEI]](#12114) [**AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation**](http://arxiv.org/abs/2110.11474) - Khoa Vo et al, `BMVC 2021`.
- [[GCM]](#12113) [**Graph Convolutional Module for Temporal Action Localization in Videos**](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9459486) - Runhao Zeng et al, `TPAMI 2021`. [[code]](https://github.com/Alvin-Zeng/GCM)
- [[AVFusion]](#12106) [**Hear Me Out: Fusional Approaches for AudioAugmented Temporal Action Localization**](https://arxiv.org/pdf/2106.14118v1.pdf) - Bagchi et al, `arXiv 2021`. [[code]]()
- [[ContextLoc]](#12111) [**Enriching Local and Global Contexts for Temporal Action Localization**](https://arxiv.org/pdf/2107.12960.pdf) - Zixin Zhu et al, `ICCV 2021`.
- [[CSA]](#12110) [**Class Semantics-based Attention for Action Detection**](https://arxiv.org/pdf/2109.02613.pdf) - Deepak Sridhar et al, `ICCV 2021`.
- [[TCANet]](#12109) [**Temporal Context Aggregation Network for Temporal Action Proposal Refinement**](https://arxiv.org/pdf/2103.13141.pdf) - Zhiwu Qing et al, `CVPR 2021`.
- [[Multi-Task TAD]](#12107) [**Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations**](http://arxiv.org/abs/2103.01302) - Zhihui Li et al, `CVPR 2021`.
- [[Coarse-Fine Networks]](#12106) [**Coarse-Fine Networks for Temporal Activity Detection in Videos**](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_Three_Birds_with_One_Stone_Multi-Task_Temporal_Action_Detection_via_CVPR_2021_paper.pdf) - Kahatapitiya et al, `CVPR 2021`.
- [[AFSD]](#12105) [**Learning Salient Boundary Feature for Anchor-free Temporal Action Localization**](https://arxiv.org/abs/2103.13137) - Chuming Lin et al, `CVPR 2021`. [[code]]()
- [[MUSEs]](#12104) [**Multi-shot temporal event localization: A Benchmark**](https://arxiv.org/pdf/2012.09434.pdf) - Xiaolong Liu et al, `CVPR 2021`
- [[SALAD]](#12103) [**SALAD: Self-Assessment Learning for Action Detection**](https://openaccess.thecvf.com/content/WACV2021/html/Vaudaux-Ruth_SALAD_Self-Assessment_Learning_for_Action_Detection_WACV_2021_paper.html) - Guillaume Vaudaux-Ruth et al, `WACV 2021`
- [[RTD-Net]](#12102) [**Relaxed Transformer Decoders for Direct Action Proposal Generation**](https://arxiv.org/pdf/2102.01894) - Jing Tan et al, `arxiv 2021`. [[code]](https://github.com/MCG-NJU/RTD-Action)
- [[AGT]](#12101) [**Activity Graph Transformer for Temporal Action Localization**](https://arxiv.org/pdf/2101.08540.pdf) - Megha Nawhal et al, `arxiv 2021`

### 2020
- [[VSGN]](#12011) [**Video Self-Stitching Graph Network for Temporal Action Localization**](https://arxiv.org/pdf/2011.14598.pdf) - Chen Zhao et al, `ICCV 2021`
- [[UFA]](#12010) [**Temporal Action Detection with Multi-level Supervision**](https://arxiv.org/pdf/2011.11893.pdf) - Baifeng Shi et al, `arxiv 2020`
- [[TSP]](#12009) [**TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks**](https://arxiv.org/pdf/2011.11479) - Humam Alwassel et al, `arxiv 2020`
- [[BSP]](#12008) [**Boundary-sensitive Pre-training for Temporal Localization in Videos**](https://arxiv.org/pdf/2011.10830.pdf) - Mengmeng Xu et al, `arxiv 2020`
- [[VAN]](#12007) [**Temporal Action Localization with Variance-Aware Networks**](https://arxiv.org/pdf/2008.11254.pdf) - Ting-Ting Xie et al, `arxiv 2020`
- [[TSI]](#12006) [**TSI: Temporal Scale Invariant Network for Action Proposal Generation**](https://openaccess.thecvf.com/content/ACCV2020/html/Liu_TSI_Temporal_Scale_Invariant_Network_for_Action_Proposal_Generation_ACCV_2020_paper.html) - Shuming Liu et al, `ACCV 2020`. [[code]](https://github.com/sming256/TSI)
- [[BU-TAL]](#12005) [**Bottom-Up Temporal Action Localization with Mutual Regularization**](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123530528.pdf) - Peisen Zhao et al, `ECCV 2020`.
- [[DBG]](#12004) [**Fast Learning of Temporal Action Proposal via Dense Boundary Generator**](https://arxiv.org/pdf/1911.04127) - Chuming Lin et al, `AAAI 2020`. [[code]]()
- [[G-TAD]](#12003) [**G-TAD: Sub-Graph Localization for Temporal Action Detection**](https://arxiv.org/abs/1911.11462) - Mengmeng Xu et al, `CVPR 2020`. [[code]]()
- [[PBRNet]](#12002) [**Progressive Boundary Refinement Network for Temporal Action Detection**](https://aaai.org/Papers/AAAI/2020GB/AAAI-LiuQ.4870.pdf) - Qinying Liu et al, `AAAI 2020`.
- [[AGCN]](#12001) [**Graph Attention based Proposal 3D ConvNets for Action Detection**](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-LiJ.1424.pdf) - Jun Li et al, `AAAI 2020`.

### 2019
- [[PGCN]](#11906) [**Graph Convolutional Networks for Temporal Action Localization**](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_Graph_Convolutional_Networks_for_Temporal_Action_Localization_ICCV_2019_paper.pdf) - Runhao Zeng et al, `ICCV 2019`. [[code]]()
- [[RAM]](#11905) [**Relation Attention for Temporal Action Localization**](https://ieeexplore.ieee.org/document/8933113) - Peihao Chen et al, `TMM 2019`.
- [[BMN]](#11904) [**BMN: Boundary-Matching Network for Temporal Action Proposal Generation**](http://openaccess.thecvf.com/content_ICCV_2019/papers/Lin_BMN_Boundary-Matching_Network_for_Temporal_Action_Proposal_Generation_ICCV_2019_paper.pdf) - Tianwei Lin et al, `ICCV 2019`.
- [[GTAN]](#11903) [**Gaussian Temporal Awareness Networks for Action Localization**](https://arxiv.org/abs/1909.03877) - Fuchen Long et al, `CVPR 2019`.
- [[DBS]](#11902) [**Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos**](https://www.aaai.org/ojs/index.php/AAAI/article/view/4846) - Zhanning Gao et al, `AAAI 2019`.
- [[C-TCN]](#11901) [**Deep Concept-wise Temporal Convolutional Networks for Action Localization**](https://arxiv.org/abs/1908.09442) - Xin Li et al, `arXiv 2019`.

### 2018
- [[TAL-Net]](#11805) [**Rethinking the Faster R-CNN Architecture for Temporal Action Localization**](https://arxiv.org/abs/1804.07667) - Yuwei Chao et al, `CVPR 2018`.
- [[BSN]](#11804) [**BSN: Boundary Sensitive Network for Temporal Action Proposal Generation**](https://arxiv.org/abs/1806.02964) - Tianwei Lin et al, `ECCV 2018`. [[code]]()
- [[Action-Search]](#11803) [**Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization**](https://arxiv.org/abs/1706.04269) - Humam Alwassel et al, `ECCV 2018`. [[code]]()
- [[TPC]](#11802) [**Exploring Temporal Preservation Networks for Precise Temporal Action Localization**](https://arxiv.org/abs/1708.03280) - Ke Yang et al, `AAAI 2018`.
- [[Self-Ad]](#11801) [**A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning**](https://arxiv.org/abs/1706.07251) - Jingjia Huang et al, `AAAI 2018`.

### 2017
- [[SSN]](#11708) [**Temporal Action Detection with Structured Segment Networks**](https://arxiv.org/abs/1704.06228) - Yue Zhao et al, `ICCV 2017`. [[code]]()
- [[R-C3D]](#11707) [**R-C3D: Region Convolutional 3D Network for Temporal Activity Detection**](https://arxiv.org/abs/1703.07814) - Huijuan Xu et al, `ICCV 2017`. [[code]]()
- [[TCN]](#11706) [**Temporal Context Network for Activity Localization in Videos**](https://arxiv.org/abs/1708.02349) - Xiyang Dai et al, `ICCV 2017`.
- [[TURN]](#11705) [**TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals**](https://arxiv.org/abs/1703.06189) - Jiyang Gao et al, `ICCV 2017`. [[code]]()
- [[SST]](#11704) [**SST: Single-Stream Temporal Action Proposals**](https://ieeexplore.ieee.org/abstract/document/8100158) - Shyamal Buch et al, `ICCV 2017`.
- [[CDC]](#11703) [**CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos**](https://arxiv.org/abs/1703.01515) - Zheng Shou et al, `CVPR 2017`. [[code]]()
- [[SCC]](#11702) [**SCC: Semantic Context Cascade for Efficient Action Detection**](https://ieeexplore.ieee.org/document/8099821) - Fabian Caba Heilbron et al, `CVPR 2017`.
- [[SMS]](#11701) [**Temporal Action Localization by Structured Maximal Sums**](https://arxiv.org/abs/1704.04671) - Zehuan Yuan et al, `CVPR 2017`.

### 2016
- [[S-CNN]](#11605) [**Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs**](https://arxiv.org/abs/1601.02129) - Zheng Shou et al, `CVPR 2016`. [[code]]()
- [[PSDF]](#11604) [**Temporal Action Localization with Pyramid of Score Distribution Features**](https://ieeexplore.ieee.org/abstract/document/7780706) - Jun Yuan et al, `CVPR 2016`.
- [[FG]](#11603) [**End-to-end Learning of Action Detection from Frame Glimpses in Videos**](https://arxiv.org/abs/1511.06984) - Serena Yeung et al, `CVPR 2016`.
- [[SLM]](#11602) [**Temporal Action Detection Using a Statistical Language Model**](https://ieeexplore.ieee.org/document/7780710) - Alexander Richard et al, `CVPR 2016`.
- [[DAPs]](#11601) [**DAPs: Deep Action Proposals for Action Understanding**](https://link.springer.com/chapter/10.1007%2F978-3-319-46487-9_47) - Victor Escorcia et al, `ECCV 2016`.

## Dataset

- [THUMOS14](https://www.crcv.ucf.edu/THUMOS14/)
- [ActivityNet v1.3](http://activity-net.org/download.html)

## Benchmark Results

#### THUMOS14

| Method | Conference | IoU=0.1 | IoU=0.2 | IoU=0.3 | IoU=0.4 | IoU=0.5 | IoU=0.6 | IoU=0.7 |
| :---------------------------------------------: | :-----------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| [DAPs](#21601) | ECCV-2016 | - | - | - | - | 13.9 | - | - |
| [SLM](#21602) | CVPR-2016 | 39.7 | 35.7 | 30.0 | 23.2 | 15.2 | - | - |
| [FG](#21603) | CVPR-2016 | 48.9 | 44.0 | 36.0 | 26.4 | 17.1 | - | - |
| [SMS](#21701) | CVPR-2017 | 51.0 | 45.2 | 36.5 | 27.8 | 17.8 | - | - |
| [PSDF](#21604) | CVPR-2016 | 51.4 | 42.6 | 33.6 | 26.1 | 18.8 | - | - |
| [S-CNN](#21605) | CVPR-2016 | 47.7 | 43.5 | 36.3 | 28.7 | 19.0 | 10.3 | 5.3 |
| [SST](#21704) | ICCV-2017 | - | - | - | - | 23.0 | - | - |
| [CDC](#21703) | CVPR-2017 | - | - | 40.1 | 29.4 | 23.3 | 13.1 | 7.9 |
| [TURN](#21705) | ICCV-2017 | 54.0 | 50.9 | 44.1 | 34.9 | 25.6 | - | - |
| [TCN](#21706) | ICCV-2017 | - | - | - | 33.3 | 25.6 | 15.9 | 9.0 |
| [Self-Ad](#21801) | AAAI-2018 | - | - | - | - | 27.7 | - | - |
| [TPC](#21802) | AAAI-2018 | - | - | 44.1 | 37.1 | 28.2 | 20.6 | 12.7 |
| [R-C3D](#21707) | ICCV-2017 | 54.5 | 51.5 | 44.8 | 35.6 | 28.9 | - | - |
| [SSN](#21708) | ICCV-2017 | 66.0 | 59.4 | 51.9 | 41.0 | 29.8 | - | - |
| [Action-Search](#21803) | ECCV-2018 | - | - | 51.8 | 42.4 | 30.8 | 20.2 | 11.1 |
| [DBS](#21902) | AAAI-2019 | 56.7 | 54.7 | 50.6 | 43.1 | 34.3 | 24.4 | 14.7 |
| [BSN](#21804) | ECCV-2018 | - | - | 53.5 | 45.0 | 36.9 | 28.4 | 20.0 |
| [AGCN](#22001) | AAAI-2020 | 59.3 | 59.6 | 57.1 | 51.6 | 38.6 | 28.9 | 17.0 |
| [GTAN](#21903) | CVPR-2019 | 69.1 | 63.7 | 57.8 | 47.2 | 38.8 | - | - |
| [BMN](#21904) | ICCV-2019 | - | - | 56.0 | 47.4 | 38.8 | 29.7 | 20.5 |
| [DBG](#22004) | AAAI-2020 | - | - | 57.8 | 49.4 | 39.8 | 30.2 | 21.7 |
| [TSI](#22006) | ACCV-2020 | - | - | 61.0 | 52.1 | 42.6 | 33.2 | 22.4 |
| [TAL-Net](#21805) | CVPR-2018 | 59.8 | 57.1 | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 |
| [RAM](#21905) | TMM-2019 | 65.4 | 63.1 | 58.8 | 52.7 | 43.7 | - | - |
| [TCANet](#22109) | CVPR-2021 | - | - | 60.6 | 53.2 | 44.6 | 36.8 | 26.7 |
| [SALAD](#22103) | WACV-2021 | 73.3 | 70.7 | 65.7 | 57.0 | 44.6 | - | - |
| [AEI](#22114) | BMVC-2021 | - | - | 58.7 | 52.7 | 44.7 | 35.9 | 23.4 |
| [RTD-Net](#22114) | ICCV-2021 | - | - | 58.5 | 53.1 | 45.1 | 36.4 | 25.0 |
| [BU-TAL](#22005) | ECCV-2020 | - | - | 53.9 | 50.7 | 45.4 | 38.0 | 28.5 |
| [PGCN](#21906) | ICCV-2019 | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | - | - |
| [CSA](#22110) | ICCV-2021 | - | - | 64.4 | 58.0 | 49.2 | 38.2 | 27.8 |
| [PBRNet](#22002) | AAAI-2020 | - | - | 58.5 | 54.6 | 51.3 | 41.8 | 29.5 |
| [G-TAD](#22003) | CVPR-2020 | - | - | 66.4 | 60.4 | 51.6 | 37.6 | 22.9 |
| [GCM](#22113) | TPAMI-2021 | 72.5 | 70.9 | 66.5 | 60.8 | 51.9 | - | - |
| [VSGN](#22011) | ICCV-2021 | - | - | 66.7 | 60.4 | 52.4 | 41.0 | 30.4 |
| [RCL](#22202) | CVPR-2022 | - | - | 70.1 | 62.3 | 52.9 | 42.7 | 30.7 |
| [DCAN](#22201) | AAAI-2022 | - | - | 68.2 | 62.7 | 54.1 | 43.9 | 32.6 |
| [ContextLoc](#22110) | ICCV-2021 | - | - | 68.3 | 63.8 | 54.3 | 41.8 | 26.2 |
| [Multi-Task TAD](#22107)| CVPR-2021 | - | - | 63.2 | 58.5 | 54.8 | 44.3 | 32.4 |
| [AFSD](#22105) | CVPR-2021 | - | - | 67.3 | 62.4 | 55.5 | 43.7 | 31.1 |
| [MUSES](#22104) | CVPR-2021 | - | - | 68.9 | 64.0 | 56.9 | 46.3 | 31.0 |
| [TALLFormer](#22204) | ECCV-2022 | - | - | 68.4 | - | 57.6 | - | 30.8 |
| [TadTR](#22108) | TIP-2022 | - | - | 74.8 | 69.1 | 60.1 | 46.6 | 32.8 |
| [ActionFormer](#22203) | ECCV-2022 | - | - | 82.1 | 77.8 | 71.0 | 59.4 | 43.9 |

| Method | Conference | IoU=0.1 | IoU=0.2 | IoU=0.3 | IoU=0.4 | IoU=0.5 | IoU=0.6 | IoU=0.7 |
| :---------------------------------------------: | :-----------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| [UFA](#22010) | arXiv | - | - | 45.6 | 36.4 | 26.2 | 15.5 | 7.1 |
| [VAN](#22008) | arXiv | - | - | 55.0 | 48.6 | 39.2 | 26.9 | 15.0 |
| [ATAG](#22115) | arXiv | - | - | 62.0 | 53.1 | 47.3 | 38.0 | 28.0 |
| [AGT](#22101) | arXiv | 72.1 | 69.8 | 65.0 | 58.1 | 50.2 | - | - |
| [RTD-Net](#22102) | arXiv | - | - | 68.3 | 62.3 | 51.9 | 38.8 | 23.7 |
| [C-TCN](#21901) | arXiv | 72.2 | 71.4 | 68.0 | 62.3 | 52.1 | - | - |
| [TSP](#22009) | arXiv | - | - | 69.1 | 63.3 | 53.5 | 40.4 | 26.0 |
| [AVFusion](#22106) | arXiv | - | - | 70.2 | 65.0 | 57.2 | 45.4 | 28.9 |

#### ActivityNet v1.3

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | Avg |
| :--------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [R-C3D](#21707) | ICCV-2017 | 26.8 | - | - | - |
| [AGCN](#22001) | AAAI-2020 | 30.4 | - | - | - |
| [SCC](#21702) | CVPR-2017 | 39.9 | 18.7 | 4.7 | 19.3 |
| [TAL-Net](#21805) | CVPR-2018 | 38.23 | 18.30 | 1.30 | 20.22 |
| [RAM](#21905) | TMM-2019 | 36.99 | 23.10 | 3.34 | 23.03 |
| [TCN](#21706) | ICCV-2017 | 37.49 | 23.47 | 4.47 | 23.58 |
| [CDC](#21703) | CVPR-2017 | 45.3 | 26.0 | 0.2 | 23.8 |
| [DBS](#21902) | CVPR-2019 | 43.2 | 25.8 | 6.1 | 26.1 |
| [PGCN](#21906) | ICCV-2019 | 42.90 | 28.14 | 2.47 | 26.99 |
| [SSN](#21708) | ICCV-2017 | 43.26 | 28.70 | 5.63 | 28.28 |
| [BU-TAL](#22005) | ECCV-2020 | 43.47 | 33.91 | 9.21 | 30.12 |
| [BSN](#21804) | ECCV-2018 | 46.45 | 29.96 | 8.02 | 30.03 |
| [RTD-Net](#22117) | ICCV-2021 | 47.21 | 30.68 | 8.61 | 30.83 |
| [SALAD](#22103) | WACV-2021 | 51.72 | 31.21 | 3.33 | 31.02 |
| [BMN](#21904) | ICCV-2019 | 50.07 | 34.78 | 8.29 | 33.85 |
| [MUSES](#22104) | CVPR-2021 | 50.02 | 34.97 | 6.57 | 33.99 |
| [G-TAD](#22003) | CVPR-2020 | 50.36 | 34.60 | 9.02 | 34.09 |
| [TSI](#22006) | ACCV-2020 | 51.18 | 35.02 | 6.59 | 34.15 |
| [ContextLoc](#22111) | ICCV-2021 | 56.01 | 35.19 | 3.55 | 34.23 |
| [GCM](#22113) | TPAMI-2021 | 51.03 | 35.17 | 7.44 | 34.24 |
| [LoFi](#22116) | NIPS-2021 | 50.68 | 35.16 | 8.16 | 34.49 |
| [GTAN](#21903) | CVPR-2019 | 52.61 | 34.14 | 8.91 | 34.31 |
| [RCL](#22202) | CVPR-2022 | 51.74 | 35.27 | 8.03 | 34.39 |
| [AFSD](#22105) | CVPR-2021 | 52.38 | 35.27 | 6.47 | 34.39 |
| [AEI](#22114) | BMVC-2021 | 52.3 | 34.5 | 9.7 | 34.7 |
| [PBRNet](#22002) | AAAI-2020 | 53.96 | 34.97 | 8.98 | 35.01 |
| [Multi-Task TAD](#22107)| CVPR-2021 | 57.8 | 37.6 | 9.6 | 35.0 |
| [DCAN](#22201) | AAAI-2021 | 51.78 | 35.98 | 9.45 | 35.39 |
| [TCANet](#22109) | CVPR-2021 | 52.27 | 36.73 | 6.86 | 35.52 |
| [CSA](#22110) | ICCV-2021 | 51.88 | 36.88 | 8.74 | 35.69 |

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | IoU=Avg |
| :--------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [RTD-Net](#22102) | arXiv | 46.4 | 30.4 | 8.6 | 30.5 |
| [C-TCN](#21901) | arXiv | 47.6 | 31.9 | 6.2 | 31.1 |
| [TadTR](#22108) | arXiv | 47.57 | 31.65 | 7.98 | 31.32 |
| [BSP](#22008) | arXiv | 50.1 | 34.7 | 7.9 | 34.0 |
| [ATAG](#22115) | arXiv | 50.92 | 35.35 | 9.71 | 34.68 |
| [VSGN](#22011) | arXiv | 52.4 | 36.0 | 8.4 | 35.1 |
| [ActionFormer](#22203) | arXiv | 53.5 | 36.2 | 7.7 | 35.6 |
| [TALLFormer](#22204) | arXiv | 54.1 | 36.2 | 7.9 | 35.6 |
| [TSP](#22009) | arXiv | 51.3 | 37.2 | 9.3 | 35.8 |
| [AVFusion](#22112) | arXiv | 52.73 | 37.78 | 9.39 | 36.63 |

---

## **Weakly Supervised Temporal Action Localization**

## Paper

### 2021
- [[BackTAL]](#42111) [**Background-Click Supervision for Temporal Action Localization**](https://arxiv.org/pdf/2111.12449.pdf) - Le Yang et al, `TPAMI 2021`. [[code]](https://github.com/VividLe/BackTAL)
- [[ACSNet]](#32110) [**ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization**](https://arxiv.org/pdf/2103.15088.pdf) - Ziyi Liu et al, `AAAI 2021`.
- [[AMS]](#32109) [**Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization**](https://arxiv.org/pdf/2104.02357.pdf) - Chen Ju et al, `arXiv 2021`.
- [[AUMN]](#32108) [**Action Unit Memory Network for Weakly Supervised Temporal Action Localization**](https://dl.acm.org/doi/pdf/10.1145/3474085.3475261) - Wang Luo et al, `CVPR 2021`.
- [[CSCL]](#32107) [**Weakly-Supervised Temporal Action Localization via Cross-Stream Collaborative Learning**](https://dl.acm.org/doi/pdf/10.1145/3474085.3475261) - Yuan Ji et al, `ACM MM 2021`.
- [[RefineLoc]](#32106) [**RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization**](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9423165) - Alejandro Pardo et al, `WACV 2021`. [[code]](https://github.com/HumamAlwassel/RefineLoc)
- [[UM-Net]](#32105) [**Weakly-supervised Temporal Action Localization by Uncertainty Modeling**](https://ojs.aaai.org/index.php/AAAI/article/download/16280/16087) - Pilhyeon Lee et al, `AAAI 2021`.
- [[CoLA]](#32104) [**CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning**](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9578494&tag=1) - Can Zhang et al, `CVPR 2021`.
- [[ActShufNet]](#32103) [**Action Shuffling for Weakly Supervised Temporal Localization**](https://arxiv.org/pdf/2105.04208.pdf) - Xiao-Yu Zhang et al, `arXiv 2021`.
- [[$\mathrm{CO_2-Net}$]](#32102) [**Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization**](https://dl.acm.org/doi/pdf/10.1145/3474085.3475298) - Fa-Ting Hong et al, `ACM MM 2021`.
- [[HAM-Net]](#32101) [**A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization**](https://arxiv.org/pdf/2101.00545) - Ashraful Islam et al, `AAAI 2021`. [[code]](https://github.com/asrafulashiq/hamnet)

### 2020
- [[ECM]](#32013) [**Equivalent Classification Mapping for Weakly Supervised Temporal Action Localization**](https://arxiv.org/pdf/2008.07728.pdf) - Tao Zhao et al, `arxiv 2020`
- [[TCA]](#32012) [**Learning Temporal Co-Attention Models for Unsupervised Video Action Localization**](https://openaccess.thecvf.com/content_CVPR_2020/html/Gong_Learning_Temporal_Co-Attention_Models_for_Unsupervised_Video_Action_Localization_CVPR_2020_paper.html) - Guoqiang Gong et al, `CVPR 2020`
- [[EM-MIL]](#32011) [**Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance**](https://arxiv.org/abs/2004.00163) - Zhekun Luo et al, `ECCV 2020`.
- [[SF-Net]](#32010) [**SF-Net: Single-Frame Supervision for Temporal Action Localization**](https://arxiv.org/abs/2003.06845) - Fan Ma et al, `ECCV 2020`. [[code]]()
- [[A2CL-PT]](#32009) [**Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization**](https://arxiv.org/abs/2007.06643) - Kyle Min et al, `ECCV 2020`.
- [[TSCN]](#32008) [**Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization**](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510035.pdf) - Yuanhao Zhai et al, `ECCV 2020`.
- [[ActionBytes]](#32007) [**ActionBytes: Learning from Trimmed Videos to Localize Actions**](https://openaccess.thecvf.com/content_CVPR_2020/papers/Jain_ActionBytes_Learning_From_Trimmed_Videos_to_Localize_Actions_CVPR_2020_paper.pdf) - Mihir Jain et al, `CVPR 2020`.
- [[DGAM]](#32006) [**Weakly-Supervised Action Localization by Generative Attention Modeling**](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_Weakly-Supervised_Action_Localization_by_Generative_Attention_Modeling_CVPR_2020_paper.pdf) - Baifeng Shi et al, `CVPR 2020`.
- [[RPN]](#32005) [**Relational Prototypical Network for Weakly Supervised Temporal Action Localization**](https://ojs.aaai.org//index.php/AAAI/article/view/6760) - Linjiang Huang et al, `AAAI 2020`.
- [[BaSNet]](#32004) [**Background Suppression Network for Weakly-supervised Temporal Action Localization**](https://arxiv.org/abs/1911.09963) - Pilhyeon Lee et al, `AAAI 2020`.
- [[DML]](#32003) [**Weakly Supervised Temporal Action Localization Using Deep Metric Learning**](https://arxiv.org/abs/2001.07793) - Ashraful Islam et al, `WACV 2020`.
- [[MCASL]](#32002) [**Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks**](https://arxiv.org/abs/2002.01449) - Maheen Rashid et al, `WACV 2020`.
- [[WSGN]](#32001) [**Weakly Supervised Gaussian Networks for Action Detection**](https://arxiv.org/abs/1904.07774) - Basura Fernando et al, `WACV 2020`.

### 2019
- [[MAAN]](#31908) [**Marginalized Average Attentional Network for Weakly Supervised Learning**](https://openreview.net/pdf?id=HkljioCcFQ) - Yuan Yuan et al, `ICLR 2019`.
- [[IWO-Net]](#31907) [**Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization**](https://ieeexplore.ieee.org/document/8737877) - Runhao Zeng et al, `TIP 2019`.
- [[3C-Net]](#31906) [**3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization**](https://ieeexplore.ieee.org/document/8737877) - Sanath Narayan et al, `TIP 2019`. [[code]]()
- [[BM]](#31905) [**Weakly-supervised Action Localization with Background Modeling**](http://openaccess.thecvf.com/content_ICCV_2019/papers/Nguyen_Weakly-Supervised_Action_Localization_With_Background_Modeling_ICCV_2019_paper.pdf) - Phuc Xuan Nguyen et al, `ICCV 2019`.
- [[TSM]](#31904) [**Temporal Structure Mining for Weakly Supervised Action Detection**](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Temporal_Structure_Mining_for_Weakly_Supervised_Action_Detection_ICCV_2019_paper.pdf) - Tan Yu et al, `ICCV 2019`.
- [[CleanNet]](#31903) [**Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks**](https://openaccess.thecvf.com/content_ICCV_2019/html/Liu_Weakly_Supervised_Temporal_Action_Localization_Through_Contrast_Based_Evaluation_Networks_ICCV_2019_paper.html) - Ziyi Liu et al, `ICCV 2019`.
- [[CMCS]](#31902) [**Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization**](http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_Completeness_Modeling_and_Context_Separation_for_Weakly_Supervised_Temporal_Action_CVPR_2019_paper.html) - Daochang Liu et al, `CVPR 2019`.
- [[STAR]](#31901) [**Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection**](https://arxiv.org/abs/1811.07460) - Yunlu Xu et al, `AAAI 2019`.

### 2018
- [[W-TALC]](#31804) [**W-TALC: Weakly-supervised Temporal Activity Localization and Classification**](https://arxiv.org/abs/1807.10418) - Sujoy Paul et al, `ECCV 2018`. [[code]]()
- [[AutoLoc]](#31803) [**AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos**](https://arxiv.org/abs/1807.08333) - Zheng Shou et al, `ECCV 2018`. [[code]]()
- [[STPN]](#31802) [**Weakly Supervised Action Localization by Sparse Temporal Pooling Network**](https://arxiv.org/abs/1712.05080) - Phuc Nguyen et al, `CVPR 2018`.
- [[One-Shot]](#31801) [**One-Shot Action Localization by Learning Sequence Matching Network**](https://ieeexplore.ieee.org/document/8578255) - Hongtao Yang et al, `CVPR 2018`.

### 2017
- [[UNet]](#31702) [**UntrimmedNets for Weakly Supervised Action Recognition and Detection**](https://arxiv.org/abs/1703.03329) - Limin Wang et al, `CVPR 2017`. [[code]]()
- [[H&S]](#31701) [**Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization**](https://arxiv.org/abs/1704.04232) - Krishna Kumar Singh et al, `CVPR 2017`.

## Dataset

- [THUMOS14](https://www.crcv.ucf.edu/THUMOS14/)
- [ActivityNet v1.3](http://activity-net.org/download.html)

## Benchmark Results

#### THUMOS14

| Method | Conference | IoU=0.1 | IoU=0.2 | IoU=0.3 | IoU=0.4 | IoU=0.5 | IoU=0.6 | IoU=0.7 |
| :--------------------------------------------: | :-----------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| [H&S](#41701) | ICCV-2017 | 36.44 | 27.84 | 19.49 | 12.66 | 6.84 | - | - |
| [UNet](#41702) | CVPR-2017 | 44.4 | 37.7 | 28.2 | 21.1 | 13.7 | - | - |
| [One-Shot](#41801) | CVPR-2018 | - | - | - | - | 14.7 | - | - |
| [STPN](#41802) | CVPR-2018 | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 |
| [MAAN](#41908) | ICLR-2019 | 59.8 | 50.8 | 41.1 | 30.6 | 20.3 | 12.0 | 6.9 |
| [IWO-Net](#41907) | TIP-2019 | 57.6 | 48.9 | 38.9 | 29.3 | 20.5 | - | - |
| [WSGN](#42001) | WACV-2020 | 55.3 | 47.6 | 38.9 | 30.0 | 21.1 | - | - |
| [AutoLoc](#41803) | ECCV-2018 | - | - | 35.8 | 29.0 | 21.2 | 13.4 | 5.8 |
| [W-TAL](#41804) | ECCV-2018 | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | - | 7.6 |
| [STAR](#41901) | AAAI-2019 | 68.8 | 60.0 | 48.7 | 34.7 | 23.0 | - | - |
| [CMCS](#42106) | WACV-2021 | - | - | 40.8 | 32.7 | 23.1 | 13.3 | 5.3 |
| [CMCS](#41902) | CVPR-2019 | 57.4 | 50.8 | 41.2 | 32.1 | 23.1 | 15.0 | 7.0 |
| [CleanNet](#41903) | ICCV-2019 | - | - | 44.4 | 36.3 | 27.1 | 17.3 | 7.3 |
| [TSM](#41904) | ICCV-2019 | - | - | 39.5 | - | 24.5 | - | 7.1 |
| [MCASL](#42002) | WACV-2020 | 63.7 | 56.9 | 47.3 | 36.4 | 26.1 | - | - |
| [3C-Net](#41906) | ICCV-2019 | 59.1 | 53.5 | 44.2 | 34.1 | 26.6 | - | 8.1 |
| [BM](#41905) | ICCV-2019 | 60.4 | 56.0 | 46.6 | 37.5 | 26.8 | 17.6 | 9.0 |
| [BaSNet](#42004) | AAAI-2020 | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 |
| [RPN](#42005) | AAAI-2020 | 62.3 | 57.0 | 48.2 | 37.2 | 27.9 | 16.7 | 8.1 |
| [TSCN](#42008) | ECCV-2020 | 63.4 | 57.6 | 47.8 | 37.7 | 28.7 | 19.4 | 10.2 |
| [DGAM](#42006) | CVPR-2020 | 60.0 | 54.2 | 46.8 | 38.2 | 28.8 | 19.8 | 11.5 |
| [ActionBytes](#42007) | CVPR-2020 | - | - | 43.0 | 35.8 | 29.0 | - | 9.5 |
| [SF-Net](#42010) | ECCV-2020 | 71.0 | 63.4 | 53.2 | 40.7 | 29.3 | 18.4 | 9.6 |
| [DML](#42003) | AAAI-2020 | 62.3 | - | 46.8 | - | 29.6 | - | 9.7 |
| [A2CL-PT](#42009) | ECCV-2020 | 61.2 | 56.1 | 48.1 | 39.0 | 30.1 | 19.2 | 10.6 |
| [TCA](#42012) | CVPR-2020 | - | - | 46.9 | 38.9 | 30.1 | 19.8 | 10.4 |
| [EM-MIL](#42011) | ECCV-2020 | 59.1 | 52.7 | 45.5 | 36.8 | 30.5 | 22.7 | 16.4 |
| [HAM-Net](#42101) | AAAI-2021 | 65.4 | 59.0 | 50.3 | 41.1 | 31.0 | 20.7 | 11.2 |
| [CoLA](#42104) | CVPR-2021 | 66.2 | 59.5 | 51.5 | 41.9 | 32.2 | 22.0 | 13.1 |
| [ACSNet](#42110) | AAAI-2021 | - | - | 51.4 | 42.7 | 32.4 | 22.0 | 11.7 |
| [AUMN](#42108) | CVPR-2021 | 66.2 | 61.9 | 54.9 | 44.4 | 33.3 | 20.5 | 9.0 |
| [CSCL](#42107) | ACM MM-2021 | 68.0 | 61.8 | 52.7 | 43.3 | 33.4 | 21.8 | 12.3 |
| [UM-Net](#42105) | AAAI-2021 | 67.5 | 61.2 | 52.3 | 43.4 | 33.7 | 22.9 | 12.1 |
| [BackTAL](#42111) | TPAMI-2021 | - | - | 54.4 | 45.5 | 36.3 | 26.2 | 14.8 |
| [$\mathrm{CO_2-Net}$](#42102) | ACM MM-2021 | 70.1 | 63.6 | 54.5 | 45.7 | 38.3 | 26.4 | 13.4 |

| Method | Conference | IoU=0.1 | IoU=0.2 | IoU=0.3 | IoU=0.4 | IoU=0.5 | IoU=0.6 | IoU=0.7 |
| :--------------------------------------------: | :-----------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| [ECM](#42013) | arXiv | 62.6 | 55.1 | 46.5 | 38.2 | 29.1 | 19.5 | 10.9 |
| [ActShufNet](#42103) | arXiv | 63.44 | 57.92 | 48.46 | 40.01 | 31.12 | 22.01 | 11.26 |
| [AMS](#42109) | arXiv | 69.1 | 62.3 | 52.7 | 42.8 | 33.1 | 23.1 | 13.0 |

#### ActivityNet v1.3

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | IoU=Avg |
| :-------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [STPN](#41802) | CVPR-2018 | 29.3 | 16.9 | 2.6 | 20.07 |
| [IWO-Net](#41907) | TIP-2019 | 29.8 | 17.6 | 4.7 | - |
| [TSM](#41904) | ICCV-2019 | 30.3 | 19.0 | 4.5 | - |
| [STAR](#41901) | AAAI-2019 | 31.1 | 18.8 | 4.7 | - |
| [CMCS](#41902) | CVPR-2019 | 34.0 | 20.9 | 5.7 | 21.2 |
| [CleanNet](#41903) | ICCV-2019 | 36.7 | 20.4 | 4.5 | 21.4 |
| [TSCN](#42008) | ECCV-2020 | 35.3 | 21.4 | 5.3 | 21.7 |
| [BaSNet](#42004) | AAAI-2019 | 34.5 | 22.5 | 4.9 | 22.2 |
| [MAAN](#41908) | ICLR-2019 | 33.7 | 21.9 | 5.5 | - |
| [BM](#41905) | ICCV-2019 | 36.4 | 19.2 | 2.9 | - |
| [A2CL-PT](#42009) | ECCV-2020 | 36.8 | 22.0 | 5.2 | 22.5 |
| [AUMN](#42108) | CVPR-2021 | 38.3 | 23.5 | 5.2 | 23.5 |
| [UM-Net](#42105) | AAAI-2021 | 37.0 | 23.9 | 5.7 | 23.7 |

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | IoU=Avg |
| :-------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [ECM](#42013) | arxiv | 36.7 | 23.6 | 5.9 | 23.5 |
| [ActShufNet](#42103) | arxiv | 36.3 | 23.5 | 5.8 | 23.6 |

#### ActivityNet v1.2

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | IoU=Avg |
| :-------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [UNet](#41701) | CVPR-2017 | 7.4 | 3.2 | 0.7 | - |
| [AutoLoc](#41803) | ECCV-2018 | 27.3 | 15.1 | 3.3 | - |
| [TSM](#41904) | ICCV-2019 | 28.3 | 17.0 | 3.5 | - |
| [MCASL](#42002) | AAAI-2020 | 29.4 | - | - | - |
| [STAR](#41901) | AAAI-2019 | 31.1 | 18.8 | 4.7 | - |
| [DML](#42003) | AAAI-2020 | 35.2 | - | - | - |
| [W-TALC](#41804) | ECCV-2018 | 37.0 | - | - | 18.0 |
| [3C-Net](#41906) | ICCV-2019 | 37.2 | - | - | - |
| [CMCS](#41902) | CVPR-2019 | 36.8 | 22.0 | 5.6 | 22.4 |
| [RefineLoc](#42106) | WACV-2021 | 38.7 | 22.6 | 5.5 | 23.2 |
| [RPN](#42005) | AAAI-2020 | 37.6 | 23.9 | 5.4 | 23.3 |
| [CleanNet](#41903) | ICCV-2019 | 40.5 | 22.3 | 5.2 | 23.4 |
| [TSCN](#42008) | ECCV-2020 | 37.6 | 23.7 | 5.7 | 23.6 |
| [ACSNet](#42110) | AAAI-2021 | 36.3 | 24.2 | 5.8 | 23.9 |
| [BaSNet](#42004) | AAAI-2020 | 38.5 | 24.2 | 5.6 | 24.3 |
| [ActionBytes](#42007) | CVPR-2020 | 39.4 | - | - | - |
| [EM-MIL](#42011) | ECCV-2020 | 37.4 | - | - | - |
| [TCA](#42012) | CVPR-2020 | 40.0 | 25.0 | 4.6 | 24.6 |
| [HAM-Net](#42101) | AAAI-2021 | 41.0 | 24.8 | 5.3 | 25.1 |
| [AUMN](#42108) | CVPR-2021 | 42.0 | 25.0 | 5.6 | 25.5 |
| [UM-Net](#42105) | AAAI-2021 | 41.2 | 25.6 | 6.0 | 25.9 |
| [CoLA](#42104) | CVPR-2021 | 42.7 | 25.7 | 5.8 | 26.1 |
| [$\mathrm{CO_2-Net}$](#42102) | ACM MM-2021 | 43.3 | 26.3 | 5.2 | 26.4 |
| [CSCL](#42107) | ACM MM-2021 | 43.8 | 26.9 | 5.6 | 26.9 |
| [BackTAL](#42111) | TPAMI-2021 | 41.5 | 27.3 | 4.7 | 27.0 |

| Method | Conference | IoU=0.5 | IoU=0.75 | IoU=0.95 | IoU=Avg |
| :-------------------: | :-----------: | :---------: | :----------: | :----------: | :---------: |
| [AMS](#42109) | arxiv | 40.7 | 23.7 | 5.8 | 24.6 |
| [ActShufNet](#42103) | arxiv | 41.2 | 24.9 | 5.9 | 25.0 |