https://github.com/DirtyHarryLYL/HOI-Learning-List
A list of Human-Object Interaction Learning.
https://github.com/DirtyHarryLYL/HOI-Learning-List
action-recognition activity-recognition behavior-analysis human-object-interaction
Last synced: 28 days ago
JSON representation
A list of Human-Object Interaction Learning.
- Host: GitHub
- URL: https://github.com/DirtyHarryLYL/HOI-Learning-List
- Owner: DirtyHarryLYL
- Created: 2020-03-11T16:51:25.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-01-30T10:45:30.000Z (3 months ago)
- Last Synced: 2025-03-09T07:25:35.252Z (about 1 month ago)
- Topics: action-recognition, activity-recognition, behavior-analysis, human-object-interaction
- Homepage:
- Size: 327 KB
- Stars: 613
- Watchers: 37
- Forks: 58
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesomeai - Human-Object Interaction(HOI)
- awesome-ai-awesomeness - Human-Object Interaction(HOI)
README
# HOI-Learning-List
Some recent (2015-now) Human-Object Interaction Learning studies. If you find any errors or problems, please don't hesitate to let me know.A list of Transfomer-based vision works https://github.com/DirtyHarryLYL/Transformer-in-Vision.
## Image Dataset/Benchmark
- BRIGHT (arXiv 2025.1), re-balanced dataset based on HICO-DET [[Paper]](https://arxiv.org/pdf/2501.16724)
- CAL (arXiv 2024.1), a contact-driven affordance learning dataset [[Paper]](https://arxiv.org/pdf/2410.11363), [[Project]](https://github.com/lhc1224/VCR-Net)
- SynHOI (arXiv 2023.5), synthetic HOI data [[Paper]](https://arxiv.org/pdf/2305.12252.pdf)
- HICO-DET-SG, V-COCO-SG (new splits of HICO-DET and V-COCO) [[Paper]](https://arxiv.org/pdf/2305.09948.pdf), [[Code]](https://github.com/FujitsuResearch/hoi_sg)
- Bongard-HOI [[Paper]](https://arxiv.org/pdf/2205.13803.pdf), [[Code]](https://github.com/NVlabs/Bongard-HOI)
- SWiG-HOI, [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Discovering_Human_Interactions_With_Large-Vocabulary_Objects_via_Query_and_Multi-Scale_ICCV_2021_paper.pdf), [[Website]](https://github.com/scwangdyd/large_vocabulary_hoi_detection)
- New Metric: mPD, [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)
- DIABOLO [[Paper]](https://arxiv.org/pdf/2201.02396.pdf), [[Website]](https://kalisteo.cea.fr/)
- HOI-COCO (CVPR2021) [[Website]](https://github.com/zhihou7/HOI-CL)
- PaStaNet-HOI (TPAMI2021) [[Benchmark]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/master/PaStaNet-HOI_Benckmark)
- HAKE (CVPR2020) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s) [[Website]](http://hake-mvig.cn/home/) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[HAKE-Action-Torch]](https://github.com/DirtyHarryLYL/HAKE-Action-Torch) [[HAKE-Action-TF]](https://github.com/DirtyHarryLYL/HAKE-Action)
- Ambiguous-HOI (CVPR2020) [[Website]](https://github.com/DirtyHarryLYL/DJ-RN) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf)
- HICO-DET (WACV2018) [[Website]](http://www-personal.umich.edu/~ywchao/hico/) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf)
- HCVRD (AAAI2018) [[Website]](https://bitbucket.org/jingruixiaozhuang/hcvrd-a-benchmark-for-large-scale-human-centered-visual/src/master/) [[Paper]](https://pdfs.semanticscholar.org/c94f/1aaf62f87d97dd579cb6451cb9149fb4967d.pdf)
- V-COCO (May 2015) [[Website]](https://github.com/s-gupta/v-coco) [[Paper]](https://arxiv.org/pdf/1505.04474.pdf)
- HICO (ICCV2015) [[Website]](http://www-personal.umich.edu/~ywchao/hico/) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_iccv2015.pdf)
- OpenImage [[Website]](https://visualgenome.org/) [[Paper]](https://arxiv.org/abs/1602.07332)
- PIC [[Website]](http://picdataset.com/challenge/index/)
More...
### Video HOI Datasets
- MOMA [[Paper]](https://proceedings.neurips.cc/paper/2021/file/95688ba636a4720a85b3634acfec8cdd-Paper.pdf), [[Project]](https://github.com/StanfordVL/moma)
- MPHOI-72 RGB-D, [[Paper]](https://arxiv.org/pdf/2207.09425.pdf), [[Project]](https://github.com/tanqiu98/2G-GCN)
- VidHOI [[Paper]](https://arxiv.org/pdf/2105.11731.pdf), [[Project]](https://github.com/coldmanck/VidHOI)
- AVA [[Website]](http://research.google.com/ava/), HOIs (human-object, human-human), and pose (body motion) actions
- Action Genome [[Website]](https://www.actiongenome.org/), action verbs and spatial relationships
- CAD120 [[Paper]](https://arxiv.org/pdf/1210.1207.pdf), [[Website]](http://pr.cs.cornell.edu/humanactivities/)
- Sth-else [[Paper]](https://arxiv.org/abs/1912.09930), [[Website]](https://github.com/joaanna/something_else)
### 3D HOI Datasets
- ParaHome, [[Paper]](https://arxiv.org/pdf/2401.10232.pdf), [[Project]](https://jlogkim.github.io/parahome/)
- BallPlay, [[Paper]](https://arxiv.org/pdf/2312.04393.pdf), [[Project]](https://wyhuai.github.io/physhoi-page/)
- COINS, [[Paper]](https://drive.google.com/file/d/1LpJe1RiDsB49tQwUFWMUorjBTQfvXzvW/view?usp=sharing), [[Project]](https://zkf1997.github.io/COINS/index.html)
- COUCH, [[Paper]](https://arxiv.org/pdf/2205.00541.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/couch/)
- HULC, [[Paper]](https://vcai.mpi-inf.mpg.de/projects/HULC/data/paper_light.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/HULC/)
- CHAIRS, [[Paper]](https://arxiv.org/pdf/2212.10621.pdf), [[Project]](https://jnnan.github.io/project/chairs/)
- GRAB, [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490562.pdf), [[Project]](https://grab.is.tue.mpg.de/)
- HUMANISE, [[Paper]](https://silvester.wang/HUMANISE/paper.pdf), [[Project]](https://silvester.wang/HUMANISE/)
- BEHAVE, [[Paper]](https://arxiv.org/pdf/2204.06950.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/behave/)
- GraviCap, [[Paper]](https://arxiv.org/pdf/2108.08844.pdf), [[Project]](https://4dqv.mpi-inf.mpg.de/GraviCap/)
## Survey
- Human object interaction detection: Design and survey (Image and Vision Computing 2022), [[Paper]](https://www.sciencedirect.com/science/article/abs/pii/S0262885622002463)
## Method
### HOI Image Generation
- AnchorCrafter (arXiv 2024) [[Paper]](https://arxiv.org/pdf/2411.17383), [[Project]](https://cangcz.github.io/Anchor-Crafter/)
- ReCorD (ACM MM 2024) [[Paper]](https://arxiv.org/pdf/2407.17911), [[Project]](https://alberthkyhky.github.io/ReCorD/)
- InteractDiffusion (CVPR 2024) [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/html/Hoe_InteractDiffusion_Interaction_Control_in_Text-to-Image_Diffusion_Models_CVPR_2024_paper.html), [[Project]](https://jiuntian.github.io/interactdiffusion/)
- Person in Place (CVPR 2024) [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Person_in_Place_Generating_Associative_Skeleton-Guidance_Maps_for_Human-Object_Interaction_CVPR_2024_paper.pdf), [[Code]](https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE)
- VirtualModel (arXiv 2024.5) [[Paper]](https://arxiv.org/pdf/2405.09985) [[Code]](https://aigcdesigngroup.github.io/replace-anything/)
- Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [[Paper]](https://arxiv.org/pdf/2104.00356.pdf)
- Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [[Paper]](https://arxiv.org/pdf/1909.05379.pdf)
### HOI Recognition: Image-based, to recognize all the HOIs in one image.
- RAM++ (arXiv'23) [[Paper]](https://arxiv.org/pdf/2310.15200.pdf), [[Code]](https://github.com/xinyu1205/recognize-anything)
- OpenTAP (ECCV'22) [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-031-19806-9_12.pdf), [[Code]](https://vkhoi.github.io/TAP)
- RelViT (ICLR'22) [[Paper]](https://arxiv.org/pdf/2204.11167.pdf), [[Code]](https://github.com/NVlabs/RelViT)
- DEFR (arXiv 2021.12) [[Paper]](https://arxiv.org/pdf/2112.06392.pdf)
- Interaction Compass (ICCV 2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Huynh_Interaction_Compass_Multi-Label_Zero-Shot_Learning_of_Human-Object_Interactions_via_Spatial_ICCV_2021_paper.pdf)
- DEFR-CLIP (arXiv 2021.07) [[Paper]](https://arxiv.org/pdf/2107.13083.pdf)
- PaStaNet: Toward Human Activity Knowledge Engine
(CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action) [[Data]](https://github.com/DirtyHarryLYL/HAKE) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s)- Pairwise (ECCV2018) [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf)
- Attentional Pooling for Action Recognition (NIPS2017) [[Code]](https://github.com/rohitgirdhar/AttentionalPoolingAction) [[Paper]](https://arxiv.org/pdf/1711.01467.pdf)
- Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering (ECCV2016) [[Code]](https://uofi.box.com/s/yflrqbser1r5m3iez1satkprawmsouag) [[Paper]](https://arxiv.org/pdf/1604.04808.pdf)
- Contextual Action Recognition with R\*CNN (ICCV2015) [[Code]](https://github.com/gkioxari/RstarCNN) [[Paper]](https://arxiv.org/pdf/1505.01197.pdf)
- HOCNN (ICCV2015) [[Code]](https://github.com/ywchao/hico_benchmark) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_iccv2015.pdf)
- SGAP-Net (AAAI2020) [[Paper]](https://aaai.org/Papers/AAAI/2020GB/AAAI-JiZ.4799.pdf)
More...
#### Unseen or zero-shot learning (image-level recognition).
- HTS (ICIP 2023) [[Paper]](https://ieeexplore.ieee.org/abstract/document/10222927)
- ICompass (ICCV2021) [[Paper]](https://hbdat.github.io/pubs/iccv21_relation_direction_final.pdf), [[Code]](https://github.com/hbdat/iccv21_relational_direction)
- Compositional Learning for Human Object Interaction (ECCV2018) [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Keizo_Kato_Compositional_Learning_of_ECCV_2018_paper.pdf)
- Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [[Paper]](https://arxiv.org/pdf/2009.01039.pdf)
More...
#### HOI for Robotics.
- HOI4ABOT: Human-Object Interaction Anticipation for Assistive roBOTs (CORL 2023) [[Paper]](https://openreview.net/forum?id=rYZBdBytxBx), [[Project]](https://evm7.github.io/HOI4ABOT_page/)
- Human–object interaction prediction in videos through gaze following (CVIU 2023) [[Paper]](https://arxiv.org/abs/2306.03597), [[Project]](https://evm7.github.io/HOIGaze-page/)
### HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.
- DSU (arXiv 2025), [[Paper]](https://arxiv.org/pdf/2501.11653), [[Project]](https://tau-vailab.github.io/Dynamic-Scene-Understanding/)
- EZ-HOI (NeurIPS 2024), [[Paper]](https://arxiv.org/pdf/2410.23904), [[Code]](https://github.com/ChelsieLei/EZ-HOI)
- HOIGen (ACM MM 2024), [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3664647.3680927), [[Code]](https://github.com/soberguo/HOIGen)
- DiffusionHOI (NeurIPS 2024), [[Paper]](https://arxiv.org/pdf/2410.20155), [[Code]](https://github.com/0liliulei/DiffusionHOI)
- CEFA (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2407.21438), [[Code]](https://github.com/LijunZhang01/CEFA)
- CO-HOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2410.15657)
- HOIGen (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2408.05974), [[Code]](https://github.com/soberguo/HOIGen)
- CMMP (ECCV 2024), [[Paper]](https://arxiv.org/pdf/2408.02484), [[Code]](https://github.com/ltttpku/CMMP)
- CycleHOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2407.11433)
- GeoHOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2406.18691), [[Code]](https://github.com/zhumanli/GeoHOI)
- SICHOI (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)
- Pose-Aware (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)
- BCOM (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)
- MP-HOI (CVPR 2024), [[Paper]](https://arxiv.org/pdf/2406.07221), [[Project]](https://mp-hoi.github.io/)
- DP-HOI (CVPR 2024), [[Paper]](https://arxiv.org/pdf/2404.01725.pdf), [[Code]](https://github.com/xingaoli/DP-HOI)
- DPADN (AAAI 2024), [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/27949), [[Code]](https://github.com/PRIS-CV/DPADN)
- KI2HOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2403.07246.pdf)
- SCTC (AAAI 2024), [[Paper]](https://arxiv.org/pdf/2401.05676.pdf)
- OBPA-Net (PRCV 2023), [[Paper]](https://link.springer.com/chapter/10.1007/978-981-99-8555-5_30), [[Code]](https://github.com/zhuang1iu/OBPA-NET)
- MLKD (WACV2024), [[Paper]](https://arxiv.org/pdf/2309.05069.pdf), [[Code]](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)
- SBM (PRCV2023), [[Paper]](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)
- UnionDet (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2312.12664.pdf)
- SCA (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2312.01713.pdf)
- HCVC (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2311.16475.pdf)
- GFIN (NN 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)
- LOGICHOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=QjI36zxjbW), [[Code]](https://github.com/weijianan1/LogicHOI)
- CLIP4HOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=nqIIWnwe73)
- UniHOI (NeurIPS 2023), [[Paper]](https://arxiv.org/pdf/2311.03799.pdf), [[Code]](https://github.com/Caoyichao/UniHOI)
- SG2HOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2311.01755.pdf)
- SQAB (Displays 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)
- Mult-Step (ACMMM 2023), [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3612581)
- PDN (PR 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)
- ICDT (ICANN 2023), [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-031-44223-0_35.pdf), [[Code]](https://github.com/bingnanG/ICDT)
- ScratchHOI (ICIP 2023), [[Paper]](https://ieeexplore.ieee.org/abstract/document/10222323)
- ADA-CM (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2309.03696.pdf), [[Code]](https://github.com/ltttpku/ADA-CM)
- HODN (TMM 2023), [[Paper]](https://arxiv.org/pdf/2308.10158.pdf)
- RLIPv2 (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.09351.pdf), [[Code]](https://github.com/JacobYuan7/RLIPv2)
- AGER (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.08370.pdf), [[Code]](https://github.com/six6607/AGER)
- Diagnosing Human-object Interaction Detectors (arXIv 2023), [[Paper]](https://arxiv.org/pdf/2308.08529.pdf), [[Code]](https://github.com/neu-vi/Diag-HOI)
- Compo (ICME 2023), [[Paper]](https://arxiv.org/pdf/2308.05961.pdf)
- PViC (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.06202.pdf), [[Code]](https://github.com/fredzzhang/pvic)
- VIL (ACM MM 2023), [[Paper]](https://arxiv.org/pdf/2308.02606.pdf)
- RmLR (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2307.13529.pdf)
- PSN (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2307.10499.pdf)
- SOV-STG (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2307.02291.pdf), [[Code]](https://github.com/cjw2021/SOV-STG)
- Shikra (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2306.15195.pdf), [[Code]](https://github.com/shikras/shikra)
- HOKEM (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2306.14260.pdf)
- SQA (ICASSP 2023), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10096029), [[Code]](https://github.com/nmbzdwss/SQA)
- OpenCat (CVPR 2023), [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)
- DiffHOI (arXiv 2023.5), [[Paper]](https://arxiv.org/pdf/2305.12252.pdf)
- ViPLO (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2304.08114.pdf), [[Code]](https://github.com/Jeeseung-Park/ViPLO)
- MUREN (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2304.04997.pdf), [[Project]](http://cvlab.postech.ac.kr/research/MUREN/)
- HOICLIP (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2303.15786.pdf), [[Code]](https://github.com/Artanic30/HOICLIP)
- CQL (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2303.14005.pdf), [[Code]](https://github.com/charles-xie/CQL)
- UniVRD (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.08998.pdf)
- SKGHOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.04253.pdf)
- Weakly-HOI-CLIP (ICLR 2023), [[Paper]](https://arxiv.org/pdf/2303.01313.pdf), [[Code]](https://github.com/bobwan1995/Weakly-HOI)
- FGAHOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2301.04019.pdf), [[Code]](https://github.com/xiaomabufei/FGAHOI)
- PR-Net (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2301.03510.pdf)
- PQNet (MMAsia 2022), [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3551626.3564944)
- MHOI (TCSVT 2022), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)
- RLIP (NeurIPS 2022), [[Paper]](https://arxiv.org/pdf/2209.01814.pdf), [[Code]](https://github.com/JacobYuan7/RLIP)
- PartMap (ECCV2022), [[Paper]](https://arxiv.org/pdf/2207.14192v1.pdf), [[Code]](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)
- K-BAN (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2207.07979.pdf)
- SGCN4HOI (IEEE SMC 2022), [[Paper]](https://arxiv.org/pdf/2207.05733.pdf)
- HQM (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2207.05293.pdf), [[Code]](https://github.com/MuchHair/HQM)
- ODM (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2207.02400.pdf)
- SDT (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2207.01869.pdf)
- DOQ (CVPR 2022), [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Qu_Distillation_Using_Oracle_Queries_for_Transformer-Based_Human-Object_Interaction_Detection_CVPR_2022_paper.pdf), [[Code]](https://github.com/SherlockHolmes221/DOQ)
- STIP (CVPR 2022), [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_Exploring_Structure-Aware_Transformer_Over_Interaction_Proposals_for_Human-Object_Interaction_Detection_CVPR_2022_paper.pdf)
- DT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.09290.pdf)
- IF (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.07718.pdf), [[Code]](https://github.com/Foruck/Interactiveness-Field)
- CPC (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.04836.pdf), [[Code]](https://github.com/mlvlab/CPChoi)
- CATN (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.04911.pdf)
- SSRT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.00746.pdf)
- GEN-VLKT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.13954.pdf), [[Code]](https://github.com/YueLiao/gen-vlkt)
- MSTR (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.14709.pdf)
- Iwin (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2203.10537.pdf)
- RGBM (arXiv 2022.2), [[Paper]](https://arxiv.org/pdf/2202.11998.pdf)
- GPV-2 (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2202.02317.pdf), [[Project]](https://prior.allenai.org/projects/gpv2)
- OC-Immunity (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)
- OCN (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2202.00259.pdf), [[Code]](https://github.com/JacobYuan7/OCN-HOI-Benchmark)
- QAHOI (arXiv 2021) [[Paper]](https://arxiv.org/pdf/2112.08647.pdf), [[Code]](https://github.com/cjw2021/QAHOI)
- PhraseHOI (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2112.07383.pdf)
- DEFR (arXiv 2021.12) [[Paper]](https://arxiv.org/pdf/2112.06392.pdf)
- UPT (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2112.01838.pdf), [[Code]](https://github.com/fredzzhang/upt)
- HRNet (TIP 2021) [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9552553)
- ACP++ (TIP 2021) [[Paper]](https://arxiv.org/pdf/2109.04047.pdf), [[Code]](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)
- SG2HOI (ICCV 2021) [[Paper]](https://arxiv.org/pdf/2108.08584.pdf)
- CDN (NeurIPS 2021) [[Paper]](https://arxiv.org/pdf/2108.05077.pdf), [[Code]](https://github.com/YueLiao/CDN)
- GTNet (arXiv 2021.8) [[Paper]](https://arxiv.org/pdf/2108.00596.pdf), [[Code]](https://github.com/UCSB-VRL/GTNet)
- HOI-MO-Net (IVC 2021) [[Paper]](https://www.sciencedirect.com/science/article/pii/S0262885621001670?via%3Dihub#tbl0005)
- IPGN (TIP 2021.7) [[Paper]](https://ieeexplore.ieee.org/document/9489275)
- SCG (ICCV 2021, SAG, v2) [[Paper]](https://arxiv.org/pdf/2012.06060.pdf), [[Code]](https://github.com/fredzzhang/spatially-conditioned-graphs)
- Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior (arXiv) [[Paper]](https://arxiv.org/pdf/2105.03089.pdf)
- PST (ICCV2021) [[Paper]](https://arxiv.org/pdf/2105.02170.pdf)
- RR-Net (arXiv 2021.5) [[Paper]](https://arxiv.org/pdf/2104.15015.pdf)
- HOTR (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.13682.pdf), [[Code]](https://github.com/kakaobrain/HOTR)
- GGNet (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.05269.pdf), [[Code]](https://github.com/SherlockHolmes221/GGNet)
- ATL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.02867.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)
- FCL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.08214.pdf), [[Code]](https://github.com/zhihou7/FCL)
- AS-Net (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.05983.pdf), [[Code]](https://github.com/yoyomimi/AS-Net)
- End-to-End Human Object Interaction Detection with HOI Transformer (CVPR2021), [[Paper]](https://arxiv.org/pdf/2103.04503.pdf), [[Code]](https://github.com/bbepoch/HoiTransformer)
- QPIC (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.05399.pdf), [[Code]](https://github.com/hitachi-rd-cv/qpic)
- TIN (TPAMI2021) [[Paper]](https://arxiv.org/pdf/2101.10292.pdf), [[Code]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)
- IDN (NeurIPS2020) [[Paper]](https://arxiv.org/pdf/2010.16219.pdf) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))
- DIRV (AAAI2021) [[Paper]](https://arxiv.org/pdf/2010.01005.pdf)
- DecAug (AAAI2021) [[Paper]](https://arxiv.org/pdf/2010.01007.pdf)
- OSGNet (IEEE Access) [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)
- PFNet (CVM) [[Paper]](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)
- UniDet (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)
- DRG (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123570681.pdf) [[Code]](https://github.com/vt-vl-lab/DRG)
- FCMNet (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)
- Contextual Heterogeneous Graph Network for Human-Object Interaction Detection (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)
- PD-Net (ECCV2020) [[Paper-1]](https://www.researchgate.net/publication/343536295_Polysemy_Deciphering_Network_for_Human-Object_Interaction_Detection) [[Paper-2]](https://arxiv.org/pdf/2008.02918.pdf) [[Code]](https://github.com/MuchHair/PD-Net)
- VCL (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.12407.pdf) [[Code]](https://github.com/zhihou7/VCL)
- ACP (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.08728.pdf) [[Code]](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)
- ConsNet (ACMMM2020) [[Paper]](https://arxiv.org/pdf/2008.06254.pdf) [[Code]](https://github.com/yeliudev/ConsNet), **HICO-DET Python API**: A general Python toolkit for the HICO-DET dataset, including APIs for data loading & processing, human-object pair IoU & NMS calculation, and standard evaluation. [[Code]](https://github.com/yeliudev/ConsNet) [[Documentation]](https://consnet.readthedocs.io/)
- Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection (IJCAI2020) [[Paper]](https://www.ijcai.org/Proceedings/2020/0154.pdf)
- PaStaNet (CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) [[Data]](https://github.com/DirtyHarryLYL/HAKE) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s)
- DJ-RN (CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/DJ-RN) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf)
- Cascaded Human-Object Interaction Recognition (CVPR2020) [[Code]](https://github.com/tfzhou/C-HOI) [[Paper]](https://arxiv.org/pdf/2003.04262.pdf)
- PPDM (CVPR2020) [[Code]](https://github.com/YueLiao/PPDM) [[Paper]](https://arxiv.org/pdf/1912.12898.pdf)
- IP-Net (CVPR2020) [[Code]](https://github.com/vaesl/IP-Net) [[Paper]](https://arxiv.org/pdf/2003.14023.pdf)
- VSGNet (CVPR2020) [[Code]](https://github.com/ASMIftekhar/VSGNet) [[Paper]](https://arxiv.org/pdf/2003.05541.pdf)
- HOID (CVPR2020) [[Code]](https://github.com/scwangdyd/zero_shot_hoi) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2020/05225.pdf)
- Diagnosing Rarity in Human-Object Interaction Detection (CVPRW2020) [[Paper]](https://arxiv.org/pdf/2006.05728.pdf)
- MLCNet (ICMR2020) [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)
- SIGN (ICME2020) [[Paper]](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)
- In-GraphNet (IJCAI-PRICAI 2020) [[Paper]](https://arxiv.org/pdf/2007.06925.pdf)
- PMFNet(ICCV2019) [[Code]](https://github.com/bobwan1995/PMFNet) [[Paper]](https://arxiv.org/abs/1909.08453)
- No-Frills (ICCV2019) [[Code]](https://github.com/BigRedT/no_frills_hoi_det) [[Paper]](http://tanmaygupta.info/assets/img/no_frills/paper.pdf)
- Analogy (ICCV2019) [[Code]](https://github.com/jpeyre/analogy) [[Paper]](https://www.di.ens.fr/willow/research/analogy/paper.pdf)
- RPNN (ICCV2019) [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhou_Relation_Parsing_Neural_Network_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)
- Deep Contextual Attention for Human-Object Interaction Detection (ICCV2019) [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)
- Interactiveness (CVPR2019) [[Code]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network) [[Paper]](https://arxiv.org/pdf/1811.08264.pdf)
- Turbo (AAAI2019) [[Paper]](https://arxiv.org/pdf/1903.06355.pdf)
- GPNN (ECCV2018) [[Code]](https://github.com/SiyuanQi/gpnn) [[Paper]](https://arxiv.org/pdf/1808.07962.pdf)
- iCAN (BMVC2018) [[Code]](https://github.com/vt-vl-lab/iCAN) [[Paper]](https://arxiv.org/pdf/1808.10437.pdf)
- InteractNet (CVPR2018) [[Paper]](https://arxiv.org/pdf/1704.07333.pdf)
- Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [[Paper]](http://vision.stanford.edu/pdf/shen2018wacv.pdf)
- HO-RCNN (WACV2018) [[Code]](https://github.com/ywchao/ho-rcnn) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf)
- VS-GATs (Mar. 2020) [[Paper]](https://arxiv.org/pdf/2001.02302.pdf)
- Classifying All Interacting Pairs in a Single Shot (Jan. 2020) [[Paper]](https://arxiv.org/pdf/2001.04360.pdf)
- Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [[Paper]](https://arxiv.org/pdf/2005.11406.pdf)
- PMN (Jul. 2020) [[Paper]](https://arxiv.org/pdf/2008.02042.pdf) [[Code]](https://github.com/birlrobotics/PMN)
- SAG (Dec 2020) [[Paper]](https://arxiv.org/pdf/2012.06060.pdf) [[Code]](https://github.com/fredzzhang/spatio-attentive-graphs)
- SABRA (Dec 2020) [[Paper]](https://arxiv.org/pdf/2012.12510.pdf)
More...
#### Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).
- HOIGen (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2408.05974), [[Code]](https://github.com/soberguo/HOIGen)
- CMD-SE (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2404.06194.pdf), [[Code]](https://github.com/ltttpku/CMD-SE-release)
- CLIP4HOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=nqIIWnwe73)
- UniHOI (NeurIPS 2023), [[Paper]](https://arxiv.org/pdf/2311.03799.pdf), [[Code]](https://github.com/Caoyichao/UniHOI)
- Lu et. al. (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2309.05069.pdf), [[Code]](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)
- CDT (TNNLS 2023), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10242152)
- RLIPv2 (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.09351.pdf), [[Code]](https://github.com/JacobYuan7/RLIPv2)
- Unal et.al. (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.05546.pdf)
- RLIP (NeurIPS 2022), [[Paper]](https://arxiv.org/pdf/2209.01814.pdf), [[Code]](https://github.com/JacobYuan7/RLIP)
- THID (CVPR 2022), [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2022/CVPR2022_4126.pdf), [[Code]](https://github.com/scwangdyd/promting_hoi)
- EoID (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2204.03541.pdf), [[Code]](https://github.com/mrwu-mac/EoID)
- SCL (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2203.14272.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)
- (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.13954.pdf), [[Code]](https://github.com/YueLiao/)
- OC-Immunity (AAAI 2022), [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)
- Align-Former (BMVC 2021), [[Paper]](https://arxiv.org/pdf/2112.00492.pdf)
- CHOID (ICCV2021) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2021/ICCV2021_sucheng.pdf), [[Code]](https://github.com/scwangdyd/large_vocabulary_hoi_detection)
- DGIG-Net (TOC2021) [[Paper]](https://ieeexplore.ieee.org/abstract/document/9352497)
- ATL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.02867.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)
- FCL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.08214.pdf), [[Code]](https://github.com/zhihou7/FCL)
- Detecting Human-Object Interaction with Mixed Supervision ( 2021) [[Paper]](https://arxiv.org/pdf/2011.04971v1.pdf)
- ConsNet (ACMMM2020) [[Paper]](https://arxiv.org/pdf/2008.06254.pdf) [[Code]](https://github.com/yeliudev/ConsNet)
- Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [[Paper]](https://arxiv.org/pdf/2009.01039.pdf)
- VCL (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.12407.pdf) [[Code]](https://github.com/zhihou7/VCL)
- HOID (CVPR2020) [[Code]](https://github.com/scwangdyd/zero_shot_hoi) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2020/05225.pdf)
- Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [[Paper]](https://arxiv.org/pdf/2005.11406.pdf)
- Analogy (ICCV2019) [[Code]](https://github.com/jpeyre/analogy) [[Paper]](https://www.di.ens.fr/willow/research/analogy/paper.pdf)
- Functional (AAAI2020) [[Paper]](https://arxiv.org/pdf/1904.03181.pdf)
- Scaling Human-Object Interaction Recognition through Zero-Shot Learning (2018) [[Paper]](http://vision.stanford.edu/pdf/shen2018wacv.pdf)
More...
### Video HOI methods
- SPDTP (arXiv, Jun 2022), [[Paper]](https://arxiv.org/pdf/2206.03061.pdf)
- V-HOI (arXiv, Jun 2022), [[Paper]](https://arxiv.org/pdf/2206.01908.pdf)
- Detecting Human-Object Relationships in Videos (ICCV2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Ji_Detecting_Human-Object_Relationships_in_Videos_ICCV_2021_paper.pdf)
- STIGPN (Aug 2021), [[Paper]](https://arxiv.org/pdf/2108.08633.pdf), [[Code]](https://github.com/GuangmingZhu/STIGPN)
- VidHOI (May 2021), [[Paper]](https://arxiv.org/pdf/2105.11731.pdf)
- LIGHTEN (ACMMM2020) [[Paper]](https://www.cse.iitb.ac.in/~rdabral/docs/acm_lighten.pdf) [[Code]](https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI)
- Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [[Paper]](https://arxiv.org/pdf/1912.02401.pdf)
- Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [[Code]](https://github.com/Tushar-N/interaction-hotspots) [[Paper]](https://arxiv.org/pdf/1812.04558.pdf)
- GPNN (ECCV2018) [[Code]](https://github.com/SiyuanQi/gpnn) [[Paper]](https://arxiv.org/pdf/1808.07962.pdf)
More...
### 3D HOI Reconstruction/Generation/Understanding
- P3HAOI (AAAI 2024) [[Paper]](https://arxiv.org/pdf/2312.10714.pdf), [[Project]](https://mvig-rhos.com/p3haoi)
- IM-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.08869.pdf), [[Project]](https://afterjourney00.github.io/IM-HOI.github.io/)
- Ins-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.09641.pdf), [[Project]](https://jiajunzhang16.github.io/ins-hoi/)
- HDM (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.07063.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/procigen-hdm/)
- HOI-Diff (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.06553.pdf), [[Project]](https://neu-vi.github.io/HOI-Diff/)
- CHOIS (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.03913.pdf), [[Project]](https://lijiaman.github.io/projects/chois/)
- MOB (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.02700.pdf), [[Project]](https://foruck.github.io/occu-page/)
- PhysHOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.04393.pdf), [[Project]](https://wyhuai.github.io/physhoi-page/)
- physfullbody-grasp (3DV 2024) [[Paper]](https://arxiv.org/pdf/2309.07907.pdf), [[Project]](https://eth-ait.github.io/phys-fullbody-grasp/)
- MIME (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2212.04360.pdf), [[Project]](https://mime.is.tue.mpg.de/)
- TOHO (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2303.13129.pdf)
- GenZI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2311.17737.pdf), [[Project]](https://craigleili.github.io/projects/genzi/)
- CG-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2311.16097.pdf), [[Project]](https://www.christian-diller.de/projects/cg-hoi/)
- OMOMO (TOG 2023) [[Paper]](https://arxiv.org/pdf/2309.16237.pdf)
- HOPS (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2205.02830.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/hops/)
- UniHSI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2309.07918.pdf), [[Project]](https://xizaoqu.github.io/unihsi/)
- NIFTY (arXiv 2023) [[Paper]](https://nileshkulkarni.github.io/nifty/assets/paper.pdf), [[Project]](https://nileshkulkarni.github.io/nifty/)
- IMoS (EUROGRAPHICS 2023) [[Paper]](https://arxiv.org/pdf/2212.07555.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/IMoS/)
- SceneDiffuser (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2301.06015.pdf), [[Project]](https://scenediffuser.github.io/)
- Locomotion-Action-Manipulation (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2301.02667.pdf), [[Porject]](https://jiyewise.github.io/projects/LAMA/)
- ROAM (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2308.12969.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/ROAM/)
- Object pop-up (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2306.00777.pdf), [[Code]](https://github.com/ptrvilya/object-popup)
- StackFLOW (IJCAI 2023) [[Paper]](https://www.ijcai.org/proceedings/2023/0100.pdf), [[Code]](https://github.com/huochf/StackFLOW)
- InterDiff (ICCV2023) [[Paper]](https://arxiv.org/pdf/2308.16905.pdf), [[Project]](https://sirui-xu.github.io/InterDiff/)
- Wang et. al. (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2209.02485.pdf)
- Haresh et. al. (3DV 2022) [[Paper]](https://arxiv.org/pdf/2209.05612.pdf), [[Project]](https://3dlg-hcvc.github.io/3dhoi/)
- COUCH (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2205.00541.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/couch/)
- HULC (ECCV 2022) [[Paper]](https://vcai.mpi-inf.mpg.de/projects/HULC/data/paper_light.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/HULC/)
- AROS (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2210.11725.pdf)
- MoCapDeform (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2208.08439.pdf)
- SUMMON (SIGGRAPH Asia 2022) [[Paper]](https://arxiv.org/pdf/2301.01424.pdf), [[Project]](https://lijiaman.github.io/projects/summon/)
- HUMANISE (arXiv 2022) [[Paper]](https://silvester.wang/HUMANISE/paper.pdf), [[Project]](https://silvester.wang/HUMANISE/)
- CHAIRS (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2212.10621.pdf), [[Project]](https://jnnan.github.io/project/chairs/)
- NeuralDome (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2212.07626.pdf)
- ARCTIC (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2204.13662.pdf), [[Project]](https://arctic.is.tue.mpg.de/)
- IMoS (EUROGRAPHICS 2023) [[Paper]](https://arxiv.org/pdf/2212.07555.pdf)
- COINS (ECCV 2022) [[Paper]](https://drive.google.com/file/d/1LpJe1RiDsB49tQwUFWMUorjBTQfvXzvW/view?usp=sharing), [[Project]](https://zkf1997.github.io/COINS/index.html)
- Pose2Room (ECCV 2022) [[Paper]](https://arxiv.org/pdf/2112.03030.pdf), [[Project]](https://yinyunie.github.io/pose2room-page/)
- RICH (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2206.09553.pdf), [[Project]](https://rich.is.tue.mpg.de/)
- MOVER (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2203.03609.pdf), [[Project]](https://mover.is.tue.mpg.de/)
- SAGA (ECCV 2022) [[Paper]](https://arxiv.org/abs/2112.10103), [[Project]](https://jiahaoplus.github.io/SAGA/saga.html)
- GOAL (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2112.11454.pdf), [[Project]](https://goal.is.tuebingen.mpg.de/)
- BEHAVE (CVPR 2022) [[Paper]](https://virtualhumans.mpi-inf.mpg.de/papers/bhatnagar22behave/behave.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/behave/)
- CHORE (ECCV 2022) [[Project]](https://virtualhumans.mpi-inf.mpg.de/chore/), [[Paper]](https://arxiv.org/pdf/2204.02445.pdf)
- POSA (CVPR 2021) [[Paper]](https://arxiv.org/pdf/2012.11581.pdf), [[Project]](https://posa.is.tue.mpg.de/)
- GraviCap (ICCV 2021) [[Paper]](https://arxiv.org/pdf/2108.08844.pdf), [[Project]](https://4dqv.mpi-inf.mpg.de/GraviCap/)
- D3D-HOI (arXiv 2021) [[Paper]](https://arxiv.org/pdf/2108.08420.pdf), [[Project]](https://github.com/facebookresearch/d3d-hoi)
- PSI (CVPR 2020) [[Paper]](https://ps.is.mpg.de/uploads_file/attachment/attachment/575/1912.02923.pdf), [[Code]](https://github.com/yz-cnsdqz/PSI-release)
- DJ-RN (CVPR 2020) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf), [[Code]](https://github.com/DirtyHarryLYL/DJ-RN)
- PLACE (3DV 2020) [[Paper]](https://arxiv.org/pdf/2008.05570.pdf), [[Project]](https://sanweiliti.github.io/PLACE/PLACE.html)
- GRAB (ECCV 2020) [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490562.pdf), [[Project]](https://grab.is.tue.mpg.de/)
- Holistic++ (ICCV 2019) [[Paper]](https://arxiv.org/pdf/1909.01507.pdf), [[Code]]()
- PROX (ICCV 2019) [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Hassan_Resolving_3D_Human_Pose_Ambiguities_With_3D_Scene_Constraints_ICCV_2019_paper.pdf), [[Project]](https://prox.is.tue.mpg.de/)
## Result
### [PaStaNet-HOI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network):
Proposed by TIN (TPAMI version, Transferable Interactiveness Network).
It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling).
It has a more severe long-tailed data distribution thus is more difficult.#### Detector: COCO pre-trained
|Method| mAP |
|:---:|:---:|
|iCAN|11.00|
|iCAN+NIS|13.13|
|[TIN](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| **15.38**|### HICO-DET:
#### 1) Detector: COCO pre-trained
|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|[Shen et al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf)| WACV2018 | 6.46 | 4.24 | 7.12| - | - | - |
|[HO-RCNN](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf)| WACV2018 | 7.81| 5.37| 8.54| 10.41| 8.94 | 10.85 |
|[InteractNet](https://arxiv.org/pdf/1704.07333.pdf)| CVPR2018 | 9.94| 7.16 | 10.77| - | - |-|
|[Turbo](https://arxiv.org/pdf/1903.06355.pdf)|AAAI2019|11.40| 7.30| 12.60|- | - |-|
|[GPNN](https://arxiv.org/pdf/1808.07962.pdf)| ECCV2018 | 13.11 | 9.34 | 14.23| - | - |-|
|[Xu et. al](https://www-users.cs.umn.edu/~qzhao/publications/pdf/xu2019cvpr.pdf)|ICCV2019|14.70 |13.26| 15.13|-|-|-|
|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 14.84| 10.45 | 16.15 | 16.26 | 11.33| 17.73 |
|[Wang et. al.](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)|ICCV2019|16.24 |11.16| 17.75| 17.73| 12.78| 19.21|
|[Lin et. al](https://www.ijcai.org/Proceedings/2020/0154.pdf)|IJCAI2020|16.63 |11.30| 18.22| 19.22| 14.56| 20.61|
|[Functional](https://arxiv.org/pdf/1904.03181.pdf) (suppl)|AAAI2020|16.96| 11.73 |18.52| -|-|-|
|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 17.03 | 13.42| 18.11| 19.17| 15.51|20.26|
|[No-Frills](http://tanmaygupta.info/assets/img/no_frills/paper.pdf)| ICCV2019 | 17.18 |12.17| 18.68 |-|-|-|
|[RPNN](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhou_Relation_Parsing_Neural_Network_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)|ICCV2019|17.35| 12.78| 18.71|-|-|-|
|[PMFNet](https://arxiv.org/pdf/1909.08453.pdf)| ICCV2019 | 17.46| 15.65| 18.00| 20.34| 17.47| 21.20|
|[SIGN](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)|ICME2020|17.51| 15.31 |18.53 |20.49| 17.53| 21.51|
|[Interactiveness-optimized](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 | 17.54 |13.80 |18.65| 19.75| 15.70| 20.96|
|[Liu et.al.](https://arxiv.org/pdf/2105.03089.pdf)|arXiv|17.55 |20.61|-|-|-|-|
|[Wang et al.](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)|ECCV2020|17.57 |16.85| 17.78| 21.00| 20.74| 21.08|
|[UnionDet](https://arxiv.org/pdf/2312.12664.pdf)|arXiv2023|17.58| 11.72 |19.33| 19.76| 14.68| 21.27|
|[In-GraphNet](https://arxiv.org/pdf/2007.06925.pdf)|IJCAI-PRICAI 2020|17.72 |12.93 |19.31|-|-|-|
|[HOID](https://github.com/scwangdyd/zero_shot_hoi)|CVPR2020| 17.85 |12.85 |19.34|-|-|-|
|[MLCNet](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)| ICMR2020| 17.95 |16.62 |18.35|22.28 |20.73 |22.74|
|[SAG](https://github.com/fredzzhang/spatio-attentive-graphs)|arXiv| 18.26 |13.40 |19.71|-|-|-|
|[Sarullo et al.](https://arxiv.org/pdf/2009.01039.pdf)|arXiv|18.74|-|-|-|-|-|
|[DRG](https://github.com/vt-vl-lab/DRG)|ECCV2020|19.26 |17.74 |19.71 |23.40 |21.75 |23.89|
|[Analogy](https://www.di.ens.fr/willow/research/analogy/paper.pdf)| ICCV2019 | 19.40 |14.60| 20.90|-|-|-|
|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|19.43 |16.55| 20.29| 22.00| 19.09| 22.87|
|[VS-GATs](https://arxiv.org/pdf/2001.02302.pdf)|arXiv|19.66 |15.79 |20.81|-|-|-|
|[VSGNet](https://github.com/ASMIftekhar/VSGNet)|CVPR2020|19.80 |16.05| 20.91|-|-|-|
|[PFNet](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)|CVM|20.05 |16.66 |21.07| 24.01| 21.09| 24.89|
|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021 |20.08| 15.57| 21.43|-|-|-|
|[FCMNet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)|ECCV2020|20.41 |17.34| 21.56| 22.04 |18.97| 23.12|
|[ACP](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)|ECCV2020|20.59 |15.92| 21.98|-|-|-|
|[PD-Net](https://github.com/MuchHair/PD-Net)|ECCV2020|20.81 |15.90| 22.28| 24.78| 18.88| 26.54|
|[SG2HOI](https://arxiv.org/pdf/2108.08584.pdf)|ICCV2021|20.93 |18.24| 21.78| 24.83| 20.52| 25.32|
|[TIN-PAMI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)|TAPMI2021|20.93|18.95| 21.32| 23.02| 20.96| 23.42|
|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|21.07 |16.79| 22.35|-|-|-|
|[PMN](https://github.com/birlrobotics/PMN)|arXiv|21.21 |17.60| 22.29|-|-|-|
|[IPGN](https://ieeexplore.ieee.org/document/9489275)|TIP2021|21.26|18.47|22.07|-|-|-|
|[DJ-RN](https://github.com/DirtyHarryLYL/DJ-RN)| CVPR2020 | 21.34|18.53|22.18|23.69|20.64|24.60|
|[OSGNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)|IEEE Access|21.40 |18.12| 22.38|-|-|-|
|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|21.48 |16.85| 22.86| 24.29| 19.09| 25.85|
|[SCG+ODM](https://arxiv.org/pdf/2207.02400.pdf)|ECCV2022|21.50| 17.59| 22.67|-|-|-|
|[DIRV](https://arxiv.org/pdf/2010.01005.pdf)| AAAI2021|21.78| 16.38| 23.39| 25.52| 20.84| 26.92|
|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021| 21.85| 18.11 |22.97|-|-|-|
|[HRNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9552553)|TIP2021|21.93 |16.30| 23.62| 25.22| 18.75| 27.15|
|[ConsNet](https://github.com/yeliudev/ConsNet)|ACMMM2020|22.15|17.55|23.52|26.57|20.8|28.3|
|[SKGHOI](https://arxiv.org/pdf/2303.04253.pdf)|arXiv2023|22.61 |15.87| 24.62|-|-|-|
|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|23.36|22.47|23.63|26.43|25.01|26.85|
|[QAHOI-Res50](https://github.com/cjw2021/QAHOI)|arXiv2021|24.35 |16.18| 26.80|-|-|-|
|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|25.97 |26.09| 25.93|-|-|-|
|[STIP](https://github.com/zyong812/STIP)|CVPR2022|**28.81**| **27.55**| **29.18**| **32.28**| **31.07**| **32.64**|#### 2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)
The finetuned detector would learn to **only detect the interactive humans and objects** (with interactiveness), thus suppressing many wrong pairings (non-interactive human-object pairs) and boosting the performance.
|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|[UniDet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)|ECCV2020|17.58 |11.72 |19.33 |19.76 |14.68 |21.27|
|[IP-Net](https://arxiv.org/pdf/2003.14023.pdf) | CVPR2020| 19.56 |12.79| 21.58 |22.05 |15.77 |23.92|
|[RR-Net](https://arxiv.org/pdf/2104.15015.pdf)|arXiv|20.72 |13.21 |22.97| -|-|-|
|[PPDM](https://arxiv.org/pdf/1912.12898v1.pdf) (paper) |CVPR2020|21.10 |14.46| 23.09| -|-|-|
|[PPDM](https://github.com/YueLiao/PPDM) (github-hourglass104) |CVPR2020|21.73/21.94 |13.78/13.97 |24.10/24.32 |24.58/24.81| 16.65/17.09| 26.84/27.12|
|[Functional](https://arxiv.org/pdf/1904.03181.pdf) |AAAI2020|21.96 |16.43|23.62| -|-|-|
|[SABRA-Res50](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 23.48| 16.39| 25.59| 28.79| 22.75| 30.54|
|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|23.63 |17.21 |25.55 |25.98 |19.12 |28.03|
|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021| 23.67| 17.64| 25.47| 26.01| 19.60| 27.93|
|[PST](https://arxiv.org/pdf/2105.02170.pdf)|ICCV2021|23.93| 14.98| 26.60| 26.42| 17.61| 29.05|
|[SABRA-Res50FPN](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 24.12 |15.91| 26.57| 29.65| 22.92| 31.65|
|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|24.50 |18.53| 26.28| 27.23| 21.27| 29.00|
|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|24.58 |20.33| 25.86| 27.89| 23.64| 29.16|
|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|24.68| 20.03| 26.07| 26.80| 21.61| 28.35|
|[HOTR](https://github.com/kakaobrain/HOTR)|CVPR2021|25.10| 17.34| 27.42| -|-|-|
|[FCL+VCL](https://github.com/zhihou7/FCL)|CVPR2021|25.27| 20.57| 26.67| 27.71| 22.34| 28.93|
|[OC-Immunity](https://github.com/Foruck/OC-Immunity)|AAAI2022|25.44| 23.03| 26.16| 27.24| 24.32| 28.11|
|[ConsNet-F](https://github.com/yeliudev/ConsNet)|ACMMM2020|25.94|19.35|27.91|30.34|23.4|32.41|
|[SABRA-Res152](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 26.09 |16.29| 29.02| 31.08| 23.44| 33.37|
|[QAHOI-Res50](https://github.com/cjw2021/QAHOI)|arXiv2021|26.18 |18.06| 28.61|-|-|-|
|[Zou et al.](https://github.com/bbepoch/HoiTransformer)|CVPR2021|26.61 |19.15| 28.84| 29.13| 20.98| 31.57|
|[SKGHOI](https://arxiv.org/pdf/2303.04253.pdf)|arXiv2023|26.95 |21.28 |28.56| -|-|-|
|[RGBM](https://arxiv.org/pdf/2202.11998.pdf)|arXiv2022|27.39| 21.34 |29.20 |30.87 |24.20 |32.87|
|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|28.03 |22.73| 29.61| 29.98| 24.13| 31.73|
|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|28.83| 20.29| 31.31| 31.05| 21.41| 33.93|
|[AS-Net](https://github.com/yoyomimi/AS-Net)|CVPR2021|28.87 |24.25 |30.25 |31.74 |27.07|33.14|
|[QPIC-Res50](https://github.com/hitachi-rd-cv/qpic)|CVPR2021| 29.07 |21.85 |31.23 |31.68 |24.14 |33.93|
|[GGNet](https://github.com/SherlockHolmes221/GGNet)|CVPR2021|29.17 |22.13 |30.84 |33.50| 26.67 |34.89|
|[QPIC-CPC](https://arxiv.org/pdf/2204.04836.pdf)|CVPR2022|29.63 |23.14 |31.57|-|-|-|
|[QPIC-Res101](https://github.com/hitachi-rd-cv/qpic)|CVPR2021|29.90 |23.92 |31.69 |32.38 |26.06 |34.27|
|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021| 29.26 | 24.61 | 30.65 | 32.87 | 27.89 | 34.35 |
|[MHOI](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)|TCSVT2022|29.67 |24.37 |31.25 |31.87 |27.28 |33.24|
|[PhraseHOI](https://arxiv.org/pdf/2112.07383.pdf)|AAAI2022|30.03 |23.48 |31.99 |33.74 |27.35 |35.64|
|[CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10242152) | TNNLS 2023|30.48|25.48|32.37|-|-|-|
|[SQAB](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)|Displays2023|30.82 |24.92| 32.58| 33.58| 27.19| 35.49|
|[MSTR](https://arxiv.org/pdf/2203.14709.pdf)|CVPR2022|31.17| 25.31| 32.92| 34.02| 28.83| 35.57|
|[SSRT](https://arxiv.org/pdf/2204.00746.pdf)|CVPR2022|31.34 |24.31 |33.32|-|-|-|
|[OCN](https://github.com/JacobYuan7/OCN-HOI-Benchmark)|AAAI2022|31.43| 25.80| 33.11| -|-|-|
|[SCG+ODM](https://arxiv.org/pdf/2207.02400.pdf)|ECCV2022|31.65 |24.95| 33.65|-|-|-|
|[DT](https://arxiv.org/pdf/2204.09290.pdf)|CVPR2022|31.75| 27.45| 33.03| 34.50| 30.13| 35.81|
|[ParSe (COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|31.79| 26.36 |33.41|-|-|-|
|[CATN (w/ Bert)](https://arxiv.org/pdf/2204.04911.pdf)|CVPR2022|31.86| 25.15| 33.84| 34.44| 27.69| 36.45|
|[SQA](https://github.com/nmbzdwss/SQA)|ICASSP2023|31.99 |29.88| 32.62| 35.12| 32.74| 35.84|
|[CDN](https://github.com/YueLiao/CDN)|NeurIPS2021|32.07| 27.19| 33.53| 34.79| 29.48| 36.38|
|[STIP](https://github.com/zyong812/STIP)|CVPR2022|32.22| 28.15| 33.43| 35.29| 31.43| 36.45|
|[DEFR](https://arxiv.org/pdf/2112.06392.pdf)|arXiv2021| 32.35 |33.45| 32.02|-|-|-|
|[PQNet-L](https://dl.acm.org/doi/pdf/10.1145/3551626.3564944)|mmasia2022|32.45 |27.80 |33.84 |35.28 |30.72 |36.64|
|[CDN-s+HQM](https://arxiv.org/pdf/2207.05293.pdf)|ECCV2022|32.47| 28.15| 33.76|-|-|-|
|[UPT](https://github.com/fredzzhang/upt)|CVPR2022|32.62| 28.62| 33.81| 36.08| 31.41| 37.47|
|[OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR2023|32.68 |28.42| 33.75|-|-|-|
|[Iwin](https://arxiv.org/pdf/2203.10537.pdf)|ECCV2022|32.79 |27.84| 35.40| 35.84| 28.74| 36.09|
|[RLIP-ParSe (VG+COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|32.84|26.85 |34.63|-|-|-|
|[PR-Net](https://arxiv.org/pdf/2301.03510.pdf)|arXiv2023|32.86 |28.03| 34.30|-|-|-|
|[MUREN](http://cvlab.postech.ac.kr/research/MUREN/)|CVPR2023|32.87 |28.67| 34.12| 35.52| 30.88| 36.91|
|[SDT](https://arxiv.org/pdf/2207.01869.pdf)|arXiv2022|32.97| 28.49| 34.31| 36.32| 31.90| 37.64|
|[HODN](https://arxiv.org/pdf/2308.10158.pdf)|TMM2023| 33.14 |28.54| 34.52| 35.86| 31.18| 37.26|
|[SG2HOI](https://arxiv.org/pdf/2311.01755.pdf)|arxXiv2023|33.14 |29.27| 35.72| 35.73| 32.01| 36.43|
|[PDN](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)|PR2023|33.18 |27.95| 34.75| 35.86| 30.57| 37.43|
|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|33.28 |29.19| 34.50|-|-|-|
|[IF](https://github.com/Foruck/Interactiveness-Field)|CVPR2022|33.51 |30.30 |34.46 |36.28 |33.16 |37.21|
|[ICDT](https://github.com/bingnanG/ICDT)|ICANN2023|34.01 |27.60 |35.92 |36.29 |29.88 |38.21|
|[PSN](https://arxiv.org/pdf/2307.10499.pdf)|arXiv2023|34.02 |29.44| 35.39|-|-|-|
|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf)|arXiv2024|34.20 |32.26| 36.10| 37.85| 35.89| 38.78|
|[VIL+](https://arxiv.org/pdf/2308.02606.pdf)|ACMMM2023|34.21| 30.58| 35.30| 37.67| 34.88| 38.50|
|[Multi-Step](https://dl.acm.org/doi/10.1145/3581783.3612581)|ACMMM2023|34.42 |30.03| 35.73| 37.71| 33.74| 38.89|
|[OBPA-Net](https://github.com/zhuang1iu/OBPA-NET)|PRCV2023|34.63 |32.83| 35.16| 36.78| 35.38| 38.04|
|[MLKD](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)|WACV2024|34.69 |31.12| 35.74|-|-|-|
|[HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|34.69 |31.12| 35.74| 37.61| 34.47| 38.54|
|[PViC w/ detr](https://github.com/fredzzhang/pvic)|ICCV2023| 34.69 |32.14| 35.45| 38.14| 35.38| 38.97|
|[GEN-VLKT+SCA](https://arxiv.org/pdf/2312.01713.pdf)|arXiv2023|34.79 |31.80| 35.68|-|-|-|
|[HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|34.84 |34.52| 34.94|-|-|-|
|[SBM](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)|PRCV2023|34.92 |31.67| 35.85| 38.79| 35.43| 39.60|
|[ (w/ CLIP)](https://github.com/YueLiao/)|CVPR2022|34.95 |31.18| 36.08| 38.22| 34.36| 39.37|
|[SOV-STG (res101)](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|35.01 |30.63 |36.32 |37.60 |32.77 |39.05|
|[GeoHOI](https://github.com/zhumanli/GeoHOI)|arXiv2024| 35.05 |33.01| 35.71| 37.12| 34.79| 37.97|
|[PartMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|35.15 |33.71| 35.58| 37.56| 35.87| 38.06|
|[GFIN](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)|NN2023|35.28 |31.91| 36.29| 38.80| 35.48| 39.79|
|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|35.33 | 33.95| 35.74| 37.19| 35.27| 37.77|
|[LOGICHOI](https://github.com/weijianan1/LogicHOI)|NeurIPS2023|35.47 |32.03| 36.22| 38.21| 35.29| 39.03|
|[QAHOI-Swin-Large-ImageNet-22K](https://github.com/cjw2021/QAHOI)|arXiv2021|35.78 |29.80 |37.56 |37.59 |31.66 |39.36|
|[DPADN](https://github.com/PRIS-CV/DPADN)|AAAI2024|35.91 |35.82| 35.94| 38.99| 39.61| 38.80|
|[-L + CQL](https://arxiv.org/pdf/2303.14005.pdf)|CVPR2023|36.03| 33.16| 36.89| 38.82| 35.51| 39.81|
|[HOICLIP+DP-HOI](https://github.com/xingaoli/DP-HOI)|CVPR2024|36.56 |34.36| 37.22|-|-|-|
|[AGER](https://github.com/six6607/AGER)|ICCV2023|36.75 |33.53| 37.71| 39.84| 35.58| 40.23|
|[FGAHOI](https://github.com/xiaomabufei/FGAHOI)|arXiv2023|37.18 |30.71| 39.11| 38.93| 31.93| 41.02|
|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|37.22 |35.45| 37.75| 40.61| 38.82| 41.15|
|[RmLR](https://arxiv.org/pdf/2307.13529.pdf)|ICCV2023|37.41 |28.81| 39.97| 38.69| 31.27| 40.91|
|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)|arXiv2023|37.54 |37.01| 37.78| 39.98| 39.01| 40.32|
|[ADA-CM](https://github.com/ltttpku/ADA-CM)|ICCV2023|38.40 |37.52| 38.66|-|-|-|
|[UniVRD w/ extra data+VLM](https://arxiv.org/pdf/2303.08998.pdf)|arXiv2023|38.61| 33.39| 40.16|-|-|-|
|[SCTC](https://arxiv.org/pdf/2401.05676.pdf)|AAAI2024|39.12 |36.09| 39.87|-|-|-|
|[BCOM](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)|CVPR2024|39.34 |39.90| 39.17| 42.24| 42.86| 42.05|
|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|40.95 |40.27 |41.32| 43.26| 43.12| 43.25|
|[DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf)|arXiv2023|41.50| 39.96| 41.96| 43.62| 41.41| 44.28|
|[DiffusionHOI](https://arxiv.org/pdf/2410.20155)|NeurIPS2024|42.54 |42.95 |42.35 |44.91 |45.18 |44.83|
|[SOV-STG (swin-l)](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|43.35| 42.25| 43.69|45.53|43.62| 46.11|
|[PViC w/ h-detr (swin-l)](https://github.com/fredzzhang/pvic)|ICCV2023|44.32| 44.61| 44.24| 47.81| 48.38| 47.64|
|[MP-HOI](https://mp-hoi.github.io/)|CVPR2024|44.53|44.48|44.55|-|-|-|
|[SICHOI](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)|CVPR2024|45.04 |45.61 |44.88 |48.16 |48.37 |48.09|
|[RLIPv2-ParSeDA w/ extra data](https://github.com/JacobYuan7/RLIPv2)|ICCV2023|45.09| 43.23|45.64|-|-|-|
|[CycleHOI](https://arxiv.org/pdf/2407.11433)|arXiv2024|45.71| 46.14| 45.52| 49.23| 49.87| 48.96|
|[Pose-Aware](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)|CVPR2024|46.01 |46.74 |45.80|**49.50** |**50.59** |**49.18**|
|[PViC+](https://tau-vailab.github.io/Dynamic-Scene-Understanding/)|arXiv2025|**46.49**| **47.43**| **46.21**|-|-|-|#### 3) Ground Truth human-object pair boxes (only evaluating HOI recognition)
|Method| Pub|Full(def) | Rare(def) | None-Rare(def)|
|:---:|:---:|:---:|:---:|:---:|
|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 33.38| 21.43 |36.95|
|[Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 |34.26|22.90 |37.65|
|[Analogy](https://www.di.ens.fr/willow/research/analogy/paper.pdf)| ICCV2019 |34.35 | 27.57 |36.38|
|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|43.32 |33.84| 46.15|
|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|43.98|40.27|45.09|
|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|44.27| 35.52| 46.89|
|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|45.25|36.27 |47.94|
|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|46.45 |35.10 |49.84|
|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021|51.53| 41.01| 54.67|
|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|52.99 |34.91| 58.40|
|[ConsNet](https://github.com/yeliudev/ConsNet)|ACMMM2020|53.04|38.79|57.3|
|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|**62.09** |**59.26** |**62.93**|#### 4) [Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/TIN-PAMI) detection (interactive or not + pair box detection):
|Method| Pub | HICO-DET | V-COCO |
|:---:|:---:|:---:|:---:|
|[TIN++](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/TIN-PAMI)|TPAMI2022| 14.35| 29.36|
|[PPDM](https://github.com/YueLiao/PPDM)|CVPR2020|27.34 |-|
|[QPIC](https://github.com/hitachi-rd-cv/qpic)| CVPR2021| 32.96 |38.33|
|[CDN](https://github.com/YueLiao/CDN)| NeurIPS2021| 33.55 |40.13|
|[PartMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|**38.74** |**43.61**|#### 5) Enhanced with HAKE:
|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 14.84 |10.45 |16.15| 16.26 |11.33| 17.73|
|[iCAN + HAKE-HICO-DET](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 19.61 (**+4.77**) |17.29 |20.30| 22.10| 20.46| 22.59|
|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 17.03 | 13.42| 18.11| 19.17| 15.51|20.26|
|[Interactiveness + HAKE-HICO-DET](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 22.12 (**+5.09**)|20.19|22.69|24.06|22.19|24.62|
|[Interactiveness + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 22.66 (**+5.63**)|21.17|23.09|24.53|23.00|24.99|#### 6) Zero-Shot HOI detection:
##### Unseen action-object combination scenario (UC)
| Method | Pub | Detector | Unseen(def)|Seen(def) | Full(def) |
|:---:|:---:|:---:|:---:|:---:|:---:|
| [Shen et al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf) | WACV2018 | COCO | 5.62 | - | 6.26 |
| [Functional](https://arxiv.org/pdf/1904.03181.pdf) | AAAI2020 | HICO-DET | 11.31 ± 1.03 | 12.74 ± 0.34 | 12.45 ± 0.16 |
| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO | 16.99 ± 1.67 | 20.51 ± 0.62 | 19.81 ± 0.32 |
| [CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10242152) | TNNLS 2023| - | 18.06 | 23.34 | 20.72 |
| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-|23.01±1.54|30.39±0.40|28.91±0.27|
| [HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|-|25.53|34.85|32.99|
| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|27.43|**35.76**|**34.56**|
| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73)|NeurIPS2023|-|27.71|33.25|32.11|
| [HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-|**30.26**| 34.23|33.44|
||
| [VCL](https://github.com/zhihou7/VCL) (NF-UC)| ECCV2020 | HICO-DET | 16.22 | 18.52 | 18.06 |
| [ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL) ((NF-UC))|CVPR2021| HICO-DET | 18.25|18.78| 18.67|
| [FCL](https://github.com/zhihou7/FCL) (NF-UC)| CVPR2021 | HICO-DET | 18.66 | 19.55 | 19.37 |
| [RLIP-ParSe](https://github.com/JacobYuan7/RLIP) (NF-UC)|NeurIPS2022|COCO, VG|20.27| 27.67| 26.19|
| [SCL](https://arxiv.org/pdf/2203.14272) | arxiv | HICO-DET | 21.73 | 25.00 | 24.34 |
| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)(NF-UC)| CVPR2023 | HICO-DET |23.25 |28.04 |27.08|
| [GEN-VLKT*](https://arxiv.org/pdf/2203.13954.pdf) (NF-UC)| CVPR2022 | HICO-DET | 25.05 | 23.38 | 23.71 |
| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385) (NF-UC)|AAAI2023|HICO-DET|26.77|26.66|26.69|
| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) (NF-UC)| CVPR2023|HICO-DET|26.39 |28.10| 27.75|
| [LOGICHOI](https://github.com/weijianan1/LogicHOI) (NF-UC)|NeurIPS2023|-|26.84 |27.86 |27.95|
| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) (NF-UC)|AAAI2024|-| 27.35 |22.09| 23.14|
| [UniHOI](https://github.com/Caoyichao/UniHOI) (NF-UC)|NeurIPS2023|-|28.45 | 32.63 | 31.79 |
| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) (NF-UC)| arXiv2024|-|28.89|28.31|27.77|
| [DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf) (NF-UC)|arXiv2023| HICO-DET + syn data| 29.45 | 31.68 |31.24|
| [HCVC](https://arxiv.org/pdf/2311.16475.pdf) (NF-UC)|arXiv2023|-|28.44 |31.35| 30.77|
| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) (NF-UC)|NeurIPS2023|-|31.44|28.26|28.90|
| [HOIGen](https://arxiv.org/pdf/2408.05974) (NF-UC)|ACMMM2024|-|**33.98**| **32.86**|**33.08** |
||
| [VCL](https://github.com/zhihou7/VCL) (RF-UC)| ECCV2020 | HICO-DET | 10.06 | 24.28 | 21.43 |
| [ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL) ((RF-UC))|CVPR2021| HICO-DET |9.18|24.67|21.57|
| [FCL](https://github.com/zhihou7/FCL) (RF-UC)| CVPR2021 | HICO-DET | 13.16 | 24.23 | 22.01 |
| [SCL](https://arxiv.org/pdf/2203.14272) (RF-UC) | arxiv | HICO-DET | 19.07 | 30.39 | 28.08 |
| [RLIP-ParSe](https://github.com/JacobYuan7/RLIP) (RF-UC)|NeurIPS2022|COCO, VG|19.19 |33.35| 30.52|
| [GEN-VLKT*](https://arxiv.org/pdf/2203.13954.pdf) (RF-UC)| CVPR2022 | HICO-DET | 21.36| 32.91 | 30.56 |
| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)(RF-UC)| CVPR2023 | HICO-DET |21.46 |33.86| 31.38|
| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) (RF-UC)|AAAI2024|-|23.32| 30.09| 28.53|
| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) (RF-UC)| CVPR2023|HICO-DET| 25.53 |34.85| 32.99|
| [LOGICHOI](https://github.com/weijianan1/LogicHOI) (RF-UC)|NeurIPS2023|-|25.97 |34.93| 33.17|
| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) (RF-UC)| arXiv2024|-|26.33|35.79|34.10|
| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|28.47|35.48|34.08|
| [UniHOI](https://github.com/Caoyichao/UniHOI) (RF-UC)|NeurIPS2023|-|28.68 | 33.16 | 32.27 |
| [DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf) (RF-UC)|arXiv2023| HICO-DET + syn data| 28.76 | 38.01 |36.16|
| [HCVC](https://arxiv.org/pdf/2311.16475.pdf) (RF-UC)|arXiv2023|-|30.95 |37.16 |35.87|
| [HOIGen](https://arxiv.org/pdf/2408.05974) (RF-UC)|ACMMM2024|-|31.01| 34.57|33.86 |
| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2) (RF-UC)| ICCV2023| VG, COCO, O365 | **31.23** | **45.01** | **42.26**|- \* indicates large Visual-Language model pretraining, \eg, CLIP.
- For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.##### Zero-shot* HOI detection without fine-tuning (NF)
| Method | Pub | Backbone | Dataset | Detector | Full | Rare | Non-Rare |
| ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |
| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 13.92 | 11.20 | 14.73 |
| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 15.40 | 15.08 | 15.50 |
| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | **23.29** | **27.97** | **21.90** |
- \* indicates a formulation that assesses the generalization of a pre-training model to unseen distributions, proposed in [RLIP](https://arxiv.org/pdf/2209.01814.pdf). *zero-shot* follows the terminology from CLIP.##### Unseen object scenario (UO)
| Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def)|
|:---:|:---:|:---:|:---:|:---:|:---:|
| [Functional](https://arxiv.org/pdf/1904.03181.pdf) | AAAI2020 | HICO-DET | 13.84 | 14.36 | 11.22 |
| [FCL](https://github.com/zhihou7/FCL) | CVPR2021 | HICO-DET | 19.87 | 20.74 | 15.54 |
| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO | 20.71 | 20.99 | 19.27 |
| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) |AAAI2024|-|27.73|27.87|27.05|
| [ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|- |15.11| 21.54|20.47|
| [GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022| - | 10.51|28.92|25.63|
| [LOGICHOI](https://github.com/weijianan1/LogicHOI) |NeurIPS2023|-|15.67| 30.42| 28.23|
| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) | CVPR2023|-|16.20|30.99|28.53|
| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|16.50|31.70|28.84|
| [HCVC](https://arxiv.org/pdf/2311.16475.pdf)| arXiv2023|-|16.78 |33.31 |30.53|
| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|31.79|32.73|32.58|
| [HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-| **36.35**|**32.90**|**33.48** |##### Unseen action scenario (UA)
| Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def)|
|:---:|:---:|:---:|:---:|:---:|:---:|
| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO | 19.04 | 20.02 | 14.12 |
| [CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10242152) | TNNLS 2023|- | 19.68 | 21.45 | 15.17|
| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) |AAAI2024|-|26.43|28.13|17.92|
| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-|**29.22**| **30.46**| **23.04**|##### Unseen action scenario (UV), results from EoID
| Method | Pub | Detector | Unseen(def)|Seen(def) | Full(def) |
|:---:|:---:|:---:|:---:|:---:|:---:|
|[HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-| 20.27|34.31|32.34 |
|[GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022| - | 20.96 |30.23 |28.74 |
|[EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-| 22.71 |30.73 |29.61|
|[HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) | CVPR2023|-|24.30|32.19|31.09|
|[LOGICHOI](https://github.com/weijianan1/LogicHOI) |NeurIPS2023|-|24.57 |31.88 |30.77|
|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)| arXiv2023|-|24.69 |36.11| 34.51|
|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|25.20|32.95|31.85|
|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|26.02|31.14|30.42|
|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|-|**26.05**|**36.78**|**34.68**|##### Another setting
| Method | Pub | Unseen| Seen | Full |
|:---:|:---:|:---:|:---:|:---:|
|[Shen et. al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf)|WACV2018| 5.62| - |6.26|
|[Functional](https://arxiv.org/pdf/1904.03181.pdf)|AAAI2020 |10.93 |12.60 |12.26|
|[VCL](https://github.com/zhihou7/VCL)|ECCV2020 |10.06| 24.28| 21.43|
|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021 |9.18 |24.67 |21.57|
|[FCL](https://github.com/zhihou7/FCL)| CVPR2021 |13.16 |24.23 |22.01|
|[THID (w/ CLIP)](https://github.com/scwangdyd/promting_hoi)|CVPR2022 |15.53 |24.32 |22.96|
|[EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|**22.04**|31.39|29.52|
|[GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022|21.36|**32.91**|**30.56**|#### 7) Few-Shot HOI detection:
##### 1% HICO-Det Data used in fine-tuning
| Method | Pub | Backbone | Dataset | Detector | Data |Full | Rare | Non-Rare |
| ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |
| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 1% | 18.30 | 16.22 | 18.92 |
| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 1% | 18.46 | 17.47 | 18.76 |
| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | 1% | **32.22** | **31.89** | **32.32** |##### 10% HICO-Det Data used in fine-tuning
| Method | Pub | Backbone | Dataset | Detector | Data |Full | Rare | Non-Rare |
| ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |
| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 10% | 22.09 | 15.89 | 23.94 |
| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 10% | 22.59 | 20.16 | 23.32 |
| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | 10% | **37.46** | **34.75** | **38.27** |#### 8) Weakly-supervised HOI detection:
| Method | Pub | Backbone | Dataset | Detector |Full | Rare | Non-Rare |
| ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |
| [Explanation-HOI](https://github.com/baldassarreFe/ws-vrd)| ECCV2020 | ResNeXt101 | COCO | FRCNN | 10.63 |8.71 |11.20|
| [MX-HOI](https://openaccess.thecvf.com/content/WACV2021/papers/Kumaraswamy_Detecting_Human-Object_Interaction_With_Mixed_Supervision_WACV_2021_paper.pdf)| WACV2021 | ResNet-101 | COCO | FRCNN | 16.14 |12.06 |17.50|
| [PPR-FCN (from Weakly-HOI-CLIP)](https://arxiv.org/pdf/1708.01956.pdf)| ICCV2017 | ResNet-50, CLIP | COCO | FRCNN | 17.55 |15.69 | 18.41|
| [Align-Former](https://www.bmvc2021-virtualconference.com/assets/papers/0054.pdf)| BMVC2021 | ResNet-101 | - | - | 20.85 |18.23 |21.64|
| [Weakly-HOI-CLIP](https://arxiv.org/pdf/2303.01313.pdf) | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | 25.70 |**24.52**| 26.05|
| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR 2023|DETR|-|-|**25.82** |24.35 |**26.19**|### [Ambiguous-HOI](https://github.com/DirtyHarryLYL/DJ-RN)
#### Detector: COCO pre-trained
|Method| mAP |
|:---:|:---:|
|[iCAN](https://github.com/vt-vl-lab/iCAN)| 8.14 |
|[Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| 8.22 |
|[Analogy(reproduced)](https://github.com/jpeyre/analogy)| 9.72 |
|[DJ-RN](https://github.com/DirtyHarryLYL/DJ-RN)| 10.37|
|[OC-Immunity](https://github.com/Foruck/OC-Immunity)|**10.45**|### [SWiG-HOI](https://github.com/scwangdyd/large_vocabulary_hoi_detection)
| Method | Pub| Non-Rare| Unseen| Seen | Full |
|:---:|:---:|:---:|:---:|:---:|:---:|
|[JSR](https://prior.allenai.org/projects/gsr)| ECCV2020| 10.01| 6.10| 2.34| 6.08|
|[CHOID](https://github.com/scwangdyd/)|ICCV2021|10.93 |6.63 |2.64 |6.64|
|[QPIC](https://github.com/hitachi-rd-cv/qpic)| CVPR2021| 16.95| 10.84| 6.21| 11.12|
|[THID (w/ CLIP)](https://github.com/scwangdyd/promting_hoi)|CVPR2022 |**17.67**| **12.82**| **10.04**| **13.26**|### V-COCO: Scenario1
#### 1) Detector: COCO pre-trained or one-stage detector
|Method| Pub | AP(role) |
|:---:|:---:|:---:|
|[Gupta et al.](https://arxiv.org/pdf/1505.04474.pdf)|arXiv| 31.8|
|[InteractNet](https://arxiv.org/pdf/1704.07333.pdf)|CVPR2018|40.0|
|[Turbo](https://arxiv.org/pdf/1903.06355.pdf)|AAAI2019|42.0|
|[GPNN](https://arxiv.org/pdf/1808.07962.pdf)|ECCV2018|44.0|
|[UniVRD w/ extra data+VLM](https://arxiv.org/pdf/2303.08998.pdf)|arXiv2023|45.19|
|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 45.3|
|[Xu et. al](https://www-users.cs.umn.edu/~qzhao/publications/pdf/xu2019cvpr.pdf)| CVPR2019| 45.9|
|[Wang et. al.](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)| ICCV2019|47.3|
|[UniDet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)|ECCV2020|47.5|
|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 47.8|
|[Lin et. al](https://www.ijcai.org/Proceedings/2020/0154.pdf)|IJCAI2020|48.1|
|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|48.3|
|[Zhou et. al.](https://arxiv.org/pdf/2003.04262.pdf) |CVPR2020|48.9|
|[In-GraphNet](https://arxiv.org/pdf/2007.06925.pdf)|IJCAI-PRICAI 2020|48.9|
|[Interactiveness-optimized](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 | 49.0|
|[TIN-PAMI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)|TAPMI2021|49.1|
|[IP-Net](https://arxiv.org/pdf/2003.14023.pdf)|CVPR2020|51.0|
|[DRG](https://github.com/vt-vl-lab/DRG)|ECCV2020|51.0|
|[RGBM](https://arxiv.org/pdf/2202.11998.pdf)|arXiv2022|51.7|
|[VSGNet](https://arxiv.org/pdf/2003.05541.pdf)|CVPR2020|51.8|
|[PMN](https://github.com/birlrobotics/PMN)|arXiv|51.8|
|[PMFNet](https://arxiv.org/pdf/1909.08453.pdf)|ICCV2019|52.0|
|[Liu et.al.](https://arxiv.org/pdf/2105.03089.pdf)|arXiv|52.28|
|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|52.35|
|[PD-Net](https://github.com/MuchHair/PD-Net)|ECCV2020|52.6|
|[Wang et.al.](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)|ECCV2020|52.7|
|[PFNet](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)|CVM|52.8|
|[Zou et al.](https://github.com/bbepoch/HoiTransformer)|CVPR2021|52.9|
|[SIGN](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)|ICME2020|53.1|
|[ACP](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)|ECCV2020|52.98 (53.23)|
|[FCMNet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)|ECCV2020|53.1|
|[HRNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9552553)|TIP2021|53.1|
|[SGCN4HOI](https://arxiv.org/pdf/2207.05733.pdf)|IEEESMC2022|53.1|
|[ConsNet](https://arxiv.org/pdf/2008.06254.pdf)|ACMMM2020|53.2|
|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|53.3|
|[SG2HOI](https://arxiv.org/pdf/2108.08584.pdf)|ICCV2021|53.3|
|[OSGNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)|IEEE Access|53.43|
|[SABRA-Res50](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 53.57|
|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|53.70|
|[IPGN](https://ieeexplore.ieee.org/document/9489275)|TIP2021|53.79|
|[AS-Net](https://github.com/yoyomimi/AS-Net)|CVPR2021|53.9|
|[RR-Net](https://arxiv.org/pdf/2104.15015.pdf)|arXiv|54.2|
|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021|54.2|
|[HOKEM](https://arxiv.org/pdf/2306.14260.pdf)|arXiv2023|54.6|
|[SABRA-Res50FPN](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 54.69|
|[GGNet](https://github.com/SherlockHolmes221/GGNet)|CVPR2021|54.7|
|[MLCNet](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)| ICMR2020|55.2|
|[HOTR](https://github.com/kakaobrain/HOTR)|CVPR2021|55.2|
|[DIRV](https://arxiv.org/pdf/2010.01005.pdf)|AAAI2021|56.1|
|[UnionDet](https://arxiv.org/pdf/2312.12664.pdf)|arXiv2023|56.2|
|[SABRA-Res152](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 56.62|
|[PhraseHOI](https://arxiv.org/pdf/2112.07383.pdf)|AAAI2022|57.4|
|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|58.29|
|[QPIC-Res101](https://github.com/hitachi-rd-cv/qpic)|CVPR2021|58.3|
|[ADA-CM](https://github.com/ltttpku/ADA-CM)|ICCV2023|58.57|
|[QPIC-Res50](https://github.com/hitachi-rd-cv/qpic)|CVPR2021| 58.8|
|[ICDT](https://github.com/bingnanG/ICDT)|ICANN2023|59.4|
|[CATN (w/ fastText)](https://arxiv.org/pdf/2204.04911.pdf)|CVPR2022|60.1|
|[FGAHOI](https://github.com/xiaomabufei/FGAHOI)|arXiv2023|60.5|
|[Iwin](https://arxiv.org/pdf/2203.10537.pdf)|ECCV2022|60.85|
|[UPT-ResNet-101-DC5](https://github.com/fredzzhang/upt)|CVPR2022| 61.3|
|[CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10242152) | TNNLS 2023|61.43|
|[SBM](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)|PRCV2023|61.5|
|[SDT](https://arxiv.org/pdf/2207.01869.pdf)|arXiv2022|61.8|
|[OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR2023|61.9|
|[MSTR](https://arxiv.org/pdf/2203.14709.pdf)|CVPR2022|62.0|
|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|62.2|
|[Multi-Step](https://dl.acm.org/doi/10.1145/3581783.3612581)|ACMMM2023|62.4|
|[PViC w/ detr](https://github.com/fredzzhang/pvic)|ICCV2023|62.8|
|[PR-Net](https://arxiv.org/pdf/2301.03510.pdf)|arXiv2023|62.9|
|[IF](https://github.com/Foruck/Interactiveness-Field)|CVPR2022|63.0|
|[ParMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|63.0|
|[QPIC-CPC](https://arxiv.org/pdf/2204.04836.pdf)|CVPR2022|63.1|
|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|63.5|
|[HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|63.5|
|[GEN-VLKT (w/ CLIP)](https://github.com/YueLiao/gen-vlkt)|CVPR2022|63.58|
|[SG2HOI](https://arxiv.org/pdf/2311.01755.pdf)|arxXiv2023|63.6|
|[QPIC+HQM](https://arxiv.org/pdf/2207.05293.pdf)|ECCV2022|63.6|
|[SOV-STG](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|63.9|
|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf)|arXiv2024|63.9|
|[CDN](https://github.com/YueLiao/CDN)|NeurIPS2021|63.91|
|[PViC w/ h-detr (swin-l)](https://github.com/fredzzhang/pvic)|ICCV2023|64.1|
|[OBPA-Net](https://github.com/zhuang1iu/OBPA-NET)|PRCV2023|64.1|
|[RmLR](https://arxiv.org/pdf/2307.13529.pdf)|ICCV2023|64.17|
|[RLIP-ParSe (COCO+VG)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|64.2|
|[LOGICHOI](https://github.com/weijianan1/LogicHOI)|NeurIPS2023|64.4|
|[MHOI](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)|TCSVT2022|64.5|
|[GEN-VLKT+SCA](https://arxiv.org/pdf/2312.01713.pdf)|arXiv2023|64.5|
|[PDN](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)|PR2023|64.7|
|[ParSe (COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|64.8|
|[SSRT](https://arxiv.org/pdf/2204.00746.pdf)|CVPR2022|65.0|
|[SQAB](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)|Displays2023|65.0|
|[OCN](https://github.com/JacobYuan7/OCN-HOI-Benchmark)|AAAI2022|65.3|
|[SQA](https://github.com/nmbzdwss/SQA)|ICASSP2023|65.4|
|[AGER](https://github.com/six6607/AGER)|ICCV2023|65.68|
|[DiffHOI](https://arxiv.org/pdf/2305.12252.pdf)|arXiv2023|65.7|
|[BCOM](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)|CVPR2024|65.8|
|[PSN](https://arxiv.org/pdf/2307.10499.pdf)|arXiv2023|65.9|
|[DPADN](https://github.com/PRIS-CV/DPADN)|AAAI2024|62.62|
|[Pose-Aware](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)|CVPR2024|63.0|
|[CO-HOI](https://arxiv.org/pdf/2410.15657)|arXiv2024|65.44|
|[STIP](https://github.com/zyong812/STIP)|CVPR2022|66.0|
|[DT](https://arxiv.org/pdf/2204.09290.pdf)|CVPR2022|66.2|
|[MP-HOI](https://mp-hoi.github.io/)|CVPR2024|66.2|
|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|66.3|
|[GENs+DP-HOI](https://github.com/xingaoli/DP-HOI)|CVPR2024|66.6|
|[GEN-VLKT-L + CQL](https://arxiv.org/pdf/2303.14005.pdf)|CVPR2023|66.8|
|[CycleHOI](https://arxiv.org/pdf/2407.11433)|arXiv2024|66.8|
|[HODN](https://arxiv.org/pdf/2308.10158.pdf)|TMM2023| 67.0|
|[DiffusionHOI](https://arxiv.org/pdf/2410.20155)|NeurIPS2024|67.1|
|[VIL+DisTR](https://arxiv.org/pdf/2308.02606.pdf)|ACMMM2023|67.6|
|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|68.05|
|[SCTC](https://arxiv.org/pdf/2401.05676.pdf)|AAAI2024|68.2|
|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)|arXiv2023|68.4|
|[MUREN](http://cvlab.postech.ac.kr/research/MUREN/)|CVPR2023|68.8|
|[GeoHOI](https://github.com/zhumanli/GeoHOI)|arXiv2024| 69.4|
|[GFIN](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)|NN2023|70.1|
|[SICHOI](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)|CVPR2024|71.1|
|[RLIPv2-ParSeDA w/ extra data](https://github.com/JacobYuan7/RLIPv2)|ICCV2023|**72.1**|#### 2) Enhanced with HAKE:
|Method| Pub | AP(role) |
|:---:|:---:|:---:|
|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| CVPR2019 | 45.3|
|[iCAN + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) (transfer learning)| CVPR2020 | 49.2 (**+3.9**)|
|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 47.8|
|[Interactiveness + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) (transfer learning)| CVPR2020 | 51.0 (**+3.2**)|#### 3) Weakly-supervised HOI detection:
| Method | Pub | Backbone | Dataset | Detector | AP(role)-S1 |AP(role)-S2 |
| ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |
| [Weakly-HOI-CLIP](https://arxiv.org/pdf/2303.01313.pdf) | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | **44.74**|**49.97**|### [HOI-COCO](https://github.com/zhihou7/HOI-CL):
based on V-COCO| Method | Pub | Full | Seen | Unseen|
|:---:|:---:|:---:|:---:|:---:|
|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|23.53 |8.29| 35.36|
|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|23.40 |8.01 |35.34|### HICO
#### 1) Default
|Method| mAP |
|:---:|:---:|
[R\*CNN](https://arxiv.org/pdf/1505.01197.pdf) | 28.5 |
[Girdhar et.al.](https://arxiv.org/pdf/1711.01467.pdf) |34.6|
[Mallya et.al.](https://arxiv.org/pdf/1604.04808.pdf) |36.1|
[RAM++ LLM](https://github.com/xinyu1205/recognize-anything) | 37.6|
[Pairwise](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf) |39.9|
|[RelViT](https://arxiv.org/pdf/2204.11167.pdf)|40.12|
[DEFR-base](https://arxiv.org/pdf/2107.13083.pdf)|44.1|
[OpenTAP](https://vkhoi.github.io/TAP)|51.7|
[DEFR-CLIP](https://arxiv.org/pdf/2107.13083.pdf)|60.5|
|[HTS](https://ieeexplore.ieee.org/abstract/document/10222927)|60.5|
|[DEFR/16 CLIP](https://arxiv.org/pdf/2112.06392.pdf)|**65.6**|#### 2) Enhanced with HAKE:
|Method| mAP |
|:---:|:---:|
[Mallya et.al.](https://arxiv.org/pdf/1604.04808.pdf) |36.1|
[Mallya et.al.+HAKE-HICO](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action) |45.0 (**+8.9**)|
[Pairwise](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf) |39.9|
[Pairwise+HAKE-HICO](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action)|45.9 (**+6.0**)|
[Pairwise+HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action)|46.3 (**+6.4**)|