{"id":13444372,"url":"https://github.com/DirtyHarryLYL/HOI-Learning-List","last_synced_at":"2025-03-20T18:32:32.464Z","repository":{"id":39637176,"uuid":"246628403","full_name":"DirtyHarryLYL/HOI-Learning-List","owner":"DirtyHarryLYL","description":"A list of Human-Object Interaction Learning.","archived":false,"fork":false,"pushed_at":"2025-01-30T10:45:30.000Z","size":335,"stargazers_count":613,"open_issues_count":1,"forks_count":58,"subscribers_count":37,"default_branch":"master","last_synced_at":"2025-03-09T07:25:35.252Z","etag":null,"topics":["action-recognition","activity-recognition","behavior-analysis","human-object-interaction"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DirtyHarryLYL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-11T16:51:25.000Z","updated_at":"2025-03-06T03:00:19.000Z","dependencies_parsed_at":"2024-03-28T07:38:40.091Z","dependency_job_id":"8555344e-5b95-4955-8ca8-f51fd42d30d2","html_url":"https://github.com/DirtyHarryLYL/HOI-Learning-List","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DirtyHarryLYL%2FHOI-Learning-List","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DirtyHarryLYL%2FHOI-Learning-List/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DirtyHarryLYL%2FHOI-Learning-List/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DirtyHarryLYL%2FHOI-Learning-List/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DirtyHarryLYL","download_url":"https://codeload.github.com/DirtyHarryLYL/HOI-Learning-List/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244670575,"owners_count":20491015,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","activity-recognition","behavior-analysis","human-object-interaction"],"created_at":"2024-07-31T04:00:21.168Z","updated_at":"2025-03-20T18:32:32.456Z","avatar_url":"https://github.com/DirtyHarryLYL.png","language":null,"funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# HOI-Learning-List\nSome recent (2015-now) Human-Object Interaction Learning studies. If you find any errors or problems, please don't hesitate to let me know.\n\nA list of Transfomer-based vision works https://github.com/DirtyHarryLYL/Transformer-in-Vision.\n\n## Image Dataset/Benchmark\n\n- BRIGHT (arXiv 2025.1), re-balanced dataset based on HICO-DET [[Paper]](https://arxiv.org/pdf/2501.16724)\n\n- CAL (arXiv 2024.1), a contact-driven affordance learning dataset [[Paper]](https://arxiv.org/pdf/2410.11363), [[Project]](https://github.com/lhc1224/VCR-Net)\n\n- SynHOI (arXiv 2023.5), synthetic HOI data [[Paper]](https://arxiv.org/pdf/2305.12252.pdf)\n\n- HICO-DET-SG, V-COCO-SG (new splits of HICO-DET and V-COCO) [[Paper]](https://arxiv.org/pdf/2305.09948.pdf), [[Code]](https://github.com/FujitsuResearch/hoi_sg) \n\n- Bongard-HOI [[Paper]](https://arxiv.org/pdf/2205.13803.pdf), [[Code]](https://github.com/NVlabs/Bongard-HOI)\n\n- SWiG-HOI, [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Discovering_Human_Interactions_With_Large-Vocabulary_Objects_via_Query_and_Multi-Scale_ICCV_2021_paper.pdf), [[Website]](https://github.com/scwangdyd/large_vocabulary_hoi_detection)\n\n- New Metric: mPD, [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)\n\n- DIABOLO [[Paper]](https://arxiv.org/pdf/2201.02396.pdf), [[Website]](https://kalisteo.cea.fr/)\n\n- HOI-COCO (CVPR2021) [[Website]](https://github.com/zhihou7/HOI-CL)\n\n- PaStaNet-HOI (TPAMI2021) [[Benchmark]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/master/PaStaNet-HOI_Benckmark)\n\n- HAKE (CVPR2020) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s) [[Website]](http://hake-mvig.cn/home/) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[HAKE-Action-Torch]](https://github.com/DirtyHarryLYL/HAKE-Action-Torch) [[HAKE-Action-TF]](https://github.com/DirtyHarryLYL/HAKE-Action)\n\n- Ambiguous-HOI (CVPR2020) [[Website]](https://github.com/DirtyHarryLYL/DJ-RN) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf)\n\n- HICO-DET (WACV2018) [[Website]](http://www-personal.umich.edu/~ywchao/hico/) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf)\n\n- HCVRD (AAAI2018) [[Website]](https://bitbucket.org/jingruixiaozhuang/hcvrd-a-benchmark-for-large-scale-human-centered-visual/src/master/) [[Paper]](https://pdfs.semanticscholar.org/c94f/1aaf62f87d97dd579cb6451cb9149fb4967d.pdf)\n\n- V-COCO (May 2015) [[Website]](https://github.com/s-gupta/v-coco) [[Paper]](https://arxiv.org/pdf/1505.04474.pdf)\n\n- HICO (ICCV2015) [[Website]](http://www-personal.umich.edu/~ywchao/hico/) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_iccv2015.pdf)\n\n- OpenImage [[Website]](https://visualgenome.org/) [[Paper]](https://arxiv.org/abs/1602.07332)\n\n- PIC [[Website]](http://picdataset.com/challenge/index/)\n\nMore...\n\n### Video HOI Datasets\n\n- MOMA [[Paper]](https://proceedings.neurips.cc/paper/2021/file/95688ba636a4720a85b3634acfec8cdd-Paper.pdf), [[Project]](https://github.com/StanfordVL/moma)\n\n- MPHOI-72 RGB-D, [[Paper]](https://arxiv.org/pdf/2207.09425.pdf), [[Project]](https://github.com/tanqiu98/2G-GCN)\n\n- VidHOI [[Paper]](https://arxiv.org/pdf/2105.11731.pdf), [[Project]](https://github.com/coldmanck/VidHOI)\n\n- AVA [[Website]](http://research.google.com/ava/), HOIs (human-object, human-human), and pose (body motion) actions\n\n- Action Genome [[Website]](https://www.actiongenome.org/), action verbs and spatial relationships\n\n- CAD120 [[Paper]](https://arxiv.org/pdf/1210.1207.pdf), [[Website]](http://pr.cs.cornell.edu/humanactivities/)\n\n- Sth-else [[Paper]](https://arxiv.org/abs/1912.09930), [[Website]](https://github.com/joaanna/something_else)\n\n### 3D HOI Datasets\n\n- ParaHome, [[Paper]](https://arxiv.org/pdf/2401.10232.pdf), [[Project]](https://jlogkim.github.io/parahome/)\n\n- BallPlay, [[Paper]](https://arxiv.org/pdf/2312.04393.pdf), [[Project]](https://wyhuai.github.io/physhoi-page/)\n\n- COINS, [[Paper]](https://drive.google.com/file/d/1LpJe1RiDsB49tQwUFWMUorjBTQfvXzvW/view?usp=sharing), [[Project]](https://zkf1997.github.io/COINS/index.html)\n\n- COUCH, [[Paper]](https://arxiv.org/pdf/2205.00541.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/couch/)\n\n- HULC, [[Paper]](https://vcai.mpi-inf.mpg.de/projects/HULC/data/paper_light.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/HULC/)\n\n- CHAIRS, [[Paper]](https://arxiv.org/pdf/2212.10621.pdf), [[Project]](https://jnnan.github.io/project/chairs/)\n\n- GRAB, [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490562.pdf), [[Project]](https://grab.is.tue.mpg.de/)\n\n- HUMANISE, [[Paper]](https://silvester.wang/HUMANISE/paper.pdf), [[Project]](https://silvester.wang/HUMANISE/)\n\n- BEHAVE, [[Paper]](https://arxiv.org/pdf/2204.06950.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/behave/)\n\n- GraviCap, [[Paper]](https://arxiv.org/pdf/2108.08844.pdf), [[Project]](https://4dqv.mpi-inf.mpg.de/GraviCap/)\n\n## Survey\n\n- Human object interaction detection: Design and survey (Image and Vision Computing 2022), [[Paper]](https://www.sciencedirect.com/science/article/abs/pii/S0262885622002463)\n\n## Method\n\n### HOI Image Generation\n\n- AnchorCrafter (arXiv 2024) [[Paper]](https://arxiv.org/pdf/2411.17383), [[Project]](https://cangcz.github.io/Anchor-Crafter/)\n\n- ReCorD (ACM MM 2024) [[Paper]](https://arxiv.org/pdf/2407.17911), [[Project]](https://alberthkyhky.github.io/ReCorD/)\n\n- InteractDiffusion (CVPR 2024) [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/html/Hoe_InteractDiffusion_Interaction_Control_in_Text-to-Image_Diffusion_Models_CVPR_2024_paper.html), [[Project]](https://jiuntian.github.io/interactdiffusion/)\n\n- Person in Place (CVPR 2024) [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Person_in_Place_Generating_Associative_Skeleton-Guidance_Maps_for_Human-Object_Interaction_CVPR_2024_paper.pdf), [[Code]](https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE)\n\n- VirtualModel (arXiv 2024.5) [[Paper]](https://arxiv.org/pdf/2405.09985) [[Code]](https://aigcdesigngroup.github.io/replace-anything/)\n\n- Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [[Paper]](https://arxiv.org/pdf/2104.00356.pdf)\n\n- Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [[Paper]](https://arxiv.org/pdf/1909.05379.pdf)\n\n### HOI Recognition: Image-based, to recognize all the HOIs in one image.\n\n- RAM++ (arXiv'23) [[Paper]](https://arxiv.org/pdf/2310.15200.pdf), [[Code]](https://github.com/xinyu1205/recognize-anything)\n\n- OpenTAP (ECCV'22) [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-031-19806-9_12.pdf), [[Code]](https://vkhoi.github.io/TAP)\n\n- RelViT (ICLR'22) [[Paper]](https://arxiv.org/pdf/2204.11167.pdf), [[Code]](https://github.com/NVlabs/RelViT)\n\n- DEFR (arXiv 2021.12) [[Paper]](https://arxiv.org/pdf/2112.06392.pdf)\n\n- Interaction Compass (ICCV 2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Huynh_Interaction_Compass_Multi-Label_Zero-Shot_Learning_of_Human-Object_Interactions_via_Spatial_ICCV_2021_paper.pdf)\n\n- DEFR-CLIP (arXiv 2021.07) [[Paper]](https://arxiv.org/pdf/2107.13083.pdf)\n\n- PaStaNet: Toward Human Activity Knowledge Engine \n(CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action) [[Data]](https://github.com/DirtyHarryLYL/HAKE) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s)\n\n- Pairwise (ECCV2018) [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf)\n\n- Attentional Pooling for Action Recognition (NIPS2017) [[Code]](https://github.com/rohitgirdhar/AttentionalPoolingAction) [[Paper]](https://arxiv.org/pdf/1711.01467.pdf)\n\n- Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering (ECCV2016) [[Code]](https://uofi.box.com/s/yflrqbser1r5m3iez1satkprawmsouag) [[Paper]](https://arxiv.org/pdf/1604.04808.pdf)\n\n- Contextual Action Recognition with R\\*CNN (ICCV2015) [[Code]](https://github.com/gkioxari/RstarCNN) [[Paper]](https://arxiv.org/pdf/1505.01197.pdf)\n\n- HOCNN (ICCV2015) [[Code]](https://github.com/ywchao/hico_benchmark) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_iccv2015.pdf)\n\n- SGAP-Net (AAAI2020) [[Paper]](https://aaai.org/Papers/AAAI/2020GB/AAAI-JiZ.4799.pdf)\n\nMore...\n\n#### Unseen or zero-shot learning (image-level recognition).\n\n- HTS (ICIP 2023) [[Paper]](https://ieeexplore.ieee.org/abstract/document/10222927)\n\n- ICompass (ICCV2021) [[Paper]](https://hbdat.github.io/pubs/iccv21_relation_direction_final.pdf), [[Code]](https://github.com/hbdat/iccv21_relational_direction)\n\n- Compositional Learning for Human Object Interaction (ECCV2018) [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Keizo_Kato_Compositional_Learning_of_ECCV_2018_paper.pdf)\n\n- Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [[Paper]](https://arxiv.org/pdf/2009.01039.pdf)\n\nMore...\n\n#### HOI for Robotics.\n\n- HOI4ABOT: Human-Object Interaction Anticipation for Assistive roBOTs (CORL 2023) [[Paper]](https://openreview.net/forum?id=rYZBdBytxBx), [[Project]](https://evm7.github.io/HOI4ABOT_page/)\n\n- Human–object interaction prediction in videos through gaze following (CVIU 2023) [[Paper]](https://arxiv.org/abs/2306.03597), [[Project]](https://evm7.github.io/HOIGaze-page/)\n\n### HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.\n\n- DSU (arXiv 2025), [[Paper]](https://arxiv.org/pdf/2501.11653), [[Project]](https://tau-vailab.github.io/Dynamic-Scene-Understanding/)\n\n- EZ-HOI (NeurIPS 2024), [[Paper]](https://arxiv.org/pdf/2410.23904), [[Code]](https://github.com/ChelsieLei/EZ-HOI)\n\n- HOIGen (ACM MM 2024), [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3664647.3680927), [[Code]](https://github.com/soberguo/HOIGen)\n\n- DiffusionHOI (NeurIPS 2024), [[Paper]](https://arxiv.org/pdf/2410.20155), [[Code]](https://github.com/0liliulei/DiffusionHOI)\n\n- CEFA (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2407.21438), [[Code]](https://github.com/LijunZhang01/CEFA)\n\n- CO-HOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2410.15657)\n\n- HOIGen (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2408.05974), [[Code]](https://github.com/soberguo/HOIGen)\n\n- CMMP (ECCV 2024), [[Paper]](https://arxiv.org/pdf/2408.02484), [[Code]](https://github.com/ltttpku/CMMP)\n\n- CycleHOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2407.11433)\n\n- GeoHOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2406.18691), [[Code]](https://github.com/zhumanli/GeoHOI)\n\n- SICHOI (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)\n\n- Pose-Aware (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)\n\n- BCOM (CVPR 2024), [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)\n\n- MP-HOI (CVPR 2024), [[Paper]](https://arxiv.org/pdf/2406.07221), [[Project]](https://mp-hoi.github.io/)\n\n- DP-HOI (CVPR 2024), [[Paper]](https://arxiv.org/pdf/2404.01725.pdf), [[Code]](https://github.com/xingaoli/DP-HOI)\n\n- DPADN (AAAI 2024), [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/27949), [[Code]](https://github.com/PRIS-CV/DPADN)\n\n- KI2HOI (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2403.07246.pdf)\n\n- SCTC (AAAI 2024), [[Paper]](https://arxiv.org/pdf/2401.05676.pdf)\n\n- OBPA-Net (PRCV 2023), [[Paper]](https://link.springer.com/chapter/10.1007/978-981-99-8555-5_30), [[Code]](https://github.com/zhuang1iu/OBPA-NET)\n\n- MLKD (WACV2024), [[Paper]](https://arxiv.org/pdf/2309.05069.pdf), [[Code]](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)\n\n- SBM (PRCV2023), [[Paper]](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)\n\n- UnionDet (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2312.12664.pdf)\n\n- SCA (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2312.01713.pdf)\n\n- HCVC (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2311.16475.pdf)\n\n- GFIN (NN 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)\n\n- LOGICHOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=QjI36zxjbW), [[Code]](https://github.com/weijianan1/LogicHOI)\n\n- CLIP4HOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=nqIIWnwe73)\n\n- UniHOI (NeurIPS 2023), [[Paper]](https://arxiv.org/pdf/2311.03799.pdf), [[Code]](https://github.com/Caoyichao/UniHOI)\n\n- SG2HOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2311.01755.pdf)\n\n- SQAB (Displays 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)\n\n- Mult-Step (ACMMM 2023), [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3612581)\n\n- PDN (PR 2023), [[Paper]](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)\n\n- ICDT (ICANN 2023), [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-031-44223-0_35.pdf), [[Code]](https://github.com/bingnanG/ICDT)\n\n- ScratchHOI (ICIP 2023), [[Paper]](https://ieeexplore.ieee.org/abstract/document/10222323)\n\n- ADA-CM (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2309.03696.pdf), [[Code]](https://github.com/ltttpku/ADA-CM)\n\n- HODN (TMM 2023), [[Paper]](https://arxiv.org/pdf/2308.10158.pdf)\n\n- RLIPv2 (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.09351.pdf), [[Code]](https://github.com/JacobYuan7/RLIPv2)\n\n- AGER (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.08370.pdf), [[Code]](https://github.com/six6607/AGER)\n\n- Diagnosing Human-object Interaction Detectors (arXIv 2023), [[Paper]](https://arxiv.org/pdf/2308.08529.pdf), [[Code]](https://github.com/neu-vi/Diag-HOI)\n\n- Compo (ICME 2023), [[Paper]](https://arxiv.org/pdf/2308.05961.pdf)\n\n- PViC (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.06202.pdf), [[Code]](https://github.com/fredzzhang/pvic)\n\n- VIL (ACM MM 2023), [[Paper]](https://arxiv.org/pdf/2308.02606.pdf)\n\n- RmLR (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2307.13529.pdf)\n\n- PSN (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2307.10499.pdf)\n\n- SOV-STG (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2307.02291.pdf), [[Code]](https://github.com/cjw2021/SOV-STG)\n\n- Shikra (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2306.15195.pdf), [[Code]](https://github.com/shikras/shikra)\n\n- HOKEM (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2306.14260.pdf)\n\n- SQA (ICASSP 2023), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10096029), [[Code]](https://github.com/nmbzdwss/SQA)\n\n- OpenCat (CVPR 2023), [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)\n\n- DiffHOI (arXiv 2023.5), [[Paper]](https://arxiv.org/pdf/2305.12252.pdf)\n\n- ViPLO (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2304.08114.pdf), [[Code]](https://github.com/Jeeseung-Park/ViPLO)\n\n- MUREN (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2304.04997.pdf), [[Project]](http://cvlab.postech.ac.kr/research/MUREN/)\n\n- HOICLIP (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2303.15786.pdf), [[Code]](https://github.com/Artanic30/HOICLIP)\n\n- CQL (CVPR 2023), [[Paper]](https://arxiv.org/pdf/2303.14005.pdf), [[Code]](https://github.com/charles-xie/CQL)\n\n- UniVRD (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.08998.pdf)\n\n- SKGHOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.04253.pdf)\n\n- Weakly-HOI-CLIP (ICLR 2023), [[Paper]](https://arxiv.org/pdf/2303.01313.pdf), [[Code]](https://github.com/bobwan1995/Weakly-HOI)\n\n- FGAHOI (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2301.04019.pdf), [[Code]](https://github.com/xiaomabufei/FGAHOI)\n\n- PR-Net (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2301.03510.pdf)\n\n- PQNet (MMAsia 2022), [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3551626.3564944)\n\n- MHOI (TCSVT 2022), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)\n\n- RLIP (NeurIPS 2022), [[Paper]](https://arxiv.org/pdf/2209.01814.pdf), [[Code]](https://github.com/JacobYuan7/RLIP)\n\n- PartMap (ECCV2022), [[Paper]](https://arxiv.org/pdf/2207.14192v1.pdf), [[Code]](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)\n\n- K-BAN (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2207.07979.pdf)\n\n- SGCN4HOI (IEEE SMC 2022), [[Paper]](https://arxiv.org/pdf/2207.05733.pdf)\n\n- HQM (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2207.05293.pdf), [[Code]](https://github.com/MuchHair/HQM)\n\n- ODM (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2207.02400.pdf)\n\n- SDT (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2207.01869.pdf)\n\n- DOQ (CVPR 2022), [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Qu_Distillation_Using_Oracle_Queries_for_Transformer-Based_Human-Object_Interaction_Detection_CVPR_2022_paper.pdf), [[Code]](https://github.com/SherlockHolmes221/DOQ)\n\n- STIP (CVPR 2022), [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_Exploring_Structure-Aware_Transformer_Over_Interaction_Proposals_for_Human-Object_Interaction_Detection_CVPR_2022_paper.pdf)\n\n- DT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.09290.pdf)\n\n- IF (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.07718.pdf), [[Code]](https://github.com/Foruck/Interactiveness-Field)\n\n- CPC (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.04836.pdf), [[Code]](https://github.com/mlvlab/CPChoi)\n\n- CATN (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.04911.pdf)\n\n- SSRT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2204.00746.pdf)\n\n- GEN-VLKT (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.13954.pdf), [[Code]](https://github.com/YueLiao/gen-vlkt)\n\n- MSTR (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.14709.pdf)\n\n- Iwin (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2203.10537.pdf)\n\n- RGBM (arXiv 2022.2), [[Paper]](https://arxiv.org/pdf/2202.11998.pdf)\n\n- GPV-2 (ECCV 2022), [[Paper]](https://arxiv.org/pdf/2202.02317.pdf), [[Project]](https://prior.allenai.org/projects/gpv2)\n\n- OC-Immunity (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)\n\n- OCN (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2202.00259.pdf), [[Code]](https://github.com/JacobYuan7/OCN-HOI-Benchmark)\n\n- QAHOI (arXiv 2021) [[Paper]](https://arxiv.org/pdf/2112.08647.pdf), [[Code]](https://github.com/cjw2021/QAHOI)\n\n- PhraseHOI (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2112.07383.pdf)\n\n- DEFR (arXiv 2021.12) [[Paper]](https://arxiv.org/pdf/2112.06392.pdf)\n\n- UPT (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2112.01838.pdf), [[Code]](https://github.com/fredzzhang/upt)\n\n- HRNet (TIP 2021) [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=9552553)\n\n- ACP++ (TIP 2021) [[Paper]](https://arxiv.org/pdf/2109.04047.pdf), [[Code]](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)\n\n- SG2HOI (ICCV 2021) [[Paper]](https://arxiv.org/pdf/2108.08584.pdf)\n\n- CDN (NeurIPS 2021) [[Paper]](https://arxiv.org/pdf/2108.05077.pdf), [[Code]](https://github.com/YueLiao/CDN)\n\n- GTNet (arXiv 2021.8) [[Paper]](https://arxiv.org/pdf/2108.00596.pdf), [[Code]](https://github.com/UCSB-VRL/GTNet)\n\n- HOI-MO-Net (IVC 2021) [[Paper]](https://www.sciencedirect.com/science/article/pii/S0262885621001670?via%3Dihub#tbl0005)\n\n- IPGN (TIP 2021.7) [[Paper]](https://ieeexplore.ieee.org/document/9489275)\n\n- SCG (ICCV 2021, SAG, v2) [[Paper]](https://arxiv.org/pdf/2012.06060.pdf), [[Code]](https://github.com/fredzzhang/spatially-conditioned-graphs)\n\n- Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior (arXiv) [[Paper]](https://arxiv.org/pdf/2105.03089.pdf)\n\n- PST (ICCV2021) [[Paper]](https://arxiv.org/pdf/2105.02170.pdf)\n\n- RR-Net (arXiv 2021.5) [[Paper]](https://arxiv.org/pdf/2104.15015.pdf)\n\n- HOTR (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.13682.pdf), [[Code]](https://github.com/kakaobrain/HOTR)\n\n- GGNet (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.05269.pdf), [[Code]](https://github.com/SherlockHolmes221/GGNet)\n\n- ATL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2104.02867.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)\n\n- FCL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.08214.pdf), [[Code]](https://github.com/zhihou7/FCL)\n\n- AS-Net (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.05983.pdf), [[Code]](https://github.com/yoyomimi/AS-Net)\n\n- End-to-End Human Object Interaction Detection with HOI Transformer (CVPR2021), [[Paper]](https://arxiv.org/pdf/2103.04503.pdf), [[Code]](https://github.com/bbepoch/HoiTransformer)\n\n- QPIC (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.05399.pdf), [[Code]](https://github.com/hitachi-rd-cv/qpic)\n\n- TIN (TPAMI2021) [[Paper]](https://arxiv.org/pdf/2101.10292.pdf), [[Code]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)\n\n- IDN (NeurIPS2020) [[Paper]](https://arxiv.org/pdf/2010.16219.pdf) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))\n\n- DIRV (AAAI2021) [[Paper]](https://arxiv.org/pdf/2010.01005.pdf)\n\n- DecAug (AAAI2021) [[Paper]](https://arxiv.org/pdf/2010.01007.pdf)\n\n- OSGNet (IEEE Access) [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)\n\n- PFNet (CVM) [[Paper]](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)\n\n- UniDet (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)\n\n- DRG (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123570681.pdf) [[Code]](https://github.com/vt-vl-lab/DRG)\n\n- FCMNet (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)\n\n- Contextual Heterogeneous Graph Network for Human-Object Interaction Detection (ECCV2020) [[Paper]](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)\n\n- PD-Net (ECCV2020) [[Paper-1]](https://www.researchgate.net/publication/343536295_Polysemy_Deciphering_Network_for_Human-Object_Interaction_Detection) [[Paper-2]](https://arxiv.org/pdf/2008.02918.pdf) [[Code]](https://github.com/MuchHair/PD-Net)\n\n- VCL (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.12407.pdf) [[Code]](https://github.com/zhihou7/VCL)\n\n- ACP (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.08728.pdf) [[Code]](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)\n\n- ConsNet (ACMMM2020) [[Paper]](https://arxiv.org/pdf/2008.06254.pdf) [[Code]](https://github.com/yeliudev/ConsNet), **HICO-DET Python API**: A general Python toolkit for the HICO-DET dataset, including APIs for data loading \u0026 processing, human-object pair IoU \u0026 NMS calculation, and standard evaluation. [[Code]](https://github.com/yeliudev/ConsNet) [[Documentation]](https://consnet.readthedocs.io/)\n\n- Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection (IJCAI2020) [[Paper]](https://www.ijcai.org/Proceedings/2020/0154.pdf)\n\n- PaStaNet (CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) [[Data]](https://github.com/DirtyHarryLYL/HAKE) [[Paper]](https://arxiv.org/pdf/2004.00945.pdf) [[YouTube]](https://t.co/hXiAYPXEuL?amp=1) [[bilibili]](https://www.bilibili.com/video/BV1s54y1Y76s)\n\n- DJ-RN (CVPR2020) [[Code]](https://github.com/DirtyHarryLYL/DJ-RN) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf)\n\n- Cascaded Human-Object Interaction Recognition (CVPR2020) [[Code]](https://github.com/tfzhou/C-HOI) [[Paper]](https://arxiv.org/pdf/2003.04262.pdf)\n\n- PPDM (CVPR2020) [[Code]](https://github.com/YueLiao/PPDM) [[Paper]](https://arxiv.org/pdf/1912.12898.pdf)\n\n- IP-Net (CVPR2020) [[Code]](https://github.com/vaesl/IP-Net) [[Paper]](https://arxiv.org/pdf/2003.14023.pdf)\n\n- VSGNet (CVPR2020) [[Code]](https://github.com/ASMIftekhar/VSGNet) [[Paper]](https://arxiv.org/pdf/2003.05541.pdf)\n\n- HOID (CVPR2020) [[Code]](https://github.com/scwangdyd/zero_shot_hoi) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2020/05225.pdf)\n\n- Diagnosing Rarity in Human-Object Interaction Detection (CVPRW2020) [[Paper]](https://arxiv.org/pdf/2006.05728.pdf)\n\n- MLCNet (ICMR2020) [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)\n\n- SIGN (ICME2020) [[Paper]](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)\n\n- In-GraphNet (IJCAI-PRICAI 2020) [[Paper]](https://arxiv.org/pdf/2007.06925.pdf)\n\n- PMFNet(ICCV2019) [[Code]](https://github.com/bobwan1995/PMFNet) [[Paper]](https://arxiv.org/abs/1909.08453)\n\n- No-Frills (ICCV2019) [[Code]](https://github.com/BigRedT/no_frills_hoi_det) [[Paper]](http://tanmaygupta.info/assets/img/no_frills/paper.pdf)\n\n- Analogy (ICCV2019) [[Code]](https://github.com/jpeyre/analogy) [[Paper]](https://www.di.ens.fr/willow/research/analogy/paper.pdf)\n\n- RPNN (ICCV2019) [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhou_Relation_Parsing_Neural_Network_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)\n\n- Deep Contextual Attention for Human-Object Interaction Detection (ICCV2019) [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)\n\n- Interactiveness (CVPR2019) [[Code]](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network) [[Paper]](https://arxiv.org/pdf/1811.08264.pdf)\n\n- Turbo (AAAI2019) [[Paper]](https://arxiv.org/pdf/1903.06355.pdf)\n\n- GPNN (ECCV2018) [[Code]](https://github.com/SiyuanQi/gpnn) [[Paper]](https://arxiv.org/pdf/1808.07962.pdf)\n\n- iCAN (BMVC2018) [[Code]](https://github.com/vt-vl-lab/iCAN) [[Paper]](https://arxiv.org/pdf/1808.10437.pdf)\n\n- InteractNet (CVPR2018) [[Paper]](https://arxiv.org/pdf/1704.07333.pdf)\n\n- Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [[Paper]](http://vision.stanford.edu/pdf/shen2018wacv.pdf)\n\n- HO-RCNN (WACV2018) [[Code]](https://github.com/ywchao/ho-rcnn) [[Paper]](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf) \n\n- VS-GATs (Mar. 2020) [[Paper]](https://arxiv.org/pdf/2001.02302.pdf)\n\n- Classifying All Interacting Pairs in a Single Shot (Jan. 2020) [[Paper]](https://arxiv.org/pdf/2001.04360.pdf)\n\n- Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [[Paper]](https://arxiv.org/pdf/2005.11406.pdf)\n\n- PMN (Jul. 2020) [[Paper]](https://arxiv.org/pdf/2008.02042.pdf) [[Code]](https://github.com/birlrobotics/PMN)\n\n- SAG (Dec 2020) [[Paper]](https://arxiv.org/pdf/2012.06060.pdf) [[Code]](https://github.com/fredzzhang/spatio-attentive-graphs)\n\n- SABRA (Dec 2020) [[Paper]](https://arxiv.org/pdf/2012.12510.pdf)\n\nMore...\n\n#### Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).\n\n- HOIGen (ACM MM 2024), [[Paper]](https://arxiv.org/pdf/2408.05974), [[Code]](https://github.com/soberguo/HOIGen)\n\n-  CMD-SE (arXiv 2024), [[Paper]](https://arxiv.org/pdf/2404.06194.pdf), [[Code]](https://github.com/ltttpku/CMD-SE-release)\n\n- CLIP4HOI (NeurIPS 2023), [[Paper]](https://openreview.net/pdf?id=nqIIWnwe73)\n\n- UniHOI (NeurIPS 2023), [[Paper]](https://arxiv.org/pdf/2311.03799.pdf), [[Code]](https://github.com/Caoyichao/UniHOI)\n\n- Lu et. al. (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2309.05069.pdf), [[Code]](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)\n\n- CDT (TNNLS 2023), [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10242152)\n\n- RLIPv2 (ICCV 2023), [[Paper]](https://arxiv.org/pdf/2308.09351.pdf), [[Code]](https://github.com/JacobYuan7/RLIPv2)\n\n- Unal et.al. (arXiv 2023), [[Paper]](https://arxiv.org/pdf/2303.05546.pdf)\n\n- RLIP (NeurIPS 2022), [[Paper]](https://arxiv.org/pdf/2209.01814.pdf), [[Code]](https://github.com/JacobYuan7/RLIP)\n\n- THID (CVPR 2022), [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2022/CVPR2022_4126.pdf), [[Code]](https://github.com/scwangdyd/promting_hoi)\n\n- EoID (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2204.03541.pdf), [[Code]](https://github.com/mrwu-mac/EoID)\n\n- SCL (arXiv 2022), [[Paper]](https://arxiv.org/pdf/2203.14272.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)\n\n-  (CVPR 2022), [[Paper]](https://arxiv.org/pdf/2203.13954.pdf), [[Code]](https://github.com/YueLiao/)\n\n- OC-Immunity (AAAI 2022), [[Paper]](https://arxiv.org/pdf/2202.09492.pdf), [[Code]](https://github.com/Foruck/OC-Immunity)\n\n- Align-Former (BMVC 2021), [[Paper]](https://arxiv.org/pdf/2112.00492.pdf)\n\n- CHOID (ICCV2021) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2021/ICCV2021_sucheng.pdf), [[Code]](https://github.com/scwangdyd/large_vocabulary_hoi_detection)\n\n- DGIG-Net (TOC2021) [[Paper]](https://ieeexplore.ieee.org/abstract/document/9352497)\n\n- ATL (CVPR2021)  [[Paper]](https://arxiv.org/pdf/2104.02867.pdf), [[Code]](https://github.com/zhihou7/HOI-CL)\n\n- FCL (CVPR2021) [[Paper]](https://arxiv.org/pdf/2103.08214.pdf), [[Code]](https://github.com/zhihou7/FCL)\n\n- Detecting Human-Object Interaction with Mixed Supervision ( 2021) [[Paper]](https://arxiv.org/pdf/2011.04971v1.pdf)\n\n- ConsNet (ACMMM2020) [[Paper]](https://arxiv.org/pdf/2008.06254.pdf) [[Code]](https://github.com/yeliudev/ConsNet)\n\n- Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [[Paper]](https://arxiv.org/pdf/2009.01039.pdf)\n\n- VCL (ECCV2020) [[Paper]](https://arxiv.org/pdf/2007.12407.pdf) [[Code]](https://github.com/zhihou7/VCL)\n\n- HOID (CVPR2020) [[Code]](https://github.com/scwangdyd/zero_shot_hoi) [[Paper]](https://cse.buffalo.edu/~jsyuan/papers/2020/05225.pdf)\n\n- Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [[Paper]](https://arxiv.org/pdf/2005.11406.pdf)\n\n- Analogy (ICCV2019) [[Code]](https://github.com/jpeyre/analogy) [[Paper]](https://www.di.ens.fr/willow/research/analogy/paper.pdf)\n\n- Functional (AAAI2020) [[Paper]](https://arxiv.org/pdf/1904.03181.pdf)\n\n- Scaling Human-Object Interaction Recognition through Zero-Shot Learning (2018) [[Paper]](http://vision.stanford.edu/pdf/shen2018wacv.pdf)\n\nMore...\n\n### Video HOI methods\n\n- SPDTP (arXiv, Jun 2022), [[Paper]](https://arxiv.org/pdf/2206.03061.pdf)\n\n- V-HOI (arXiv, Jun 2022), [[Paper]](https://arxiv.org/pdf/2206.01908.pdf)\n\n- Detecting Human-Object Relationships in Videos (ICCV2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Ji_Detecting_Human-Object_Relationships_in_Videos_ICCV_2021_paper.pdf)\n\n- STIGPN (Aug 2021), [[Paper]](https://arxiv.org/pdf/2108.08633.pdf), [[Code]](https://github.com/GuangmingZhu/STIGPN)\n\n- VidHOI (May 2021), [[Paper]](https://arxiv.org/pdf/2105.11731.pdf)\n\n- LIGHTEN (ACMMM2020) [[Paper]](https://www.cse.iitb.ac.in/~rdabral/docs/acm_lighten.pdf) [[Code]](https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI)\n\n- Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [[Paper]](https://arxiv.org/pdf/1912.02401.pdf)\n\n- Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [[Code]](https://github.com/Tushar-N/interaction-hotspots) [[Paper]](https://arxiv.org/pdf/1812.04558.pdf)\n\n- GPNN (ECCV2018) [[Code]](https://github.com/SiyuanQi/gpnn) [[Paper]](https://arxiv.org/pdf/1808.07962.pdf)\n\nMore...\n\n### 3D HOI Reconstruction/Generation/Understanding\n\n- P3HAOI (AAAI 2024) [[Paper]](https://arxiv.org/pdf/2312.10714.pdf), [[Project]](https://mvig-rhos.com/p3haoi)\n\n- IM-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.08869.pdf), [[Project]](https://afterjourney00.github.io/IM-HOI.github.io/)\n\n- Ins-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.09641.pdf), [[Project]](https://jiajunzhang16.github.io/ins-hoi/)\n\n- HDM (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.07063.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/procigen-hdm/)\n\n- HOI-Diff (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.06553.pdf), [[Project]](https://neu-vi.github.io/HOI-Diff/)\n\n- CHOIS (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.03913.pdf), [[Project]](https://lijiaman.github.io/projects/chois/)\n\n- MOB (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.02700.pdf), [[Project]](https://foruck.github.io/occu-page/)\n\n- PhysHOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2312.04393.pdf), [[Project]](https://wyhuai.github.io/physhoi-page/)\n\n- physfullbody-grasp (3DV 2024) [[Paper]](https://arxiv.org/pdf/2309.07907.pdf), [[Project]](https://eth-ait.github.io/phys-fullbody-grasp/)\n\n- MIME (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2212.04360.pdf), [[Project]](https://mime.is.tue.mpg.de/)\n\n- TOHO (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2303.13129.pdf)\n\n- GenZI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2311.17737.pdf), [[Project]](https://craigleili.github.io/projects/genzi/)\n\n- CG-HOI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2311.16097.pdf), [[Project]](https://www.christian-diller.de/projects/cg-hoi/)\n\n- OMOMO (TOG 2023) [[Paper]](https://arxiv.org/pdf/2309.16237.pdf)\n\n- HOPS (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2205.02830.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/hops/)\n\n- UniHSI (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2309.07918.pdf), [[Project]](https://xizaoqu.github.io/unihsi/)\n\n- NIFTY (arXiv 2023) [[Paper]](https://nileshkulkarni.github.io/nifty/assets/paper.pdf), [[Project]](https://nileshkulkarni.github.io/nifty/)\n\n- IMoS (EUROGRAPHICS 2023) [[Paper]](https://arxiv.org/pdf/2212.07555.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/IMoS/)\n\n- SceneDiffuser (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2301.06015.pdf), [[Project]](https://scenediffuser.github.io/)\n\n- Locomotion-Action-Manipulation (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2301.02667.pdf), [[Porject]](https://jiyewise.github.io/projects/LAMA/)\n\n- ROAM (arXiv 2023) [[Paper]](https://arxiv.org/pdf/2308.12969.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/ROAM/)\n\n- Object pop-up (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2306.00777.pdf), [[Code]](https://github.com/ptrvilya/object-popup)\n\n- StackFLOW (IJCAI 2023) [[Paper]](https://www.ijcai.org/proceedings/2023/0100.pdf), [[Code]](https://github.com/huochf/StackFLOW)\n\n- InterDiff (ICCV2023) [[Paper]](https://arxiv.org/pdf/2308.16905.pdf), [[Project]](https://sirui-xu.github.io/InterDiff/)\n\n- Wang et. al. (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2209.02485.pdf)\n\n- Haresh et. al. (3DV 2022) [[Paper]](https://arxiv.org/pdf/2209.05612.pdf), [[Project]](https://3dlg-hcvc.github.io/3dhoi/)\n\n- COUCH (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2205.00541.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/couch/)\n\n- HULC (ECCV 2022) [[Paper]](https://vcai.mpi-inf.mpg.de/projects/HULC/data/paper_light.pdf), [[Project]](https://vcai.mpi-inf.mpg.de/projects/HULC/)\n\n- AROS (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2210.11725.pdf)\n\n- MoCapDeform (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2208.08439.pdf)\n\n- SUMMON (SIGGRAPH Asia 2022) [[Paper]](https://arxiv.org/pdf/2301.01424.pdf), [[Project]](https://lijiaman.github.io/projects/summon/)\n\n- HUMANISE (arXiv 2022) [[Paper]](https://silvester.wang/HUMANISE/paper.pdf), [[Project]](https://silvester.wang/HUMANISE/)\n\n- CHAIRS (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2212.10621.pdf), [[Project]](https://jnnan.github.io/project/chairs/)\n\n- NeuralDome (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2212.07626.pdf)\n\n- ARCTIC (arXiv 2022) [[Paper]](https://arxiv.org/pdf/2204.13662.pdf), [[Project]](https://arctic.is.tue.mpg.de/) \n\n- IMoS (EUROGRAPHICS 2023) [[Paper]](https://arxiv.org/pdf/2212.07555.pdf)\n\n- COINS (ECCV 2022) [[Paper]](https://drive.google.com/file/d/1LpJe1RiDsB49tQwUFWMUorjBTQfvXzvW/view?usp=sharing), [[Project]](https://zkf1997.github.io/COINS/index.html)\n\n- Pose2Room (ECCV 2022) [[Paper]](https://arxiv.org/pdf/2112.03030.pdf), [[Project]](https://yinyunie.github.io/pose2room-page/)\n\n- RICH (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2206.09553.pdf), [[Project]](https://rich.is.tue.mpg.de/)\n\n- MOVER (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2203.03609.pdf), [[Project]](https://mover.is.tue.mpg.de/)\n\n- SAGA (ECCV 2022) [[Paper]](https://arxiv.org/abs/2112.10103), [[Project]](https://jiahaoplus.github.io/SAGA/saga.html)\n\n- GOAL (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2112.11454.pdf), [[Project]](https://goal.is.tuebingen.mpg.de/)\n\n- BEHAVE (CVPR 2022) [[Paper]](https://virtualhumans.mpi-inf.mpg.de/papers/bhatnagar22behave/behave.pdf), [[Project]](https://virtualhumans.mpi-inf.mpg.de/behave/)\n\n- CHORE (ECCV 2022) [[Project]](https://virtualhumans.mpi-inf.mpg.de/chore/), [[Paper]](https://arxiv.org/pdf/2204.02445.pdf)\n\n- POSA (CVPR 2021) [[Paper]](https://arxiv.org/pdf/2012.11581.pdf), [[Project]](https://posa.is.tue.mpg.de/)\n\n- GraviCap (ICCV 2021) [[Paper]](https://arxiv.org/pdf/2108.08844.pdf), [[Project]](https://4dqv.mpi-inf.mpg.de/GraviCap/)\n\n- D3D-HOI (arXiv 2021) [[Paper]](https://arxiv.org/pdf/2108.08420.pdf), [[Project]](https://github.com/facebookresearch/d3d-hoi)\n\n- PSI (CVPR 2020) [[Paper]](https://ps.is.mpg.de/uploads_file/attachment/attachment/575/1912.02923.pdf), [[Code]](https://github.com/yz-cnsdqz/PSI-release)\n\n- DJ-RN (CVPR 2020) [[Paper]](https://arxiv.org/pdf/2004.08154.pdf), [[Code]](https://github.com/DirtyHarryLYL/DJ-RN)\n\n- PLACE (3DV 2020) [[Paper]](https://arxiv.org/pdf/2008.05570.pdf), [[Project]](https://sanweiliti.github.io/PLACE/PLACE.html)\n\n- GRAB (ECCV 2020) [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490562.pdf), [[Project]](https://grab.is.tue.mpg.de/)\n\n- Holistic++ (ICCV 2019) [[Paper]](https://arxiv.org/pdf/1909.01507.pdf), [[Code]]()\n\n- PROX (ICCV 2019) [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Hassan_Resolving_3D_Human_Pose_Ambiguities_With_3D_Scene_Constraints_ICCV_2019_paper.pdf), [[Project]](https://prox.is.tue.mpg.de/)\n\n## Result\n\n### [PaStaNet-HOI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network):\nProposed by TIN (TPAMI version, Transferable Interactiveness Network).\nIt is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 \"no_interaction\" HOIs of HICO-DET to avoid the incomplete labeling). \nIt has a more severe long-tailed data distribution thus is more difficult.\n\n#### Detector: COCO pre-trained\n|Method| mAP |\n|:---:|:---:|\n|iCAN|11.00|\n|iCAN+NIS|13.13|\n|[TIN](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| **15.38**|\n\n\n### HICO-DET:\n\n#### 1) Detector: COCO pre-trained\n|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n|[Shen et al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf)| WACV2018 |  6.46 | 4.24 | 7.12| - | - | - |\n|[HO-RCNN](http://www-personal.umich.edu/~ywchao/publications/chao_wacv2018.pdf)| WACV2018 | 7.81|  5.37|  8.54|  10.41|  8.94 | 10.85 |\n|[InteractNet](https://arxiv.org/pdf/1704.07333.pdf)| CVPR2018 |  9.94|  7.16 | 10.77| - | - |-|\n|[Turbo](https://arxiv.org/pdf/1903.06355.pdf)|AAAI2019|11.40| 7.30| 12.60|- | - |-|\n|[GPNN](https://arxiv.org/pdf/1808.07962.pdf)| ECCV2018 |  13.11 | 9.34 | 14.23| - | - |-|\n|[Xu et. al](https://www-users.cs.umn.edu/~qzhao/publications/pdf/xu2019cvpr.pdf)|ICCV2019|14.70 |13.26| 15.13|-|-|-|\n|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 14.84|  10.45 | 16.15 | 16.26  | 11.33| 17.73 |\n|[Wang et. al.](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)|ICCV2019|16.24 |11.16| 17.75| 17.73| 12.78| 19.21|\n|[Lin et. al](https://www.ijcai.org/Proceedings/2020/0154.pdf)|IJCAI2020|16.63 |11.30| 18.22| 19.22| 14.56| 20.61|\n|[Functional](https://arxiv.org/pdf/1904.03181.pdf) (suppl)|AAAI2020|16.96| 11.73 |18.52| -|-|-|\n|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 17.03 | 13.42| 18.11| 19.17| 15.51|20.26|\n|[No-Frills](http://tanmaygupta.info/assets/img/no_frills/paper.pdf)| ICCV2019 | 17.18 |12.17| 18.68 |-|-|-|\n|[RPNN](http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhou_Relation_Parsing_Neural_Network_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)|ICCV2019|17.35| 12.78| 18.71|-|-|-|\n|[PMFNet](https://arxiv.org/pdf/1909.08453.pdf)| ICCV2019 | 17.46| 15.65| 18.00| 20.34| 17.47| 21.20|\n|[SIGN](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)|ICME2020|17.51| 15.31 |18.53 |20.49| 17.53| 21.51|\n|[Interactiveness-optimized](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 | 17.54\t|13.80\t|18.65|\t19.75|\t15.70|\t20.96|\n|[Liu et.al.](https://arxiv.org/pdf/2105.03089.pdf)|arXiv|17.55 |20.61|-|-|-|-|\n|[Wang et al.](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)|ECCV2020|17.57 |16.85| 17.78| 21.00| 20.74| 21.08|\n|[UnionDet](https://arxiv.org/pdf/2312.12664.pdf)|arXiv2023|17.58| 11.72 |19.33| 19.76| 14.68| 21.27|\n|[In-GraphNet](https://arxiv.org/pdf/2007.06925.pdf)|IJCAI-PRICAI 2020|17.72 |12.93 |19.31|-|-|-|\n|[HOID](https://github.com/scwangdyd/zero_shot_hoi)|CVPR2020| 17.85 |12.85 |19.34|-|-|-|\n|[MLCNet](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)| ICMR2020| 17.95 |16.62 |18.35|22.28 |20.73 |22.74|\n|[SAG](https://github.com/fredzzhang/spatio-attentive-graphs)|arXiv| 18.26 |13.40 |19.71|-|-|-|\n|[Sarullo et al.](https://arxiv.org/pdf/2009.01039.pdf)|arXiv|18.74|-|-|-|-|-|\n|[DRG](https://github.com/vt-vl-lab/DRG)|ECCV2020|19.26 |17.74 |19.71 |23.40 |21.75 |23.89|\n|[Analogy](https://www.di.ens.fr/willow/research/analogy/paper.pdf)| ICCV2019 | 19.40 |14.60| 20.90|-|-|-|\n|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|19.43 |16.55| 20.29| 22.00| 19.09| 22.87|\n|[VS-GATs](https://arxiv.org/pdf/2001.02302.pdf)|arXiv|19.66 |15.79 |20.81|-|-|-|\n|[VSGNet](https://github.com/ASMIftekhar/VSGNet)|CVPR2020|19.80 |16.05| 20.91|-|-|-|\n|[PFNet](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)|CVM|20.05 |16.66 |21.07| 24.01| 21.09| 24.89|\n|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021 |20.08| 15.57| 21.43|-|-|-|\n|[FCMNet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)|ECCV2020|20.41 |17.34| 21.56| 22.04 |18.97| 23.12|\n|[ACP](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)|ECCV2020|20.59 |15.92| 21.98|-|-|-|\n|[PD-Net](https://github.com/MuchHair/PD-Net)|ECCV2020|20.81 |15.90| 22.28| 24.78| 18.88| 26.54|\n|[SG2HOI](https://arxiv.org/pdf/2108.08584.pdf)|ICCV2021|20.93 |18.24| 21.78| 24.83| 20.52| 25.32|\n|[TIN-PAMI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)|TAPMI2021|20.93|18.95|\t21.32|\t23.02|\t20.96|\t23.42|\n|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|21.07 |16.79| 22.35|-|-|-|\n|[PMN](https://github.com/birlrobotics/PMN)|arXiv|21.21 |17.60| 22.29|-|-|-|\n|[IPGN](https://ieeexplore.ieee.org/document/9489275)|TIP2021|21.26|18.47|22.07|-|-|-|\n|[DJ-RN](https://github.com/DirtyHarryLYL/DJ-RN)| CVPR2020 | 21.34|18.53|22.18|23.69|20.64|24.60|\n|[OSGNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)|IEEE Access|21.40 |18.12| 22.38|-|-|-|\n|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|21.48 |16.85| 22.86| 24.29| 19.09| 25.85|\n|[SCG+ODM](https://arxiv.org/pdf/2207.02400.pdf)|ECCV2022|21.50| 17.59| 22.67|-|-|-|\n|[DIRV](https://arxiv.org/pdf/2010.01005.pdf)| AAAI2021|21.78| 16.38| 23.39| 25.52| 20.84| 26.92|\n|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021| 21.85| 18.11 |22.97|-|-|-|\n|[HRNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=9552553)|TIP2021|21.93 |16.30| 23.62| 25.22| 18.75| 27.15|\n|[ConsNet](https://github.com/yeliudev/ConsNet)|ACMMM2020|22.15|17.55|23.52|26.57|20.8|28.3|\n|[SKGHOI](https://arxiv.org/pdf/2303.04253.pdf)|arXiv2023|22.61 |15.87| 24.62|-|-|-|\n|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|23.36|22.47|23.63|26.43|25.01|26.85|\n|[QAHOI-Res50](https://github.com/cjw2021/QAHOI)|arXiv2021|24.35 |16.18| 26.80|-|-|-|\n|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|25.97 |26.09| 25.93|-|-|-|\n|[STIP](https://github.com/zyong812/STIP)|CVPR2022|**28.81**| **27.55**| **29.18**| **32.28**| **31.07**| **32.64**|\n\n#### 2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)\nThe finetuned detector would learn to **only detect the interactive humans and objects** (with interactiveness), thus suppressing many wrong pairings (non-interactive human-object pairs) and boosting the performance.\n|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n|[UniDet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)|ECCV2020|17.58 |11.72 |19.33 |19.76 |14.68 |21.27|\n|[IP-Net](https://arxiv.org/pdf/2003.14023.pdf) | CVPR2020| 19.56 |12.79| 21.58 |22.05 |15.77 |23.92|\n|[RR-Net](https://arxiv.org/pdf/2104.15015.pdf)|arXiv|20.72 |13.21 |22.97| -|-|-|\n|[PPDM](https://arxiv.org/pdf/1912.12898v1.pdf) (paper) |CVPR2020|21.10 |14.46| 23.09| -|-|-|\n|[PPDM](https://github.com/YueLiao/PPDM) (github-hourglass104) |CVPR2020|21.73/21.94\t|13.78/13.97\t|24.10/24.32\t|24.58/24.81|\t16.65/17.09|\t26.84/27.12|\n|[Functional](https://arxiv.org/pdf/1904.03181.pdf) |AAAI2020|21.96 |16.43|23.62| -|-|-|\n|[SABRA-Res50](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 23.48| 16.39| 25.59| 28.79| 22.75| 30.54|\n|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|23.63 |17.21 |25.55 |25.98 |19.12 |28.03|\n|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021| 23.67| 17.64| 25.47| 26.01| 19.60| 27.93|\n|[PST](https://arxiv.org/pdf/2105.02170.pdf)|ICCV2021|23.93| 14.98| 26.60| 26.42| 17.61| 29.05|\n|[SABRA-Res50FPN](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 24.12 |15.91| 26.57| 29.65| 22.92| 31.65|\n|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|24.50 |18.53| 26.28| 27.23| 21.27| 29.00|\n|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|24.58\t|20.33|\t25.86|\t27.89|\t23.64|\t29.16|\n|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|24.68| 20.03| 26.07| 26.80| 21.61| 28.35|\n|[HOTR](https://github.com/kakaobrain/HOTR)|CVPR2021|25.10| 17.34| 27.42| -|-|-|\n|[FCL+VCL](https://github.com/zhihou7/FCL)|CVPR2021|25.27| 20.57| 26.67| 27.71| 22.34| 28.93|\n|[OC-Immunity](https://github.com/Foruck/OC-Immunity)|AAAI2022|25.44| 23.03| 26.16| 27.24| 24.32| 28.11|\n|[ConsNet-F](https://github.com/yeliudev/ConsNet)|ACMMM2020|25.94|19.35|27.91|30.34|23.4|32.41|\n|[SABRA-Res152](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 26.09 |16.29| 29.02| 31.08| 23.44| 33.37|\n|[QAHOI-Res50](https://github.com/cjw2021/QAHOI)|arXiv2021|26.18 |18.06| 28.61|-|-|-|\n|[Zou et al.](https://github.com/bbepoch/HoiTransformer)|CVPR2021|26.61 |19.15| 28.84| 29.13| 20.98| 31.57|\n|[SKGHOI](https://arxiv.org/pdf/2303.04253.pdf)|arXiv2023|26.95 |21.28 |28.56| -|-|-|\n|[RGBM](https://arxiv.org/pdf/2202.11998.pdf)|arXiv2022|27.39| 21.34 |29.20 |30.87 |24.20 |32.87|\n|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|28.03 |22.73| 29.61| 29.98| 24.13| 31.73|\n|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|28.83| 20.29| 31.31| 31.05| 21.41| 33.93|\n|[AS-Net](https://github.com/yoyomimi/AS-Net)|CVPR2021|28.87 |24.25 |30.25 |31.74 |27.07|33.14|\n|[QPIC-Res50](https://github.com/hitachi-rd-cv/qpic)|CVPR2021| 29.07 |21.85 |31.23 |31.68 |24.14 |33.93|\n|[GGNet](https://github.com/SherlockHolmes221/GGNet)|CVPR2021|29.17 |22.13 |30.84 |33.50| 26.67 |34.89|\n|[QPIC-CPC](https://arxiv.org/pdf/2204.04836.pdf)|CVPR2022|29.63 |23.14 |31.57|-|-|-|\n|[QPIC-Res101](https://github.com/hitachi-rd-cv/qpic)|CVPR2021|29.90 |23.92 |31.69 |32.38 |26.06 |34.27|\n|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021| 29.26 | 24.61 | 30.65 | 32.87 | 27.89 | 34.35 |\n|[MHOI](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)|TCSVT2022|29.67 |24.37 |31.25 |31.87 |27.28 |33.24|\n|[PhraseHOI](https://arxiv.org/pdf/2112.07383.pdf)|AAAI2022|30.03 |23.48 |31.99 |33.74 |27.35 |35.64|\n|[CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10242152) | TNNLS 2023|30.48|25.48|32.37|-|-|-|\n|[SQAB](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)|Displays2023|30.82\t|24.92|\t32.58|\t33.58|\t27.19|\t35.49|\n|[MSTR](https://arxiv.org/pdf/2203.14709.pdf)|CVPR2022|31.17| 25.31| 32.92| 34.02| 28.83| 35.57|\n|[SSRT](https://arxiv.org/pdf/2204.00746.pdf)|CVPR2022|31.34 |24.31 |33.32|-|-|-|\n|[OCN](https://github.com/JacobYuan7/OCN-HOI-Benchmark)|AAAI2022|31.43| 25.80| 33.11| -|-|-|\n|[SCG+ODM](https://arxiv.org/pdf/2207.02400.pdf)|ECCV2022|31.65 |24.95| 33.65|-|-|-|\n|[DT](https://arxiv.org/pdf/2204.09290.pdf)|CVPR2022|31.75| 27.45| 33.03| 34.50| 30.13| 35.81|\n|[ParSe (COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|31.79| 26.36 |33.41|-|-|-|\n|[CATN (w/ Bert)](https://arxiv.org/pdf/2204.04911.pdf)|CVPR2022|31.86| 25.15| 33.84| 34.44| 27.69| 36.45|\n|[SQA](https://github.com/nmbzdwss/SQA)|ICASSP2023|31.99 |29.88| 32.62| 35.12| 32.74| 35.84|\n|[CDN](https://github.com/YueLiao/CDN)|NeurIPS2021|32.07| 27.19| 33.53| 34.79| 29.48| 36.38|\n|[STIP](https://github.com/zyong812/STIP)|CVPR2022|32.22| 28.15| 33.43| 35.29| 31.43| 36.45|\n|[DEFR](https://arxiv.org/pdf/2112.06392.pdf)|arXiv2021| 32.35 |33.45| 32.02|-|-|-|\n|[PQNet-L](https://dl.acm.org/doi/pdf/10.1145/3551626.3564944)|mmasia2022|32.45 |27.80 |33.84 |35.28 |30.72 |36.64|\n|[CDN-s+HQM](https://arxiv.org/pdf/2207.05293.pdf)|ECCV2022|32.47| 28.15| 33.76|-|-|-|\n|[UPT](https://github.com/fredzzhang/upt)|CVPR2022|32.62| 28.62| 33.81| 36.08| 31.41| 37.47|\n|[OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR2023|32.68 |28.42| 33.75|-|-|-|\n|[Iwin](https://arxiv.org/pdf/2203.10537.pdf)|ECCV2022|32.79 |27.84| 35.40| 35.84| 28.74| 36.09|\n|[RLIP-ParSe (VG+COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|32.84|26.85 |34.63|-|-|-|\n|[PR-Net](https://arxiv.org/pdf/2301.03510.pdf)|arXiv2023|32.86 |28.03| 34.30|-|-|-|\n|[MUREN](http://cvlab.postech.ac.kr/research/MUREN/)|CVPR2023|32.87 |28.67| 34.12| 35.52| 30.88| 36.91|\n|[SDT](https://arxiv.org/pdf/2207.01869.pdf)|arXiv2022|32.97| 28.49| 34.31| 36.32| 31.90| 37.64|\n|[HODN](https://arxiv.org/pdf/2308.10158.pdf)|TMM2023| 33.14 |28.54| 34.52| 35.86| 31.18| 37.26|\n|[SG2HOI](https://arxiv.org/pdf/2311.01755.pdf)|arxXiv2023|33.14 |29.27| 35.72| 35.73| 32.01| 36.43|\n|[PDN](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)|PR2023|33.18\t|27.95|\t34.75|\t35.86|\t30.57|\t37.43|\n|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|33.28 |29.19| 34.50|-|-|-|\n|[IF](https://github.com/Foruck/Interactiveness-Field)|CVPR2022|33.51 |30.30 |34.46 |36.28 |33.16 |37.21|\n|[ICDT](https://github.com/bingnanG/ICDT)|ICANN2023|34.01 |27.60 |35.92 |36.29 |29.88 |38.21|\n|[PSN](https://arxiv.org/pdf/2307.10499.pdf)|arXiv2023|34.02 |29.44| 35.39|-|-|-|\n|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf)|arXiv2024|34.20 |32.26| 36.10| 37.85| 35.89| 38.78| \n|[VIL+](https://arxiv.org/pdf/2308.02606.pdf)|ACMMM2023|34.21| 30.58| 35.30| 37.67| 34.88| 38.50|\n|[Multi-Step](https://dl.acm.org/doi/10.1145/3581783.3612581)|ACMMM2023|34.42 |30.03| 35.73| 37.71| 33.74| 38.89|\n|[OBPA-Net](https://github.com/zhuang1iu/OBPA-NET)|PRCV2023|34.63 |32.83| 35.16| 36.78| 35.38| 38.04|\n|[MLKD](https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP)|WACV2024|34.69 |31.12| 35.74|-|-|-|\n|[HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|34.69 |31.12| 35.74| 37.61| 34.47| 38.54|\n|[PViC w/ detr](https://github.com/fredzzhang/pvic)|ICCV2023| 34.69 |32.14| 35.45| 38.14| 35.38| 38.97|\n|[GEN-VLKT+SCA](https://arxiv.org/pdf/2312.01713.pdf)|arXiv2023|34.79 |31.80| 35.68|-|-|-|\n|[HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|34.84\t|34.52|\t34.94|-|-|-|\n|[SBM](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)|PRCV2023|34.92 |31.67| 35.85| 38.79| 35.43| 39.60|\n|[ (w/ CLIP)](https://github.com/YueLiao/)|CVPR2022|34.95 |31.18| 36.08| 38.22| 34.36| 39.37|\n|[SOV-STG (res101)](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|35.01 |30.63 |36.32 |37.60 |32.77 |39.05|\n|[GeoHOI](https://github.com/zhumanli/GeoHOI)|arXiv2024| 35.05 |33.01| 35.71| 37.12| 34.79| 37.97|\n|[PartMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|35.15 |33.71| 35.58| 37.56| 35.87| 38.06|\n|[GFIN](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)|NN2023|35.28\t|31.91|\t36.29|\t38.80|\t35.48|\t39.79|\n|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|35.33 | 33.95| 35.74| 37.19| 35.27| 37.77|\n|[LOGICHOI](https://github.com/weijianan1/LogicHOI)|NeurIPS2023|35.47 |32.03| 36.22| 38.21| 35.29| 39.03|\n|[QAHOI-Swin-Large-ImageNet-22K](https://github.com/cjw2021/QAHOI)|arXiv2021|35.78 |29.80 |37.56 |37.59 |31.66 |39.36|\n|[DPADN](https://github.com/PRIS-CV/DPADN)|AAAI2024|35.91 |35.82| 35.94| 38.99| 39.61| 38.80|\n|[-L + CQL](https://arxiv.org/pdf/2303.14005.pdf)|CVPR2023|36.03| 33.16| 36.89| 38.82| 35.51| 39.81|\n|[HOICLIP+DP-HOI](https://github.com/xingaoli/DP-HOI)|CVPR2024|36.56 |34.36| 37.22|-|-|-|\n|[AGER](https://github.com/six6607/AGER)|ICCV2023|36.75 |33.53| 37.71| 39.84| 35.58| 40.23|\n|[FGAHOI](https://github.com/xiaomabufei/FGAHOI)|arXiv2023|37.18 |30.71| 39.11| 38.93| 31.93| 41.02|\n|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|37.22 |35.45| 37.75| 40.61| 38.82| 41.15| \n|[RmLR](https://arxiv.org/pdf/2307.13529.pdf)|ICCV2023|37.41 |28.81| 39.97| 38.69| 31.27| 40.91|\n|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)|arXiv2023|37.54 |37.01| 37.78| 39.98| 39.01| 40.32|\n|[ADA-CM](https://github.com/ltttpku/ADA-CM)|ICCV2023|38.40\t|37.52|\t38.66|-|-|-|\n|[UniVRD w/ extra data+VLM](https://arxiv.org/pdf/2303.08998.pdf)|arXiv2023|38.61| 33.39| 40.16|-|-|-|\n|[SCTC](https://arxiv.org/pdf/2401.05676.pdf)|AAAI2024|39.12 |36.09| 39.87|-|-|-|\n|[BCOM](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)|CVPR2024|39.34 |39.90| 39.17| 42.24| 42.86| 42.05|\n|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|40.95 |40.27 |41.32| 43.26| 43.12| 43.25|\n|[DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf)|arXiv2023|41.50| 39.96| 41.96| 43.62| 41.41| 44.28|\n|[DiffusionHOI](https://arxiv.org/pdf/2410.20155)|NeurIPS2024|42.54 |42.95 |42.35 |44.91 |45.18 |44.83|\n|[SOV-STG (swin-l)](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|43.35| 42.25| 43.69|45.53|43.62| 46.11|\n|[PViC w/ h-detr (swin-l)](https://github.com/fredzzhang/pvic)|ICCV2023|44.32| 44.61| 44.24| 47.81| 48.38| 47.64|\n|[MP-HOI](https://mp-hoi.github.io/)|CVPR2024|44.53|44.48|44.55|-|-|-|\n|[SICHOI](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)|CVPR2024|45.04 |45.61 |44.88 |48.16 |48.37 |48.09|\n|[RLIPv2-ParSeDA w/ extra data](https://github.com/JacobYuan7/RLIPv2)|ICCV2023|45.09| 43.23|45.64|-|-|-|\n|[CycleHOI](https://arxiv.org/pdf/2407.11433)|arXiv2024|45.71| 46.14| 45.52| 49.23| 49.87| 48.96|\n|[Pose-Aware](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)|CVPR2024|46.01 |46.74 |45.80|**49.50** |**50.59** |**49.18**|\n|[PViC+](https://tau-vailab.github.io/Dynamic-Scene-Understanding/)|arXiv2025|**46.49**| **47.43**| **46.21**|-|-|-|\n\n#### 3) Ground Truth human-object pair boxes (only evaluating HOI recognition)\n|Method| Pub|Full(def) | Rare(def) | None-Rare(def)|\n|:---:|:---:|:---:|:---:|:---:|\n|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 33.38|  21.43 |36.95|\n|[Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 |34.26|22.90 |37.65|\n|[Analogy](https://www.di.ens.fr/willow/research/analogy/paper.pdf)| ICCV2019 |34.35 | 27.57 |36.38|\n|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|43.32 |33.84| 46.15|\n|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|43.98|40.27|45.09|\n|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|44.27| 35.52| 46.89|\n|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|45.25|36.27 |47.94|\n|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|46.45 |35.10 |49.84|\n|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021|51.53| 41.01| 54.67|\n|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|52.99 |34.91| 58.40|\n|[ConsNet](https://github.com/yeliudev/ConsNet)|ACMMM2020|53.04|38.79|57.3|\n|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|**62.09** |**59.26** |**62.93**|\n\n\n#### 4) [Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/TIN-PAMI) detection (interactive or not + pair box detection):\n|Method| Pub | HICO-DET | V-COCO |\n|:---:|:---:|:---:|:---:|\n|[TIN++](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network/tree/TIN-PAMI)|TPAMI2022| 14.35| 29.36|\n|[PPDM](https://github.com/YueLiao/PPDM)|CVPR2020|27.34 |-|\n|[QPIC](https://github.com/hitachi-rd-cv/qpic)| CVPR2021| 32.96 |38.33|\n|[CDN](https://github.com/YueLiao/CDN)| NeurIPS2021| 33.55 |40.13|\n|[PartMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|**38.74** |**43.61**|\n\n#### 5) Enhanced with HAKE:\n|Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 14.84\t|10.45\t|16.15|\t16.26\t|11.33|\t17.73|\n|[iCAN + HAKE-HICO-DET](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 19.61 (**+4.77**)\t|17.29\t|20.30|\t22.10|\t20.46|\t22.59|\n|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 17.03 | 13.42| 18.11| 19.17| 15.51|20.26|\n|[Interactiveness + HAKE-HICO-DET](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 22.12 (**+5.09**)|20.19|22.69|24.06|22.19|24.62|\n|[Interactiveness + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action)| CVPR2020 | 22.66 (**+5.63**)|21.17|23.09|24.53|23.00|24.99|\n\n#### 6) Zero-Shot HOI detection:\n\n##### Unseen action-object combination scenario (UC)\n\n| Method | Pub | Detector |  Unseen(def)|Seen(def) |  Full(def) |\n|:---:|:---:|:---:|:---:|:---:|:---:|\n| [Shen et al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf) | WACV2018 | COCO |  5.62 |  - | 6.26 |\n| [Functional](https://arxiv.org/pdf/1904.03181.pdf) | AAAI2020 | HICO-DET | 11.31 ± 1.03 | 12.74 ± 0.34 |  12.45 ± 0.16 | \n| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO |  16.99 ± 1.67 | 20.51 ± 0.62 | 19.81 ± 0.32 | \n| [CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10242152) | TNNLS 2023| - | 18.06 | 23.34 | 20.72 |\n| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-|23.01±1.54|30.39±0.40|28.91±0.27|\n| [HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|-|25.53|34.85|32.99|\n| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|27.43|**35.76**|**34.56**|\n| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73)|NeurIPS2023|-|27.71|33.25|32.11|\n| [HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-|**30.26**| 34.23|33.44|\n||\n| [VCL](https://github.com/zhihou7/VCL) (NF-UC)| ECCV2020 | HICO-DET | 16.22 | 18.52 |  18.06 | \n| [ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL) ((NF-UC))|CVPR2021| HICO-DET | 18.25|18.78| 18.67|\n| [FCL](https://github.com/zhihou7/FCL) (NF-UC)| CVPR2021 | HICO-DET | 18.66 | 19.55 |  19.37 | \n| [RLIP-ParSe](https://github.com/JacobYuan7/RLIP) (NF-UC)|NeurIPS2022|COCO, VG|20.27| 27.67| 26.19|\n| [SCL](https://arxiv.org/pdf/2203.14272) | arxiv |  HICO-DET | 21.73 |  25.00 |  24.34 |\n| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)(NF-UC)| CVPR2023 | HICO-DET |23.25 |28.04 |27.08|\n| [GEN-VLKT*](https://arxiv.org/pdf/2203.13954.pdf) (NF-UC)| CVPR2022 | HICO-DET | 25.05 | 23.38 | 23.71 |\n| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385) (NF-UC)|AAAI2023|HICO-DET|26.77|26.66|26.69|\n| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) (NF-UC)| CVPR2023|HICO-DET|26.39 |28.10| 27.75|\n| [LOGICHOI](https://github.com/weijianan1/LogicHOI) (NF-UC)|NeurIPS2023|-|26.84 |27.86 |27.95|\n| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) (NF-UC)|AAAI2024|-| 27.35 |22.09| 23.14|\n| [UniHOI](https://github.com/Caoyichao/UniHOI) (NF-UC)|NeurIPS2023|-|28.45 | 32.63 | 31.79 |\n| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) (NF-UC)| arXiv2024|-|28.89|28.31|27.77|\n| [DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf) (NF-UC)|arXiv2023| HICO-DET + syn data| 29.45 | 31.68 |31.24|\n| [HCVC](https://arxiv.org/pdf/2311.16475.pdf) (NF-UC)|arXiv2023|-|28.44 |31.35| 30.77|\n| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) (NF-UC)|NeurIPS2023|-|31.44|28.26|28.90|\n| [HOIGen](https://arxiv.org/pdf/2408.05974) (NF-UC)|ACMMM2024|-|**33.98**| **32.86**|**33.08** |\n||\n| [VCL](https://github.com/zhihou7/VCL) (RF-UC)| ECCV2020 | HICO-DET | 10.06 | 24.28 | 21.43 |\n| [ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL) ((RF-UC))|CVPR2021| HICO-DET |9.18|24.67|21.57|\n| [FCL](https://github.com/zhihou7/FCL) (RF-UC)| CVPR2021 | HICO-DET | 13.16 | 24.23 | 22.01 |\n| [SCL](https://arxiv.org/pdf/2203.14272) (RF-UC) | arxiv | HICO-DET | 19.07 |  30.39 | 28.08 |\n| [RLIP-ParSe](https://github.com/JacobYuan7/RLIP) (RF-UC)|NeurIPS2022|COCO, VG|19.19 |33.35| 30.52|\n| [GEN-VLKT*](https://arxiv.org/pdf/2203.13954.pdf) (RF-UC)| CVPR2022 | HICO-DET | 21.36| 32.91 | 30.56 |\n| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)(RF-UC)| CVPR2023 | HICO-DET |21.46 |33.86| 31.38|\n| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) (RF-UC)|AAAI2024|-|23.32| 30.09| 28.53|\n| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) (RF-UC)| CVPR2023|HICO-DET| 25.53 |34.85| 32.99|\n| [LOGICHOI](https://github.com/weijianan1/LogicHOI) (RF-UC)|NeurIPS2023|-|25.97 |34.93| 33.17|\n| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) (RF-UC)| arXiv2024|-|26.33|35.79|34.10|\n| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|28.47|35.48|34.08|\n| [UniHOI](https://github.com/Caoyichao/UniHOI) (RF-UC)|NeurIPS2023|-|28.68 | 33.16 | 32.27 |\n| [DiffHOI w/ syn data](https://arxiv.org/pdf/2305.12252.pdf) (RF-UC)|arXiv2023| HICO-DET + syn data| 28.76 | 38.01 |36.16|\n| [HCVC](https://arxiv.org/pdf/2311.16475.pdf) (RF-UC)|arXiv2023|-|30.95 |37.16 |35.87|\n| [HOIGen](https://arxiv.org/pdf/2408.05974) (RF-UC)|ACMMM2024|-|31.01| 34.57|33.86 |\n| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2) (RF-UC)| ICCV2023| VG, COCO, O365 | **31.23** | **45.01** | **42.26**|\n\n- \\* indicates large Visual-Language model pretraining, \\eg, CLIP. \n- For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.\n\n##### Zero-shot* HOI detection without fine-tuning (NF)\n| Method | Pub | Backbone | Dataset | Detector | Full | Rare | Non-Rare |\n| ---------- | :-----------:  | :-----------:  | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 13.92 | 11.20 | 14.73 |\n| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 15.40 | 15.08 | 15.50 |\n| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | **23.29** | **27.97** | **21.90** |\n- \\* indicates a formulation that assesses the generalization of a pre-training model to unseen distributions, proposed in [RLIP](https://arxiv.org/pdf/2209.01814.pdf). *zero-shot* follows the terminology from CLIP.\n\n##### Unseen object scenario (UO)\n\n| Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def)|\n|:---:|:---:|:---:|:---:|:---:|:---:|\n| [Functional](https://arxiv.org/pdf/1904.03181.pdf) | AAAI2020 | HICO-DET | 13.84 | 14.36 | 11.22 |\n| [FCL](https://github.com/zhihou7/FCL) | CVPR2021 | HICO-DET | 19.87 | 20.74 | 15.54 |\n| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO | 20.71 | 20.99 | 19.27 |\n| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) |AAAI2024|-|27.73|27.87|27.05|\n| [ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021|- |15.11| 21.54|20.47|\n| [GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022| - | 10.51|28.92|25.63|\n| [LOGICHOI](https://github.com/weijianan1/LogicHOI) |NeurIPS2023|-|15.67| 30.42| 28.23|\n| [HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) | CVPR2023|-|16.20|30.99|28.53|\n| [KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|16.50|31.70|28.84|\n| [HCVC](https://arxiv.org/pdf/2311.16475.pdf)| arXiv2023|-|16.78 |33.31 |30.53|\n| [CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|31.79|32.73|32.58|\n| [HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-| **36.35**|**32.90**|**33.48** |\n\n\n##### Unseen action scenario (UA)\n\n| Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def)|\n|:---:|:---:|:---:|:---:|:---:|:---:|\n| [ConsNet](https://github.com/yeliudev/ConsNet) | ACMMM2020 | COCO | 19.04 | 20.02 | 14.12 |\n| [CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10242152) | TNNLS 2023|- | 19.68 | 21.45 | 15.17|\n| [Wu et.al.](https://ojs.aaai.org/index.php/AAAI/article/view/28422) |AAAI2024|-|26.43|28.13|17.92|\n| [EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-|**29.22**| **30.46**| **23.04**|\n\n##### Unseen action scenario (UV), results from EoID\n\n| Method | Pub | Detector |  Unseen(def)|Seen(def) |  Full(def) |\n|:---:|:---:|:---:|:---:|:---:|:---:|\n|[HOIGen](https://arxiv.org/pdf/2408.05974)|ACMMM2024|-| 20.27|34.31|32.34 |\n|[GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022| - |  20.96 |30.23 |28.74 |\n|[EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|-| 22.71 |30.73 |29.61|\n|[HOICLIP](https://arxiv.org/pdf/2303.15786.pdf) | CVPR2023|-|24.30|32.19|31.09|\n|[LOGICHOI](https://github.com/weijianan1/LogicHOI) |NeurIPS2023|-|24.57 |31.88 |30.77|\n|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)| arXiv2023|-|24.69 |36.11| 34.51|\n|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf) | arXiv2024|-|25.20|32.95|31.85|\n|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|-|26.02|31.14|30.42|\n|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|-|**26.05**|**36.78**|**34.68**|\n\n##### Another setting\n| Method | Pub | Unseen| Seen | Full | \n|:---:|:---:|:---:|:---:|:---:|\n|[Shen et. al.](http://vision.stanford.edu/pdf/shen2018wacv.pdf)|WACV2018| 5.62| - |6.26|\n|[Functional](https://arxiv.org/pdf/1904.03181.pdf)|AAAI2020 |10.93 |12.60 |12.26|\n|[VCL](https://github.com/zhihou7/VCL)|ECCV2020 |10.06| 24.28| 21.43|\n|[ATL](https://github.com/zhihou7/HOI-CL)|CVPR2021 |9.18 |24.67 |21.57|\n|[FCL](https://github.com/zhihou7/FCL)| CVPR2021 |13.16 |24.23 |22.01|\n|[THID (w/ CLIP)](https://github.com/scwangdyd/promting_hoi)|CVPR2022 |15.53 |24.32 |22.96|\n|[EoID](https://ojs.aaai.org/index.php/AAAI/article/view/25385)|AAAI2023|**22.04**|31.39|29.52|\n|[GEN-VLKT](https://arxiv.org/pdf/2203.13954.pdf)|CVPR2022|21.36|**32.91**|**30.56**|\n\n#### 7) Few-Shot HOI detection:\n##### 1% HICO-Det Data used in fine-tuning\n| Method | Pub | Backbone | Dataset | Detector | Data |Full | Rare | Non-Rare |\n| ---------- | :-----------:  | :-----------:  | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 1% | 18.30 | 16.22 | 18.92 |\n| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 1% | 18.46 | 17.47 | 18.76 |\n| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | 1% | **32.22** | **31.89** | **32.32** |\n\n##### 10% HICO-Det Data used in fine-tuning\n| Method | Pub | Backbone | Dataset | Detector | Data |Full | Rare | Non-Rare |\n| ---------- | :-----------:  | :-----------:  | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| [RLIP-ParSeD](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 10% | 22.09 | 15.89 | 23.94 |\n| [RLIP-ParSe](https://arxiv.org/pdf/2209.01814.pdf) | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 10% | 22.59 | 20.16 | 23.32 |\n| [RLIPv2-ParSeDA](https://github.com/JacobYuan7/RLIPv2)| ICCV2023 | Swin-L | VG+COCO+O365| DDETR | 10% | **37.46** | **34.75** | **38.27** |\n\n#### 8) Weakly-supervised HOI detection:\n| Method | Pub | Backbone | Dataset | Detector |Full | Rare | Non-Rare |\n| ---------- | :-----------:  | :-----------:  | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| [Explanation-HOI](https://github.com/baldassarreFe/ws-vrd)| ECCV2020 | ResNeXt101 | COCO | FRCNN | 10.63 |8.71 |11.20|\n| [MX-HOI](https://openaccess.thecvf.com/content/WACV2021/papers/Kumaraswamy_Detecting_Human-Object_Interaction_With_Mixed_Supervision_WACV_2021_paper.pdf)| WACV2021 | ResNet-101 | COCO | FRCNN | 16.14 |12.06 |17.50|\n| [PPR-FCN (from Weakly-HOI-CLIP)](https://arxiv.org/pdf/1708.01956.pdf)| ICCV2017 | ResNet-50, CLIP | COCO | FRCNN | 17.55 |15.69 | 18.41|\n| [Align-Former](https://www.bmvc2021-virtualconference.com/assets/papers/0054.pdf)| BMVC2021 | ResNet-101 | - | - | 20.85 |18.23 |21.64|\n| [Weakly-HOI-CLIP](https://arxiv.org/pdf/2303.01313.pdf) | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | 25.70 |**24.52**| 26.05|\n| [OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR 2023|DETR|-|-|**25.82** |24.35 |**26.19**|\n\n### [Ambiguous-HOI](https://github.com/DirtyHarryLYL/DJ-RN)\n#### Detector: COCO pre-trained\n|Method| mAP |\n|:---:|:---:|\n|[iCAN](https://github.com/vt-vl-lab/iCAN)| 8.14 |\n|[Interactiveness](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| 8.22 |\n|[Analogy(reproduced)](https://github.com/jpeyre/analogy)| 9.72 |\n|[DJ-RN](https://github.com/DirtyHarryLYL/DJ-RN)| 10.37|\n|[OC-Immunity](https://github.com/Foruck/OC-Immunity)|**10.45**|\n\n### [SWiG-HOI](https://github.com/scwangdyd/large_vocabulary_hoi_detection)\n| Method | Pub| Non-Rare| Unseen| Seen | Full | \n|:---:|:---:|:---:|:---:|:---:|:---:|\n|[JSR](https://prior.allenai.org/projects/gsr)| ECCV2020| 10.01| 6.10| 2.34| 6.08|\n|[CHOID](https://github.com/scwangdyd/)|ICCV2021|10.93 |6.63 |2.64 |6.64|\n|[QPIC](https://github.com/hitachi-rd-cv/qpic)| CVPR2021| 16.95| 10.84| 6.21| 11.12|\n|[THID (w/ CLIP)](https://github.com/scwangdyd/promting_hoi)|CVPR2022 |**17.67**| **12.82**| **10.04**| **13.26**|\n\n\n### V-COCO: Scenario1\n\n#### 1) Detector: COCO pre-trained or one-stage detector\n|Method| Pub | AP(role) |\n|:---:|:---:|:---:|\n|[Gupta et al.](https://arxiv.org/pdf/1505.04474.pdf)|arXiv| 31.8|\n|[InteractNet](https://arxiv.org/pdf/1704.07333.pdf)|CVPR2018|40.0|\n|[Turbo](https://arxiv.org/pdf/1903.06355.pdf)|AAAI2019|42.0|\n|[GPNN](https://arxiv.org/pdf/1808.07962.pdf)|ECCV2018|44.0|\n|[UniVRD w/ extra data+VLM](https://arxiv.org/pdf/2303.08998.pdf)|arXiv2023|45.19|\n|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| BMVC2018 | 45.3| \n|[Xu et. al](https://www-users.cs.umn.edu/~qzhao/publications/pdf/xu2019cvpr.pdf)| CVPR2019| 45.9|\n|[Wang et. al.](http://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Contextual_Attention_for_Human-Object_Interaction_Detection_ICCV_2019_paper.pdf)| ICCV2019|47.3|\n|[UniDet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600494.pdf)|ECCV2020|47.5|\n|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 47.8| \n|[Lin et. al](https://www.ijcai.org/Proceedings/2020/0154.pdf)|IJCAI2020|48.1|\n|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|48.3|\n|[Zhou et. al.](https://arxiv.org/pdf/2003.04262.pdf) |CVPR2020|48.9|\n|[In-GraphNet](https://arxiv.org/pdf/2007.06925.pdf)|IJCAI-PRICAI 2020|48.9|\n|[Interactiveness-optimized](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)| CVPR2019 | 49.0|\n|[TIN-PAMI](https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network)|TAPMI2021|49.1|\n|[IP-Net](https://arxiv.org/pdf/2003.14023.pdf)|CVPR2020|51.0|\n|[DRG](https://github.com/vt-vl-lab/DRG)|ECCV2020|51.0|\n|[RGBM](https://arxiv.org/pdf/2202.11998.pdf)|arXiv2022|51.7|\n|[VSGNet](https://arxiv.org/pdf/2003.05541.pdf)|CVPR2020|51.8|\n|[PMN](https://github.com/birlrobotics/PMN)|arXiv|51.8|\n|[PMFNet](https://arxiv.org/pdf/1909.08453.pdf)|ICCV2019|52.0|\n|[Liu et.al.](https://arxiv.org/pdf/2105.03089.pdf)|arXiv|52.28|\n|[FCL](https://github.com/zhihou7/FCL)|CVPR2021|52.35|\n|[PD-Net](https://github.com/MuchHair/PD-Net)|ECCV2020|52.6|\n|[Wang et.al.](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620239.pdf)|ECCV2020|52.7|\n|[PFNet](https://link.springer.com/content/pdf/10.1007/s41095-020-0188-2.pdf)|CVM|52.8|\n|[Zou et al.](https://github.com/bbepoch/HoiTransformer)|CVPR2021|52.9|\n|[SIGN](https://ieeexplore.ieee.org/ielx7/9099125/9102711/09102755.pdf)|ICME2020|53.1|\n|[ACP](https://github.com/Dong-JinKim/ActionCooccurrencePriors/)|ECCV2020|52.98 (53.23)|\n|[FCMNet](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590239.pdf)|ECCV2020|53.1|\n|[HRNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=9552553)|TIP2021|53.1|\n|[SGCN4HOI](https://arxiv.org/pdf/2207.05733.pdf)|IEEESMC2022|53.1|\n|[ConsNet](https://arxiv.org/pdf/2008.06254.pdf)|ACMMM2020|53.2|\n|[IDN](https://github.com/DirtyHarryLYL/HAKE-Action-Torch/tree/IDN-(Integrating-Decomposing-Network))|NeurIPS2020|53.3|\n|[SG2HOI](https://arxiv.org/pdf/2108.08584.pdf)|ICCV2021|53.3|\n|[OSGNet](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9360596)|IEEE Access|53.43|\n|[SABRA-Res50](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 53.57|\n|[K-BAN](https://arxiv.org/pdf/2207.07979.pdf)|arXiv2022|53.70|\n|[IPGN](https://ieeexplore.ieee.org/document/9489275)|TIP2021|53.79|\n|[AS-Net](https://github.com/yoyomimi/AS-Net)|CVPR2021|53.9|\n|[RR-Net](https://arxiv.org/pdf/2104.15015.pdf)|arXiv|54.2|\n|[SCG](https://github.com/fredzzhang/spatially-conditioned-graphs)|ICCV2021|54.2|\n|[HOKEM](https://arxiv.org/pdf/2306.14260.pdf)|arXiv2023|54.6|\n|[SABRA-Res50FPN](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 54.69|\n|[GGNet](https://github.com/SherlockHolmes221/GGNet)|CVPR2021|54.7|\n|[MLCNet](https://dl.acm.org/doi/pdf/10.1145/3372278.3390671)| ICMR2020|55.2|\n|[HOTR](https://github.com/kakaobrain/HOTR)|CVPR2021|55.2|\n|[DIRV](https://arxiv.org/pdf/2010.01005.pdf)|AAAI2021|56.1|\n|[UnionDet](https://arxiv.org/pdf/2312.12664.pdf)|arXiv2023|56.2|\n|[SABRA-Res152](https://arxiv.org/pdf/2012.12510.pdf)| arXiv| 56.62|\n|[PhraseHOI](https://arxiv.org/pdf/2112.07383.pdf)|AAAI2022|57.4|\n|[GTNet](https://github.com/UCSB-VRL/GTNet)|arXiv|58.29|\n|[QPIC-Res101](https://github.com/hitachi-rd-cv/qpic)|CVPR2021|58.3|\n|[ADA-CM](https://github.com/ltttpku/ADA-CM)|ICCV2023|58.57|\n|[QPIC-Res50](https://github.com/hitachi-rd-cv/qpic)|CVPR2021| 58.8|\n|[ICDT](https://github.com/bingnanG/ICDT)|ICANN2023|59.4|\n|[CATN (w/ fastText)](https://arxiv.org/pdf/2204.04911.pdf)|CVPR2022|60.1|\n|[FGAHOI](https://github.com/xiaomabufei/FGAHOI)|arXiv2023|60.5|\n|[Iwin](https://arxiv.org/pdf/2203.10537.pdf)|ECCV2022|60.85|\n|[UPT-ResNet-101-DC5](https://github.com/fredzzhang/upt)|CVPR2022| 61.3|\n|[CDT](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10242152) | TNNLS 2023|61.43|\n|[SBM](https://link.springer.com/chapter/10.1007/978-981-99-8429-9_5)|PRCV2023|61.5|\n|[SDT](https://arxiv.org/pdf/2207.01869.pdf)|arXiv2022|61.8|\n|[OpenCat](https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf)|CVPR2023|61.9|\n|[MSTR](https://arxiv.org/pdf/2203.14709.pdf)|CVPR2022|62.0|\n|[ViPLO](https://arxiv.org/pdf/2304.08114.pdf)|CVPR2023|62.2|\n|[Multi-Step](https://dl.acm.org/doi/10.1145/3581783.3612581)|ACMMM2023|62.4|\n|[PViC w/ detr](https://github.com/fredzzhang/pvic)|ICCV2023|62.8|\n|[PR-Net](https://arxiv.org/pdf/2301.03510.pdf)|arXiv2023|62.9|\n|[IF](https://github.com/Foruck/Interactiveness-Field)|CVPR2022|63.0|\n|[ParMap](https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness)|ECCV2022|63.0|\n|[QPIC-CPC](https://arxiv.org/pdf/2204.04836.pdf)|CVPR2022|63.1|\n|[DOQ](https://github.com/SherlockHolmes221/DOQ)|CVPR2022|63.5|\n|[HOICLIP](https://github.com/Artanic30/HOICLIP)|CVPR2023|63.5|\n|[GEN-VLKT (w/ CLIP)](https://github.com/YueLiao/gen-vlkt)|CVPR2022|63.58|\n|[SG2HOI](https://arxiv.org/pdf/2311.01755.pdf)|arxXiv2023|63.6|\n|[QPIC+HQM](https://arxiv.org/pdf/2207.05293.pdf)|ECCV2022|63.6|\n|[SOV-STG](https://arxiv.org/pdf/2307.02291.pdf)|arXiv2023|63.9|\n|[KI2HOI](https://arxiv.org/pdf/2403.07246.pdf)|arXiv2024|63.9|\n|[CDN](https://github.com/YueLiao/CDN)|NeurIPS2021|63.91|\n|[PViC w/ h-detr (swin-l)](https://github.com/fredzzhang/pvic)|ICCV2023|64.1|\n|[OBPA-Net](https://github.com/zhuang1iu/OBPA-NET)|PRCV2023|64.1|\n|[RmLR](https://arxiv.org/pdf/2307.13529.pdf)|ICCV2023|64.17|\n|[RLIP-ParSe (COCO+VG)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|64.2|\n|[LOGICHOI](https://github.com/weijianan1/LogicHOI)|NeurIPS2023|64.4|\n|[MHOI](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9927451)|TCSVT2022|64.5|\n|[GEN-VLKT+SCA](https://arxiv.org/pdf/2312.01713.pdf)|arXiv2023|64.5|\n|[PDN](https://www.sciencedirect.com/science/article/pii/S0031320323007185?via%3Dihub)|PR2023|64.7|\n|[ParSe (COCO)](https://github.com/JacobYuan7/RLIP)|NeurIPS2022|64.8|\n|[SSRT](https://arxiv.org/pdf/2204.00746.pdf)|CVPR2022|65.0|\n|[SQAB](https://www.sciencedirect.com/science/article/pii/S0141938223002044?via%3Dihub#tbl1)|Displays2023|65.0|\n|[OCN](https://github.com/JacobYuan7/OCN-HOI-Benchmark)|AAAI2022|65.3|\n|[SQA](https://github.com/nmbzdwss/SQA)|ICASSP2023|65.4|\n|[AGER](https://github.com/six6607/AGER)|ICCV2023|65.68|\n|[DiffHOI](https://arxiv.org/pdf/2305.12252.pdf)|arXiv2023|65.7|\n|[BCOM](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)|CVPR2024|65.8|\n|[PSN](https://arxiv.org/pdf/2307.10499.pdf)|arXiv2023|65.9|\n|[DPADN](https://github.com/PRIS-CV/DPADN)|AAAI2024|62.62|\n|[Pose-Aware](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)|CVPR2024|63.0|\n|[CO-HOI](https://arxiv.org/pdf/2410.15657)|arXiv2024|65.44|\n|[STIP](https://github.com/zyong812/STIP)|CVPR2022|66.0|\n|[DT](https://arxiv.org/pdf/2204.09290.pdf)|CVPR2022|66.2|\n|[MP-HOI](https://mp-hoi.github.io/)|CVPR2024|66.2|\n|[CLIP4HOI](https://openreview.net/pdf?id=nqIIWnwe73) |NeurIPS2023|66.3|\n|[GENs+DP-HOI](https://github.com/xingaoli/DP-HOI)|CVPR2024|66.6|\n|[GEN-VLKT-L + CQL](https://arxiv.org/pdf/2303.14005.pdf)|CVPR2023|66.8|\n|[CycleHOI](https://arxiv.org/pdf/2407.11433)|arXiv2024|66.8|\n|[HODN](https://arxiv.org/pdf/2308.10158.pdf)|TMM2023| 67.0|\n|[DiffusionHOI](https://arxiv.org/pdf/2410.20155)|NeurIPS2024|67.1|\n|[VIL+DisTR](https://arxiv.org/pdf/2308.02606.pdf)|ACMMM2023|67.6|\n|[UniHOI](https://github.com/Caoyichao/UniHOI)|NeurIPS2023|68.05|\n|[SCTC](https://arxiv.org/pdf/2401.05676.pdf)|AAAI2024|68.2|\n|[HCVC](https://arxiv.org/pdf/2311.16475.pdf)|arXiv2023|68.4|\n|[MUREN](http://cvlab.postech.ac.kr/research/MUREN/)|CVPR2023|68.8|\n|[GeoHOI](https://github.com/zhumanli/GeoHOI)|arXiv2024| 69.4|\n|[GFIN](https://www.sciencedirect.com/science/article/pii/S0893608023006251?via%3Dihub#fig1)|NN2023|70.1|\n|[SICHOI](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)|CVPR2024|71.1|\n|[RLIPv2-ParSeDA w/ extra data](https://github.com/JacobYuan7/RLIPv2)|ICCV2023|**72.1**|\n\n#### 2) Enhanced with HAKE:\n|Method| Pub | AP(role) |\n|:---:|:---:|:---:|\n|[iCAN](https://arxiv.org/pdf/1808.10437.pdf)| CVPR2019 | 45.3| \n|[iCAN + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) (transfer learning)| CVPR2020 | 49.2 (**+3.9**)|\n|[Interactiveness](https://arxiv.org/pdf/1811.08264.pdf)| CVPR2019 | 47.8| \n|[Interactiveness + HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Instance-level-HAKE-Action) (transfer learning)| CVPR2020 | 51.0 (**+3.2**)|\n\n#### 3) Weakly-supervised HOI detection:\n| Method | Pub | Backbone | Dataset | Detector | AP(role)-S1 |AP(role)-S2 |\n| ---------- | :-----------:  | :-----------:  | :-----------: | :-----------: | :-----------: | :-----------: |\n| [Weakly-HOI-CLIP](https://arxiv.org/pdf/2303.01313.pdf) | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | **44.74**|**49.97**|\n\n### [HOI-COCO](https://github.com/zhihou7/HOI-CL): \nbased on V-COCO\n\n| Method | Pub | Full | Seen | Unseen|\n|:---:|:---:|:---:|:---:|:---:|\n|[VCL](https://github.com/zhihou7/VCL)|ECCV2020|23.53 |8.29| 35.36|\n|[ATL(w/ COCO)](https://github.com/zhihou7/HOI-CL)|CVPR2021|23.40 |8.01 |35.34|\n\n\n### HICO\n\n#### 1) Default\n|Method| mAP |\n|:---:|:---:|\n[R\\*CNN](https://arxiv.org/pdf/1505.01197.pdf) | 28.5 |\n[Girdhar et.al.](https://arxiv.org/pdf/1711.01467.pdf) |34.6|\n[Mallya et.al.](https://arxiv.org/pdf/1604.04808.pdf) |36.1|\n[RAM++ LLM](https://github.com/xinyu1205/recognize-anything) | 37.6|\n[Pairwise](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf) |39.9| \n|[RelViT](https://arxiv.org/pdf/2204.11167.pdf)|40.12|\n[DEFR-base](https://arxiv.org/pdf/2107.13083.pdf)|44.1|\n[OpenTAP](https://vkhoi.github.io/TAP)|51.7|\n[DEFR-CLIP](https://arxiv.org/pdf/2107.13083.pdf)|60.5|\n|[HTS](https://ieeexplore.ieee.org/abstract/document/10222927)|60.5|\n|[DEFR/16 CLIP](https://arxiv.org/pdf/2112.06392.pdf)|**65.6**|\n\n#### 2) Enhanced with HAKE:\n|Method| mAP |\n|:---:|:---:|\n[Mallya et.al.](https://arxiv.org/pdf/1604.04808.pdf) |36.1|\n[Mallya et.al.+HAKE-HICO](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action) |45.0 (**+8.9**)|\n[Pairwise](http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshu_Fang_Pairwise_Body-Part_Attention_ECCV_2018_paper.pdf) |39.9| \n[Pairwise+HAKE-HICO](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action)|45.9 (**+6.0**)|\n[Pairwise+HAKE-Large](https://github.com/DirtyHarryLYL/HAKE-Action/tree/Image-level-HAKE-Action)|46.3 (**+6.4**)|\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDirtyHarryLYL%2FHOI-Learning-List","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDirtyHarryLYL%2FHOI-Learning-List","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDirtyHarryLYL%2FHOI-Learning-List/lists"}