{"id":19320156,"url":"https://github.com/52cv/iccv-2023-papers","last_synced_at":"2026-01-25T08:31:25.102Z","repository":{"id":184392857,"uuid":"668508883","full_name":"52CV/ICCV-2023-Papers","owner":"52CV","description":null,"archived":false,"fork":false,"pushed_at":"2023-11-14T02:38:35.000Z","size":1676,"stargazers_count":248,"open_issues_count":0,"forks_count":12,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-02-24T05:14:38.369Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/52CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-20T01:42:22.000Z","updated_at":"2025-02-21T19:58:27.000Z","dependencies_parsed_at":"2023-11-14T03:28:35.145Z","dependency_job_id":"b47d7e6b-b68f-4c8e-b113-025ed2e8ba68","html_url":"https://github.com/52CV/ICCV-2023-Papers","commit_stats":null,"previous_names":["52cv/iccv-2023-papers"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/52CV/ICCV-2023-Papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2023-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2023-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2023-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2023-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/52CV","download_url":"https://codeload.github.com/52CV/ICCV-2023-Papers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2023-Papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28749303,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T08:31:04.260Z","status":"ssl_error","status_checked_at":"2026-01-25T08:30:28.859Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T01:27:19.586Z","updated_at":"2026-01-25T08:31:25.079Z","avatar_url":"https://github.com/52CV.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ICCV-2023-Papers\n![Alt text](af0c53186833a908a200f58867b6dcf.png)\n\n## 官网链接：https://iccv2023.thecvf.com/\n\n### 研讨会:bell:：2023 年 10 月 2 日至 3 日\u003cbr\u003e\n### 主会:bell:：2023 年 10 月 4 日至 6 日\n\n## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~\n\n## 2024 年论文分类汇总戳这里\n↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)\n\n## 2023 年论文分类汇总戳这里\n↘️[CVPR-2023-Papers](https://github.com/52CV/CVPR-2023-Papers)\n↘️[WACV-2023-Papers](https://github.com/52CV/WACV-2023-Papers)\n↘️[ICCV-2023-Papers](https://github.com/52CV/ICCV-2023-Papers)\n\n## [2022 年论文分类汇总戳这里](#000)\n## [2021 年论文分类汇总戳这里](#00)\n## [2020 年论文分类汇总戳这里](#0)\n\n## 目录\n\n|:cat:|:dog:|:tiger:|:wolf:|\n|------|------|------|------|\n|[1.其它(others)](#1)|[2.3D(三维重建\\三维视觉)](#2)|[3.Object Detection(目标检测)](#3)|[4.Object Tracking(目标跟踪)](#4)|\n|[5.Biometric Recognition(生物特征识别)](#5)|[6.Face(人脸)](#6)|[7.Image Progress(低层图像处理、质量评价)](#7)|[8.Image Segmentation(图像分割)](#8)|\n|[9.Image Classification(图像分类)](#9)|[10.Image Synthesis(图像合成)](#10)|[11.Image/Video Editing(图像/视频编辑)](#11)|[12.Medical Image(医学影像)](#12)|\n|[13.Image Captions(图像字幕)](#13)|[14.Image/Video Composition(图像/视频压缩)](#14)|[15.Image/Video Retrieval(图像/视频检索)](#15)|[16.Super-Resolution(超分辨率)](#16)|\n|[17.GAN](#17)|[18.Pose](#18)|[19.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#19)|[20.Reid](#20)|\n|[21.Point Cloud(点云)](#21)|[22.OCR](#22)|[23.Optical Flow Estimation(光流估计)](#23)|[24.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)](#24)|\n|[25.Model Compression/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#25)|[26.ML(机器学习)](#26)|[27.Self/Semi-Supervised Learning](#27)|[28.Style Transfer(风格迁移)](#28)|\n|[29.Autonomous vehicles(自动驾驶)](#30)|[30.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)](#30)|[31.HOI(人物交互)](#31)|[32.Sign Language Recognition(手语)](#32)|\n|[33.Video](#33)|[34.Action Detection](#34)|[35.Human Motion Prediction(人体运动预测)](#35)|[36.Vision Question Answering(视觉问答)](#36)|\n|[37.Object Pose Estimation(物体姿势估计)](#37)|[38.Vision-Language(视觉语言)](#38)|[39.Keypoint Detection(关键点检测)](#39)|[40.Anomaly Detection(异常检测)](#40)|\n|[41.Vision Transformers](#41)|[42.Dataset/Benchmark](#42)|[43.Neural Radiance Fields](#43)|[44.Rendering(渲染)](#44)|\n|[45.Scene Graph Generation(场景图合成)](#45)|[46.Edge Detection](#46)|[47.Image-to-Image Translation](#47)|[48.Image Reconstruction](#48)|\n|[49.Image Fusion(图像融合)](#49)|[50.Image Matching(图像匹配)](#50)|[51.Visual Place Recognition](#51)|[52.View Synthesis(视图合成)](#52)|\n|[53.Computed Imaging(计算成像，如光学、几何、光场成像等)](#53)|[54.Gaze Estimation](#54)|[55.sound(语音)](#55)|[56.NAS](#56)|\n|[57.Semantic Scene Completion(语义场景补全)](#57)|[58.scene flow estimation(场景流估计)](#58)|[59.Copyright Protection(版权保护/信息安全)](#59)|[60.Visual Localization(视觉定位)](#60)|\n|[61.Natural Language Progress(NLP)](#61)|[62.Group Affect Recognition(群体情感识别)](#62)|[63.Industrial Defect Detectors](#63)|[64.Scene Understanding(场景理解)](#64)|\n|[65.Deepfake Detectors](#65)|[66.Graph Neural Networks(图神经网络)](#66)|[67.Open Set Recognition(开集识别)](#67)|[68.Clustering(聚类)](#68)|\n|[69.Affordance Learning(启示学习)](#69)|[70.Active Learning(主动学习)](#70)|[71.Data Augmentation(数据增强)](#71)|[72.Dense Prediction(密集预测)](#72)|\n|[73.Spiking Neural Networks](#73)|\n\n\n## 💥💥💥ICCV 2023 获奖论文\n### 最佳论文奖——马尔奖\n* [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/pdf/2302.05543.pdf)\u003cbr\u003e:star:[code](https://github.com/lllyasviel/ControlNet)\n* [Passive Ultra-Wideband Single-Photon Imaging](https://appleswithacapitala.github.io/static/docs/paper.pdf)\u003cbr\u003e:star:[code](https://appleswithacapitala.github.io/)\n### 最佳论文奖提名\n* [Segment Anything](https://arxiv.org/abs/2304.02643)\u003cbr\u003e:house:[project](https://segment-anything.com/)\n### 最佳学生论文奖\n* [Tracking Everything Everywhere All at Once](https://browse.arxiv.org/pdf/2306.05422.pdf)\u003cbr\u003e:house:[project](https://github.com/qianqianwang68/omnimotion)\n\n\u003cbr\u003e:thumbsup:[ICCV 2023 数据集分享（含水下图像视频、阴影去除、目标检测跟踪分割、交互、超分等）](https://mp.weixin.qq.com/s/XK943x4INOGHzD5Kvvo_Hw)\n\u003cbr\u003e:thumbsup:[ICCV 2023 数据集分享（含动人物姿态、自动驾驶、遥感、去雪、人脸、VOS等）](https://zhuanlan.zhihu.com/p/660344176)\n\n\u003ca name=\"78\"/\u003e\n\n## 78.Sketch\n* [CLIPascene: Scene Sketching with Different Types and Levels of Abstraction](http://arxiv.org/abs/2211.17256)\n\n\u003ca name=\"73\"/\u003e\n\n## 73.Spiking Neural Networks\n* [RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks](http://arxiv.org/abs/2308.06787v1)\n* [Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks](https://openaccess.thecvf.com/content/ICCV2023/papers/Meng_Towards_Memory-_and_Time-Efficient_Backpropagation_for_Training_Spiking_Neural_Networks_ICCV_2023_paper.pdf)\n* [SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_SSF_Accelerating_Training_of_Spiking_Neural_Networks_with_Stabilized_Spiking_ICCV_2023_paper.pdf)\n* [Inherent Redundancy in Spiking Neural Networks](http://arxiv.org/abs/2308.08227v1)\u003cbr\u003e:star:[code](https://github.com/BICLab/ASA-SNN)\n* [Membrane Potential Batch Normalization for Spiking Neural Networks](http://arxiv.org/abs/2308.08359v1)\u003cbr\u003e:star:[code](https://github.com/yfguo91/MPBN)\n* [Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Unleashing_the_Potential_of_Spiking_Neural_Networks_with_Dynamic_Confidence_ICCV_2023_paper.pdf)\n* [Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wei_Temporal-Coded_Spiking_Neural_Networks_with_Dynamic_Firing_Threshold_Learning_with_ICCV_2023_paper.pdf)\n* [Efficient Converted Spiking Neural Network for 3D and 2D Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/Lan_Efficient_Converted_Spiking_Neural_Network_for_3D_and_2D_Classification_ICCV_2023_paper.pdf)\n\n\u003ca name=\"72\"/\u003e\n\n## 72.Dense Prediction(密集预测)\n* [Multi-Task Learning with Knowledge Distillation for Dense Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Multi-Task_Learning_with_Knowledge_Distillation_for_Dense_Prediction_ICCV_2023_paper.pdf)\n* [Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera](https://openaccess.thecvf.com/content/ICCV2023/papers/Cai_Consistent_Depth_Prediction_for_Transparent_Object_Reconstruction_from_RGB-D_Camera_ICCV_2023_paper.pdf)\n* [EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Cai_EfficientViT_Lightweight_Multi-Scale_Attention_for_High-Resolution_Dense_Prediction_ICCV_2023_paper.pdf)\n\n\u003ca name=\"71\"/\u003e\n\n## 71.Data Augmentation(数据增强)\n* [HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness](http://arxiv.org/abs/2307.11823v1)\n* [MixBag: Bag-Level Data Augmentation for Learning from Label Proportions](http://arxiv.org/abs/2308.08822v1)\n* [When to Learn What: Model-Adaptive Data Augmentation Curriculum](http://arxiv.org/abs/2309.04747)\n* [Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning](http://arxiv.org/abs/2308.06038)\n\n\u003ca name=\"70\"/\u003e\n\n## 70.Active Learning(主动学习)\n* [TiDAL: Learning Training Dynamics for Active Learning](http://arxiv.org/abs/2210.06788)\n* [HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling](http://arxiv.org/abs/2301.10460)\n* [Knowledge-Aware Federated Active Learning with Non-IID Data](http://arxiv.org/abs/2211.13579)\n* [Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Skip-Plan_Procedure_Planning_in_Instructional_Videos_via_Condensed_Action_Space_ICCV_2023_paper.pdf)\n\n\u003ca name=\"69\"/\u003e\n\n## 69.Affordance Learning(启示学习)\n* [MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects](https://openaccess.thecvf.com/content/ICCV2023/papers/Liang_MAAL_Multimodality-Aware_Autoencoder-Based_Affordance_Learning_for_3D_Articulated_Objects_ICCV_2023_paper.pdf)\n\n\u003ca name=\"68\"/\u003e\n\n## 68.Clustering(聚类)\n* [Deep Multiview Clustering by Contrasting Cluster Assignments](http://arxiv.org/abs/2304.10769)\n* [Cross-modal Scalable Hyperbolic Hierarchical Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Long_Cross-modal_Scalable_Hierarchical_Clustering_in_Hyperbolic_space_ICCV_2023_paper.pdf)\n* [Cross-view Topology Based Consistent and Complementary Information for Deep Multi-view Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_Cross-view_Topology_Based_Consistent_and_Complementary_Information_for_Deep_Multi-view_ICCV_2023_paper.pdf)\n* [MHCN: A Hyperbolic Neural Network Model for Multi-view Hierarchical Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_MHCN_A_Hyperbolic_Neural_Network_Model_for_Multi-view_Hierarchical_Clustering_ICCV_2023_paper.pdf)\n* [Stable Cluster Discrimination for Deep Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Qian_Stable_Cluster_Discrimination_for_Deep_Clustering_ICCV_2023_paper.pdf)\n* [Unsupervised Manifold Linearizing and Clustering](http://arxiv.org/abs/2301.01805)\n* [Anchor Structure Regularization Induced Multi-view Subspace Clustering via Enhanced Tensor Rank Minimization](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_Anchor_Structure_Regularization_Induced_Multi-view_Subspace_Clustering_via_Enhanced_Tensor_ICCV_2023_paper.pdf)\n* [Surface Normal Clustering for Implicit Representation of Manhattan Scenes](http://arxiv.org/abs/2212.01331)\n\n\u003ca name=\"67\"/\u003e\n\n## 67.Open Set Recognition(开集识别)\n* [FedPD: Federated Open Set Recognition with Parameter Disentanglement](https://openaccess.thecvf.com/content/ICCV2023/papers/Yang_FedPD_Federated_Open_Set_Recognition_with_Parameter_Disentanglement_ICCV_2023_paper.pdf)\n\n\u003ca name=\"66\"/\u003e\n\n## 66.Graph Neural Networks(图神经网络)\n* [VertexSerum: Poisoning Graph Neural Networks for Link Inference](http://arxiv.org/abs/2308.01469)\n* [Learning Adaptive Neighborhoods for Graph Neural Networks](http://arxiv.org/abs/2307.09065)\n* [Vision HGNN: An Image is More than a Graph of Nodes](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Vision_HGNN_An_Image_is_More_than_a_Graph_of_ICCV_2023_paper.pdf)\n\n\u003ca name=\"65\"/\u003e\n\n## 65.Deepfake Detectors\n* [Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View](https://openaccess.thecvf.com/content/ICCV2023/papers/Yao_Towards_Understanding_the_Generalization_of_Deepfake_Detectors_from_a_Game-Theoretical_ICCV_2023_paper.pdf)\n* [Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning](http://arxiv.org/abs/2309.05911)\n* [SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes](http://arxiv.org/abs/2211.11296)\n* [UCF: Uncovering Common Features for Generalizable Deepfake Detection](http://arxiv.org/abs/2304.13949)\n\n\u003ca name=\"64\"/\u003e\n\n## 64.Scene Understanding(场景理解)\n* [Shape Anchor Guided Holistic Indoor Scene Understanding](http://arxiv.org/abs/2309.11133)\n* [Efficient Computation Sharing for Multi-Task Visual Scene Understanding](http://arxiv.org/abs/2303.09663)\n* [Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Agarwal_Ordered_Atomic_Activity_for_Fine-grained_Interactive_Traffic_Scenario_Understanding_ICCV_2023_paper.pdf)\n* [Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting](http://arxiv.org/abs/2304.03763)\n* [Human-centric Scene Understanding for 3D Large-scale Scenarios](http://arxiv.org/abs/2307.14392)\n\n\u003ca name=\"63\"/\u003e\n\n## 63.Industrial Defect Detectors\n* 工业缺陷定位\n  * [Removing Anomalies as Noises for Industrial Defect Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Lu_Removing_Anomalies_as_Noises_for_Industrial_Defect_Localization_ICCV_2023_paper.pdf)\n* 工业异常检测\n  * [PNI : Industrial Anomaly Detection using Position and Neighborhood Information](http://arxiv.org/abs/2211.12634)\n  * [FastRecon: Few-shot Industrial Anomaly Detection via Fast Feature Reconstruction](https://openaccess.thecvf.com/content/ICCV2023/papers/Fang_FastRecon_Few-shot_Industrial_Anomaly_Detection_via_Fast_Feature_Reconstruction_ICCV_2023_paper.pdf)\n* 裂缝检测\n  * [The Devil is in the Crack Orientation: A New Perspective for Crack Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_The_Devil_is_in_the_Crack_Orientation_A_New_Perspective_ICCV_2023_paper.pdf)\n\n\u003ca name=\"62\"/\u003e\n\n## 62.Group Affect Recognition(群体情感识别)\n* [Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_Most_Important_Person-Guided_Dual-Branch_Cross-Patch_Attention_for_Group_Affect_Recognition_ICCV_2023_paper.pdf)\n\n\u003ca name=\"61\"/\u003e\n\n## 61.Natural Language Progress(NLP)\n* [Improved Visual Fine-tuning with Natural Language Supervision](http://arxiv.org/abs/2304.01489)\n* [Tracking by Natural Language Specification with Long Short-term Context Decoupling](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Tracking_by_Natural_Language_Specification_with_Long_Short-term_Context_Decoupling_ICCV_2023_paper.pdf)\n\n\u003ca name=\"60\"/\u003e\n\n## 60.Visual Localization(视觉定位)\n* [EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization](http://arxiv.org/abs/2309.07471v1)\n* [Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance](http://arxiv.org/abs/2309.07403v1)\n* [Yes, we CANN: Constrained Approximate Nearest Neighbors for Local Feature-Based Visual Localization](http://arxiv.org/abs/2306.09012)\n* [OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_OFVL-MS_Once_for_Visual_Localization_across_Multiple_Indoor_Scenes_ICCV_2023_paper.pdf)\n\n\u003ca name=\"59\"/\u003e\n\n## 59.Copyright Protection(版权保护/信息安全)\n* [Towards Robust Model Watermark via Reducing Parametric Vulnerability](http://arxiv.org/abs/2309.04777v1)\u003cbr\u003e:star:[code](https://github.com/GuanhaoGan/robust-model-watermarking)\n* [CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields](http://arxiv.org/abs/2307.11526v1)\n\n\u003ca name=\"58\"/\u003e\n\n## 58.scene flow estimation(场景流估计)\n* [EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity](http://arxiv.org/abs/2309.01296v1)\n* [Fast Neural Scene Flow](http://arxiv.org/abs/2304.09121)\n* [Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud Based Scene Flow Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_Multi-Scale_Bidirectional_Recurrent_Network_with_Hybrid_Correlation_for_Point_Cloud_ICCV_2023_paper.pdf)\n* [IHNet: Iterative Hierarchical Network Guided by High-Resolution Estimated Information for Scene Flow Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_IHNet_Iterative_Hierarchical_Network_Guided_by_High-Resolution_Estimated_Information_for_ICCV_2023_paper.pdf)\n\n\u003ca name=\"57\"/\u003e\n\n## 57.Semantic Scene Completion(语义场景补全)\n* [NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space](http://arxiv.org/abs/2309.14616v1)\u003cbr\u003e:star:[code](https://jiawei-yao0812.github.io/NDC-Scene/)\u003cbr\u003e:star:[code](https://github.com/Jiawei-Yao0812/NDCScene)\n* [DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DDIT_Semantic_Scene_Completion_via_Deformable_Deep_Implicit_Templates_ICCV_2023_paper.pdf)\n* [CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion](http://arxiv.org/abs/2307.07938)\n* [Learning Long-Range Information with Dual-Scale Transformers for Indoor Scene Completion](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Learning_Long-Range_Information_with_Dual-Scale_Transformers_for_Indoor_Scene_Completion_ICCV_2023_paper.pdf)\n\n\u003ca name=\"56\"/\u003e\n\n## 56.NAS\n* [ShiftNAS: Improving One-shot NAS via Probability Shift](http://arxiv.org/abs/2307.08300)\n* [ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation](http://arxiv.org/abs/2011.11233)\n* [Extensible and Efficient Proxy for Neural Architecture Search](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Extensible_and_Efficient_Proxy_for_Neural_Architecture_Search_ICCV_2023_paper.pdf)\n* [MixPath: A Unified Approach for One-shot Neural Architecture Search](http://arxiv.org/abs/2001.05887)\n* [Unleashing the Power of Gradient Signal-to-Noise Ratio for Zero-Shot NAS](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Unleashing_the_Power_of_Gradient_Signal-to-Noise_Ratio_for_Zero-Shot_NAS_ICCV_2023_paper.pdf)\n\n\u003ca name=\"55\"/\u003e\n\n## 55.sound(语音)\n* [Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video](http://arxiv.org/abs/2309.04814)\n* [MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition](http://arxiv.org/abs/2303.05309)\n* [Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Be_Everywhere_-_Hear_Everything_BEE_Audio_Scene_Reconstruction_by_ICCV_2023_paper.pdf)\n* [On the Audio-visual Synchronization for Lip-to-Speech Synthesis](http://arxiv.org/abs/2303.00502)\n* [Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing](https://openaccess.thecvf.com/content/ICCV2023/papers/Rachavarapu_Boosting_Positive_Segments_for_Weakly-Supervised_Audio-Visual_Video_Parsing_ICCV_2023_paper.pdf)\n* [DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding](http://arxiv.org/abs/2308.07787v1)\n* [Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation](http://arxiv.org/abs/2308.10306v1)\n* [Sound Source Localization is All about Cross-Modal Alignment](http://arxiv.org/abs/2309.10724v1)\n* [Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping](http://arxiv.org/abs/2308.06112)\n* 去混响\n  * [AdVerb: Visually Guided Audio Dereverberation](http://arxiv.org/abs/2308.12370v1)\u003cbr\u003e:house:[project](https://gamma.umd.edu/researchdirections/speech/adverb)\n* 唇语识别\n  * [Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge](http://arxiv.org/abs/2308.09311v1)\n* 音频-视频生成\n  * [The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion](http://arxiv.org/abs/2309.04509v1)\u003cbr\u003e:star:[code](https://ku-vai.github.io/TPoS/)\n* 音视觉导航\n  * [Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation](http://arxiv.org/abs/2308.10306)\n* 声音定位\n  * [Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation](http://arxiv.org/abs/2303.11329)\n* 视听分割\n  * [Multimodal Variational Auto-encoder based Audio-Visual Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Mao_Multimodal_Variational_Auto-encoder_based_Audio-Visual_Segmentation_ICCV_2023_paper.pdf)\n\n\u003ca name=\"54\"/\u003e\n\n## 54.Gaze Estimation\n* [DVGaze: Dual-View Gaze Estimation](http://arxiv.org/abs/2308.10310v1)\u003cbr\u003e:star:[code](https://github.com/yihuacheng/DVGaze)\n\n\u003ca name=\"53\"/\u003e\n\n## 53.Computed Imaging(计算成像，如光学、几何、光场成像等)\n* [Event Camera Data Pre-training](http://arxiv.org/abs/2301.01928)\n* [Deep Geometrized Cartoon Line Inbetweening](https://openaccess.thecvf.com/content/ICCV2023/papers/Siyao_Deep_Geometrized_Cartoon_Line_Inbetweening_ICCV_2023_paper.pdf)\n* [Aperture Diffraction for Compact Snapshot Spectral Imaging](https://openaccess.thecvf.com/content/ICCV2023/papers/Lv_Aperture_Diffraction_for_Compact_Snapshot_Spectral_Imaging_ICCV_2023_paper.pdf)\n* [Examining Autoexposure for Challenging Scenes](http://arxiv.org/abs/2309.04542v1)\n* [Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction](http://arxiv.org/abs/2308.10694v1)\u003cbr\u003e:star:[code](https://github.com/cvg/VP-Estimation-with-Prior-Gravity)\n* [Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes](http://arxiv.org/abs/2309.08588v1)\u003cbr\u003e:house:[project](https://fabiendelattre.com/robust-rotation-estimation)\n* [Exploring Positional Characteristics of Dual-Pixel Data for Camera Autofocus](https://openaccess.thecvf.com/content/ICCV2023/papers/Choi_Exploring_Positional_Characteristics_of_Dual-Pixel_Data_for_Camera_Autofocus_ICCV_2023_paper.pdf)\n* [Enhancing Non-line-of-sight Imaging via Learnable Inverse Kernel and Attention Mechanisms](https://openaccess.thecvf.com/content/ICCV2023/papers/Yu_Enhancing_Non-line-of-sight_Imaging_via_Learnable_Inverse_Kernel_and_Attention_Mechanisms_ICCV_2023_paper.pdf)\n* [On the Robustness of Normalizing Flows for Inverse Problems in Imaging](http://arxiv.org/abs/2212.04319)\n* [Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation](http://arxiv.org/abs/2304.05669)\n* 光场\n  * [NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_NeILF_Inter-Reflectable_Light_Fields_for_Geometry_and_Material_Estimation_ICCV_2023_paper.pdf)\n* 相机姿势估计\n  * [Multi-body Depth and Camera Pose Estimation from Multiple Views](https://openaccess.thecvf.com/content/ICCV2023/papers/Dal_Cin_Multi-body_Depth_and_Camera_Pose_Estimation_from_Multiple_Views_ICCV_2023_paper.pdf)\n\n\u003ca name=\"52\"/\u003e\n\n## 52.View Synthesis(视图合成)\n* [Forward Flow for Novel View Synthesis of Dynamic Scenes](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Forward_Flow_for_Novel_View_Synthesis_of_Dynamic_Scenes_ICCV_2023_paper.pdf)\n* [iVS-Net: Learning Human View Synthesis from Internet Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_iVS-Net_Learning_Human_View_Synthesis_from_Internet_Videos_ICCV_2023_paper.pdf)\n* [Multi-task View Synthesis with Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Zheng_Multi-task_View_Synthesis_with_Neural_Radiance_Fields_ICCV_2023_paper.pdf)\n* [IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis](http://arxiv.org/abs/2210.00647)\n* [Generative Novel View Synthesis with 3D-Aware Diffusion Models](http://arxiv.org/abs/2304.02602)\n* [SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis](http://arxiv.org/abs/2303.16196)\n* [Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_Total-Recon_Deformable_Scene_Reconstruction_for_Embodied_View_Synthesis_ICCV_2023_paper.pdf)\n* [Neural LiDAR Fields for Novel View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_Neural_LiDAR_Fields_for_Novel_View_Synthesis_ICCV_2023_paper.pdf)\n* [LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference](http://arxiv.org/abs/2307.12217v1)\n* [Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis](http://arxiv.org/abs/2308.02840v1)\u003cbr\u003e:star:[code](https://w-ted.github.io/publications/udc-nerf)\n* [Efficient View Synthesis with Neural Radiance Distribution Field](http://arxiv.org/abs/2308.11130v1)\n* [NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes](http://arxiv.org/abs/2308.12967v1)\u003cbr\u003e:star:[code](https://zubair-irshad.github.io/projects/neo360.html)\n* [PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Ying_PARF_Primitive-Aware_Radiance_Fusion_for_Indoor_Scene_Novel_View_Synthesis_ICCV_2023_paper.pdf)\n* [Urban Radiance Field Representation with Deformable Neural Mesh Primitives](http://arxiv.org/abs/2307.10776v1)\u003cbr\u003e:house:[project](https://dnmp.github.io/)\n* [FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis](http://arxiv.org/abs/2306.17723)\n* [SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image](http://arxiv.org/abs/2309.06323)\n* [A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction](http://arxiv.org/abs/2301.06782)\n* [Cross-Ray Neural Radiance Fields for Novel-View Synthesis from Unconstrained Image Collections](http://arxiv.org/abs/2307.08093)\n* [Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models](http://arxiv.org/abs/2304.10700)\n* [NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_NEMTO_Neural_Environment_Matting_for_Novel_View_and_Relighting_Synthesis_ICCV_2023_paper.pdf)\n\n\u003ca name=\"51\"/\u003e\n\n## 51.Visual Place Recognition\n* [CASSPR: Cross Attention Single Scan Place Recognition](http://arxiv.org/abs/2211.12542)\n* [EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition](http://arxiv.org/abs/2308.10832v1)\u003cbr\u003e:star:[code](https://github.com/gmberton/auto_VPR)\u003cbr\u003e:star:[code](https://github.com/gmberton/EigenPlaces)\n* [CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition](http://arxiv.org/abs/2303.17778)\n* [BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Luo_BEVPlace_Learning_LiDAR-based_Place_Recognition_using_Birds_Eye_View_Images_ICCV_2023_paper.pdf)\n\n\u003ca name=\"50\"/\u003e\n\n## 50.Image Matching(图像匹配)\n* [OccNet: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions](https://openaccess.thecvf.com/content/ICCV2023/papers/Fan_Occ2Net_Robust_Image_Matching_Based_on_3D_Occupancy_Estimation_for_ICCV_2023_paper.pdf)\u003cbr\u003e:thumbsup:[ICCV 2023|Occ2Net，一种基于3D 占据估计的有效且稳健的带有遮挡区域的图像匹配方法](https://mp.weixin.qq.com/s/mbk5tnYlzCLOg4_xfyKRyA)\n* [Scene-Aware Feature Matching](http://arxiv.org/abs/2308.09949v1)\n* [Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints](http://arxiv.org/abs/2303.02885)\n* [GlueStick: Robust Image Matching by Sticking Points and Lines Together](https://openaccess.thecvf.com/content/ICCV2023/papers/Pautrat_GlueStick_Robust_Image_Matching_by_Sticking_Points_and_Lines_Together_ICCV_2023_paper.pdf)\n* [Grounded Image Text Matching with Mismatched Relation Reasoning](http://arxiv.org/abs/2308.01236)\n* [Graph Matching with Bi-level Noisy Correspondence](http://arxiv.org/abs/2212.04085)\n* 特征匹配\n  * [Guiding Local Feature Matching with Surface Curvature](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Guiding_Local_Feature_Matching_with_Surface_Curvature_ICCV_2023_paper.pdf)\n\n\u003ca name=\"49\"/\u003e\n\n## 49.Image Fusion(图像融合)\n* [DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion](http://arxiv.org/abs/2303.06840)\n* [MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion](http://arxiv.org/abs/2309.11847)\n* [Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion](http://arxiv.org/abs/2308.16083v1)\u003cbr\u003e:star:[code](https://manman1995.github.io/)\n* [Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Degradation-Resistant_Unfolding_Network_for_Heterogeneous_Image_Fusion_ICCV_2023_paper.pdf)\n* [Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-spectral Image Fusion](http://arxiv.org/abs/2308.16083)\n* [UniFusion: Unified Multi-View Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View](https://openaccess.thecvf.com/content/ICCV2023/papers/Qin_UniFusion_Unified_Multi-View_Fusion_Transformer_for_Spatial-Temporal_Representation_in_Birds-Eye-View_ICCV_2023_paper.pdf)\n* [Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion](http://arxiv.org/abs/2302.01392)\n\n\u003ca name=\"48\"/\u003e\n\n## 48.Image Reconstruction\n* [Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction](http://arxiv.org/abs/2308.10820v1)\u003cbr\u003e:star:[code](https://github.com/MyuLi/PADUT)\n* [RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image](http://arxiv.org/abs/2309.02020v1)\n* [Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction](http://arxiv.org/abs/2309.03900v1)\u003cbr\u003e:star:[code](https://skchen1993.github.io/CEVR_web/)\n\n\u003ca name=\"47\"/\u003e\n\n## 47.Image-to-Image Translation\n* [General Image-to-Image Translation with One-Shot Image Guidance](http://arxiv.org/abs/2307.14352v1)\u003cbr\u003e:star:[code](https://github.com/CrystalNeuro/visual-concept-translator)\n* [Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation](http://arxiv.org/abs/2308.12968v1)\u003cbr\u003e:star:[code](https://github.com/Yuxinn-J/Scenimefy)\u003cbr\u003e:star:[code](https://yuxinn-j.github.io/projects/Scenimefy.html)\n* [UGC: Unified GAN Compression for Efficient Image-to-Image Translation](http://arxiv.org/abs/2309.09310)\n\n\u003ca name=\"46\"/\u003e\n\n## 46.Edge Detection\n* [Tiny and Efficient Model for the Edge Detection Generalization](http://arxiv.org/abs/2308.06468v1)\u003cbr\u003e:star:[code](https://github.com/xavysp/TEED)\n\n\u003ca name=\"45\"/\u003e\n\n## 45.Scene Graph Generation(场景图合成)\n* [SGAligner: 3D Scene Alignment with Scene Graphs](http://arxiv.org/abs/2304.14880)\n* [Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation](http://arxiv.org/abs/2308.03282v1)\n* [Compositional Feature Augmentation for Unbiased Scene Graph Generation](http://arxiv.org/abs/2308.06712v1)\n* [Vision Relation Transformer for Unbiased Scene Graph Generation](http://arxiv.org/abs/2308.09472v1)\n* [TextPSG: Panoptic Scene Graph Generation from Textual Descriptions](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_TextPSG_Panoptic_Scene_Graph_Generation_from_Textual_Descriptions_ICCV_2023_paper.pdf)\n* [HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation](http://arxiv.org/abs/2303.15994)\n* [Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World](http://arxiv.org/abs/2303.13233)\n* [Visual Traffic Knowledge Graph Generation from Scene Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Visual_Traffic_Knowledge_Graph_Generation_from_Scene_Images_ICCV_2023_paper.pdf)\n\n\u003ca name=\"44\"/\u003e\n\n## 44.Rendering(渲染)\n* [LiveHand: Real-time and Photorealistic Neural Hand Rendering](http://arxiv.org/abs/2302.07672)\n* [NeMF: Inverse Volume Rendering with Neural Microflake Field](http://arxiv.org/abs/2304.00782)\n* [ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs](http://arxiv.org/abs/2304.14401)\n* [HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation](http://arxiv.org/abs/2308.10122v1)\n* [DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering](https://arxiv.org/abs/2307.10173)\u003cbr\u003e:house:[project](https://dna-rendering.github.io/)\n* [S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields](http://arxiv.org/abs/2308.07032v1)\u003cbr\u003e:star:[code](https://github.com/Madaoer/S3IM)\u003cbr\u003e:thumbsup:[ICCV 2023 | NeRF 提点的 Magic Loss —— S3IM 随机结构相似性](https://mp.weixin.qq.com/s/w5IUykx6_-7lBE_2r_Onkg)\n* [TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering](http://arxiv.org/abs/2307.12291v1)\u003cbr\u003e:star:[code](https://pansanity666.github.io/TransHuman/)\n* [Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields](http://arxiv.org/abs/2307.11335v1)\u003cbr\u003e:star:[code](https://wbhu.github.io/projects/Tri-MipRF)\n* [Rendering Humans from Object-Occluded Monocular Videos](http://arxiv.org/abs/2308.04622v1)\u003cbr\u003e:house:[project](https://cs.stanford.edu/~xtiange/projects/occnerf/)\n* [ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering](http://arxiv.org/abs/2305.02103)\n* [DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_DNA-Rendering_A_Diverse_Neural_Actor_Repository_for_High-Fidelity_Human-Centric_Rendering_ICCV_2023_paper.pdf)\n* [Neural Microfacet Fields for Inverse Rendering](http://arxiv.org/abs/2303.17806)\n* [3D-aware Blending with Generative NeRFs](http://arxiv.org/abs/2302.06608)\n\n\u003ca name=\"43\"/\u003e\n\n## 43.Neural Radiance Fields\n* [Instance Neural Radiance Field](http://arxiv.org/abs/2304.04395)\n* [Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Gao_Adaptive_Positional_Encoding_for_Bundle-Adjusting_Neural_Radiance_Fields_ICCV_2023_paper.pdf)\n* [FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models](http://arxiv.org/abs/2303.12786)\n* [NerfAcc: Efficient Sampling Accelerates NeRFs](http://arxiv.org/abs/2305.04966)\n* [MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Kaneko_MIMO-NeRF_Fast_Neural_Rendering_with_Multi-input_Multi-output_Neural_Radiance_Fields_ICCV_2023_paper.pdf)\n* [UHDNeRF: Ultra-High-Definition Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_UHDNeRF_Ultra-High-Definition_Neural_Radiance_Fields_ICCV_2023_paper.pdf)\n* [Deformable Neural Radiance Fields using RGB and Event Cameras](http://arxiv.org/abs/2309.08416)\n* [Learning Neural Implicit Surfaces with Object-Aware Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Learning_Neural_Implicit_Surfaces_with_Object-Aware_Radiance_Fields_ICCV_2023_paper.pdf)\n* [ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field](http://arxiv.org/abs/2211.13226)\n* [HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video](http://arxiv.org/abs/2304.12281)\n* [ReNeRF: Relightable Neural Radiance Fields with Nearfield Lighting](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_ReNeRF_Relightable_Neural_Radiance_Fields_with_Nearfield_Lighting_ICCV_2023_paper.pdf)\n* [E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Qi_E2NeRF_Event_Enhanced_Neural_Radiance_Fields_from_Blurry_Images_ICCV_2023_paper.pdf)\n* [Neural Fields for Structured Lighting](https://openaccess.thecvf.com/content/ICCV2023/papers/Shandilya_Neural_Fields_for_Structured_Lighting_ICCV_2023_paper.pdf)\n* [NeRF-MS: Neural Radiance Fields with Multi-Sequence](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_NeRF-MS_Neural_Radiance_Fields_with_Multi-Sequence_ICCV_2023_paper.pdf)\n* [StegaNeRF: Embedding Invisible Information within Neural Radiance Fields](http://arxiv.org/abs/2212.01602)\n* [SHERF: Generalizable Human NeRF from a Single Image](http://arxiv.org/abs/2303.12791)\n* [DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_DeformToon3D_Deformable_Neural_Radiance_Fields_for_3D_Toonification_ICCV_2023_paper.pdf)\n* [Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs](http://arxiv.org/abs/2304.10532)\n* [Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra](https://openaccess.thecvf.com/content/ICCV2023/papers/Kulhanek_Tetra-NeRF_Representing_Neural_Radiance_Fields_Using_Tetrahedra_ICCV_2023_paper.pdf)\n* [Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Barron_Zip-NeRF_Anti-Aliased_Grid-Based_Neural_Radiance_Fields_ICCV_2023_paper.pdf)\n* [NeRFrac: Neural Radiance Fields through Refractive Surface](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhan_NeRFrac_Neural_Radiance_Fields_through_Refractive_Surface_ICCV_2023_paper.pdf)\n* [MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos](http://arxiv.org/abs/2212.13056)\n* [Reference-guided Controllable Inpainting of Neural Radiance Fields](http://arxiv.org/abs/2304.09677)\n* [DeLiRa: Self-Supervised Depth, Light, and Radiance Fields](http://arxiv.org/abs/2304.02797)\n* [Neural Radiance Field with LiDAR maps](https://openaccess.thecvf.com/content/ICCV2023/papers/Chang_Neural_Radiance_Field_with_LiDAR_maps_ICCV_2023_paper.pdf)\n* [Dynamic Mesh-Aware Radiance Fields](http://arxiv.org/abs/2309.04581v1)\n* [Locally Stylized Neural Radiance Fields](http://arxiv.org/abs/2309.10684v1)\n* [Generalizable Neural Fields as Partially Observed Neural Processes](http://arxiv.org/abs/2309.06660v1)\n* [DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields](http://arxiv.org/abs/2309.04410v1)\u003cbr\u003e:house:[project](https://www.mmlab-ntu.com/project/deformtoon3d/)\u003cbr\u003e:star:[code](https://github.com/junzhezhang/DeformToon3D)\n* [Robust e-NeRF: NeRF from Sparse \u0026 Noisy Events under Non-Uniform Motion](http://arxiv.org/abs/2309.08596v1)\u003cbr\u003e:star:[code](https://wengflow.github.io/robust-e-nerf)\n* [Pose-Free Neural Radiance Fields via Implicit Pose Regularization](http://arxiv.org/abs/2308.15049v1)\n* [Canonical Factors for Hybrid Neural Fields](http://arxiv.org/abs/2308.15461v1)\u003cbr\u003e:star:[code](https://brentyi.github.io/tilted/)\n* [Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor](http://arxiv.org/abs/2308.14383v1)\u003cbr\u003e:star:[code](https://zju3dv.github.io/tof_slam/)\n* [Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields](http://arxiv.org/abs/2308.11974v1)\n* [Strata-NeRF : Neural Radiance Fields for Stratified Scenes](http://arxiv.org/abs/2308.10337v1)\u003cbr\u003e:star:[code](https://ankitatiisc.github.io/Strata-NeRF/)\n* [DReg-NeRF: Deep Registration for Neural Radiance Fields](http://arxiv.org/abs/2308.09386v1)\u003cbr\u003e:star:[code](https://github.com/AIBluefisher/DReg-NeRF)\n* [Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields](http://arxiv.org/abs/2307.15131v1)\u003cbr\u003e:house:[project](https://windingwind.github.io/seal-3d/)\u003cbr\u003e:star:[code](https://github.com/windingwind/seal-3d/)\n* [WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields](http://arxiv.org/abs/2308.04826v1)\n* [UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields](http://arxiv.org/abs/2303.14167)\n* [LERF: Language Embedded Radiance Fields](http://arxiv.org/abs/2303.09553)\n* [Strivec: Sparse Tri-Vector Radiance Fields](http://arxiv.org/abs/2307.13226)\n* [Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering](http://arxiv.org/abs/2304.10075)\n\n\n\u003ca name=\"42\"/\u003e\n\n## 42.Dataset/Benchmark\n* 数据集\n  * [Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds](https://arxiv.org/pdf/2307.11914.pdf)\u003cbr\u003e:sunflower:[dataset](https://building3d.ucalgary.ca/#)\u003cbr\u003e:thumbsup:[ICCV2023 首个城市级别的基于航空点云的房屋建模数据集 Building3D](https://mp.weixin.qq.com/s/gKFByZ8ud2aNlG7C7t2-2Q)\n  * [LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_LoTE-Animal_A_Long_Time-span_Dataset_for_Endangered_Animal_Behavior_Understanding_ICCV_2023_paper.pdf)\n  * [Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Bolduc_Beyond_the_Pixel_a_Photometrically_Calibrated_HDR_Dataset_for_Luminance_ICCV_2023_paper.pdf)\n  * [Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Atmospheric_Transmission_and_Thermal_Inertia_Induced_Blind_Road_Segmentation_with_ICCV_2023_paper.pdf)\n  * [H3WB: Human3.6M 3D WholeBody Dataset and Benchmark](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_H3WB_Human3.6M_3D_WholeBody_Dataset_and_Benchmark_ICCV_2023_paper.pdf)\n  * [V3Det: Vast Vocabulary Visual Detection Dataset](http://arxiv.org/abs/2304.03752)\n  * [HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_HoloAssist_an_Egocentric_Human_Interaction_Dataset_for_Interactive_AI_Assistants_ICCV_2023_paper.pdf)\n  * [Zenseact Open Dataset: A Large-Scale and Diverse Multimodal Dataset for Autonomous Driving](http://arxiv.org/abs/2305.02008)\n  * [FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods](http://arxiv.org/abs/2308.06248)\n  * [Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Lee_Lecture_Presentations_Multimodal_Dataset_Towards_Understanding_Multimodality_in_Educational_Videos_ICCV_2023_paper.pdf)\n  * [RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation](https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_RealGraph_A_Multiview_Dataset_for_4D_Real-world_Context_Graph_Generation_ICCV_2023_paper.pdf)\n  * [Video Background Music Generation: Dataset, Method and Evaluation](http://arxiv.org/abs/2211.11248)\n  * [Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Thinking_Image_Color_Aesthetics_Assessment_Models_Datasets_and_Benchmarks_ICCV_2023_paper.pdf)\n  * [Snow Removal in Video: A New Dataset and A Novel Method](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Snow_Removal_in_Video_A_New_Dataset_and_A_Novel_ICCV_2023_paper.pdf)\n  * [SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes](http://arxiv.org/abs/2304.05170)\n  * [EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes](http://arxiv.org/abs/2307.07961)\n  * [DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners](http://arxiv.org/abs/2309.03483)\n  * [PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking](http://arxiv.org/abs/2307.15055)\n  * [SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling](http://arxiv.org/abs/2303.17368)\n  * [MOSE: A New Dataset for Video Object Segmentation in Complex Scenes](http://arxiv.org/abs/2302.01872)\n  * [Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning](http://arxiv.org/abs/2303.12745)\n  * [3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_3DMiner_Discovering_Shapes_from_Large-Scale_Unannotated_Image_Datasets_ICCV_2023_paper.pdf)\n  * [MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_MatrixCity_A_Large-scale_City_Dataset_for_City-scale_Neural_Rendering_and_ICCV_2023_paper.pdf)\n  * [LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark](http://arxiv.org/abs/2308.09618v1)\u003cbr\u003e:star:[code](https://lojzezust.github.io/lars-dataset)\n  * [EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding](http://arxiv.org/abs/2309.08816v1)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/EgoObjects)\n  * [Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations](http://arxiv.org/abs/2309.01858v1)\u003cbr\u003e:house:[project](https://cmp.felk.cvut.cz/univ_emb/)\n  * [High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net](http://arxiv.org/abs/2308.14221)\n  * [ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes](http://arxiv.org/abs/2308.11417v1)\u003cbr\u003e:house:[project](https://youtu.be/E6P9e2r6M8I)\u003cbr\u003e:star:[code](https://cy94.github.io/scannetpp/)\n  * [Learning Optical Flow from Event Camera with Rendered Dataset](https://arxiv.org/abs/2303.11011)\n  * [Efficient neural supersampling on a novel gaming dataset](http://arxiv.org/abs/2308.01483v1)\n  * [AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception](http://arxiv.org/abs/2307.13933v1)\u003cbr\u003e:star:[code](https://github.com/ydk122024/AIDE)\n  * [360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking](http://arxiv.org/abs/2307.14630v1)\u003cbr\u003e:house:[project](https://360vot.hkustvgd.com)\u003cbr\u003e:star:[code](https://github.com/HuajianUP/360VOT)全向视觉目标跟踪\n  * [Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning](http://arxiv.org/abs/2308.13411v1)\u003cbr\u003e:house:[project](https://ophai.hms.harvard.edu/datasets/harvard-gdp1000)\n  * [FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Khan_FishNet_A_Large-scale_Dataset_and_Benchmark_for_Fish_Recognition_Detection_ICCV_2023_paper.pdf)\n* 基准\n  * [Towards Real-world Burst Image Super-Resolution: Benchmark and Method](https://arxiv.org/abs/2309.04803)\u003cbr\u003e:star:[code](https://github.com/yjsunnn/FBANet)\u003cbr\u003e:thumbsup:[ICCV2023 ｜FBANet：迈向真实世界的多帧超分](https://mp.weixin.qq.com/s/JN-5d_Ujak3YDcGM6ZZjPw)\n  * [SQAD: Automatic Smartphone Camera Quality Assessment and Benchmarking](https://openaccess.thecvf.com/content/ICCV2023/papers/Fang_SQAD_Automatic_Smartphone_Camera_Quality_Assessment_and_Benchmarking_ICCV_2023_paper.pdf)\n  * [ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes](http://arxiv.org/abs/2304.04321)\n  * [Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception](http://arxiv.org/abs/2306.06362)\n  * [From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal](http://arxiv.org/abs/2308.03867v1)\n  * [ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour](https://openaccess.thecvf.com/content/ICCV2023/papers/Tafasca_ChildPlay_A_New_Benchmark_for_Understanding_Childrens_Gaze_Behaviour_ICCV_2023_paper.pdf)\n  * [PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking](http://arxiv.org/abs/2303.07625)\n  * [OmniLabel: A Challenging Benchmark for Language-Based Object Detection](http://arxiv.org/abs/2304.11463)\n  * [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](http://arxiv.org/abs/2303.03991)\n  * [HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Bakr_HRS-Bench_Holistic_Reliable_and_Scalable_Benchmark_for_Text-to-Image_Models_ICCV_2023_paper.pdf)\n  * [Beyond Object Recognition: A New Benchmark towards Object Concept Learning](http://arxiv.org/abs/2212.02710)\n  * [ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose via An Indirect Recording Solution](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_ClothPose_A_Real-world_Benchmark_for_Visual_Analysis_of_Garment_Pose_ICCV_2023_paper.pdf)\n  * [REAP: A Large-Scale Realistic Adversarial Patch Benchmark](http://arxiv.org/abs/2212.05680)\n  * [Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events](https://openaccess.thecvf.com/content/ICCV2023/papers/Ong_Chaotic_World_A_Large_and_Challenging_Benchmark_for_Human_Behavior_ICCV_2023_paper.pdf)\n  * [Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images](http://arxiv.org/abs/2303.07274)\n  * [FACET: Fairness in Computer Vision Evaluation Benchmark](http://arxiv.org/abs/2309.00035)\n  * [A Benchmark for Chinese-English Scene Text Image Super-Resolution](http://arxiv.org/abs/2308.03262)\n  * [Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark](https://openaccess.thecvf.com/content/ICCV2023/papers/Khirodkar_Ego-Humans_An_Ego-Centric_3D_Multi-Human_Benchmark_ICCV_2023_paper.pdf)\n  * [Towards Real-World Burst Image Super-Resolution: Benchmark and Method](http://arxiv.org/abs/2309.04803v1)\u003cbr\u003e:star:[code](https://github.com/yjsunnn/FBANet)\n  * [COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts](http://arxiv.org/abs/2307.12730v1)\u003cbr\u003e:star:[code](https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o)\n  * [Dancing in the Dark: A Benchmark towards General Low-light Video Enhancement](https://openaccess.thecvf.com/content/ICCV2023/papers/Fu_Dancing_in_the_Dark_A_Benchmark_towards_General_Low-light_Video_ICCV_2023_paper.pdf)\n  * [DiLiGenT-Pi: Photometric Stereo for Planar Surfaces with Rich Details - Benchmark Dataset and Beyond](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_DiLiGenT-Pi_Photometric_Stereo_for_Planar_Surfaces_with_Rich_Details_-_ICCV_2023_paper.pdf)\n* 方法\n  * [Prototype-based Dataset Comparison](http://arxiv.org/abs/2309.02401v1)\u003cbr\u003e:star:[code](https://github.com/Nanne/ProtoSim)\n\n\u003ca name=\"41\"/\u003e\n\n## 41.Vision Transformers\n* [Masked Spiking Transformer](http://arxiv.org/abs/2210.01208)\n* [Scale-Aware Modulation Meet Transformer](http://arxiv.org/abs/2307.08579)\n* [BiViT: Extremely Compressed Binary Vision Transformers](http://arxiv.org/abs/2211.07091)\n* [Fcaformer: Forward Cross Attention in Hybrid Vision Transformer](http://arxiv.org/abs/2211.07198)\n* [FastViT: A Fast Hybrid Vision Transformer Using Structural Reparameterization](http://arxiv.org/abs/2303.14189)\n* [SwinLSTM: Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM](http://arxiv.org/abs/2308.09891)\n* [Multimodal High-order Relation Transformer for Scene Boundary Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Wei_Multimodal_High-order_Relation_Transformer_for_Scene_Boundary_Detection_ICCV_2023_paper.pdf)\n* [GET: Group Event Transformer for Event-Based Vision](https://openaccess.thecvf.com/content/ICCV2023/papers/Peng_GET_Group_Event_Transformer_for_Event-Based_Vision_ICCV_2023_paper.pdf)\n* [DiffRate : Differentiable Compression Rate for Efficient Vision Transformers](http://arxiv.org/abs/2305.17997)\n* [Scratching Visual Transformer's Back with Uniform Attention](https://openaccess.thecvf.com/content/ICCV2023/papers/Hyeon-Woo_Scratching_Visual_Transformers_Back_with_Uniform_Attention_ICCV_2023_paper.pdf)\n* [Skill Transformer: A Monolithic Policy for Mobile Manipulation](http://arxiv.org/abs/2308.09873)\n* [A Multidimensional Analysis of Social Biases in Vision Transformers](http://arxiv.org/abs/2308.01948)\n* [Token-Label Alignment for Vision Transformers](http://arxiv.org/abs/2210.06455)\n* [Building Vision Transformers with Hierarchy Aware Feature Aggregation](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Building_Vision_Transformers_with_Hierarchy_Aware_Feature_Aggregation_ICCV_2023_paper.pdf)\n* [TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching](https://openaccess.thecvf.com/content/ICCV2023/papers/Fu_TripLe_Revisiting_Pretrained_Model_Reuse_and_Progressive_Learning_for_Efficient_ICCV_2023_paper.pdf)\n* [DarSwin: Distortion Aware Radial Swin Transformer](https://openaccess.thecvf.com/content/ICCV2023/papers/Athwale_DarSwin_Distortion_Aware_Radial_Swin_Transformer_ICCV_2023_paper.pdf)\n* [Robustifying Token Attention for Vision Transformers](http://arxiv.org/abs/2303.11126)\n* [FLatten Transformer: Vision Transformer using Focused Linear Attention](http://arxiv.org/abs/2308.00442)\n* [Detection Transformer with Stable Matching](http://arxiv.org/abs/2304.04742)\n* [LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization](https://openaccess.thecvf.com/content/ICCV2023/papers/Yu_LaPE_Layer-adaptive_Position_Embedding_for_Vision_Transformers_with_Independent_Layer_ICCV_2023_paper.pdf)\n* [M2T: Masking Transformers Twice for Faster Decoding](https://openaccess.thecvf.com/content/ICCV2023/papers/Mentzer_M2T_Masking_Transformers_Twice_for_Faster_Decoding_ICCV_2023_paper.pdf)\n* [FDViT: Improve the Hierarchical Architecture of Vision Transformer](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_FDViT_Improve_the_Hierarchical_Architecture_of_Vision_Transformer_ICCV_2023_paper.pdf)\n* [Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers](http://arxiv.org/abs/2308.10814)\n* [Rethinking Vision Transformers for MobileNet Size and Speed](http://arxiv.org/abs/2212.08059)\n* [Structure Invariant Transformation for better Adversarial Transferability](http://arxiv.org/abs/2309.14700v1)\u003cbr\u003e:star:[code](https://github.com/xiaosen-wang/SIT)\n* [SG-Former: Self-guided Transformer with Evolving Token Reallocation](http://arxiv.org/abs/2308.12216v1)\u003cbr\u003e:star:[code](https://github.com/OliverRensu/SG-Former)\n* [Pre-training Vision Transformers with Very Limited Synthesized Images](http://arxiv.org/abs/2307.14710v1)\n* [SMMix: Self-Motivated Image Mixing for Vision Transformers](https://arxiv.org/abs/2212.12977)\n* [Revisiting Vision Transformer from the View of Path Ensemble](http://arxiv.org/abs/2308.06548v1)\n* [SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM](http://arxiv.org/abs/2308.09891v1)\u003cbr\u003e:star:[code](https://github.com/SongTang-x/SwinLSTM)\n* [Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers](http://arxiv.org/abs/2308.13494v1)\n* [Contrastive Feature Masking Open-Vocabulary Vision Transformer](http://arxiv.org/abs/2309.00775v1)\n* [MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer](http://arxiv.org/abs/2309.09067v1)\n* [SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_SAL-ViT_Towards_Latency_Efficient_Private_Inference_on_ViT_using_Selective_ICCV_2023_paper.pdf)\n* [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](http://arxiv.org/abs/2303.15446)\n* [MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention](http://arxiv.org/abs/2211.13955)\n\n\u003ca name=\"40\"/\u003e\n\n## 40.Anomaly Detection(异常检测)\n* [Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection](http://arxiv.org/abs/2308.10155v1)\n* [Anomaly Detection Under Distribution Shift](http://arxiv.org/abs/2303.13845)\n* [Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Unsupervised_Surface_Anomaly_Detection_with_Diffusion_Probabilistic_Model_ICCV_2023_paper.pdf)\n* [Anomaly Detection using Score-based Perturbation Resilience](https://openaccess.thecvf.com/content/ICCV2023/papers/Shin_Anomaly_Detection_using_Score-based_Perturbation_Resilience_ICCV_2023_paper.pdf)\n* [Remembering Normality: Memory-guided Knowledge Distillation for Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Gu_Remembering_Normality_Memory-guided_Knowledge_Distillation_for_Unsupervised_Anomaly_Detection_ICCV_2023_paper.pdf)\n* [Template-guided Hierarchical Feature Restoration for Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Template-guided_Hierarchical_Feature_Restoration_for_Anomaly_Detection_ICCV_2023_paper.pdf)\n* [Inter-Realization Channels: Unsupervised Anomaly Detection Beyond One-Class Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/McIntosh_Inter-Realization_Channels_Unsupervised_Anomaly_Detection_Beyond_One-Class_Classification_ICCV_2023_paper.pdf)\n* 图像异常检测\n  * [Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection](http://arxiv.org/abs/2308.02983v1)\u003cbr\u003e:star:[code](https://github.com/xcyao00/FOD)\n* OOD\n  * [CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No](http://arxiv.org/abs/2308.12213v1)\u003cbr\u003e:star:[code](https://github.com/xmed-lab/CLIPN)\n  * [Meta OOD Learning for Continuously Adaptive OOD Detection](http://arxiv.org/abs/2309.11705v1)\n  * [Simple and Effective Out-of-Distribution Detection via Cosine-based Softmax Loss](https://openaccess.thecvf.com/content/ICCV2023/papers/Noh_Simple_and_Effective_Out-of-Distribution_Detection_via_Cosine-based_Softmax_Loss_ICCV_2023_paper.pdf)\n  * [Nearest Neighbor Guidance for Out-of-Distribution Detection](http://arxiv.org/abs/2309.14888v1)\u003cbr\u003e:star:[code](https://github.com/roomo7time/nnguide)\n  * [Understanding the Feature Norm for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Park_Understanding_the_Feature_Norm_for_Out-of-Distribution_Detection_ICCV_2023_paper.pdf)\n  * [Meta OOD Learning For Continuously Adaptive OOD Detection](http://arxiv.org/abs/2309.11705)\n  * [SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection](http://arxiv.org/abs/2208.13930)\n  * [Unified Out-Of-Distribution Detection: A Model-Specific Perspective](http://arxiv.org/abs/2304.06813)\n  * [Revisit PCA-based Technique for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Guan_Revisit_PCA-based_Technique_for_Out-of-Distribution_Detection_ICCV_2023_paper.pdf)\n  * [Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Hierarchical_Visual_Categories_Modeling_A_Joint_Representation_Learning_and_Density_ICCV_2023_paper.pdf)\n  * [WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis](http://arxiv.org/abs/2303.07543)\n\n\u003ca name=\"39\"/\u003e\n\n## 39.Keypoint Detection(关键点检测)\n* [Neural Interactive Keypoint Detection](http://arxiv.org/abs/2308.10174v1)\u003cbr\u003e:star:[code](https://github.com/IDEA-Research/Click-Pose)\n* [3D Implicit Transporter for Temporally Consistent Keypoint Discovery](http://arxiv.org/abs/2309.05098v1)\u003cbr\u003e:star:[code](https://github.com/zhongcl-thu/3D-Implicit-Transporter)\n\n\u003ca name=\"38\"/\u003e\n\n## 38.Vision-Language(视觉语言)\n* [Linear Spaces of Meanings: Compositional Structures in Vision-Language Models](http://arxiv.org/abs/2302.14383)\n* [ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation](http://arxiv.org/abs/2308.16689)\n* [Gloss-Free Sign Language Translation: Improving from Visual-Language Pretraining](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhou_Gloss-Free_Sign_Language_Translation_Improving_from_Visual-Language_Pretraining_ICCV_2023_paper.pdf)\n* [SuS-X: Training-Free Name-Only Transfer of Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Udandarao_SuS-X_Training-Free_Name-Only_Transfer_of_Vision-Language_Models_ICCV_2023_paper.pdf)\n* [Bayesian Prompt Learning for Image-Language Model Generalization](http://arxiv.org/abs/2210.02390)\n* [eP-ALM: Efficient Perceptual Augmentation of Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Shukor_eP-ALM_Efficient_Perceptual_Augmentation_of_Language_Models_ICCV_2023_paper.pdf)\n* [Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models](http://arxiv.org/abs/2303.17169)\n* [SLAN: Self-Locator Aided Network for Vision-Language Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhai_SLAN_Self-Locator_Aided_Network_for_Vision-Language_Understanding_ICCV_2023_paper.pdf)\n* [Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Borrowing_Knowledge_From_Pre-trained_Language_Model_A_New_Data-efficient_Visual_ICCV_2023_paper.pdf)\n* [VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching](https://openaccess.thecvf.com/content/ICCV2023/papers/Bi_VL-Match_Enhancing_Vision-Language_Pretraining_with_Token-Level_and_Instance-Level_Matching_ICCV_2023_paper.pdf)\n* [A Retrospect to Multi-prompt Learning across Vision and Language](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_A_Retrospect_to_Multi-prompt_Learning_across_Vision_and_Language_ICCV_2023_paper.pdf)\n* [CiT: Curation in Training for Effective Vision-Language Data](http://arxiv.org/abs/2301.02241)\n* [EgoTV: Egocentric Task Verification from Natural Language Task Descriptions](http://arxiv.org/abs/2303.16975)\n* [Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models](http://arxiv.org/abs/2303.06571)\n* [Towards Unifying Medical Vision-and-Language Pre-Training via Soft Prompts](http://arxiv.org/abs/2302.08958)\n* [Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models](http://arxiv.org/abs/2303.06628)\n* [Perceptual Grouping in Contrastive Vision-Language Models](http://arxiv.org/abs/2210.09996)\n* [Black Box Few-Shot Adaptation for Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ouali_Black_Box_Few-Shot_Adaptation_for_Vision-Language_Models_ICCV_2023_paper.pdf)\n* [CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation](http://arxiv.org/abs/2308.07146)\n* [VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_VL-PET_Vision-and-Language_Parameter-Efficient_Tuning_via_Granularity_Control_ICCV_2023_paper.pdf)\n* [GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training](http://arxiv.org/abs/2308.11331)\n* [I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Gu_I_Cant_Believe_Theres_No_Images_Learning_Visual_Tasks_Using_ICCV_2023_paper.pdf)\n* [Too Large; Data Reduction for Vision-Language Pre-Training](http://arxiv.org/abs/2305.20087)\n* [Equivariant Similarity for Vision-Language Foundation Models](http://arxiv.org/abs/2303.14465)\n* [Going Beyond Nouns With Vision \u0026 Language Models Using Synthetic Data](http://arxiv.org/abs/2303.17590)\n* [SINC: Self-Supervised In-Context Learning for Vision-Language Tasks](http://arxiv.org/abs/2307.07742)\n* [Unified Visual Relationship Detection with Vision and Language Models](http://arxiv.org/abs/2303.08998)\n* [ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models](http://arxiv.org/abs/2307.00398)\n* [Distilling Large Vision-Language Model with Out-of-Distribution Generalizability](http://arxiv.org/abs/2307.03135)\n* [Distribution-Aware Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2309.03406v1)\u003cbr\u003e:star:[code](https://github.com/mlvlab/DAPT)\n* [LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models](http://arxiv.org/abs/2309.01155v1)\u003cbr\u003e:star:[code](https://chengshiest.github.io/logo)\n* [CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation](http://arxiv.org/abs/2308.15226v1)\n* [GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training](http://arxiv.org/abs/2308.11331v1)\n* [RLIPv2: Fast Scaling of Relational Language-Image Pre-training](http://arxiv.org/abs/2308.09351v1)\u003cbr\u003e:star:[code](https://github.com/JacobYuan7/RLIPv2)\n* [Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models](http://arxiv.org/abs/2307.15049v1)\u003cbr\u003e:star:[code](https://wuw2019.github.io/RMT/)\n* [Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?](http://arxiv.org/abs/2307.11978v1)\u003cbr\u003e:star:[code](https://github.com/CEWu/PTNL)\n* [Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models](http://arxiv.org/abs/2307.14061v1)\u003cbr\u003e:thumbsup:[ICCV 2023 Oral | 南科大VIP Lab | 针对VLP模型的集合级引导攻击](https://mp.weixin.qq.com/s/bE97oBoa4nH1c5XuOz4WWA)\n* [CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation](http://arxiv.org/abs/2308.07146v1)\u003cbr\u003e:star:[code](https://github.com/KevinLight831/CTP)\n* [VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control](http://arxiv.org/abs/2308.09804v1)\u003cbr\u003e:star:[code](https://github.com/HenryHZY/VL-PET)\n* [Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models](http://arxiv.org/abs/2308.11186v1)\n* [VLSlice: Interactive Vision-and-Language Slice Discovery](http://arxiv.org/abs/2309.06703v1)\u003cbr\u003e:house:[project](https://ericslyman.com/vlslice/)\n* [What does CLIP know about a red circle? Visual prompt engineering for VLMs](http://arxiv.org/abs/2304.06712)\n* [BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization](http://arxiv.org/abs/2307.08504)\n* 视觉表示学习\n  * [Hallucination Improves the Performance of Unsupervised Visual Representation Learning](http://arxiv.org/abs/2307.12168v1)\n  * [Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning](http://arxiv.org/abs/2212.06486)\n  * [ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data](http://arxiv.org/abs/2308.11194v1)\n* VLN\n  * [Learning Vision-and-Language Navigation from YouTube Videos](http://arxiv.org/abs/2307.11984v1)\u003cbr\u003e:star:[code](https://github.com/JeremyLinky/YouTube-VLN)\n  * [Learning Navigational Visual Representations with Semantic Map Supervision](http://arxiv.org/abs/2307.12335)\n  * [Grounded Entity-Landmark Adaptive Pre-Training for Vision-and-Language Navigation](http://arxiv.org/abs/2308.12587)\n  * [GridMM: Grid Memory Map for Vision-and-Language Navigation](http://arxiv.org/abs/2307.12907v1)\n  * [Scaling Data Generation in Vision-and-Language Navigation](http://arxiv.org/abs/2307.15644v1)\n  * [Bird's-Eye-View Scene Graph for Vision-Language Navigation](http://arxiv.org/abs/2308.04758v1)\u003cbr\u003e:star:[code](https://github.com/DefaultRui/BEV-Scene-Graph)\n  * [AerialVLN: Vision-and-Language Navigation for UAVs](http://arxiv.org/abs/2308.06735v1)\u003cbr\u003e:star:[code](https://github.com/AirVLN/AirVLN)\n  * [DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation](http://arxiv.org/abs/2308.07498v1)\u003cbr\u003e:star:[code](https://github.com/hanqingwangai/Dreamwalker)\n  * [VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation](http://arxiv.org/abs/2308.10172v1)\n  * [March in Chat: Interactive Prompting for Remote Embodied Referring Expression](http://arxiv.org/abs/2308.10141v1)\n  * [Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation](http://arxiv.org/abs/2308.12587v1)\n  * [BEVBert: Multimodal Map Pre-training for Language-guided Navigation](https://arxiv.org/pdf/2212.04385.pdf)\u003cbr\u003e:star:[code](https://github.com/MarSaKi/VLN-BEVBert)\n* Visual Grounding(视觉定位)\n  * [ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding](http://arxiv.org/abs/2303.16894)\n  * [Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding](http://arxiv.org/abs/2307.09267)\n  * [Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Confidence-aware_Pseudo-label_Learning_for_Weakly_Supervised_Visual_Grounding_ICCV_2023_paper.pdf)\n* Video-Language\n  * [HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training](http://arxiv.org/abs/2212.14546) \n  * [Learning Trajectory-Word Alignments for Video-Language Tasks](http://arxiv.org/abs/2301.01953)\n  * [HiVLP: Hierarchical Interactive Video-Language Pre-Training](https://openaccess.thecvf.com/content/ICCV2023/papers/Shao_HiVLP_Hierarchical_Interactive_Video-Language_Pre-Training_ICCV_2023_paper.pdf)\n  * [Verbs in Action: Improving Verb Understanding in Video-Language Models](http://arxiv.org/abs/2304.06708)\n  * [Exploring Temporal Concurrency for Video-Language Representation Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Exploring_Temporal_Concurrency_for_Video-Language_Representation_Learning_ICCV_2023_paper.pdf)\n  * [EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone](http://arxiv.org/abs/2307.05463)\n  * [SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-Training](http://arxiv.org/abs/2211.11446)\n* 视觉推理\n  * [ViperGPT: Visual Inference via Python Execution for Reasoning](https://openaccess.thecvf.com/content/ICCV2023/papers/Suris_ViperGPT_Visual_Inference_via_Python_Execution_for_Reasoning_ICCV_2023_paper.pdf) \n* LLM\n  * [LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_LLM-Planner_Few-Shot_Grounded_Planning_for_Embodied_Agents_with_Large_Language_ICCV_2023_paper.pdf)\n  * [Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts](http://arxiv.org/abs/2308.11793v1)\u003cbr\u003e:star:[code](https://github.com/VITA-Group/GNT-MOVE)\n\n\u003ca name=\"37\"/\u003e\n\n## 37.Object Pose Estimation(物体姿势估计)\n* [IST-Net: Prior-Free Category-Level Pose Estimation with Implicit Space Transformation](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_IST-Net_Prior-Free_Category-Level_Pose_Estimation_with_Implicit_Space_Transformation_ICCV_2023_paper.pdf)\n* [LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_LU-NeRF_Scene_and_Pose_Estimation_by_Synchronizing_Local_Unposed_NeRFs_ICCV_2023_paper.pdf)\n* [PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment](http://arxiv.org/abs/2306.15667)\n* [Nonrigid Object Contact Estimation With Regional Unwrapping Transformer](http://arxiv.org/abs/2308.14074v1)\n* 6D\n  * [Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation](http://arxiv.org/abs/2308.05438v1)\n  * [SOCS: Semantically-Aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations](http://arxiv.org/abs/2303.10346)\n  * [Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation](http://arxiv.org/abs/2303.11516)\n  * [Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation](http://arxiv.org/abs/2308.10016v1)\n  * [Center-Based Decoupled Point-cloud Registration for 6D Object Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Jiang_Center-Based_Decoupled_Point-cloud_Registration_for_6D_Object_Pose_Estimation_ICCV_2023_paper.pdf)\n  * [Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Query6DoF_Learning_Sparse_Queries_as_Implicit_Shape_Prior_for_Category-Level_ICCV_2023_paper.pdf)\n  * [VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations](http://arxiv.org/abs/2308.09916v1)\u003cbr\u003e:star:[code](https://github.com/JiehongLin/VI-Net)\n  * [Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Learning_Symmetry-Aware_Geometry_Correspondences_for_6D_Object_Pose_Estimation_ICCV_2023_paper.pdf)\n  * [3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation](http://arxiv.org/abs/2302.03744)\n* 物体计数\n  * [STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning](http://arxiv.org/abs/2308.10468v1)\u003cbr\u003e:star:[code](https://github.com/taohan10200/STEERER)\n  * [Interactive Class-Agnostic Object Counting](http://arxiv.org/abs/2309.05277)\n  * [A Low-Shot Object Counting Network With Iterative Prototype Adaptation](https://openaccess.thecvf.com/content/ICCV2023/papers/Dukic_A_Low-Shot_Object_Counting_Network_With_Iterative_Prototype_Adaptation_ICCV_2023_paper.pdf)\n* 动物姿势估计\n  * [Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape](http://arxiv.org/abs/2308.11737)\n\n\u003ca name=\"36\"/\u003e\n\n## 36.Vision Question Answering(视觉问答)\n* [Toward Unsupervised Realistic Visual Question Answering](http://arxiv.org/abs/2303.05068)\n* [Variational Causal Inference Network for Explanatory Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Xue_Variational_Causal_Inference_Network_for_Explanatory_Visual_Question_Answering_ICCV_2023_paper.pdf)\n* [VQA Therapy: Exploring Answer Differences by Visually Grounding Answers](http://arxiv.org/abs/2308.11662)\n* [VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_VQA-GNN_Reasoning_with_Multimodal_Knowledge_via_Graph_Neural_Networks_for_ICCV_2023_paper.pdf)\n* [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](http://arxiv.org/abs/2303.11897)\n* [Encyclopedic VQA: Visual Questions About Detailed Properties of Fine-Grained Categories](http://arxiv.org/abs/2306.09224)\n* [PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_PromptCap_Prompt-Guided_Image_Captioning_for_VQA_with_GPT-3_ICCV_2023_paper.pdf)\n* [Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Qian_Decouple_Before_Interact_Multi-Modal_Prompt_Learning_for_Continual_Visual_Question_ICCV_2023_paper.pdf)\n* Video-QA\n  * [Discovering Spatio-Temporal Rationales for Video Question Answering](http://arxiv.org/abs/2307.12058v1)\u003cbr\u003e:star:[code](https://github.com/yl3800/TranSTR)\n  * [Knowledge Proxy Intervention for Deconfounded Video Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Knowledge_Proxy_Intervention_for_Deconfounded_Video_Question_Answering_ICCV_2023_paper.pdf)\n  * [Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer](http://arxiv.org/abs/2308.08414v1)\n  * [Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models](http://arxiv.org/abs/2308.09363v1)\u003cbr\u003e:star:[code](https://github.com/mlvlab/OVQA)\n  * [Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Tem-Adapter_Adapting_Image-Text_Pretraining_for_Video_Question_Answer_ICCV_2023_paper.pdf)\n* 视频意图推理\n  * [IntentQA: Context-aware Video Intent Reasoning](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_IntentQA_Context-aware_Video_Intent_Reasoning_ICCV_2023_paper.pdf)\n\n\u003ca name=\"35\"/\u003e\n\n## 35.Human Motion Prediction(人体运动预测)\n* [Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction](http://arxiv.org/abs/2308.08942v1)\u003cbr\u003e:star:[code](https://github.com/MediaBrain-SJTU/AuxFormer)\n* [Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders](http://arxiv.org/abs/2308.09882v1)\u003cbr\u003e:star:[code](https://github.com/jchengai/forecast-mae)\n* [Priority-Centric Human Motion Generation in Discrete Latent Space](http://arxiv.org/abs/2308.14480v1)\n* [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://openaccess.thecvf.com/content/ICCV2023/papers/Seff_MotionLM_Multi-Agent_Motion_Forecasting_as_Language_Modeling_ICCV_2023_paper.pdf)\n* [HumanMAC: Masked Motion Completion for Human Motion Prediction](http://arxiv.org/abs/2302.03665)\n* [Joint-Relation Transformer for Multi-Person Motion Prediction](http://arxiv.org/abs/2308.04808)\n* [Bootstrap Motion Forecasting With Self-Consistent Constraints](http://arxiv.org/abs/2204.05859)\n* [PhysDiff: Physics-Guided Human Motion Diffusion Model](http://arxiv.org/abs/2212.02500)\n* [AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism](http://arxiv.org/abs/2309.00796)\n* [BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction](http://arxiv.org/abs/2211.14304)\n* [Social Diffusion: Long-term Multiple Human Motion Anticipation](https://openaccess.thecvf.com/content/ICCV2023/papers/Tanke_Social_Diffusion_Long-term_Multiple_Human_Motion_Anticipation_ICCV_2023_paper.pdf)\n* [SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation](http://arxiv.org/abs/2304.10417)\n\n\u003ca name=\"34\"/\u003e\n\n## 34.Action Detection(动作识别)\n* [Multimodal Distillation for Egocentric Action Recognition](http://arxiv.org/abs/2307.07483)\n* [Memory-and-Anticipation Transformer for Online Action Understanding](http://arxiv.org/abs/2308.07893v1)\u003cbr\u003e:star:[code](https://github.com/Echo0125/Memory-and-Anticipation-Transformer)\n* [Masked Motion Predictors are Strong 3D Action Representation Learners](http://arxiv.org/abs/2308.07092v1)\u003cbr\u003e:star:[code](https://github.com/maoyunyao/MAMP)\n* [Efficient Video Action Detection with Token Dropout and Context Refinement](http://arxiv.org/abs/2304.08451)\n* [Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Wasim_Video-FocalNets_Spatio-Temporal_Focal_Modulation_for_Video_Action_Recognition_ICCV_2023_paper.pdf)\n* [E2E-LOAD: End-to-End Long-form Online Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Cao_E2E-LOAD_End-to-End_Long-form_Online_Action_Detection_ICCV_2023_paper.pdf)\n* [Ego-Only: Egocentric Action Detection without Exocentric Transferring](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Ego-Only_Egocentric_Action_Detection_without_Exocentric_Transferring_ICCV_2023_paper.pdf)\n* [Cross-Modal Learning with 3D Deformable Attention for Action Recognition](http://arxiv.org/abs/2212.05638)\n* [DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion](http://arxiv.org/abs/2303.14863)\n* [STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition](http://arxiv.org/abs/2301.03046)\n* [MiniROAD: Minimal RNN Framework for Online Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/An_MiniROAD_Minimal_RNN_Framework_for_Online_Action_Detection_ICCV_2023_paper.pdf)\n* [Video Action Recognition with Attentive Semantic Units](http://arxiv.org/abs/2303.09756)\n* [A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition](http://arxiv.org/abs/2303.13505)\n* [What Can a Cook in Italy Teach a Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations](http://arxiv.org/abs/2306.08713)\n* 基于骨架的动作识别\n  * [LAC -- Latent Action Composition for Skeleton-based Action Segmentation](http://arxiv.org/abs/2308.14500v1)\n  * [Generative Action Description Prompts for Skeleton-based Action Recognition](http://arxiv.org/abs/2208.05318)\n  * [Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Parallel_Attention_Interaction_Network_for_Few-Shot_Skeleton-Based_Action_Recognition_ICCV_2023_paper.pdf)\n  * [Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition](http://arxiv.org/abs/2212.04761)\n  * [Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition](http://arxiv.org/abs/2208.10741)\n  * [Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_Modeling_the_Relative_Visual_Tempo_for_Self-supervised_Skeleton-based_Action_Recognition_ICCV_2023_paper.pdf)\n  * [SkeleTR: Towrads Skeleton-based Action Recognition in the Wild](http://arxiv.org/abs/2309.11445v1)\n  * [Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient](http://arxiv.org/abs/2308.05681)\n  * [FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation](http://arxiv.org/abs/2306.11046)\n* 开集动作识别\n  * [SOAR: Scene-debiasing Open-set Action Recognition](http://arxiv.org/abs/2309.01265v1)\u003cbr\u003e:star:[code](https://github.com/yhZhai/SOAR)\n* 零样本动作识别\n  * [MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge](http://arxiv.org/abs/2303.08914)\n* 小样本动作识别\n  * [Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching](http://arxiv.org/abs/2308.09346v1)\u003cbr\u003e:star:[code](https://github.com/jiazheng-xing/GgHM)\n* 时序动作定位\n  * [DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization](http://arxiv.org/abs/2307.16415v1)\u003cbr\u003e:star:[code](https://github.com/XiaojunTang22/ICCV2023-DDGNet)\n  * [Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Movement_Enhancement_toward_Multi-Scale_Video_Feature_Representation_for_Temporal_Action_ICCV_2023_paper.pdf)\n  * [Self-Feedback DETR for Temporal Action Detection](http://arxiv.org/abs/2308.10570v1)\n  * [Action Sensitivity Learning for Temporal Action Localization](http://arxiv.org/abs/2305.15701)\n  * [Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Revisiting_Foreground_and_Background_Separation_in_Weakly-supervised_Temporal_Action_Localization_ICCV_2023_paper.pdf)\n  * [Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Xia_Learning_from_Noisy_Pseudo_Labels_for_Semi-Supervised_Temporal_Action_Localization_ICCV_2023_paper.pdf)\n* 弱监督动作定位\n  * [Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling](http://arxiv.org/abs/2308.09946v1)\n* 小样本动作定位\n  * [Few-Shot Common Action Localization via Cross-Attentional Fusion of Context and Temporal Dynamics](https://openaccess.thecvf.com/content/ICCV2023/papers/Lee_Few-Shot_Common_Action_Localization_via_Cross-Attentional_Fusion_of_Context_and_ICCV_2023_paper.pdf)\n* 动作理解\n  * [Memory-and-Anticipation Transformer for Online Action Understanding](http://arxiv.org/abs/2308.07893)\n\n\u003ca name=\"33\"/\u003e\n\n## 33.Video(视频)\n* [Neural Video Depth Stabilizer](http://arxiv.org/abs/2307.08695)\n* [NPC: Neural Point Characters from Video](http://arxiv.org/abs/2304.02013)\n* [Localizing Moments in Long Video Via Multimodal Guidance](http://arxiv.org/abs/2302.13372)\n* [Order-Prompted Tag Sequence Generation for Video Tagging](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Order-Prompted_Tag_Sequence_Generation_for_Video_Tagging_ICCV_2023_paper.pdf)\n* [Moment Detection in Long Tutorial Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Croitoru_Moment_Detection_in_Long_Tutorial_Videos_ICCV_2023_paper.pdf)\n* [MMVP: Motion-Matrix-based Video Prediction](http://arxiv.org/abs/2308.16154v1)\u003cbr\u003e:star:[code](https://github.com/Kay1794/MMVP-motion-matrix-based-video-prediction)\n* [D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation](http://arxiv.org/abs/2308.04197v1)\u003cbr\u003e:star:[code](https://github.com/solicucu/D3G)\n* [LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction](http://arxiv.org/abs/2308.11116v1)\n* [TALL: Thumbnail Layout for Deepfake Video Detection](http://arxiv.org/abs/2307.07494)\n* [Spatio-temporal Prompting Network for Robust Video Feature Extraction](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Spatio-temporal_Prompting_Network_for_Robust_Video_Feature_Extraction_ICCV_2023_paper.pdf)\n* [Neural Reconstruction of Relightable Human Model from Monocular Video](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Neural_Reconstruction_of_Relightable_Human_Model_from_Monocular_Video_ICCV_2023_paper.pdf)\n* 视频理解\n  * [Long-range Multimodal Pretraining for Movie Understanding](http://arxiv.org/abs/2308.09775v1)\n  * [RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D](http://arxiv.org/abs/2308.12035v1)\n  * [UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_UniFormerV2_Unlocking_the_Potential_of_Image_ViTs_for_Video_Understanding_ICCV_2023_paper.pdf)\n* 视频分类\n  * [ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded](https://openaccess.thecvf.com/content/ICCV2023/papers/Bulat_ReGen_A_good_Generative_Zero-Shot_Video_Classifier_Should_be_Rewarded_ICCV_2023_paper.pdf)\n  * [Gram-based Attentive Neural Ordinary Differential Equations Network for Video Nystagmography Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/Qiu_Gram-based_Attentive_Neural_Ordinary_Differential_Equations_Network_for_Video_Nystagmography_ICCV_2023_paper.pdf)\n  * [Few-Shot Video Classification via Representation Fusion and Promotion Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Xia_Few-Shot_Video_Classification_via_Representation_Fusion_and_Promotion_Learning_ICCV_2023_paper.pdf)\n* 视频合成\n  * [StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation](http://arxiv.org/abs/2308.16909v1)\u003cbr\u003e:house:[project](https://www.mmlab-ntu.com/project/styleinv/index.html)\u003cbr\u003e:star:[code](https://github.com/johannwyh/StyleInV)\n  * [Mixed Neural Voxels for Fast Multi-view Video Synthesis](http://arxiv.org/abs/2212.00190)\n  * [WALDO: Future Video Synthesis Using Object Layer Decomposition and Parametric Flow Prediction](http://arxiv.org/abs/2211.14308)\n  * [Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://openaccess.thecvf.com/content/ICCV2023/papers/Khachatryan_Text2Video-Zero_Text-to-Image_Diffusion_Models_are_Zero-Shot_Video_Generators_ICCV_2023_paper.pdf)\n  * [StyleLipSync: Style-based Personalized Lip-sync Video Generation](http://arxiv.org/abs/2305.00521)\n  * [Text2Performer: Text-Driven Human Video Generation](http://arxiv.org/abs/2304.08483)\n  * [DreamPose: Fashion Video Synthesis with Stable Diffusion](https://openaccess.thecvf.com/content/ICCV2023/papers/Karras_DreamPose_Fashion_Video_Synthesis_with_Stable_Diffusion_ICCV_2023_paper.pdf)\n  * [Structure and Content-Guided Video Synthesis with Diffusion Models](http://arxiv.org/abs/2302.03011)\n* 视频稳定\n  * [Fast Full-frame Video Stabilization with Iterative Optimization](http://arxiv.org/abs/2307.12774v1)\n  * [Minimum Latency Deep Online Video Stabilization](http://arxiv.org/abs/2212.02073)\n* Video Grounding(视频定位)\n  * [G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory](http://arxiv.org/abs/2307.14277v1)\n  * [UniVTG: Towards Unified Video-Language Temporal Grounding](http://arxiv.org/abs/2307.16715v1)\u003cbr\u003e:star:[code](https://github.com/showlab/UniVTG)\n  * [Knowing Where to Focus: Event-aware Transformer for Video Grounding](http://arxiv.org/abs/2308.06947v1)\u003cbr\u003e:star:[code](https://github.com/jinhyunj/EaTR)\n  * [Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos](http://arxiv.org/abs/2303.08345)\n* 视频分割\n  * [XMem++: Production-level Video Segmentation From Few Annotated Frames](http://arxiv.org/abs/2307.15958v1)\n  * [Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation](http://arxiv.org/abs/2309.13248v1)\u003cbr\u003e:star:[code](https://github.com/kfan21/EoRaS)\n  * [GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation](http://arxiv.org/abs/2309.11145v1)\u003cbr\u003e:star:[code](https://github.com/xmed-lab/GraphEcho)\n  * [MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions](http://arxiv.org/abs/2308.08544v1)\u003cbr\u003e:star:[code](https://henghuiding.github.io/MeViS)\u003cbr\u003e:star:[code](https://henghuiding.github.io/MeViS/)\u003cbr\u003e:thumbsup:[ICCV2023｜新数据集 MeViS：基于动作描述的视频分割](https://mp.weixin.qq.com/s/hHAgfiQdA_g0DkmgWPzLeg)\n  * [MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation](http://arxiv.org/abs/2308.11185v1)\n  * [Tracking Anything with Decoupled Video Segmentation](http://arxiv.org/abs/2309.03903v1)\u003cbr\u003e:star:[code](https://hkchengrex.github.io/Tracking-Anything-with-DEVA)\n  * [The Making and Breaking of Camouflage](http://arxiv.org/abs/2309.03899v1)\n  * [Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Tube-Link_A_Flexible_Cross_Tube_Framework_for_Universal_Video_Segmentation_ICCV_2023_paper.pdf)\n* 视频对应\n  * [Learning Fine-Grained Features for Pixel-wise Video Correspondences](http://arxiv.org/abs/2308.03040v1)\u003cbr\u003e:star:[code](https://github.com/qianduoduolr/FGVC)\n* 视频感知\n  * [ResQ: Residual Quantization for Video Perception](http://arxiv.org/abs/2308.09511v1)\n* 视频识别\n  * [Audio-Visual Glance Network for Efficient Video Recognition](http://arxiv.org/abs/2308.09322v1)\n  * [Efficient Decision-based Black-box Patch Attacks on Video Recognition](http://arxiv.org/abs/2303.11917)\n  * [Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition](http://arxiv.org/abs/2308.11489v1)\u003cbr\u003e:star:[code](https://github.com/wqtwjt1996/SUM-L)\n  * [Implicit Temporal Modeling with Learnable Alignment for Video Recognition](http://arxiv.org/abs/2304.10465)\n* 视频修补\n  * [ProPainter: Improving Propagation and Transformer for Video Inpainting](http://arxiv.org/abs/2309.03897v1)\u003cbr\u003e:star:[code](https://github.com/sczhou/ProPainter)\n* 视频表示学习\n  * [MGMAE: Motion Guided Masking for Video Masked Autoencoding](http://arxiv.org/abs/2308.10794v1)\n  * [Spatio-Temporal Crop Aggregation for Video Representation Learning](http://arxiv.org/abs/2211.17042)\n* VAD\n  * [TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection](http://arxiv.org/abs/2308.11072v1)\u003cbr\u003e:star:[code](https://joefioresi718.github.io/TeD-SPAD_webpage/)\n  * [Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks](https://openaccess.thecvf.com/content/ICCV2023/papers/Shi_Video_Anomaly_Detection_via_Sequentially_Learning_Multiple_Pretext_Tasks_ICCV_2023_paper.pdf)\n  * [Feature Prediction Diffusion Model for Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Yan_Feature_Prediction_Diffusion_Model_for_Video_Anomaly_Detection_ICCV_2023_paper.pdf)\n* Video Localization\n  * [UnLoc: A Unified Framework for Video Localization Tasks](http://arxiv.org/abs/2308.11062v1)\u003cbr\u003e:star:[code](https://github.com/google-research/scenic)\n  * [Video OWL-ViT: Temporally-consistent open-world localization in video](http://arxiv.org/abs/2308.11093v1)\n  * [Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Flaborea_Multimodal_Motion_Conditioned_Diffusion_Model_for_Skeleton-based_Video_Anomaly_Detection_ICCV_2023_paper.pdf)\n  * [TeD-SPAD: Temporal Distinctiveness for Self-Supervised Privacy-Preservation for Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Fioresi_TeD-SPAD_Temporal_Distinctiveness_for_Self-Supervised_Privacy-Preservation_for_Video_Anomaly_Detection_ICCV_2023_paper.pdf)\n* 视频预测\n  * [MMVP: Motion-Matrix-Based Video Prediction](http://arxiv.org/abs/2308.16154)\n  * [Efficient Video Prediction via Sparsely Conditioned Flow Matching](http://arxiv.org/abs/2211.14575)\n* 视频玻璃分割\n  * [Multi-view Spectral Polarization Propagation for Video Glass Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Qiao_Multi-view_Spectral_Polarization_Propagation_for_Video_Glass_Segmentation_ICCV_2023_paper.pdf)\n* 视频帧插值\n  * [Rethinking Video Frame Interpolation from Shutter Mode Induced Degradation](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_Rethinking_Video_Frame_Interpolation_from_Shutter_Mode_Induced_Degradation_ICCV_2023_paper.pdf)\n* 视频语义压缩\n  * [Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression](https://openaccess.thecvf.com/content/ICCV2023/papers/Tian_Non-Semantics_Suppressed_Mask_Learning_for_Unsupervised_Video_Semantic_Compression_ICCV_2023_paper.pdf)\n* 视频-视频翻译 \n  * [Shortcut-V2V: Compression Framework for Video-to-Video Translation Based on Temporal Redundancy Reduction](https://openaccess.thecvf.com/content/ICCV2023/papers/Chung_Shortcut-V2V_Compression_Framework_for_Video-to-Video_Translation_Based_on_Temporal_Redundancy_ICCV_2023_paper.pdf)\n\n\u003ca name=\"32\"/\u003e\n\n## 32.Sign Language Recognition(手语)\n* [Human Part-wise 3D Motion Context Learning for Sign Language Recognition](http://arxiv.org/abs/2308.09305v1)\n* [CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Jiao_CoSign_Exploring_Co-occurrence_Signals_in_Skeleton-based_Continuous_Sign_Language_Recognition_ICCV_2023_paper.pdf)\n* [Improving Continuous Sign Language Recognition with Cross-Lingual Signs](http://arxiv.org/abs/2308.10809v1)\n* [C2ST: Cross-Modal Contextualized Sequence Transduction for Continuous Sign Language Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_C2ST_Cross-Modal_Contextualized_Sequence_Transduction_for_Continuous_Sign_Language_Recognition_ICCV_2023_paper.pdf)\n* 手语翻译\n  * [Sign Language Translation with Iterative Prototype](http://arxiv.org/abs/2308.12191v1)\n  * [Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining](http://arxiv.org/abs/2307.14768v1)\u003cbr\u003e:star:[code](https://github.com/zhoubenjia/GFSLT-VLP)\n\n\u003ca name=\"31\"/\u003e\n\n## 31.Human-Object Interaction(人物交互)\n* [Full-Body Articulated Human-Object Interaction](http://arxiv.org/abs/2212.10621)\n* [Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory](http://arxiv.org/abs/2309.03696)\n* [Learning Human-Human Interactions in Images from Weak Textual Supervision](http://arxiv.org/abs/2304.14104)\n* [Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction](http://arxiv.org/abs/2307.12729v1)\n* [Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection](http://arxiv.org/abs/2307.13529v1)\n* [Agglomerative Transformer for Human-Object Interaction Detection](http://arxiv.org/abs/2308.08370v1)\n* [InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion](http://arxiv.org/abs/2308.16905v1)\u003cbr\u003e:star:[code](https://sirui-xu.github.io/InterDiff/)\n* [Exploring Predicate Visual Context in Detecting of Human-Object Interactions](http://arxiv.org/abs/2308.06202)\n* [Persistent-Transient Duality: A Multi-Mechanism Approach for Modeling Human-Object Interaction](http://arxiv.org/abs/2307.12729)\n* [Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning](http://arxiv.org/abs/2303.09410)\n* [Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting](https://openaccess.thecvf.com/content/ICCV2023/papers/Xi_Open_Set_Video_HOI_detection_from_Action-Centric_Chain-of-Look_Prompting_ICCV_2023_paper.pdf)\n* [Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Pi_Hierarchical_Generation_of_Human-Object_Interactions_with_Diffusion_Probabilistic_Models_ICCV_2023_paper.pdf)\n* 手物交互\n  * [EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding](http://arxiv.org/abs/2309.02423v1)\u003cbr\u003e:house:[project](https://mvig-rhos.com/ego_pca)\n  * [Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips](http://arxiv.org/abs/2309.05663v1)\u003cbr\u003e:star:[code](https://judyye.github.io/diffhoi-www/)\n  * [AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose](http://arxiv.org/abs/2309.08942v1)\u003cbr\u003e:star:[code](https://github.com/GentlesJan/AffordPose)\n  * [Novel-View Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views](http://arxiv.org/abs/2308.11198)\n\n\u003ca name=\"30\"/\u003e\n\n## 30.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)\n* 虚拟人物生成\n  * [MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions](http://arxiv.org/abs/2307.10008v1)\n  * [GETAvatar: Generative Textured Meshes for Animatable Human Avatars](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_GETAvatar_Generative_Textured_Meshes_for_Animatable_Human_Avatars_ICCV_2023_paper.pdf)\n  * [NSF: Neural Surface Fields for Human Modeling from Monocular Depth](http://arxiv.org/abs/2308.14847v1)\u003cbr\u003e:house:[project](https://yuxuan-xue.com/nsf)\n  * [AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control](http://arxiv.org/abs/2303.17606)\n  * [DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars](http://arxiv.org/abs/2303.09375)\n* 机器人\n  * [Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly](http://arxiv.org/abs/2309.06810v1)\u003cbr\u003e:star:[code](https://github.com/crtie/Leveraging-SE-3-Equivariance-for-Learning-3D-Geometric-Shape-Assembly)\u003cbr\u003e:star:[code](https://crtie.github.io/SE-3-part-assembly/)\n  * [PourIt!: Weakly-Supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring](http://arxiv.org/abs/2307.11299)\n* AR/VR\n  * [HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations](http://arxiv.org/abs/2308.11261v1)\n* SLAM\n  * [GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction](http://arxiv.org/abs/2309.02436v1)\u003cbr\u003e:star:[code](https://youmi-zym.github.io/projects/GO-SLAM/)\u003cbr\u003e:star:[code](https://github.com/youmi-zym/GO-SLAM)\n  * [Point-SLAM: Dense Neural Point Cloud-based SLAM](https://openaccess.thecvf.com/content/ICCV2023/papers/Sandstrom_Point-SLAM_Dense_Neural_Point_Cloud-based_SLAM_ICCV_2023_paper.pdf)\n  * [NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping](https://openaccess.thecvf.com/content/ICCV2023/papers/Deng_NeRF-LOAM_Neural_Implicit_Representation_for_Large-Scale_Incremental_LiDAR_Odometry_and_ICCV_2023_paper.pdf)\n  * [MV-Map: Offboard HD-Map Generation with Multi-view Consistency](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_MV-Map_Offboard_HD-Map_Generation_with_Multi-view_Consistency_ICCV_2023_paper.pdf)\n* 虚拟试穿\n  * [Virtual Try-On with Pose-Garment Keypoints Guided Inpainting](https://openaccess.thecvf","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52cv%2Ficcv-2023-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F52cv%2Ficcv-2023-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52cv%2Ficcv-2023-papers/lists"}