https://github.com/52cv/iccv-2023-papers

Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/52cv/iccv-2023-papers
Owner: 52CV
Created: 2023-07-20T01:42:22.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-11-14T02:38:35.000Z (over 2 years ago)
Last Synced: 2025-02-24T05:14:38.369Z (over 1 year ago)
Size: 1.6 MB
Stars: 248
Watchers: 6
Forks: 12
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ICCV-2023-Papers

![Alt text](af0c53186833a908a200f58867b6dcf.png)

## 官网链接：https://iccv2023.thecvf.com/

### 研讨会:bell:：2023 年 10 月 2 日至 3 日


### 主会:bell:：2023 年 10 月 4 日至 6 日

## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~

## 2024 年论文分类汇总戳这里

↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)

## 2023 年论文分类汇总戳这里

↘️[CVPR-2023-Papers](https://github.com/52CV/CVPR-2023-Papers)

↘️[WACV-2023-Papers](https://github.com/52CV/WACV-2023-Papers)

↘️[ICCV-2023-Papers](https://github.com/52CV/ICCV-2023-Papers)

## [2022 年论文分类汇总戳这里](#000)

## [2021 年论文分类汇总戳这里](#00)

## [2020 年论文分类汇总戳这里](#0)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|

|------|------|------|------|

|[1.其它(others)](#1)|[2.3D(三维重建\三维视觉)](#2)|[3.Object Detection(目标检测)](#3)|[4.Object Tracking(目标跟踪)](#4)|

|[5.Biometric Recognition(生物特征识别)](#5)|[6.Face(人脸)](#6)|[7.Image Progress(低层图像处理、质量评价)](#7)|[8.Image Segmentation(图像分割)](#8)|

|[9.Image Classification(图像分类)](#9)|[10.Image Synthesis(图像合成)](#10)|[11.Image/Video Editing(图像/视频编辑)](#11)|[12.Medical Image(医学影像)](#12)|

|[13.Image Captions(图像字幕)](#13)|[14.Image/Video Composition(图像/视频压缩)](#14)|[15.Image/Video Retrieval(图像/视频检索)](#15)|[16.Super-Resolution(超分辨率)](#16)|

|[17.GAN](#17)|[18.Pose](#18)|[19.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#19)|[20.Reid](#20)|

|[21.Point Cloud(点云)](#21)|[22.OCR](#22)|[23.Optical Flow Estimation(光流估计)](#23)|[24.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)](#24)|

|[25.Model Compression/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#25)|[26.ML(机器学习)](#26)|[27.Self/Semi-Supervised Learning](#27)|[28.Style Transfer(风格迁移)](#28)|

|[29.Autonomous vehicles(自动驾驶)](#30)|[30.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)](#30)|[31.HOI(人物交互)](#31)|[32.Sign Language Recognition(手语)](#32)|

|[33.Video](#33)|[34.Action Detection](#34)|[35.Human Motion Prediction(人体运动预测)](#35)|[36.Vision Question Answering(视觉问答)](#36)|

|[37.Object Pose Estimation(物体姿势估计)](#37)|[38.Vision-Language(视觉语言)](#38)|[39.Keypoint Detection(关键点检测)](#39)|[40.Anomaly Detection(异常检测)](#40)|

|[41.Vision Transformers](#41)|[42.Dataset/Benchmark](#42)|[43.Neural Radiance Fields](#43)|[44.Rendering(渲染)](#44)|

|[45.Scene Graph Generation(场景图合成)](#45)|[46.Edge Detection](#46)|[47.Image-to-Image Translation](#47)|[48.Image Reconstruction](#48)|

|[49.Image Fusion(图像融合)](#49)|[50.Image Matching(图像匹配)](#50)|[51.Visual Place Recognition](#51)|[52.View Synthesis(视图合成)](#52)|

|[53.Computed Imaging(计算成像，如光学、几何、光场成像等)](#53)|[54.Gaze Estimation](#54)|[55.sound(语音)](#55)|[56.NAS](#56)|

|[57.Semantic Scene Completion(语义场景补全)](#57)|[58.scene flow estimation(场景流估计)](#58)|[59.Copyright Protection(版权保护/信息安全)](#59)|[60.Visual Localization(视觉定位)](#60)|

|[61.Natural Language Progress(NLP)](#61)|[62.Group Affect Recognition(群体情感识别)](#62)|[63.Industrial Defect Detectors](#63)|[64.Scene Understanding(场景理解)](#64)|

|[65.Deepfake Detectors](#65)|[66.Graph Neural Networks(图神经网络)](#66)|[67.Open Set Recognition(开集识别)](#67)|[68.Clustering(聚类)](#68)|

|[69.Affordance Learning(启示学习)](#69)|[70.Active Learning(主动学习)](#70)|[71.Data Augmentation(数据增强)](#71)|[72.Dense Prediction(密集预测)](#72)|

|[73.Spiking Neural Networks](#73)|

## 💥💥💥ICCV 2023 获奖论文

### 最佳论文奖——马尔奖

* [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/pdf/2302.05543.pdf)
:star:[code](https://github.com/lllyasviel/ControlNet)

* [Passive Ultra-Wideband Single-Photon Imaging](https://appleswithacapitala.github.io/static/docs/paper.pdf)
:star:[code](https://appleswithacapitala.github.io/)

### 最佳论文奖提名

* [Segment Anything](https://arxiv.org/abs/2304.02643)
:house:[project](https://segment-anything.com/)

### 最佳学生论文奖

* [Tracking Everything Everywhere All at Once](https://browse.arxiv.org/pdf/2306.05422.pdf)
:house:[project](https://github.com/qianqianwang68/omnimotion)


:thumbsup:[ICCV 2023 数据集分享（含水下图像视频、阴影去除、目标检测跟踪分割、交互、超分等）](https://mp.weixin.qq.com/s/XK943x4INOGHzD5Kvvo_Hw)


:thumbsup:[ICCV 2023 数据集分享（含动人物姿态、自动驾驶、遥感、去雪、人脸、VOS等）](https://zhuanlan.zhihu.com/p/660344176)



## 78.Sketch

* [CLIPascene: Scene Sketching with Different Types and Levels of Abstraction](http://arxiv.org/abs/2211.17256)



## 73.Spiking Neural Networks

* [RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks](http://arxiv.org/abs/2308.06787v1)

* [Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks](https://openaccess.thecvf.com/content/ICCV2023/papers/Meng_Towards_Memory-_and_Time-Efficient_Backpropagation_for_Training_Spiking_Neural_Networks_ICCV_2023_paper.pdf)

* [SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_SSF_Accelerating_Training_of_Spiking_Neural_Networks_with_Stabilized_Spiking_ICCV_2023_paper.pdf)

* [Inherent Redundancy in Spiking Neural Networks](http://arxiv.org/abs/2308.08227v1)
:star:[code](https://github.com/BICLab/ASA-SNN)

* [Membrane Potential Batch Normalization for Spiking Neural Networks](http://arxiv.org/abs/2308.08359v1)
:star:[code](https://github.com/yfguo91/MPBN)

* [Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Unleashing_the_Potential_of_Spiking_Neural_Networks_with_Dynamic_Confidence_ICCV_2023_paper.pdf)

* [Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wei_Temporal-Coded_Spiking_Neural_Networks_with_Dynamic_Firing_Threshold_Learning_with_ICCV_2023_paper.pdf)

* [Efficient Converted Spiking Neural Network for 3D and 2D Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/Lan_Efficient_Converted_Spiking_Neural_Network_for_3D_and_2D_Classification_ICCV_2023_paper.pdf)



## 72.Dense Prediction(密集预测)

* [Multi-Task Learning with Knowledge Distillation for Dense Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Multi-Task_Learning_with_Knowledge_Distillation_for_Dense_Prediction_ICCV_2023_paper.pdf)

* [Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera](https://openaccess.thecvf.com/content/ICCV2023/papers/Cai_Consistent_Depth_Prediction_for_Transparent_Object_Reconstruction_from_RGB-D_Camera_ICCV_2023_paper.pdf)

* [EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Cai_EfficientViT_Lightweight_Multi-Scale_Attention_for_High-Resolution_Dense_Prediction_ICCV_2023_paper.pdf)



## 71.Data Augmentation(数据增强)

* [HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness](http://arxiv.org/abs/2307.11823v1)

* [MixBag: Bag-Level Data Augmentation for Learning from Label Proportions](http://arxiv.org/abs/2308.08822v1)

* [When to Learn What: Model-Adaptive Data Augmentation Curriculum](http://arxiv.org/abs/2309.04747)

* [Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning](http://arxiv.org/abs/2308.06038)



## 70.Active Learning(主动学习)

* [TiDAL: Learning Training Dynamics for Active Learning](http://arxiv.org/abs/2210.06788)

* [HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling](http://arxiv.org/abs/2301.10460)

* [Knowledge-Aware Federated Active Learning with Non-IID Data](http://arxiv.org/abs/2211.13579)

* [Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Skip-Plan_Procedure_Planning_in_Instructional_Videos_via_Condensed_Action_Space_ICCV_2023_paper.pdf)



## 69.Affordance Learning(启示学习)

* [MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects](https://openaccess.thecvf.com/content/ICCV2023/papers/Liang_MAAL_Multimodality-Aware_Autoencoder-Based_Affordance_Learning_for_3D_Articulated_Objects_ICCV_2023_paper.pdf)



## 68.Clustering(聚类)

* [Deep Multiview Clustering by Contrasting Cluster Assignments](http://arxiv.org/abs/2304.10769)

* [Cross-modal Scalable Hyperbolic Hierarchical Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Long_Cross-modal_Scalable_Hierarchical_Clustering_in_Hyperbolic_space_ICCV_2023_paper.pdf)

* [Cross-view Topology Based Consistent and Complementary Information for Deep Multi-view Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_Cross-view_Topology_Based_Consistent_and_Complementary_Information_for_Deep_Multi-view_ICCV_2023_paper.pdf)

* [MHCN: A Hyperbolic Neural Network Model for Multi-view Hierarchical Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_MHCN_A_Hyperbolic_Neural_Network_Model_for_Multi-view_Hierarchical_Clustering_ICCV_2023_paper.pdf)

* [Stable Cluster Discrimination for Deep Clustering](https://openaccess.thecvf.com/content/ICCV2023/papers/Qian_Stable_Cluster_Discrimination_for_Deep_Clustering_ICCV_2023_paper.pdf)

* [Unsupervised Manifold Linearizing and Clustering](http://arxiv.org/abs/2301.01805)

* [Anchor Structure Regularization Induced Multi-view Subspace Clustering via Enhanced Tensor Rank Minimization](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_Anchor_Structure_Regularization_Induced_Multi-view_Subspace_Clustering_via_Enhanced_Tensor_ICCV_2023_paper.pdf)

* [Surface Normal Clustering for Implicit Representation of Manhattan Scenes](http://arxiv.org/abs/2212.01331)



## 67.Open Set Recognition(开集识别)

* [FedPD: Federated Open Set Recognition with Parameter Disentanglement](https://openaccess.thecvf.com/content/ICCV2023/papers/Yang_FedPD_Federated_Open_Set_Recognition_with_Parameter_Disentanglement_ICCV_2023_paper.pdf)



## 66.Graph Neural Networks(图神经网络)

* [VertexSerum: Poisoning Graph Neural Networks for Link Inference](http://arxiv.org/abs/2308.01469)

* [Learning Adaptive Neighborhoods for Graph Neural Networks](http://arxiv.org/abs/2307.09065)

* [Vision HGNN: An Image is More than a Graph of Nodes](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Vision_HGNN_An_Image_is_More_than_a_Graph_of_ICCV_2023_paper.pdf)



## 65.Deepfake Detectors

* [Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View](https://openaccess.thecvf.com/content/ICCV2023/papers/Yao_Towards_Understanding_the_Generalization_of_Deepfake_Detectors_from_a_Game-Theoretical_ICCV_2023_paper.pdf)

* [Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning](http://arxiv.org/abs/2309.05911)

* [SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes](http://arxiv.org/abs/2211.11296)

* [UCF: Uncovering Common Features for Generalizable Deepfake Detection](http://arxiv.org/abs/2304.13949)



## 64.Scene Understanding(场景理解)

* [Shape Anchor Guided Holistic Indoor Scene Understanding](http://arxiv.org/abs/2309.11133)

* [Efficient Computation Sharing for Multi-Task Visual Scene Understanding](http://arxiv.org/abs/2303.09663)

* [Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Agarwal_Ordered_Atomic_Activity_for_Fine-grained_Interactive_Traffic_Scenario_Understanding_ICCV_2023_paper.pdf)

* [Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting](http://arxiv.org/abs/2304.03763)

* [Human-centric Scene Understanding for 3D Large-scale Scenarios](http://arxiv.org/abs/2307.14392)



## 63.Industrial Defect Detectors

* 工业缺陷定位

  * [Removing Anomalies as Noises for Industrial Defect Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Lu_Removing_Anomalies_as_Noises_for_Industrial_Defect_Localization_ICCV_2023_paper.pdf)

* 工业异常检测

  * [PNI : Industrial Anomaly Detection using Position and Neighborhood Information](http://arxiv.org/abs/2211.12634)

  * [FastRecon: Few-shot Industrial Anomaly Detection via Fast Feature Reconstruction](https://openaccess.thecvf.com/content/ICCV2023/papers/Fang_FastRecon_Few-shot_Industrial_Anomaly_Detection_via_Fast_Feature_Reconstruction_ICCV_2023_paper.pdf)

* 裂缝检测

  * [The Devil is in the Crack Orientation: A New Perspective for Crack Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_The_Devil_is_in_the_Crack_Orientation_A_New_Perspective_ICCV_2023_paper.pdf)



## 62.Group Affect Recognition(群体情感识别)

* [Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_Most_Important_Person-Guided_Dual-Branch_Cross-Patch_Attention_for_Group_Affect_Recognition_ICCV_2023_paper.pdf)



## 61.Natural Language Progress(NLP)

* [Improved Visual Fine-tuning with Natural Language Supervision](http://arxiv.org/abs/2304.01489)

* [Tracking by Natural Language Specification with Long Short-term Context Decoupling](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Tracking_by_Natural_Language_Specification_with_Long_Short-term_Context_Decoupling_ICCV_2023_paper.pdf)



## 60.Visual Localization(视觉定位)

* [EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization](http://arxiv.org/abs/2309.07471v1)

* [Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance](http://arxiv.org/abs/2309.07403v1)

* [Yes, we CANN: Constrained Approximate Nearest Neighbors for Local Feature-Based Visual Localization](http://arxiv.org/abs/2306.09012)

* [OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_OFVL-MS_Once_for_Visual_Localization_across_Multiple_Indoor_Scenes_ICCV_2023_paper.pdf)



## 59.Copyright Protection(版权保护/信息安全)

* [Towards Robust Model Watermark via Reducing Parametric Vulnerability](http://arxiv.org/abs/2309.04777v1)
:star:[code](https://github.com/GuanhaoGan/robust-model-watermarking)

* [CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields](http://arxiv.org/abs/2307.11526v1)



## 58.scene flow estimation(场景流估计)

* [EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity](http://arxiv.org/abs/2309.01296v1)

* [Fast Neural Scene Flow](http://arxiv.org/abs/2304.09121)

* [Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud Based Scene Flow Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_Multi-Scale_Bidirectional_Recurrent_Network_with_Hybrid_Correlation_for_Point_Cloud_ICCV_2023_paper.pdf)

* [IHNet: Iterative Hierarchical Network Guided by High-Resolution Estimated Information for Scene Flow Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_IHNet_Iterative_Hierarchical_Network_Guided_by_High-Resolution_Estimated_Information_for_ICCV_2023_paper.pdf)



## 57.Semantic Scene Completion(语义场景补全)

* [NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space](http://arxiv.org/abs/2309.14616v1)
:star:[code](https://jiawei-yao0812.github.io/NDC-Scene/)
:star:[code](https://github.com/Jiawei-Yao0812/NDCScene)

* [DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DDIT_Semantic_Scene_Completion_via_Deformable_Deep_Implicit_Templates_ICCV_2023_paper.pdf)

* [CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion](http://arxiv.org/abs/2307.07938)

* [Learning Long-Range Information with Dual-Scale Transformers for Indoor Scene Completion](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Learning_Long-Range_Information_with_Dual-Scale_Transformers_for_Indoor_Scene_Completion_ICCV_2023_paper.pdf)



## 56.NAS

* [ShiftNAS: Improving One-shot NAS via Probability Shift](http://arxiv.org/abs/2307.08300)

* [ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation](http://arxiv.org/abs/2011.11233)

* [Extensible and Efficient Proxy for Neural Architecture Search](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Extensible_and_Efficient_Proxy_for_Neural_Architecture_Search_ICCV_2023_paper.pdf)

* [MixPath: A Unified Approach for One-shot Neural Architecture Search](http://arxiv.org/abs/2001.05887)

* [Unleashing the Power of Gradient Signal-to-Noise Ratio for Zero-Shot NAS](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Unleashing_the_Power_of_Gradient_Signal-to-Noise_Ratio_for_Zero-Shot_NAS_ICCV_2023_paper.pdf)



## 55.sound(语音)

* [Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video](http://arxiv.org/abs/2309.04814)

* [MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition](http://arxiv.org/abs/2303.05309)

* [Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Be_Everywhere_-_Hear_Everything_BEE_Audio_Scene_Reconstruction_by_ICCV_2023_paper.pdf)

* [On the Audio-visual Synchronization for Lip-to-Speech Synthesis](http://arxiv.org/abs/2303.00502)

* [Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing](https://openaccess.thecvf.com/content/ICCV2023/papers/Rachavarapu_Boosting_Positive_Segments_for_Weakly-Supervised_Audio-Visual_Video_Parsing_ICCV_2023_paper.pdf)

* [DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding](http://arxiv.org/abs/2308.07787v1)

* [Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation](http://arxiv.org/abs/2308.10306v1)

* [Sound Source Localization is All about Cross-Modal Alignment](http://arxiv.org/abs/2309.10724v1)

* [Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping](http://arxiv.org/abs/2308.06112)

* 去混响

  * [AdVerb: Visually Guided Audio Dereverberation](http://arxiv.org/abs/2308.12370v1)
:house:[project](https://gamma.umd.edu/researchdirections/speech/adverb)

* 唇语识别

  * [Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge](http://arxiv.org/abs/2308.09311v1)

* 音频-视频生成

  * [The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion](http://arxiv.org/abs/2309.04509v1)
:star:[code](https://ku-vai.github.io/TPoS/)

* 音视觉导航

  * [Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation](http://arxiv.org/abs/2308.10306)

* 声音定位

  * [Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation](http://arxiv.org/abs/2303.11329)

* 视听分割

  * [Multimodal Variational Auto-encoder based Audio-Visual Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Mao_Multimodal_Variational_Auto-encoder_based_Audio-Visual_Segmentation_ICCV_2023_paper.pdf)



## 54.Gaze Estimation

* [DVGaze: Dual-View Gaze Estimation](http://arxiv.org/abs/2308.10310v1)
:star:[code](https://github.com/yihuacheng/DVGaze)



## 53.Computed Imaging(计算成像，如光学、几何、光场成像等)

* [Event Camera Data Pre-training](http://arxiv.org/abs/2301.01928)

* [Deep Geometrized Cartoon Line Inbetweening](https://openaccess.thecvf.com/content/ICCV2023/papers/Siyao_Deep_Geometrized_Cartoon_Line_Inbetweening_ICCV_2023_paper.pdf)

* [Aperture Diffraction for Compact Snapshot Spectral Imaging](https://openaccess.thecvf.com/content/ICCV2023/papers/Lv_Aperture_Diffraction_for_Compact_Snapshot_Spectral_Imaging_ICCV_2023_paper.pdf)

* [Examining Autoexposure for Challenging Scenes](http://arxiv.org/abs/2309.04542v1)

* [Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction](http://arxiv.org/abs/2308.10694v1)
:star:[code](https://github.com/cvg/VP-Estimation-with-Prior-Gravity)

* [Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes](http://arxiv.org/abs/2309.08588v1)
:house:[project](https://fabiendelattre.com/robust-rotation-estimation)

* [Exploring Positional Characteristics of Dual-Pixel Data for Camera Autofocus](https://openaccess.thecvf.com/content/ICCV2023/papers/Choi_Exploring_Positional_Characteristics_of_Dual-Pixel_Data_for_Camera_Autofocus_ICCV_2023_paper.pdf)

* [Enhancing Non-line-of-sight Imaging via Learnable Inverse Kernel and Attention Mechanisms](https://openaccess.thecvf.com/content/ICCV2023/papers/Yu_Enhancing_Non-line-of-sight_Imaging_via_Learnable_Inverse_Kernel_and_Attention_Mechanisms_ICCV_2023_paper.pdf)

* [On the Robustness of Normalizing Flows for Inverse Problems in Imaging](http://arxiv.org/abs/2212.04319)

* [Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation](http://arxiv.org/abs/2304.05669)

* 光场

  * [NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_NeILF_Inter-Reflectable_Light_Fields_for_Geometry_and_Material_Estimation_ICCV_2023_paper.pdf)

* 相机姿势估计

  * [Multi-body Depth and Camera Pose Estimation from Multiple Views](https://openaccess.thecvf.com/content/ICCV2023/papers/Dal_Cin_Multi-body_Depth_and_Camera_Pose_Estimation_from_Multiple_Views_ICCV_2023_paper.pdf)



## 52.View Synthesis(视图合成)

* [Forward Flow for Novel View Synthesis of Dynamic Scenes](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Forward_Flow_for_Novel_View_Synthesis_of_Dynamic_Scenes_ICCV_2023_paper.pdf)

* [iVS-Net: Learning Human View Synthesis from Internet Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_iVS-Net_Learning_Human_View_Synthesis_from_Internet_Videos_ICCV_2023_paper.pdf)

* [Multi-task View Synthesis with Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Zheng_Multi-task_View_Synthesis_with_Neural_Radiance_Fields_ICCV_2023_paper.pdf)

* [IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis](http://arxiv.org/abs/2210.00647)

* [Generative Novel View Synthesis with 3D-Aware Diffusion Models](http://arxiv.org/abs/2304.02602)

* [SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis](http://arxiv.org/abs/2303.16196)

* [Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_Total-Recon_Deformable_Scene_Reconstruction_for_Embodied_View_Synthesis_ICCV_2023_paper.pdf)

* [Neural LiDAR Fields for Novel View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_Neural_LiDAR_Fields_for_Novel_View_Synthesis_ICCV_2023_paper.pdf)

* [LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference](http://arxiv.org/abs/2307.12217v1)

* [Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis](http://arxiv.org/abs/2308.02840v1)
:star:[code](https://w-ted.github.io/publications/udc-nerf)

* [Efficient View Synthesis with Neural Radiance Distribution Field](http://arxiv.org/abs/2308.11130v1)

* [NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes](http://arxiv.org/abs/2308.12967v1)
:star:[code](https://zubair-irshad.github.io/projects/neo360.html)

* [PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis](https://openaccess.thecvf.com/content/ICCV2023/papers/Ying_PARF_Primitive-Aware_Radiance_Fusion_for_Indoor_Scene_Novel_View_Synthesis_ICCV_2023_paper.pdf)

* [Urban Radiance Field Representation with Deformable Neural Mesh Primitives](http://arxiv.org/abs/2307.10776v1)
:house:[project](https://dnmp.github.io/)

* [FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis](http://arxiv.org/abs/2306.17723)

* [SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image](http://arxiv.org/abs/2309.06323)

* [A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction](http://arxiv.org/abs/2301.06782)

* [Cross-Ray Neural Radiance Fields for Novel-View Synthesis from Unconstrained Image Collections](http://arxiv.org/abs/2307.08093)

* [Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models](http://arxiv.org/abs/2304.10700)

* [NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_NEMTO_Neural_Environment_Matting_for_Novel_View_and_Relighting_Synthesis_ICCV_2023_paper.pdf)



## 51.Visual Place Recognition

* [CASSPR: Cross Attention Single Scan Place Recognition](http://arxiv.org/abs/2211.12542)

* [EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition](http://arxiv.org/abs/2308.10832v1)
:star:[code](https://github.com/gmberton/auto_VPR)
:star:[code](https://github.com/gmberton/EigenPlaces)

* [CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition](http://arxiv.org/abs/2303.17778)

* [BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Luo_BEVPlace_Learning_LiDAR-based_Place_Recognition_using_Birds_Eye_View_Images_ICCV_2023_paper.pdf)



## 50.Image Matching(图像匹配)

* [OccNet: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions](https://openaccess.thecvf.com/content/ICCV2023/papers/Fan_Occ2Net_Robust_Image_Matching_Based_on_3D_Occupancy_Estimation_for_ICCV_2023_paper.pdf)
:thumbsup:[ICCV 2023|Occ2Net，一种基于3D 占据估计的有效且稳健的带有遮挡区域的图像匹配方法](https://mp.weixin.qq.com/s/mbk5tnYlzCLOg4_xfyKRyA)

* [Scene-Aware Feature Matching](http://arxiv.org/abs/2308.09949v1)

* [Improving Transformer-based Image Matching by Cascaded Capturing Spatially Informative Keypoints](http://arxiv.org/abs/2303.02885)

* [GlueStick: Robust Image Matching by Sticking Points and Lines Together](https://openaccess.thecvf.com/content/ICCV2023/papers/Pautrat_GlueStick_Robust_Image_Matching_by_Sticking_Points_and_Lines_Together_ICCV_2023_paper.pdf)

* [Grounded Image Text Matching with Mismatched Relation Reasoning](http://arxiv.org/abs/2308.01236)

* [Graph Matching with Bi-level Noisy Correspondence](http://arxiv.org/abs/2212.04085)

* 特征匹配

  * [Guiding Local Feature Matching with Surface Curvature](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Guiding_Local_Feature_Matching_with_Surface_Curvature_ICCV_2023_paper.pdf)



## 49.Image Fusion(图像融合)

* [DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion](http://arxiv.org/abs/2303.06840)

* [MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion](http://arxiv.org/abs/2309.11847)

* [Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion](http://arxiv.org/abs/2308.16083v1)
:star:[code](https://manman1995.github.io/)

* [Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Degradation-Resistant_Unfolding_Network_for_Heterogeneous_Image_Fusion_ICCV_2023_paper.pdf)

* [Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-spectral Image Fusion](http://arxiv.org/abs/2308.16083)

* [UniFusion: Unified Multi-View Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View](https://openaccess.thecvf.com/content/ICCV2023/papers/Qin_UniFusion_Unified_Multi-View_Fusion_Transformer_for_Spatial-Temporal_Representation_in_Birds-Eye-View_ICCV_2023_paper.pdf)

* [Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion](http://arxiv.org/abs/2302.01392)



## 48.Image Reconstruction

* [Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction](http://arxiv.org/abs/2308.10820v1)
:star:[code](https://github.com/MyuLi/PADUT)

* [RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image](http://arxiv.org/abs/2309.02020v1)

* [Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction](http://arxiv.org/abs/2309.03900v1)
:star:[code](https://skchen1993.github.io/CEVR_web/)



## 47.Image-to-Image Translation

* [General Image-to-Image Translation with One-Shot Image Guidance](http://arxiv.org/abs/2307.14352v1)
:star:[code](https://github.com/CrystalNeuro/visual-concept-translator)

* [Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation](http://arxiv.org/abs/2308.12968v1)
:star:[code](https://github.com/Yuxinn-J/Scenimefy)
:star:[code](https://yuxinn-j.github.io/projects/Scenimefy.html)

* [UGC: Unified GAN Compression for Efficient Image-to-Image Translation](http://arxiv.org/abs/2309.09310)



## 46.Edge Detection

* [Tiny and Efficient Model for the Edge Detection Generalization](http://arxiv.org/abs/2308.06468v1)
:star:[code](https://github.com/xavysp/TEED)



## 45.Scene Graph Generation(场景图合成)

* [SGAligner: 3D Scene Alignment with Scene Graphs](http://arxiv.org/abs/2304.14880)

* [Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation](http://arxiv.org/abs/2308.03282v1)

* [Compositional Feature Augmentation for Unbiased Scene Graph Generation](http://arxiv.org/abs/2308.06712v1)

* [Vision Relation Transformer for Unbiased Scene Graph Generation](http://arxiv.org/abs/2308.09472v1)

* [TextPSG: Panoptic Scene Graph Generation from Textual Descriptions](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_TextPSG_Panoptic_Scene_Graph_Generation_from_Textual_Descriptions_ICCV_2023_paper.pdf)

* [HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation](http://arxiv.org/abs/2303.15994)

* [Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World](http://arxiv.org/abs/2303.13233)

* [Visual Traffic Knowledge Graph Generation from Scene Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Visual_Traffic_Knowledge_Graph_Generation_from_Scene_Images_ICCV_2023_paper.pdf)



## 44.Rendering(渲染)

* [LiveHand: Real-time and Photorealistic Neural Hand Rendering](http://arxiv.org/abs/2302.07672)

* [NeMF: Inverse Volume Rendering with Neural Microflake Field](http://arxiv.org/abs/2304.00782)

* [ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs](http://arxiv.org/abs/2304.14401)

* [HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation](http://arxiv.org/abs/2308.10122v1)

* [DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering](https://arxiv.org/abs/2307.10173)
:house:[project](https://dna-rendering.github.io/)

* [S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields](http://arxiv.org/abs/2308.07032v1)
:star:[code](https://github.com/Madaoer/S3IM)
:thumbsup:[ICCV 2023 | NeRF 提点的 Magic Loss —— S3IM 随机结构相似性](https://mp.weixin.qq.com/s/w5IUykx6_-7lBE_2r_Onkg)

* [TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering](http://arxiv.org/abs/2307.12291v1)
:star:[code](https://pansanity666.github.io/TransHuman/)

* [Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields](http://arxiv.org/abs/2307.11335v1)
:star:[code](https://wbhu.github.io/projects/Tri-MipRF)

* [Rendering Humans from Object-Occluded Monocular Videos](http://arxiv.org/abs/2308.04622v1)
:house:[project](https://cs.stanford.edu/~xtiange/projects/occnerf/)

* [ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering](http://arxiv.org/abs/2305.02103)

* [DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_DNA-Rendering_A_Diverse_Neural_Actor_Repository_for_High-Fidelity_Human-Centric_Rendering_ICCV_2023_paper.pdf)

* [Neural Microfacet Fields for Inverse Rendering](http://arxiv.org/abs/2303.17806)

* [3D-aware Blending with Generative NeRFs](http://arxiv.org/abs/2302.06608)



## 43.Neural Radiance Fields

* [Instance Neural Radiance Field](http://arxiv.org/abs/2304.04395)

* [Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Gao_Adaptive_Positional_Encoding_for_Bundle-Adjusting_Neural_Radiance_Fields_ICCV_2023_paper.pdf)

* [FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models](http://arxiv.org/abs/2303.12786)

* [NerfAcc: Efficient Sampling Accelerates NeRFs](http://arxiv.org/abs/2305.04966)

* [MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Kaneko_MIMO-NeRF_Fast_Neural_Rendering_with_Multi-input_Multi-output_Neural_Radiance_Fields_ICCV_2023_paper.pdf)

* [UHDNeRF: Ultra-High-Definition Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_UHDNeRF_Ultra-High-Definition_Neural_Radiance_Fields_ICCV_2023_paper.pdf)

* [Deformable Neural Radiance Fields using RGB and Event Cameras](http://arxiv.org/abs/2309.08416)

* [Learning Neural Implicit Surfaces with Object-Aware Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Learning_Neural_Implicit_Surfaces_with_Object-Aware_Radiance_Fields_ICCV_2023_paper.pdf)

* [ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field](http://arxiv.org/abs/2211.13226)

* [HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video](http://arxiv.org/abs/2304.12281)

* [ReNeRF: Relightable Neural Radiance Fields with Nearfield Lighting](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_ReNeRF_Relightable_Neural_Radiance_Fields_with_Nearfield_Lighting_ICCV_2023_paper.pdf)

* [E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images](https://openaccess.thecvf.com/content/ICCV2023/papers/Qi_E2NeRF_Event_Enhanced_Neural_Radiance_Fields_from_Blurry_Images_ICCV_2023_paper.pdf)

* [Neural Fields for Structured Lighting](https://openaccess.thecvf.com/content/ICCV2023/papers/Shandilya_Neural_Fields_for_Structured_Lighting_ICCV_2023_paper.pdf)

* [NeRF-MS: Neural Radiance Fields with Multi-Sequence](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_NeRF-MS_Neural_Radiance_Fields_with_Multi-Sequence_ICCV_2023_paper.pdf)

* [StegaNeRF: Embedding Invisible Information within Neural Radiance Fields](http://arxiv.org/abs/2212.01602)

* [SHERF: Generalizable Human NeRF from a Single Image](http://arxiv.org/abs/2303.12791)

* [DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_DeformToon3D_Deformable_Neural_Radiance_Fields_for_3D_Toonification_ICCV_2023_paper.pdf)

* [Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs](http://arxiv.org/abs/2304.10532)

* [Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra](https://openaccess.thecvf.com/content/ICCV2023/papers/Kulhanek_Tetra-NeRF_Representing_Neural_Radiance_Fields_Using_Tetrahedra_ICCV_2023_paper.pdf)

* [Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields](https://openaccess.thecvf.com/content/ICCV2023/papers/Barron_Zip-NeRF_Anti-Aliased_Grid-Based_Neural_Radiance_Fields_ICCV_2023_paper.pdf)

* [NeRFrac: Neural Radiance Fields through Refractive Surface](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhan_NeRFrac_Neural_Radiance_Fields_through_Refractive_Surface_ICCV_2023_paper.pdf)

* [MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos](http://arxiv.org/abs/2212.13056)

* [Reference-guided Controllable Inpainting of Neural Radiance Fields](http://arxiv.org/abs/2304.09677)

* [DeLiRa: Self-Supervised Depth, Light, and Radiance Fields](http://arxiv.org/abs/2304.02797)

* [Neural Radiance Field with LiDAR maps](https://openaccess.thecvf.com/content/ICCV2023/papers/Chang_Neural_Radiance_Field_with_LiDAR_maps_ICCV_2023_paper.pdf)

* [Dynamic Mesh-Aware Radiance Fields](http://arxiv.org/abs/2309.04581v1)

* [Locally Stylized Neural Radiance Fields](http://arxiv.org/abs/2309.10684v1)

* [Generalizable Neural Fields as Partially Observed Neural Processes](http://arxiv.org/abs/2309.06660v1)

* [DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields](http://arxiv.org/abs/2309.04410v1)
:house:[project](https://www.mmlab-ntu.com/project/deformtoon3d/)
:star:[code](https://github.com/junzhezhang/DeformToon3D)

* [Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion](http://arxiv.org/abs/2309.08596v1)
:star:[code](https://wengflow.github.io/robust-e-nerf)

* [Pose-Free Neural Radiance Fields via Implicit Pose Regularization](http://arxiv.org/abs/2308.15049v1)

* [Canonical Factors for Hybrid Neural Fields](http://arxiv.org/abs/2308.15461v1)
:star:[code](https://brentyi.github.io/tilted/)

* [Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor](http://arxiv.org/abs/2308.14383v1)
:star:[code](https://zju3dv.github.io/tof_slam/)

* [Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields](http://arxiv.org/abs/2308.11974v1)

* [Strata-NeRF : Neural Radiance Fields for Stratified Scenes](http://arxiv.org/abs/2308.10337v1)
:star:[code](https://ankitatiisc.github.io/Strata-NeRF/)

* [DReg-NeRF: Deep Registration for Neural Radiance Fields](http://arxiv.org/abs/2308.09386v1)
:star:[code](https://github.com/AIBluefisher/DReg-NeRF)

* [Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields](http://arxiv.org/abs/2307.15131v1)
:house:[project](https://windingwind.github.io/seal-3d/)
:star:[code](https://github.com/windingwind/seal-3d/)

* [WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields](http://arxiv.org/abs/2308.04826v1)

* [UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields](http://arxiv.org/abs/2303.14167)

* [LERF: Language Embedded Radiance Fields](http://arxiv.org/abs/2303.09553)

* [Strivec: Sparse Tri-Vector Radiance Fields](http://arxiv.org/abs/2307.13226)

* [Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering](http://arxiv.org/abs/2304.10075)



## 42.Dataset/Benchmark

* 数据集

  * [Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds](https://arxiv.org/pdf/2307.11914.pdf)
:sunflower:[dataset](https://building3d.ucalgary.ca/#)
:thumbsup:[ICCV2023 首个城市级别的基于航空点云的房屋建模数据集 Building3D](https://mp.weixin.qq.com/s/gKFByZ8ud2aNlG7C7t2-2Q)

  * [LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_LoTE-Animal_A_Long_Time-span_Dataset_for_Endangered_Animal_Behavior_Understanding_ICCV_2023_paper.pdf)

  * [Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Bolduc_Beyond_the_Pixel_a_Photometrically_Calibrated_HDR_Dataset_for_Luminance_ICCV_2023_paper.pdf)

  * [Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Atmospheric_Transmission_and_Thermal_Inertia_Induced_Blind_Road_Segmentation_with_ICCV_2023_paper.pdf)

  * [H3WB: Human3.6M 3D WholeBody Dataset and Benchmark](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_H3WB_Human3.6M_3D_WholeBody_Dataset_and_Benchmark_ICCV_2023_paper.pdf)

  * [V3Det: Vast Vocabulary Visual Detection Dataset](http://arxiv.org/abs/2304.03752)

  * [HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_HoloAssist_an_Egocentric_Human_Interaction_Dataset_for_Interactive_AI_Assistants_ICCV_2023_paper.pdf)

  * [Zenseact Open Dataset: A Large-Scale and Diverse Multimodal Dataset for Autonomous Driving](http://arxiv.org/abs/2305.02008)

  * [FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods](http://arxiv.org/abs/2308.06248)

  * [Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Lee_Lecture_Presentations_Multimodal_Dataset_Towards_Understanding_Multimodality_in_Educational_Videos_ICCV_2023_paper.pdf)

  * [RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation](https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_RealGraph_A_Multiview_Dataset_for_4D_Real-world_Context_Graph_Generation_ICCV_2023_paper.pdf)

  * [Video Background Music Generation: Dataset, Method and Evaluation](http://arxiv.org/abs/2211.11248)

  * [Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks](https://openaccess.thecvf.com/content/ICCV2023/papers/He_Thinking_Image_Color_Aesthetics_Assessment_Models_Datasets_and_Benchmarks_ICCV_2023_paper.pdf)

  * [Snow Removal in Video: A New Dataset and A Novel Method](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Snow_Removal_in_Video_A_New_Dataset_and_A_Novel_ICCV_2023_paper.pdf)

  * [SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes](http://arxiv.org/abs/2304.05170)

  * [EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes](http://arxiv.org/abs/2307.07961)

  * [DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners](http://arxiv.org/abs/2309.03483)

  * [PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking](http://arxiv.org/abs/2307.15055)

  * [SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling](http://arxiv.org/abs/2303.17368)

  * [MOSE: A New Dataset for Video Object Segmentation in Complex Scenes](http://arxiv.org/abs/2302.01872)

  * [Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning](http://arxiv.org/abs/2303.12745)

  * [3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_3DMiner_Discovering_Shapes_from_Large-Scale_Unannotated_Image_Datasets_ICCV_2023_paper.pdf)

  * [MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_MatrixCity_A_Large-scale_City_Dataset_for_City-scale_Neural_Rendering_and_ICCV_2023_paper.pdf)

  * [LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark](http://arxiv.org/abs/2308.09618v1)
:star:[code](https://lojzezust.github.io/lars-dataset)

  * [EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding](http://arxiv.org/abs/2309.08816v1)
:star:[code](https://github.com/facebookresearch/EgoObjects)

  * [Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations](http://arxiv.org/abs/2309.01858v1)
:house:[project](https://cmp.felk.cvut.cz/univ_emb/)

  * [High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net](http://arxiv.org/abs/2308.14221)

  * [ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes](http://arxiv.org/abs/2308.11417v1)
:house:[project](https://youtu.be/E6P9e2r6M8I)
:star:[code](https://cy94.github.io/scannetpp/)

  * [Learning Optical Flow from Event Camera with Rendered Dataset](https://arxiv.org/abs/2303.11011)

  * [Efficient neural supersampling on a novel gaming dataset](http://arxiv.org/abs/2308.01483v1)

  * [AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception](http://arxiv.org/abs/2307.13933v1)
:star:[code](https://github.com/ydk122024/AIDE)

  * [360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking](http://arxiv.org/abs/2307.14630v1)
:house:[project](https://360vot.hkustvgd.com)
:star:[code](https://github.com/HuajianUP/360VOT)全向视觉目标跟踪

  * [Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning](http://arxiv.org/abs/2308.13411v1)
:house:[project](https://ophai.hms.harvard.edu/datasets/harvard-gdp1000)

  * [FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Khan_FishNet_A_Large-scale_Dataset_and_Benchmark_for_Fish_Recognition_Detection_ICCV_2023_paper.pdf)

* 基准

  * [Towards Real-world Burst Image Super-Resolution: Benchmark and Method](https://arxiv.org/abs/2309.04803)
:star:[code](https://github.com/yjsunnn/FBANet)
:thumbsup:[ICCV2023 ｜FBANet：迈向真实世界的多帧超分](https://mp.weixin.qq.com/s/JN-5d_Ujak3YDcGM6ZZjPw)

  * [SQAD: Automatic Smartphone Camera Quality Assessment and Benchmarking](https://openaccess.thecvf.com/content/ICCV2023/papers/Fang_SQAD_Automatic_Smartphone_Camera_Quality_Assessment_and_Benchmarking_ICCV_2023_paper.pdf)

  * [ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes](http://arxiv.org/abs/2304.04321)

  * [Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception](http://arxiv.org/abs/2306.06362)

  * [From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal](http://arxiv.org/abs/2308.03867v1)

  * [ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour](https://openaccess.thecvf.com/content/ICCV2023/papers/Tafasca_ChildPlay_A_New_Benchmark_for_Understanding_Childrens_Gaze_Behaviour_ICCV_2023_paper.pdf)

  * [PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking](http://arxiv.org/abs/2303.07625)

  * [OmniLabel: A Challenging Benchmark for Language-Based Object Detection](http://arxiv.org/abs/2304.11463)

  * [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](http://arxiv.org/abs/2303.03991)

  * [HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Bakr_HRS-Bench_Holistic_Reliable_and_Scalable_Benchmark_for_Text-to-Image_Models_ICCV_2023_paper.pdf)

  * [Beyond Object Recognition: A New Benchmark towards Object Concept Learning](http://arxiv.org/abs/2212.02710)

  * [ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose via An Indirect Recording Solution](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_ClothPose_A_Real-world_Benchmark_for_Visual_Analysis_of_Garment_Pose_ICCV_2023_paper.pdf)

  * [REAP: A Large-Scale Realistic Adversarial Patch Benchmark](http://arxiv.org/abs/2212.05680)

  * [Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events](https://openaccess.thecvf.com/content/ICCV2023/papers/Ong_Chaotic_World_A_Large_and_Challenging_Benchmark_for_Human_Behavior_ICCV_2023_paper.pdf)

  * [Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images](http://arxiv.org/abs/2303.07274)

  * [FACET: Fairness in Computer Vision Evaluation Benchmark](http://arxiv.org/abs/2309.00035)

  * [A Benchmark for Chinese-English Scene Text Image Super-Resolution](http://arxiv.org/abs/2308.03262)

  * [Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark](https://openaccess.thecvf.com/content/ICCV2023/papers/Khirodkar_Ego-Humans_An_Ego-Centric_3D_Multi-Human_Benchmark_ICCV_2023_paper.pdf)

  * [Towards Real-World Burst Image Super-Resolution: Benchmark and Method](http://arxiv.org/abs/2309.04803v1)
:star:[code](https://github.com/yjsunnn/FBANet)

  * [COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts](http://arxiv.org/abs/2307.12730v1)
:star:[code](https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o)

  * [Dancing in the Dark: A Benchmark towards General Low-light Video Enhancement](https://openaccess.thecvf.com/content/ICCV2023/papers/Fu_Dancing_in_the_Dark_A_Benchmark_towards_General_Low-light_Video_ICCV_2023_paper.pdf)

  * [DiLiGenT-Pi: Photometric Stereo for Planar Surfaces with Rich Details - Benchmark Dataset and Beyond](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_DiLiGenT-Pi_Photometric_Stereo_for_Planar_Surfaces_with_Rich_Details_-_ICCV_2023_paper.pdf)

* 方法

  * [Prototype-based Dataset Comparison](http://arxiv.org/abs/2309.02401v1)
:star:[code](https://github.com/Nanne/ProtoSim)



## 41.Vision Transformers

* [Masked Spiking Transformer](http://arxiv.org/abs/2210.01208)

* [Scale-Aware Modulation Meet Transformer](http://arxiv.org/abs/2307.08579)

* [BiViT: Extremely Compressed Binary Vision Transformers](http://arxiv.org/abs/2211.07091)

* [Fcaformer: Forward Cross Attention in Hybrid Vision Transformer](http://arxiv.org/abs/2211.07198)

* [FastViT: A Fast Hybrid Vision Transformer Using Structural Reparameterization](http://arxiv.org/abs/2303.14189)

* [SwinLSTM: Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM](http://arxiv.org/abs/2308.09891)

* [Multimodal High-order Relation Transformer for Scene Boundary Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Wei_Multimodal_High-order_Relation_Transformer_for_Scene_Boundary_Detection_ICCV_2023_paper.pdf)

* [GET: Group Event Transformer for Event-Based Vision](https://openaccess.thecvf.com/content/ICCV2023/papers/Peng_GET_Group_Event_Transformer_for_Event-Based_Vision_ICCV_2023_paper.pdf)

* [DiffRate : Differentiable Compression Rate for Efficient Vision Transformers](http://arxiv.org/abs/2305.17997)

* [Scratching Visual Transformer's Back with Uniform Attention](https://openaccess.thecvf.com/content/ICCV2023/papers/Hyeon-Woo_Scratching_Visual_Transformers_Back_with_Uniform_Attention_ICCV_2023_paper.pdf)

* [Skill Transformer: A Monolithic Policy for Mobile Manipulation](http://arxiv.org/abs/2308.09873)

* [A Multidimensional Analysis of Social Biases in Vision Transformers](http://arxiv.org/abs/2308.01948)

* [Token-Label Alignment for Vision Transformers](http://arxiv.org/abs/2210.06455)

* [Building Vision Transformers with Hierarchy Aware Feature Aggregation](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Building_Vision_Transformers_with_Hierarchy_Aware_Feature_Aggregation_ICCV_2023_paper.pdf)

* [TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching](https://openaccess.thecvf.com/content/ICCV2023/papers/Fu_TripLe_Revisiting_Pretrained_Model_Reuse_and_Progressive_Learning_for_Efficient_ICCV_2023_paper.pdf)

* [DarSwin: Distortion Aware Radial Swin Transformer](https://openaccess.thecvf.com/content/ICCV2023/papers/Athwale_DarSwin_Distortion_Aware_Radial_Swin_Transformer_ICCV_2023_paper.pdf)

* [Robustifying Token Attention for Vision Transformers](http://arxiv.org/abs/2303.11126)

* [FLatten Transformer: Vision Transformer using Focused Linear Attention](http://arxiv.org/abs/2308.00442)

* [Detection Transformer with Stable Matching](http://arxiv.org/abs/2304.04742)

* [LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization](https://openaccess.thecvf.com/content/ICCV2023/papers/Yu_LaPE_Layer-adaptive_Position_Embedding_for_Vision_Transformers_with_Independent_Layer_ICCV_2023_paper.pdf)

* [M2T: Masking Transformers Twice for Faster Decoding](https://openaccess.thecvf.com/content/ICCV2023/papers/Mentzer_M2T_Masking_Transformers_Twice_for_Faster_Decoding_ICCV_2023_paper.pdf)

* [FDViT: Improve the Hierarchical Architecture of Vision Transformer](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_FDViT_Improve_the_Hierarchical_Architecture_of_Vision_Transformer_ICCV_2023_paper.pdf)

* [Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers](http://arxiv.org/abs/2308.10814)

* [Rethinking Vision Transformers for MobileNet Size and Speed](http://arxiv.org/abs/2212.08059)

* [Structure Invariant Transformation for better Adversarial Transferability](http://arxiv.org/abs/2309.14700v1)
:star:[code](https://github.com/xiaosen-wang/SIT)

* [SG-Former: Self-guided Transformer with Evolving Token Reallocation](http://arxiv.org/abs/2308.12216v1)
:star:[code](https://github.com/OliverRensu/SG-Former)

* [Pre-training Vision Transformers with Very Limited Synthesized Images](http://arxiv.org/abs/2307.14710v1)

* [SMMix: Self-Motivated Image Mixing for Vision Transformers](https://arxiv.org/abs/2212.12977)

* [Revisiting Vision Transformer from the View of Path Ensemble](http://arxiv.org/abs/2308.06548v1)

* [SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM](http://arxiv.org/abs/2308.09891v1)
:star:[code](https://github.com/SongTang-x/SwinLSTM)

* [Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers](http://arxiv.org/abs/2308.13494v1)

* [Contrastive Feature Masking Open-Vocabulary Vision Transformer](http://arxiv.org/abs/2309.00775v1)

* [MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer](http://arxiv.org/abs/2309.09067v1)

* [SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_SAL-ViT_Towards_Latency_Efficient_Private_Inference_on_ViT_using_Selective_ICCV_2023_paper.pdf)

* [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](http://arxiv.org/abs/2303.15446)

* [MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention](http://arxiv.org/abs/2211.13955)



## 40.Anomaly Detection(异常检测)

* [Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection](http://arxiv.org/abs/2308.10155v1)

* [Anomaly Detection Under Distribution Shift](http://arxiv.org/abs/2303.13845)

* [Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Unsupervised_Surface_Anomaly_Detection_with_Diffusion_Probabilistic_Model_ICCV_2023_paper.pdf)

* [Anomaly Detection using Score-based Perturbation Resilience](https://openaccess.thecvf.com/content/ICCV2023/papers/Shin_Anomaly_Detection_using_Score-based_Perturbation_Resilience_ICCV_2023_paper.pdf)

* [Remembering Normality: Memory-guided Knowledge Distillation for Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Gu_Remembering_Normality_Memory-guided_Knowledge_Distillation_for_Unsupervised_Anomaly_Detection_ICCV_2023_paper.pdf)

* [Template-guided Hierarchical Feature Restoration for Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Guo_Template-guided_Hierarchical_Feature_Restoration_for_Anomaly_Detection_ICCV_2023_paper.pdf)

* [Inter-Realization Channels: Unsupervised Anomaly Detection Beyond One-Class Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/McIntosh_Inter-Realization_Channels_Unsupervised_Anomaly_Detection_Beyond_One-Class_Classification_ICCV_2023_paper.pdf)

* 图像异常检测

  * [Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection](http://arxiv.org/abs/2308.02983v1)
:star:[code](https://github.com/xcyao00/FOD)

* OOD

  * [CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No](http://arxiv.org/abs/2308.12213v1)
:star:[code](https://github.com/xmed-lab/CLIPN)

  * [Meta OOD Learning for Continuously Adaptive OOD Detection](http://arxiv.org/abs/2309.11705v1)

  * [Simple and Effective Out-of-Distribution Detection via Cosine-based Softmax Loss](https://openaccess.thecvf.com/content/ICCV2023/papers/Noh_Simple_and_Effective_Out-of-Distribution_Detection_via_Cosine-based_Softmax_Loss_ICCV_2023_paper.pdf)

  * [Nearest Neighbor Guidance for Out-of-Distribution Detection](http://arxiv.org/abs/2309.14888v1)
:star:[code](https://github.com/roomo7time/nnguide)

  * [Understanding the Feature Norm for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Park_Understanding_the_Feature_Norm_for_Out-of-Distribution_Detection_ICCV_2023_paper.pdf)

  * [Meta OOD Learning For Continuously Adaptive OOD Detection](http://arxiv.org/abs/2309.11705)

  * [SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection](http://arxiv.org/abs/2208.13930)

  * [Unified Out-Of-Distribution Detection: A Model-Specific Perspective](http://arxiv.org/abs/2304.06813)

  * [Revisit PCA-based Technique for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Guan_Revisit_PCA-based_Technique_for_Out-of-Distribution_Detection_ICCV_2023_paper.pdf)

  * [Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Hierarchical_Visual_Categories_Modeling_A_Joint_Representation_Learning_and_Density_ICCV_2023_paper.pdf)

  * [WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis](http://arxiv.org/abs/2303.07543)



## 39.Keypoint Detection(关键点检测)

* [Neural Interactive Keypoint Detection](http://arxiv.org/abs/2308.10174v1)
:star:[code](https://github.com/IDEA-Research/Click-Pose)

* [3D Implicit Transporter for Temporally Consistent Keypoint Discovery](http://arxiv.org/abs/2309.05098v1)
:star:[code](https://github.com/zhongcl-thu/3D-Implicit-Transporter)



## 38.Vision-Language(视觉语言)

* [Linear Spaces of Meanings: Compositional Structures in Vision-Language Models](http://arxiv.org/abs/2302.14383)

* [ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation](http://arxiv.org/abs/2308.16689)

* [Gloss-Free Sign Language Translation: Improving from Visual-Language Pretraining](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhou_Gloss-Free_Sign_Language_Translation_Improving_from_Visual-Language_Pretraining_ICCV_2023_paper.pdf)

* [SuS-X: Training-Free Name-Only Transfer of Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Udandarao_SuS-X_Training-Free_Name-Only_Transfer_of_Vision-Language_Models_ICCV_2023_paper.pdf)

* [Bayesian Prompt Learning for Image-Language Model Generalization](http://arxiv.org/abs/2210.02390)

* [eP-ALM: Efficient Perceptual Augmentation of Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Shukor_eP-ALM_Efficient_Perceptual_Augmentation_of_Language_Models_ICCV_2023_paper.pdf)

* [Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models](http://arxiv.org/abs/2303.17169)

* [SLAN: Self-Locator Aided Network for Vision-Language Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhai_SLAN_Self-Locator_Aided_Network_for_Vision-Language_Understanding_ICCV_2023_paper.pdf)

* [Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Borrowing_Knowledge_From_Pre-trained_Language_Model_A_New_Data-efficient_Visual_ICCV_2023_paper.pdf)

* [VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching](https://openaccess.thecvf.com/content/ICCV2023/papers/Bi_VL-Match_Enhancing_Vision-Language_Pretraining_with_Token-Level_and_Instance-Level_Matching_ICCV_2023_paper.pdf)

* [A Retrospect to Multi-prompt Learning across Vision and Language](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_A_Retrospect_to_Multi-prompt_Learning_across_Vision_and_Language_ICCV_2023_paper.pdf)

* [CiT: Curation in Training for Effective Vision-Language Data](http://arxiv.org/abs/2301.02241)

* [EgoTV: Egocentric Task Verification from Natural Language Task Descriptions](http://arxiv.org/abs/2303.16975)

* [Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models](http://arxiv.org/abs/2303.06571)

* [Towards Unifying Medical Vision-and-Language Pre-Training via Soft Prompts](http://arxiv.org/abs/2302.08958)

* [Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models](http://arxiv.org/abs/2303.06628)

* [Perceptual Grouping in Contrastive Vision-Language Models](http://arxiv.org/abs/2210.09996)

* [Black Box Few-Shot Adaptation for Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ouali_Black_Box_Few-Shot_Adaptation_for_Vision-Language_Models_ICCV_2023_paper.pdf)

* [CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation](http://arxiv.org/abs/2308.07146)

* [VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_VL-PET_Vision-and-Language_Parameter-Efficient_Tuning_via_Granularity_Control_ICCV_2023_paper.pdf)

* [GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training](http://arxiv.org/abs/2308.11331)

* [I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Gu_I_Cant_Believe_Theres_No_Images_Learning_Visual_Tasks_Using_ICCV_2023_paper.pdf)

* [Too Large; Data Reduction for Vision-Language Pre-Training](http://arxiv.org/abs/2305.20087)

* [Equivariant Similarity for Vision-Language Foundation Models](http://arxiv.org/abs/2303.14465)

* [Going Beyond Nouns With Vision & Language Models Using Synthetic Data](http://arxiv.org/abs/2303.17590)

* [SINC: Self-Supervised In-Context Learning for Vision-Language Tasks](http://arxiv.org/abs/2307.07742)

* [Unified Visual Relationship Detection with Vision and Language Models](http://arxiv.org/abs/2303.08998)

* [ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models](http://arxiv.org/abs/2307.00398)

* [Distilling Large Vision-Language Model with Out-of-Distribution Generalizability](http://arxiv.org/abs/2307.03135)

* [Distribution-Aware Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2309.03406v1)
:star:[code](https://github.com/mlvlab/DAPT)

* [LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models](http://arxiv.org/abs/2309.01155v1)
:star:[code](https://chengshiest.github.io/logo)

* [CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation](http://arxiv.org/abs/2308.15226v1)

* [GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training](http://arxiv.org/abs/2308.11331v1)

* [RLIPv2: Fast Scaling of Relational Language-Image Pre-training](http://arxiv.org/abs/2308.09351v1)
:star:[code](https://github.com/JacobYuan7/RLIPv2)

* [Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models](http://arxiv.org/abs/2307.15049v1)
:star:[code](https://wuw2019.github.io/RMT/)

* [Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?](http://arxiv.org/abs/2307.11978v1)
:star:[code](https://github.com/CEWu/PTNL)

* [Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models](http://arxiv.org/abs/2307.14061v1)
:thumbsup:[ICCV 2023 Oral | 南科大VIP Lab | 针对VLP模型的集合级引导攻击](https://mp.weixin.qq.com/s/bE97oBoa4nH1c5XuOz4WWA)

* [CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation](http://arxiv.org/abs/2308.07146v1)
:star:[code](https://github.com/KevinLight831/CTP)

* [VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control](http://arxiv.org/abs/2308.09804v1)
:star:[code](https://github.com/HenryHZY/VL-PET)

* [Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models](http://arxiv.org/abs/2308.11186v1)

* [VLSlice: Interactive Vision-and-Language Slice Discovery](http://arxiv.org/abs/2309.06703v1)
:house:[project](https://ericslyman.com/vlslice/)

* [What does CLIP know about a red circle? Visual prompt engineering for VLMs](http://arxiv.org/abs/2304.06712)

* [BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization](http://arxiv.org/abs/2307.08504)

* 视觉表示学习

  * [Hallucination Improves the Performance of Unsupervised Visual Representation Learning](http://arxiv.org/abs/2307.12168v1)

  * [Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning](http://arxiv.org/abs/2212.06486)

  * [ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data](http://arxiv.org/abs/2308.11194v1)

* VLN

  * [Learning Vision-and-Language Navigation from YouTube Videos](http://arxiv.org/abs/2307.11984v1)
:star:[code](https://github.com/JeremyLinky/YouTube-VLN)

  * [Learning Navigational Visual Representations with Semantic Map Supervision](http://arxiv.org/abs/2307.12335)

  * [Grounded Entity-Landmark Adaptive Pre-Training for Vision-and-Language Navigation](http://arxiv.org/abs/2308.12587)

  * [GridMM: Grid Memory Map for Vision-and-Language Navigation](http://arxiv.org/abs/2307.12907v1)

  * [Scaling Data Generation in Vision-and-Language Navigation](http://arxiv.org/abs/2307.15644v1)

  * [Bird's-Eye-View Scene Graph for Vision-Language Navigation](http://arxiv.org/abs/2308.04758v1)
:star:[code](https://github.com/DefaultRui/BEV-Scene-Graph)

  * [AerialVLN: Vision-and-Language Navigation for UAVs](http://arxiv.org/abs/2308.06735v1)
:star:[code](https://github.com/AirVLN/AirVLN)

  * [DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation](http://arxiv.org/abs/2308.07498v1)
:star:[code](https://github.com/hanqingwangai/Dreamwalker)

  * [VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation](http://arxiv.org/abs/2308.10172v1)

  * [March in Chat: Interactive Prompting for Remote Embodied Referring Expression](http://arxiv.org/abs/2308.10141v1)

  * [Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation](http://arxiv.org/abs/2308.12587v1)

  * [BEVBert: Multimodal Map Pre-training for Language-guided Navigation](https://arxiv.org/pdf/2212.04385.pdf)
:star:[code](https://github.com/MarSaKi/VLN-BEVBert)

* Visual Grounding(视觉定位)

  * [ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding](http://arxiv.org/abs/2303.16894)

  * [Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding](http://arxiv.org/abs/2307.09267)

  * [Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Confidence-aware_Pseudo-label_Learning_for_Weakly_Supervised_Visual_Grounding_ICCV_2023_paper.pdf)

* Video-Language

  * [HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training](http://arxiv.org/abs/2212.14546) 

  * [Learning Trajectory-Word Alignments for Video-Language Tasks](http://arxiv.org/abs/2301.01953)

  * [HiVLP: Hierarchical Interactive Video-Language Pre-Training](https://openaccess.thecvf.com/content/ICCV2023/papers/Shao_HiVLP_Hierarchical_Interactive_Video-Language_Pre-Training_ICCV_2023_paper.pdf)

  * [Verbs in Action: Improving Verb Understanding in Video-Language Models](http://arxiv.org/abs/2304.06708)

  * [Exploring Temporal Concurrency for Video-Language Representation Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Exploring_Temporal_Concurrency_for_Video-Language_Representation_Learning_ICCV_2023_paper.pdf)

  * [EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone](http://arxiv.org/abs/2307.05463)

  * [SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-Training](http://arxiv.org/abs/2211.11446)

* 视觉推理

  * [ViperGPT: Visual Inference via Python Execution for Reasoning](https://openaccess.thecvf.com/content/ICCV2023/papers/Suris_ViperGPT_Visual_Inference_via_Python_Execution_for_Reasoning_ICCV_2023_paper.pdf) 

* LLM

  * [LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Song_LLM-Planner_Few-Shot_Grounded_Planning_for_Embodied_Agents_with_Large_Language_ICCV_2023_paper.pdf)

  * [Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts](http://arxiv.org/abs/2308.11793v1)
:star:[code](https://github.com/VITA-Group/GNT-MOVE)



## 37.Object Pose Estimation(物体姿势估计)

* [IST-Net: Prior-Free Category-Level Pose Estimation with Implicit Space Transformation](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_IST-Net_Prior-Free_Category-Level_Pose_Estimation_with_Implicit_Space_Transformation_ICCV_2023_paper.pdf)

* [LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_LU-NeRF_Scene_and_Pose_Estimation_by_Synchronizing_Local_Unposed_NeRFs_ICCV_2023_paper.pdf)

* [PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment](http://arxiv.org/abs/2306.15667)

* [Nonrigid Object Contact Estimation With Regional Unwrapping Transformer](http://arxiv.org/abs/2308.14074v1)

* 6D

  * [Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation](http://arxiv.org/abs/2308.05438v1)

  * [SOCS: Semantically-Aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations](http://arxiv.org/abs/2303.10346)

  * [Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation](http://arxiv.org/abs/2303.11516)

  * [Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation](http://arxiv.org/abs/2308.10016v1)

  * [Center-Based Decoupled Point-cloud Registration for 6D Object Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Jiang_Center-Based_Decoupled_Point-cloud_Registration_for_6D_Object_Pose_Estimation_ICCV_2023_paper.pdf)

  * [Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Query6DoF_Learning_Sparse_Queries_as_Implicit_Shape_Prior_for_Category-Level_ICCV_2023_paper.pdf)

  * [VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations](http://arxiv.org/abs/2308.09916v1)
:star:[code](https://github.com/JiehongLin/VI-Net)

  * [Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Learning_Symmetry-Aware_Geometry_Correspondences_for_6D_Object_Pose_Estimation_ICCV_2023_paper.pdf)

  * [3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation](http://arxiv.org/abs/2302.03744)

* 物体计数

  * [STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning](http://arxiv.org/abs/2308.10468v1)
:star:[code](https://github.com/taohan10200/STEERER)

  * [Interactive Class-Agnostic Object Counting](http://arxiv.org/abs/2309.05277)

  * [A Low-Shot Object Counting Network With Iterative Prototype Adaptation](https://openaccess.thecvf.com/content/ICCV2023/papers/Dukic_A_Low-Shot_Object_Counting_Network_With_Iterative_Prototype_Adaptation_ICCV_2023_paper.pdf)

* 动物姿势估计

  * [Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape](http://arxiv.org/abs/2308.11737)



## 36.Vision Question Answering(视觉问答)

* [Toward Unsupervised Realistic Visual Question Answering](http://arxiv.org/abs/2303.05068)

* [Variational Causal Inference Network for Explanatory Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Xue_Variational_Causal_Inference_Network_for_Explanatory_Visual_Question_Answering_ICCV_2023_paper.pdf)

* [VQA Therapy: Exploring Answer Differences by Visually Grounding Answers](http://arxiv.org/abs/2308.11662)

* [VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_VQA-GNN_Reasoning_with_Multimodal_Knowledge_via_Graph_Neural_Networks_for_ICCV_2023_paper.pdf)

* [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](http://arxiv.org/abs/2303.11897)

* [Encyclopedic VQA: Visual Questions About Detailed Properties of Fine-Grained Categories](http://arxiv.org/abs/2306.09224)

* [PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_PromptCap_Prompt-Guided_Image_Captioning_for_VQA_with_GPT-3_ICCV_2023_paper.pdf)

* [Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Qian_Decouple_Before_Interact_Multi-Modal_Prompt_Learning_for_Continual_Visual_Question_ICCV_2023_paper.pdf)

* Video-QA

  * [Discovering Spatio-Temporal Rationales for Video Question Answering](http://arxiv.org/abs/2307.12058v1)
:star:[code](https://github.com/yl3800/TranSTR)

  * [Knowledge Proxy Intervention for Deconfounded Video Question Answering](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Knowledge_Proxy_Intervention_for_Deconfounded_Video_Question_Answering_ICCV_2023_paper.pdf)

  * [Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer](http://arxiv.org/abs/2308.08414v1)

  * [Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models](http://arxiv.org/abs/2308.09363v1)
:star:[code](https://github.com/mlvlab/OVQA)

  * [Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Tem-Adapter_Adapting_Image-Text_Pretraining_for_Video_Question_Answer_ICCV_2023_paper.pdf)

* 视频意图推理

  * [IntentQA: Context-aware Video Intent Reasoning](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_IntentQA_Context-aware_Video_Intent_Reasoning_ICCV_2023_paper.pdf)



## 35.Human Motion Prediction(人体运动预测)

* [Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction](http://arxiv.org/abs/2308.08942v1)
:star:[code](https://github.com/MediaBrain-SJTU/AuxFormer)

* [Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders](http://arxiv.org/abs/2308.09882v1)
:star:[code](https://github.com/jchengai/forecast-mae)

* [Priority-Centric Human Motion Generation in Discrete Latent Space](http://arxiv.org/abs/2308.14480v1)

* [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://openaccess.thecvf.com/content/ICCV2023/papers/Seff_MotionLM_Multi-Agent_Motion_Forecasting_as_Language_Modeling_ICCV_2023_paper.pdf)

* [HumanMAC: Masked Motion Completion for Human Motion Prediction](http://arxiv.org/abs/2302.03665)

* [Joint-Relation Transformer for Multi-Person Motion Prediction](http://arxiv.org/abs/2308.04808)

* [Bootstrap Motion Forecasting With Self-Consistent Constraints](http://arxiv.org/abs/2204.05859)

* [PhysDiff: Physics-Guided Human Motion Diffusion Model](http://arxiv.org/abs/2212.02500)

* [AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism](http://arxiv.org/abs/2309.00796)

* [BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction](http://arxiv.org/abs/2211.14304)

* [Social Diffusion: Long-term Multiple Human Motion Anticipation](https://openaccess.thecvf.com/content/ICCV2023/papers/Tanke_Social_Diffusion_Long-term_Multiple_Human_Motion_Anticipation_ICCV_2023_paper.pdf)

* [SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation](http://arxiv.org/abs/2304.10417)



## 34.Action Detection(动作识别)

* [Multimodal Distillation for Egocentric Action Recognition](http://arxiv.org/abs/2307.07483)

* [Memory-and-Anticipation Transformer for Online Action Understanding](http://arxiv.org/abs/2308.07893v1)
:star:[code](https://github.com/Echo0125/Memory-and-Anticipation-Transformer)

* [Masked Motion Predictors are Strong 3D Action Representation Learners](http://arxiv.org/abs/2308.07092v1)
:star:[code](https://github.com/maoyunyao/MAMP)

* [Efficient Video Action Detection with Token Dropout and Context Refinement](http://arxiv.org/abs/2304.08451)

* [Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Wasim_Video-FocalNets_Spatio-Temporal_Focal_Modulation_for_Video_Action_Recognition_ICCV_2023_paper.pdf)

* [E2E-LOAD: End-to-End Long-form Online Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Cao_E2E-LOAD_End-to-End_Long-form_Online_Action_Detection_ICCV_2023_paper.pdf)

* [Ego-Only: Egocentric Action Detection without Exocentric Transferring](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Ego-Only_Egocentric_Action_Detection_without_Exocentric_Transferring_ICCV_2023_paper.pdf)

* [Cross-Modal Learning with 3D Deformable Attention for Action Recognition](http://arxiv.org/abs/2212.05638)

* [DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion](http://arxiv.org/abs/2303.14863)

* [STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition](http://arxiv.org/abs/2301.03046)

* [MiniROAD: Minimal RNN Framework for Online Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/An_MiniROAD_Minimal_RNN_Framework_for_Online_Action_Detection_ICCV_2023_paper.pdf)

* [Video Action Recognition with Attentive Semantic Units](http://arxiv.org/abs/2303.09756)

* [A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition](http://arxiv.org/abs/2303.13505)

* [What Can a Cook in Italy Teach a Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations](http://arxiv.org/abs/2306.08713)

* 基于骨架的动作识别

  * [LAC -- Latent Action Composition for Skeleton-based Action Segmentation](http://arxiv.org/abs/2308.14500v1)

  * [Generative Action Description Prompts for Skeleton-based Action Recognition](http://arxiv.org/abs/2208.05318)

  * [Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Parallel_Attention_Interaction_Network_for_Few-Shot_Skeleton-Based_Action_Recognition_ICCV_2023_paper.pdf)

  * [Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition](http://arxiv.org/abs/2212.04761)

  * [Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition](http://arxiv.org/abs/2208.10741)

  * [Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_Modeling_the_Relative_Visual_Tempo_for_Self-supervised_Skeleton-based_Action_Recognition_ICCV_2023_paper.pdf)

  * [SkeleTR: Towrads Skeleton-based Action Recognition in the Wild](http://arxiv.org/abs/2309.11445v1)

  * [Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient](http://arxiv.org/abs/2308.05681)

  * [FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation](http://arxiv.org/abs/2306.11046)

* 开集动作识别

  * [SOAR: Scene-debiasing Open-set Action Recognition](http://arxiv.org/abs/2309.01265v1)
:star:[code](https://github.com/yhZhai/SOAR)

* 零样本动作识别

  * [MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge](http://arxiv.org/abs/2303.08914)

* 小样本动作识别

  * [Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching](http://arxiv.org/abs/2308.09346v1)
:star:[code](https://github.com/jiazheng-xing/GgHM)

* 时序动作定位

  * [DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization](http://arxiv.org/abs/2307.16415v1)
:star:[code](https://github.com/XiaojunTang22/ICCV2023-DDGNet)

  * [Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Movement_Enhancement_toward_Multi-Scale_Video_Feature_Representation_for_Temporal_Action_ICCV_2023_paper.pdf)

  * [Self-Feedback DETR for Temporal Action Detection](http://arxiv.org/abs/2308.10570v1)

  * [Action Sensitivity Learning for Temporal Action Localization](http://arxiv.org/abs/2305.15701)

  * [Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Revisiting_Foreground_and_Background_Separation_in_Weakly-supervised_Temporal_Action_Localization_ICCV_2023_paper.pdf)

  * [Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization](https://openaccess.thecvf.com/content/ICCV2023/papers/Xia_Learning_from_Noisy_Pseudo_Labels_for_Semi-Supervised_Temporal_Action_Localization_ICCV_2023_paper.pdf)

* 弱监督动作定位

  * [Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling](http://arxiv.org/abs/2308.09946v1)

* 小样本动作定位

  * [Few-Shot Common Action Localization via Cross-Attentional Fusion of Context and Temporal Dynamics](https://openaccess.thecvf.com/content/ICCV2023/papers/Lee_Few-Shot_Common_Action_Localization_via_Cross-Attentional_Fusion_of_Context_and_ICCV_2023_paper.pdf)

* 动作理解

  * [Memory-and-Anticipation Transformer for Online Action Understanding](http://arxiv.org/abs/2308.07893)



## 33.Video(视频)

* [Neural Video Depth Stabilizer](http://arxiv.org/abs/2307.08695)

* [NPC: Neural Point Characters from Video](http://arxiv.org/abs/2304.02013)

* [Localizing Moments in Long Video Via Multimodal Guidance](http://arxiv.org/abs/2302.13372)

* [Order-Prompted Tag Sequence Generation for Video Tagging](https://openaccess.thecvf.com/content/ICCV2023/papers/Ma_Order-Prompted_Tag_Sequence_Generation_for_Video_Tagging_ICCV_2023_paper.pdf)

* [Moment Detection in Long Tutorial Videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Croitoru_Moment_Detection_in_Long_Tutorial_Videos_ICCV_2023_paper.pdf)

* [MMVP: Motion-Matrix-based Video Prediction](http://arxiv.org/abs/2308.16154v1)
:star:[code](https://github.com/Kay1794/MMVP-motion-matrix-based-video-prediction)

* [D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation](http://arxiv.org/abs/2308.04197v1)
:star:[code](https://github.com/solicucu/D3G)

* [LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction](http://arxiv.org/abs/2308.11116v1)

* [TALL: Thumbnail Layout for Deepfake Video Detection](http://arxiv.org/abs/2307.07494)

* [Spatio-temporal Prompting Network for Robust Video Feature Extraction](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Spatio-temporal_Prompting_Network_for_Robust_Video_Feature_Extraction_ICCV_2023_paper.pdf)

* [Neural Reconstruction of Relightable Human Model from Monocular Video](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Neural_Reconstruction_of_Relightable_Human_Model_from_Monocular_Video_ICCV_2023_paper.pdf)

* 视频理解

  * [Long-range Multimodal Pretraining for Movie Understanding](http://arxiv.org/abs/2308.09775v1)

  * [RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D](http://arxiv.org/abs/2308.12035v1)

  * [UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_UniFormerV2_Unlocking_the_Potential_of_Image_ViTs_for_Video_Understanding_ICCV_2023_paper.pdf)

* 视频分类

  * [ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded](https://openaccess.thecvf.com/content/ICCV2023/papers/Bulat_ReGen_A_good_Generative_Zero-Shot_Video_Classifier_Should_be_Rewarded_ICCV_2023_paper.pdf)

  * [Gram-based Attentive Neural Ordinary Differential Equations Network for Video Nystagmography Classification](https://openaccess.thecvf.com/content/ICCV2023/papers/Qiu_Gram-based_Attentive_Neural_Ordinary_Differential_Equations_Network_for_Video_Nystagmography_ICCV_2023_paper.pdf)

  * [Few-Shot Video Classification via Representation Fusion and Promotion Learning](https://openaccess.thecvf.com/content/ICCV2023/papers/Xia_Few-Shot_Video_Classification_via_Representation_Fusion_and_Promotion_Learning_ICCV_2023_paper.pdf)

* 视频合成

  * [StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation](http://arxiv.org/abs/2308.16909v1)
:house:[project](https://www.mmlab-ntu.com/project/styleinv/index.html)
:star:[code](https://github.com/johannwyh/StyleInV)

  * [Mixed Neural Voxels for Fast Multi-view Video Synthesis](http://arxiv.org/abs/2212.00190)

  * [WALDO: Future Video Synthesis Using Object Layer Decomposition and Parametric Flow Prediction](http://arxiv.org/abs/2211.14308)

  * [Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://openaccess.thecvf.com/content/ICCV2023/papers/Khachatryan_Text2Video-Zero_Text-to-Image_Diffusion_Models_are_Zero-Shot_Video_Generators_ICCV_2023_paper.pdf)

  * [StyleLipSync: Style-based Personalized Lip-sync Video Generation](http://arxiv.org/abs/2305.00521)

  * [Text2Performer: Text-Driven Human Video Generation](http://arxiv.org/abs/2304.08483)

  * [DreamPose: Fashion Video Synthesis with Stable Diffusion](https://openaccess.thecvf.com/content/ICCV2023/papers/Karras_DreamPose_Fashion_Video_Synthesis_with_Stable_Diffusion_ICCV_2023_paper.pdf)

  * [Structure and Content-Guided Video Synthesis with Diffusion Models](http://arxiv.org/abs/2302.03011)

* 视频稳定

  * [Fast Full-frame Video Stabilization with Iterative Optimization](http://arxiv.org/abs/2307.12774v1)

  * [Minimum Latency Deep Online Video Stabilization](http://arxiv.org/abs/2212.02073)

* Video Grounding(视频定位)

  * [G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory](http://arxiv.org/abs/2307.14277v1)

  * [UniVTG: Towards Unified Video-Language Temporal Grounding](http://arxiv.org/abs/2307.16715v1)
:star:[code](https://github.com/showlab/UniVTG)

  * [Knowing Where to Focus: Event-aware Transformer for Video Grounding](http://arxiv.org/abs/2308.06947v1)
:star:[code](https://github.com/jinhyunj/EaTR)

  * [Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos](http://arxiv.org/abs/2303.08345)

* 视频分割

  * [XMem++: Production-level Video Segmentation From Few Annotated Frames](http://arxiv.org/abs/2307.15958v1)

  * [Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation](http://arxiv.org/abs/2309.13248v1)
:star:[code](https://github.com/kfan21/EoRaS)

  * [GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation](http://arxiv.org/abs/2309.11145v1)
:star:[code](https://github.com/xmed-lab/GraphEcho)

  * [MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions](http://arxiv.org/abs/2308.08544v1)
:star:[code](https://henghuiding.github.io/MeViS)
:star:[code](https://henghuiding.github.io/MeViS/)
:thumbsup:[ICCV2023｜新数据集 MeViS：基于动作描述的视频分割](https://mp.weixin.qq.com/s/hHAgfiQdA_g0DkmgWPzLeg)

  * [MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation](http://arxiv.org/abs/2308.11185v1)

  * [Tracking Anything with Decoupled Video Segmentation](http://arxiv.org/abs/2309.03903v1)
:star:[code](https://hkchengrex.github.io/Tracking-Anything-with-DEVA)

  * [The Making and Breaking of Camouflage](http://arxiv.org/abs/2309.03899v1)

  * [Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Tube-Link_A_Flexible_Cross_Tube_Framework_for_Universal_Video_Segmentation_ICCV_2023_paper.pdf)

* 视频对应

  * [Learning Fine-Grained Features for Pixel-wise Video Correspondences](http://arxiv.org/abs/2308.03040v1)
:star:[code](https://github.com/qianduoduolr/FGVC)

* 视频感知

  * [ResQ: Residual Quantization for Video Perception](http://arxiv.org/abs/2308.09511v1)

* 视频识别

  * [Audio-Visual Glance Network for Efficient Video Recognition](http://arxiv.org/abs/2308.09322v1)

  * [Efficient Decision-based Black-box Patch Attacks on Video Recognition](http://arxiv.org/abs/2303.11917)

  * [Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition](http://arxiv.org/abs/2308.11489v1)
:star:[code](https://github.com/wqtwjt1996/SUM-L)

  * [Implicit Temporal Modeling with Learnable Alignment for Video Recognition](http://arxiv.org/abs/2304.10465)

* 视频修补

  * [ProPainter: Improving Propagation and Transformer for Video Inpainting](http://arxiv.org/abs/2309.03897v1)
:star:[code](https://github.com/sczhou/ProPainter)

* 视频表示学习

  * [MGMAE: Motion Guided Masking for Video Masked Autoencoding](http://arxiv.org/abs/2308.10794v1)

  * [Spatio-Temporal Crop Aggregation for Video Representation Learning](http://arxiv.org/abs/2211.17042)

* VAD

  * [TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection](http://arxiv.org/abs/2308.11072v1)
:star:[code](https://joefioresi718.github.io/TeD-SPAD_webpage/)

  * [Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks](https://openaccess.thecvf.com/content/ICCV2023/papers/Shi_Video_Anomaly_Detection_via_Sequentially_Learning_Multiple_Pretext_Tasks_ICCV_2023_paper.pdf)

  * [Feature Prediction Diffusion Model for Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Yan_Feature_Prediction_Diffusion_Model_for_Video_Anomaly_Detection_ICCV_2023_paper.pdf)

* Video Localization

  * [UnLoc: A Unified Framework for Video Localization Tasks](http://arxiv.org/abs/2308.11062v1)
:star:[code](https://github.com/google-research/scenic)

  * [Video OWL-ViT: Temporally-consistent open-world localization in video](http://arxiv.org/abs/2308.11093v1)

  * [Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Flaborea_Multimodal_Motion_Conditioned_Diffusion_Model_for_Skeleton-based_Video_Anomaly_Detection_ICCV_2023_paper.pdf)

  * [TeD-SPAD: Temporal Distinctiveness for Self-Supervised Privacy-Preservation for Video Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2023/papers/Fioresi_TeD-SPAD_Temporal_Distinctiveness_for_Self-Supervised_Privacy-Preservation_for_Video_Anomaly_Detection_ICCV_2023_paper.pdf)

* 视频预测

  * [MMVP: Motion-Matrix-Based Video Prediction](http://arxiv.org/abs/2308.16154)

  * [Efficient Video Prediction via Sparsely Conditioned Flow Matching](http://arxiv.org/abs/2211.14575)

* 视频玻璃分割

  * [Multi-view Spectral Polarization Propagation for Video Glass Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Qiao_Multi-view_Spectral_Polarization_Propagation_for_Video_Glass_Segmentation_ICCV_2023_paper.pdf)

* 视频帧插值

  * [Rethinking Video Frame Interpolation from Shutter Mode Induced Degradation](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_Rethinking_Video_Frame_Interpolation_from_Shutter_Mode_Induced_Degradation_ICCV_2023_paper.pdf)

* 视频语义压缩

  * [Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression](https://openaccess.thecvf.com/content/ICCV2023/papers/Tian_Non-Semantics_Suppressed_Mask_Learning_for_Unsupervised_Video_Semantic_Compression_ICCV_2023_paper.pdf)

* 视频-视频翻译 

  * [Shortcut-V2V: Compression Framework for Video-to-Video Translation Based on Temporal Redundancy Reduction](https://openaccess.thecvf.com/content/ICCV2023/papers/Chung_Shortcut-V2V_Compression_Framework_for_Video-to-Video_Translation_Based_on_Temporal_Redundancy_ICCV_2023_paper.pdf)



## 32.Sign Language Recognition(手语)

* [Human Part-wise 3D Motion Context Learning for Sign Language Recognition](http://arxiv.org/abs/2308.09305v1)

* [CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Jiao_CoSign_Exploring_Co-occurrence_Signals_in_Skeleton-based_Continuous_Sign_Language_Recognition_ICCV_2023_paper.pdf)

* [Improving Continuous Sign Language Recognition with Cross-Lingual Signs](http://arxiv.org/abs/2308.10809v1)

* [C2ST: Cross-Modal Contextualized Sequence Transduction for Continuous Sign Language Recognition](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_C2ST_Cross-Modal_Contextualized_Sequence_Transduction_for_Continuous_Sign_Language_Recognition_ICCV_2023_paper.pdf)

* 手语翻译

  * [Sign Language Translation with Iterative Prototype](http://arxiv.org/abs/2308.12191v1)

  * [Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining](http://arxiv.org/abs/2307.14768v1)
:star:[code](https://github.com/zhoubenjia/GFSLT-VLP)



## 31.Human-Object Interaction(人物交互)

* [Full-Body Articulated Human-Object Interaction](http://arxiv.org/abs/2212.10621)

* [Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory](http://arxiv.org/abs/2309.03696)

* [Learning Human-Human Interactions in Images from Weak Textual Supervision](http://arxiv.org/abs/2304.14104)

* [Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction](http://arxiv.org/abs/2307.12729v1)

* [Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection](http://arxiv.org/abs/2307.13529v1)

* [Agglomerative Transformer for Human-Object Interaction Detection](http://arxiv.org/abs/2308.08370v1)

* [InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion](http://arxiv.org/abs/2308.16905v1)
:star:[code](https://sirui-xu.github.io/InterDiff/)

* [Exploring Predicate Visual Context in Detecting of Human-Object Interactions](http://arxiv.org/abs/2308.06202)

* [Persistent-Transient Duality: A Multi-Mechanism Approach for Modeling Human-Object Interaction](http://arxiv.org/abs/2307.12729)

* [Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning](http://arxiv.org/abs/2303.09410)

* [Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting](https://openaccess.thecvf.com/content/ICCV2023/papers/Xi_Open_Set_Video_HOI_detection_from_Action-Centric_Chain-of-Look_Prompting_ICCV_2023_paper.pdf)

* [Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Pi_Hierarchical_Generation_of_Human-Object_Interactions_with_Diffusion_Probabilistic_Models_ICCV_2023_paper.pdf)

* 手物交互

  * [EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding](http://arxiv.org/abs/2309.02423v1)
:house:[project](https://mvig-rhos.com/ego_pca)

  * [Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips](http://arxiv.org/abs/2309.05663v1)
:star:[code](https://judyye.github.io/diffhoi-www/)

  * [AffordPose: A Large-scale Dataset of Hand-Object Interactions with Affordance-driven Hand Pose](http://arxiv.org/abs/2309.08942v1)
:star:[code](https://github.com/GentlesJan/AffordPose)

  * [Novel-View Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views](http://arxiv.org/abs/2308.11198)



## 30.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

* 虚拟人物生成

  * [MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions](http://arxiv.org/abs/2307.10008v1)

  * [GETAvatar: Generative Textured Meshes for Animatable Human Avatars](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_GETAvatar_Generative_Textured_Meshes_for_Animatable_Human_Avatars_ICCV_2023_paper.pdf)

  * [NSF: Neural Surface Fields for Human Modeling from Monocular Depth](http://arxiv.org/abs/2308.14847v1)
:house:[project](https://yuxuan-xue.com/nsf)

  * [AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control](http://arxiv.org/abs/2303.17606)

  * [DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars](http://arxiv.org/abs/2303.09375)

* 机器人

  * [Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly](http://arxiv.org/abs/2309.06810v1)
:star:[code](https://github.com/crtie/Leveraging-SE-3-Equivariance-for-Learning-3D-Geometric-Shape-Assembly)
:star:[code](https://crtie.github.io/SE-3-part-assembly/)

  * [PourIt!: Weakly-Supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring](http://arxiv.org/abs/2307.11299)

* AR/VR

  * [HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations](http://arxiv.org/abs/2308.11261v1)

* SLAM

  * [GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction](http://arxiv.org/abs/2309.02436v1)
:star:[code](https://youmi-zym.github.io/projects/GO-SLAM/)
:star:[code](https://github.com/youmi-zym/GO-SLAM)

  * [Point-SLAM: Dense Neural Point Cloud-based SLAM](https://openaccess.thecvf.com/content/ICCV2023/papers/Sandstrom_Point-SLAM_Dense_Neural_Point_Cloud-based_SLAM_ICCV_2023_paper.pdf)

  * [NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping](https://openaccess.thecvf.com/content/ICCV2023/papers/Deng_NeRF-LOAM_Neural_Implicit_Representation_for_Large-Scale_Incremental_LiDAR_Odometry_and_ICCV_2023_paper.pdf)

  * [MV-Map: Offboard HD-Map Generation with Multi-view Consistency](https://openaccess.thecvf.com/content/ICCV2023/papers/Xie_MV-Map_Offboard_HD-Map_Generation_with_Multi-view_Consistency_ICCV_2023_paper.pdf)

* 虚拟试穿

  * [Virtual Try-On with Pose-Garment Keypoints Guided Inpainting](https://openaccess.thecvf
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52cv/iccv-2023-papers

Awesome Lists containing this project

README