https://github.com/52cv/wacv-2024-papers

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/52cv/wacv-2024-papers
Owner: 52CV
Created: 2023-08-24T08:20:51.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-01-16T08:14:54.000Z (over 2 years ago)
Last Synced: 2025-02-24T05:14:35.497Z (over 1 year ago)
Size: 431 KB
Stars: 101
Watchers: 3
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # WACV-2024-Papers

![Alt text](96748913c73db498eb8249e43c245b8.jpg)

## 会议时间：2024年1月3-7日

## 会议网址：https://wacv2024.thecvf.com/

## ❣❣❣ WACV 2024 论文分类整理已完成

## 📢📢📢获奖论文

#### 🏆最佳论文奖(Algorithms)

[Conditional Velocity Score Estimation for Image Restoration](https://openaccess.thecvf.com/content/WACV2024/papers/Shi_Conditional_Velocity_Score_Estimation_for_Image_Restoration_WACV_2024_paper.pdf)

#### 🏆最佳论文奖(Applications)

[WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Cermak_WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification_WACV_2024_paper.pdf)

#### 🏆最佳学生论文

[Wino Vidi Vici: Conquering Numerical Instability of 8-Bit Winograd Convolution for Accurate Inference Acceleration on Edge](https://openaccess.thecvf.com/content/WACV2024/papers/Mori_Wino_Vidi_Vici_Conquering_Numerical_Instability_of_8-Bit_Winograd_Convolution_WACV_2024_paper.pdf)

#### 🏆最佳论文荣誉提名

[ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields](https://openaccess.thecvf.com/content/WACV2024/papers/Abou-Chakra_ParticleNeRF_A_Particle-Based_Encoding_for_Online_Neural_Radiance_Fields_WACV_2024_paper.pdf)

## 查看2024年综述文献点这里↘️[2024-CV-Surveys](https://github.com/52CV/CV-Surveys)

## 2024 年论文分类汇总戳这里

↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)

## 2023 年论文分类汇总戳这里

↘️[CVPR-2023-Papers](https://github.com/52CV/CVPR-2023-Papers)

↘️[WACV-2023-Papers](https://github.com/52CV/WACV-2023-Papers)

↘️[ICCV-2023-Papers](https://github.com/52CV/ICCV-2023-Papers)

↘️[2023-CV-Surveys](https://github.com/52CV/CV-Surveys/blob/main/2023-CV-Surveys.md)

## [2022 年论文分类汇总戳这里](#000)

## [2021 年论文分类汇总戳这里](#00)

## [2020 年论文分类汇总戳这里](#0)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|

|------|------|------|------|

|[1.其它(Other)](#1)|[2.SR(超分辨率)](#2)|[3.Image/Video Retrieval(图像/视频检索)](#3)|[4.Image/Video Caption(图像/视频字幕)](#4)|

|[5.Image/Video Composition(图像/视频压缩)](#5)|[6.Medical Image(医学图像处理)](#6)|[7.3D(三维重建\三维视觉)](#7)|[8.Face(人脸技术)](#8)|

|[9.Image Segmentation(图像分割)](#9)|[10.Object Detector(目标检测)](#10)|[11.Object Tracking(目标跟踪)](#11)|[12.UAV/RS/Satellite Image(无人机/遥感/卫星图像)](#12)|

|[13.Reid(人员重识别/步态识别/行人检测)](#13)|[14.OCR(文本检测识别)](#14)|[15.Video](#15)|[16.Action Detection(动作检测)](#16)|

|[17.HPE(人体姿态估计)](#17)|[18.Animal](#18)|[19.Object Pose Estimation(物体姿态估计)](#19)|[20.GAN/生成](#20)|

|[21.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)](#21)|[22.VAQ(视觉问答)](#22)|[23.VL(视觉语言)](#23)|[24.LLM(大语言模型)](#24)|

|[25.Multimodal(多模态)](#25)|[26.Human Motion Prediction(人体运动预测)](#26)|[27.HOI(人物交互)](#27)|[28.Point-Cloud(点云)](#28)|

|[29.SGG(场景图生成)](#29)|[30.GNN/GCN](#30)|[31.Automated Driving(自动驾驶)](#31)|[32.Scene Flow Estimation(场景流估计)](#32)|

|[33.Optical Flow Estimation(光流估计)](#33)|[34.NAS](#34)|[35.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#35)|[36.NLP](#36)|

|[37.ML(机器学习)](#37)|[38.Visual Representation Learning](#38)|[39.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#39)|[40.Self/Semi-supervised learning](#40)|

|[41.Image Progress(低层图像处理、质量评价)](#41)|[42.Image Classification(图像分类)](#42)|[43.Image Fusion(图像融合)](#43)|[44.visual industrial inspection(工业检测)](#44)|

|[45.Visual Tampering Detection(视觉篡改检测)](#45)|[46.Dense Prediction(密集预测)](#46)|[47.Edge Detection(边缘检测)](#47)|[48.Image/Video Editing](#48)|

|[49.Vision Transformers](#49)|[50.Dataset(数据集)](#50)|[51.sound(语音)](#51)|[52.Gaze Estimation(凝视估计)](#52)|[53.Crack Segmentation](#53)|

|[53.Crack Segmentation](#53)|[54.Style Transfer(风格迁移)](#54)|[55.Biometrics(生物特征识别)](#55)|[56.Event Cameras(事件相机)](#56)|

|[57.Neural Radiance Fields(NeRF)](#57)|[58.Novel View Synthesis(新视角合成)](#58)|[59.Rendering](#59)|[60.Graphic Layout(图形布局)](#60)|

|[61.Computed Imaging(计算成像，如光学、几何、光场成像等)](#61)|



## 61.Computed Imaging(计算成像，如光学、几何、光场成像等)

* [Motion Matters: Neural Motion Transfer for Better Camera Physiological Measurement](http://arxiv.org/abs/2303.12059)

* [On the Quantification of Image Reconstruction Uncertainty without Training Data](http://arxiv.org/abs/2311.09639v1)

* [Deep Optics for Optomechanical Control Policy Design](https://openaccess.thecvf.com/content/WACV2024/papers/Fletcher_Deep_Optics_for_Optomechanical_Control_Policy_Design_WACV_2024_paper.pdf)

* [From Chaos to Calibration: A Geometric Mutual Information Approach To Target-Free Camera LiDAR Extrinsic Calibration](https://openaccess.thecvf.com/content/WACV2024/papers/Borer_From_Chaos_to_Calibration_A_Geometric_Mutual_Information_Approach_To_WACV_2024_paper.pdf)

* [Joint 3D Shape and Motion Estimation From Rolling Shutter Light-Field Images](http://arxiv.org/abs/2311.01292)

* [CGAPoseNet+GCAN: A Geometric Clifford Algebra Network for Geometry-Aware Camera Pose Regression](https://openaccess.thecvf.com/content/WACV2024/papers/Pepe_CGAPoseNetGCAN_A_Geometric_Clifford_Algebra_Network_for_Geometry-Aware_Camera_Pose_WACV_2024_paper.pdf)

* 相机校准

  * [MSCC: Multi-Scale Transformers for Camera Calibration](https://openaccess.thecvf.com/content/WACV2024/papers/Song_MSCC_Multi-Scale_Transformers_for_Camera_Calibration_WACV_2024_paper.pdf)



## 60.Graphic Layout(图形布局)

* [Unsupervised Graphic Layout Grouping with Transformers](https://openaccess.thecvf.com/content/WACV2024/papers/Zhu_Unsupervised_Graphic_Layout_Grouping_With_Transformers_WACV_2024_paper.pdf)



## 59.Rendering

* [LensNeRF: Rethinking Volume Rendering Based on Thin-Lens Camera Model](https://openaccess.thecvf.com/content/WACV2024/papers/Kim_LensNeRF_Rethinking_Volume_Rendering_Based_on_Thin-Lens_Camera_Model_WACV_2024_paper.pdf)

* [Specular Object Reconstruction Behind Frosted Glass by Differentiable Rendering](https://openaccess.thecvf.com/content/WACV2024/papers/Iwaguchi_Specular_Object_Reconstruction_Behind_Frosted_Glass_by_Differentiable_Rendering_WACV_2024_paper.pdf)



## 58.Novel View Synthesis(新视角合成)

* [Ray Deformation Networks for Novel View Synthesis of Refractive Objects](https://openaccess.thecvf.com/content/WACV2024/papers/Deng_Ray_Deformation_Networks_for_Novel_View_Synthesis_of_Refractive_Objects_WACV_2024_paper.pdf)

* [Stereo Conversion With Disparity-Aware Warping, Compositing and Inpainting](https://openaccess.thecvf.com/content/WACV2024/papers/Mehl_Stereo_Conversion_With_Disparity-Aware_Warping_Compositing_and_Inpainting_WACV_2024_paper.pdf)



## 57.Neural Radiance Fields(NeRF)

* [EvDNeRF: Reconstructing Event Data With Dynamic Neural Radiance Fields](http://arxiv.org/abs/2310.02437)

* [Hyb-NeRF: A Multiresolution Hybrid Encoding for Neural Radiance Fields](https://arxiv.org/abs/2311.12490)

* [Fast Sun-aligned Outdoor Scene Relighting based on TensoRF](http://arxiv.org/abs/2311.03965v1)

* [ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields](https://openaccess.thecvf.com/content/WACV2024/papers/Abou-Chakra_ParticleNeRF_A_Particle-Based_Encoding_for_Online_Neural_Radiance_Fields_WACV_2024_paper.pdf)

* [MoRF: Mobile Realistic Fullbody Avatars From a Monocular Video](https://arxiv.org/abs/2303.10275)

* [ZIGNeRF: Zero-Shot 3D Scene Representation With Invertible Generative Neural Radiance Fields](http://arxiv.org/abs/2306.02741)

* [Point-DynRF: Point-Based Dynamic Radiance Fields From a Monocular Video](https://openaccess.thecvf.com/content/WACV2024/papers/Park_Point-DynRF_Point-Based_Dynamic_Radiance_Fields_From_a_Monocular_Video_WACV_2024_paper.pdf)

* [A Generic and Flexible Regularization Framework for NeRFs](https://openaccess.thecvf.com/content/WACV2024/papers/Ehret_A_Generic_and_Flexible_Regularization_Framework_for_NeRFs_WACV_2024_paper.pdf)



## 56.Event Cameras(事件相机)

* [Masked Event Modeling: Self-Supervised Pretraining for Event Cameras](http://arxiv.org/abs/2212.10368)



## 55.Biometrics(生物特征识别)

* [Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species](http://arxiv.org/abs/2305.06695)

* [Fingervein Verification using Convolutional Multi-Head Attention Network](http://arxiv.org/abs/2310.16808v1)

* [FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude](http://arxiv.org/abs/2306.17206)

* [Vikriti-ID: A Novel Approach for Real Looking Fingerprint Data-Set Generation](https://openaccess.thecvf.com/content/WACV2024/papers/Shukla_Vikriti-ID_A_Novel_Approach_for_Real_Looking_Fingerprint_Data-Set_Generation_WACV_2024_paper.pdf)

* 指纹生成

  * [FPGAN-Control: A Controllable Fingerprint Generator for Training With Synthetic Data](https://openaccess.thecvf.com/content/WACV2024/papers/Shoshan_FPGAN-Control_A_Controllable_Fingerprint_Generator_for_Training_With_Synthetic_Data_WACV_2024_paper.pdf)



## 54.Style Transfer(风格迁移)

* [Optical Flow Domain Adaptation via Target Style Transfer](https://openaccess.thecvf.com/content/WACV2024/papers/Yoon_Optical_Flow_Domain_Adaptation_via_Target_Style_Transfer_WACV_2024_paper.pdf)

* [Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion](http://arxiv.org/abs/2312.01671v1)
:star:[code](https://hywang66.github.io/mmist/)

* [FastCLIPstyler: Optimisation-Free Text-Based Image Style Transfer Using Style Representations](http://arxiv.org/abs/2210.03461)

* [SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer From a Spectral Perspective](http://arxiv.org/abs/2303.09270)

* [Neural Style Protection: Counteracting Unauthorized Neural Style Transfer](https://openaccess.thecvf.com/content/WACV2024/papers/Li_Neural_Style_Protection_Counteracting_Unauthorized_Neural_Style_Transfer_WACV_2024_paper.pdf)

* [LipAT: Beyond Style Transfer for Controllable Neural Simulation of Lipstick Using Cosmetic Attributes](https://openaccess.thecvf.com/content/WACV2024/papers/Silva_LipAT_Beyond_Style_Transfer_for_Controllable_Neural_Simulation_of_Lipstick_WACV_2024_paper.pdf)



## 53.Crack Segmentation

* [Designing a Hybrid Neural System To Learn Real-World Crack Segmentation From Fractal-Based Simulation](http://arxiv.org/abs/2309.09637)



## 52.Gaze Estimation(凝视估计)

* [Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-Based Gaze Estimation](http://arxiv.org/abs/2305.12704)

* 目光跟踪

  * [Multi-Modal Gaze Following in Conversational Scenarios](http://arxiv.org/abs/2311.05669)



## 51.sound(语音)

* 唇语同步

  * [Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization](http://arxiv.org/abs/2308.09716)

* 声源定位

  * [Can CLIP Help Sound Source Localization?](http://arxiv.org/abs/2311.04066v1)

* 音频分离  

  * [LAVSS: Location-Guided Audio-Visual Spatial Audio Separation](https://arxiv.org/abs/2310.20446)

  * [Visually Guided Audio Source Separation With Meta Consistency Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Islam_Visually_Guided_Audio_Source_Separation_With_Meta_Consistency_Learning_WACV_2024_paper.pdf)

* 3D 声源检测

  * [Sound3DVDet: 3D Sound Source Detection Using Multiview Microphone Array and RGB Images](https://openaccess.thecvf.com/content/WACV2024/papers/He_Sound3DVDet_3D_Sound_Source_Detection_Using_Multiview_Microphone_Array_and_WACV_2024_paper.pdf)

* 音视频分割

  * [Annotation-Free Audio-Visual Segmentation](http://arxiv.org/abs/2305.11019)

* 语音视频合成

  * [DR2: Disentangled Recurrent Representation Learning for Data-Efficient Speech Video Synthesis](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_DR2_Disentangled_Recurrent_Representation_Learning_for_Data-Efficient_Speech_Video_Synthesis_WACV_2024_paper.pdf)

* 身体节拍制作互动鼓声

  * [Let the Beat Follow You - Creating Interactive Drum Sounds From Body Rhythm](https://openaccess.thecvf.com/content/WACV2024/papers/Liu_Let_the_Beat_Follow_You_-_Creating_Interactive_Drum_Sounds_WACV_2024_paper.pdf)



## 50.Dataset(数据集)

* [HaGRID -- HAnd Gesture Recognition Image Dataset](https://openaccess.thecvf.com/content/WACV2024/papers/Kapitanov_HaGRID_--_HAnd_Gesture_Recognition_Image_Dataset_WACV_2024_paper.pdf)

* [Beyond RGB: A Real World Dataset for Multispectral Imaging in Mobile Devices](https://openaccess.thecvf.com/content/WACV2024/papers/Glatt_Beyond_RGB_A_Real_World_Dataset_for_Multispectral_Imaging_in_WACV_2024_paper.pdf)

* [IKEA Ego 3D Dataset: Understanding Furniture Assembly Actions From Ego-View 3D Point Clouds](https://openaccess.thecvf.com/content/WACV2024/papers/Ben-Shabat_IKEA_Ego_3D_Dataset_Understanding_Furniture_Assembly_Actions_From_Ego-View_WACV_2024_paper.pdf)

* [PsyMo: A Dataset for Estimating Self-Reported Psychological Traits From Gait](http://arxiv.org/abs/2308.10631)

* [The Growing Strawberries Dataset: Tracking Multiple Objects With Biological Development Over an Extended Period](https://openaccess.thecvf.com/content/WACV2024/papers/Wen_The_Growing_Strawberries_Dataset_Tracking_Multiple_Objects_With_Biological_Development_WACV_2024_paper.pdf)

* [UOW-Vessel: A Benchmark Dataset of High-Resolution Optical Satellite Images for Vessel Detection and Segmentation](https://openaccess.thecvf.com/content/WACV2024/papers/Bui_UOW-Vessel_A_Benchmark_Dataset_of_High-Resolution_Optical_Satellite_Images_for_WACV_2024_paper.pdf)

* [NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction](http://arxiv.org/abs/2311.04505v1)
:star:[code](https://github.com/thohemp/nitec)

* [FishTrack23: An Ensemble Underwater Dataset for Multi-Object Tracking](https://openaccess.thecvf.com/content/WACV2024/papers/Dawkins_FishTrack23_An_Ensemble_Underwater_Dataset_for_Multi-Object_Tracking_WACV_2024_paper.pdf)

* [Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning](http://arxiv.org/abs/2309.06597)

* [NOMAD: A Natural, Occluded, Multi-Scale Aerial Dataset, for Emergency Response Scenarios](http://arxiv.org/abs/2309.09518)

* [Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering](https://openaccess.thecvf.com/content/WACV2024/papers/Liu_Tackling_Data_Bias_in_MUSIC-AVQA_Crafting_a_Balanced_Dataset_for_WACV_2024_paper.pdf)

* [SphereCraft: A Dataset for Spherical Keypoint Detection, Matching and Camera Pose Estimation](https://openaccess.thecvf.com/content/WACV2024/papers/Gava_SphereCraft_A_Dataset_for_Spherical_Keypoint_Detection_Matching_and_Camera_WACV_2024_paper.pdf)

* [Ego2HandsPose: A Dataset for Egocentric Two-Hand 3D Global Pose Estimation](http://arxiv.org/abs/2206.04927)

* [MarsLS-Net: Martian Landslides Segmentation Network and Benchmark Dataset](https://openaccess.thecvf.com/content/WACV2024/papers/Paheding_MarsLS-Net_Martian_Landslides_Segmentation_Network_and_Benchmark_Dataset_WACV_2024_paper.pdf)

* [Beyond Document Page Classification: Design, Datasets, and Challenges](http://arxiv.org/abs/2308.12896)

* [MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis](http://arxiv.org/abs/2311.02778)

* [SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection](https://arxiv.org/abs/2309.01907)
:sunflower:[dataset](https://github.com/JTRNEO/SyntheWorld)

* [IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting](http://arxiv.org/abs/2310.17323v1)
:star:[code](https://github.com/TimSchoonbeek/IndustReal)

* [SICKLE: A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Cropping Parameters](https://arxiv.org/abs/2312.00069)

* [SeaTurtleID2022: A Long-Span Dataset for Reliable Sea Turtle Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Adam_SeaTurtleID2022_A_Long-Span_Dataset_for_Reliable_Sea_Turtle_Re-Identification_WACV_2024_paper.pdf)

* [Amodal Intra-Class Instance Segmentation: Synthetic Datasets and Benchmark](http://arxiv.org/abs/2303.06596)

* [Towards Accurate Disease Segmentation in Plant Images: A Comprehensive Dataset Creation and Network Evaluation](https://openaccess.thecvf.com/content/WACV2024/papers/Prashanth_Towards_Accurate_Disease_Segmentation_in_Plant_Images_A_Comprehensive_Dataset_WACV_2024_paper.pdf)

* [AssemblyNet: A Point Cloud Dataset and Benchmark for Predicting Part Directions in an Exploded Layout](https://openaccess.thecvf.com/content/WACV2024/papers/Gaarsdal_AssemblyNet_A_Point_Cloud_Dataset_and_Benchmark_for_Predicting_Part_WACV_2024_paper.pdf)

* [MAdVerse: A Hierarchical Dataset of Multi-Lingual Ads From Diverse Sources and Categories](https://openaccess.thecvf.com/content/WACV2024/papers/Sagar_MAdVerse_A_Hierarchical_Dataset_of_Multi-Lingual_Ads_From_Diverse_Sources_WACV_2024_paper.pdf)

* [InfraParis: A Multi-Modal and Multi-Task Autonomous Driving Dataset](http://arxiv.org/abs/2309.15751)

* [ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding](http://arxiv.org/abs/2304.13219)

* 基准

  * [ISAR: A Benchmark for Single- and Few-Shot Object Instance Segmentation and Re-Identification](http://arxiv.org/abs/2311.02734v1)

  * [ConeQuest: A Benchmark for Cone Segmentation on Mars](http://arxiv.org/abs/2311.08657v1)
:star:[code](https://github.com/kerner-lab/ConeQuest)

  * [dacl10k: Benchmark for Semantic Bridge Damage Segmentation](https://openaccess.thecvf.com/content/WACV2024/papers/Flotzinger_dacl10k_Benchmark_for_Semantic_Bridge_Damage_Segmentation_WACV_2024_paper.pdf)

  * [IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather](https://arxiv.org/abs/2311.14459)

  * [A Multimodal Benchmark and Improved Architecture for Zero Shot Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Doshi_A_Multimodal_Benchmark_and_Improved_Architecture_for_Zero_Shot_Learning_WACV_2024_paper.pdf)

  * [RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-Centric Learning](http://arxiv.org/abs/2308.14899)



## 49.Vision Transformers

* [Grafting Vision Transformers](http://arxiv.org/abs/2210.15943)

* [Efficient MAE Towards Large-Scale Vision Transformers](https://openaccess.thecvf.com/content/WACV2024/papers/Han_Efficient_MAE_Towards_Large-Scale_Vision_Transformers_WACV_2024_paper.pdf)

* [SimA: Simple Softmax-Free Attention for Vision Transformers](http://arxiv.org/abs/2206.08898)

* [Open-NeRF: Towards Open Vocabulary NeRF Decomposition](http://arxiv.org/abs/2310.16383v1)

* [Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders](https://arxiv.org/abs/2310.20704)

* [Triplet Attention Transformer for Spatiotemporal Predictive Learning](http://arxiv.org/abs/2310.18698)

* [Query-Guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch](http://arxiv.org/abs/2303.08784)

* [GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation](http://arxiv.org/abs/2311.03035v1)
:star:[code](https://github.com/Ackesnal/GTP-ViT)

* [Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective](http://arxiv.org/abs/2208.09602)

* [SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers](http://arxiv.org/abs/2311.03747v1)
:star:[code](https://github.com/xyongLu/SBCFormer)

* [Robust Eye Blink Detection Using Dual Embedding Video Vision Transformer](https://openaccess.thecvf.com/content/WACV2024/papers/Hong_Robust_Eye_Blink_Detection_Using_Dual_Embedding_Video_Vision_Transformer_WACV_2024_paper.pdf)

* [Semantic Labels-Aware Transformer Model for Searching Over a Large Collection of Lecture-Slides](https://openaccess.thecvf.com/content/WACV2024/papers/Jobin_Semantic_Labels-Aware_Transformer_Model_for_Searching_Over_a_Large_Collection_WACV_2024_paper.pdf)



## 48.Image/Video Editing

* [Unified Concept Editing in Diffusion Models](http://arxiv.org/abs/2308.14761)

* [Iterative Multi-Granular Image Editing Using Diffusion Models](http://arxiv.org/abs/2309.00613)

* [Discovering and Mitigating Biases in CLIP-Based Image Editing](https://openaccess.thecvf.com/content/WACV2024/papers/Tanjim_Discovering_and_Mitigating_Biases_in_CLIP-Based_Image_Editing_WACV_2024_paper.pdf)

* [Revisiting Latent Space of GAN Inversion for Robust Real Image Editing](https://openaccess.thecvf.com/content/WACV2024/papers/Katsumata_Revisiting_Latent_Space_of_GAN_Inversion_for_Robust_Real_Image_WACV_2024_paper.pdf)

* [ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance](https://openaccess.thecvf.com/content/WACV2024/papers/Han_ProxEdit_Improving_Tuning-Free_Real_Image_Editing_With_Proximal_Guidance_WACV_2024_paper.pdf)

* 图像拼接

  * [Learning Residual Elastic Warps for Image Stitching Under Dirichlet Boundary Condition](http://arxiv.org/abs/2309.01406)

  * [Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction](http://arxiv.org/abs/2309.01409)

* 视频编辑

  * [Real Time GAZED: Online Shot Selection and Editing of Virtual Cameras From Wide-Angle Monocular Video Recordings](http://arxiv.org/abs/2311.15581)

* 文本-图像编辑

  * [Text-to-Image Editing by Image Information Removal](http://arxiv.org/abs/2305.17489)

* 3D 场景编辑

  * [NeRFEditor: Differentiable Style Decomposition for 3D Scene Editing](https://openaccess.thecvf.com/content/WACV2024/papers/Sun_NeRFEditor_Differentiable_Style_Decomposition_for_3D_Scene_Editing_WACV_2024_paper.pdf)



## 47.Edge Detection(边缘检测)

* [Self-Supervised Edge Detection Reconstruction for Topology-Informed 3D Axon Segmentation and Centerline Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Xu_Self-Supervised_Edge_Detection_Reconstruction_for_Topology-Informed_3D_Axon_Segmentation_and_WACV_2024_paper.pdf)



## 46.Dense Prediction(密集预测)

* [PolyMaX: General Dense Prediction with Mask Transformer](http://arxiv.org/abs/2311.05770v1)

* [Convolutional Masked Image Modeling for Dense Prediction Tasks on Pathology Images](https://openaccess.thecvf.com/content/WACV2024/papers/Yang_Convolutional_Masked_Image_Modeling_for_Dense_Prediction_Tasks_on_Pathology_WACV_2024_paper.pdf)



## 45.Visual Tampering Detection(视觉篡改检测)

* 包裹防伪检测

  * [TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains](http://arxiv.org/abs/2311.03124v1)
:star:[code](https://a-nau.github.io/tampar)

* 视频伪造检测

  * [VideoFACT: Detecting Video Forgeries Using Attention, Scene Context, and Forensic Traces](https://arxiv.org/abs/2211.15775)

* Deepfakes 

  * [D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles](http://arxiv.org/abs/2202.05687)

  * [Weakly-supervised deepfake localization in diffusion-generated images](http://arxiv.org/abs/2311.04584v1)

  * [How Do Deepfakes Move? Motion Magnification for Deepfake Source Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Demir_How_Do_Deepfakes_Move_Motion_Magnification_for_Deepfake_Source_Detection_WACV_2024_paper.pdf)

  * [Improving Fairness in Deepfake Detection](http://arxiv.org/abs/2306.16635)



## 44.visual industrial inspection(工业检测)

* [ReConPatch: Contrastive Patch Representation Learning for Industrial Anomaly Detection](http://arxiv.org/abs/2305.16713)

* [High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis](http://arxiv.org/abs/2304.06433)

* 图像异常检测

  * [Attention Modules Improve Image-Level Anomaly Detection for Industrial Inspection: A DifferNet Case Study](https://openaccess.thecvf.com/content/WACV2024/papers/Vieira_e_Silva_Attention_Modules_Improve_Image-Level_Anomaly_Detection_for_Industrial_Inspection_A_WACV_2024_paper.pdf)

  * [Contextual Affinity Distillation for Image Anomaly Detection](http://arxiv.org/abs/2307.03101)

* 表面异常检测

  * [Cheating Depth: Enhancing 3D Surface Anomaly Detection via Depth Simulation](http://arxiv.org/abs/2311.01117v1)

* 图像异常定位

  * [Learning Transferable Representations for Image Anomaly Localization Using Dense Pretraining](https://openaccess.thecvf.com/content/WACV2024/papers/He_Learning_Transferable_Representations_for_Image_Anomaly_Localization_Using_Dense_Pretraining_WACV_2024_paper.pdf)

* 视觉异常检测

  * [EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies](https://openaccess.thecvf.com/content/WACV2024/papers/Batzner_EfficientAD_Accurate_Visual_Anomaly_Detection_at_Millisecond-Level_Latencies_WACV_2024_paper.pdf)

* 零样本异常检测

  * [PromptAD: Zero-Shot Anomaly Detection Using Text Prompts](https://openaccess.thecvf.com/content/WACV2024/papers/Li_PromptAD_Zero-Shot_Anomaly_Detection_Using_Text_Prompts_WACV_2024_paper.pdf)

* 轨迹异常检测

  * [Holistic Representation Learning for Multitask Trajectory Anomaly Detection](http://arxiv.org/abs/2311.01851)

* 人类行为理解

  * [ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios](https://openaccess.thecvf.com/content/WACV2024/papers/Ragusa_ENIGMA-51_Towards_a_Fine-Grained_Understanding_of_Human_Behavior_in_Industrial_WACV_2024_paper.pdf)

* OOD

  * [HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings](https://openaccess.thecvf.com/content/WACV2024/papers/Mehta_HyperMix_Out-of-Distribution_Detection_and_Classification_in_Few-Shot_Settings_WACV_2024_paper.pdf)

  * [Out-of-Distribution Detection With Logical Reasoning](https://openaccess.thecvf.com/content/WACV2024/papers/Kirchheim_Out-of-Distribution_Detection_With_Logical_Reasoning_WACV_2024_paper.pdf)

  * [ATS: Adaptive Temperature Scaling for Enhancing Out-of-Distribution Detection Methods](https://openaccess.thecvf.com/content/WACV2024/papers/Krumpl_ATS_Adaptive_Temperature_Scaling_for_Enhancing_Out-of-Distribution_Detection_Methods_WACV_2024_paper.pdf)



## 43.Image Fusion(图像融合)

* [Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion](http://arxiv.org/abs/2311.01886v1)
:star:[code](https://github.com/ixilai/MFIF-MMIF)



## 42.Image Classification(图像分类)

* [Semantic Generative Augmentations for Few-Shot Counting](https://arxiv.org/abs/2311.16122)

* [Learning Quality Labels for Robust Image Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Wang_Learning_Quality_Labels_for_Robust_Image_Classification_WACV_2024_paper.pdf)

* [Visual Narratives: Large-Scale Hierarchical Classification of Art-Historical Images](https://openaccess.thecvf.com/content/WACV2024/papers/Springstein_Visual_Narratives_Large-Scale_Hierarchical_Classification_of_Art-Historical_Images_WACV_2024_paper.pdf)

* [Benchmark Generation Framework With Customizable Distortions for Image Classifier Robustness](http://arxiv.org/abs/2310.18626)

* [Deep Subdomain Alignment for Cross-Domain Image Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Zhao_Deep_Subdomain_Alignment_for_Cross-Domain_Image_Classification_WACV_2024_paper.pdf)

* [Online Class-Incremental Learning for Real-World Food Image Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Raghavan_Online_Class-Incremental_Learning_for_Real-World_Food_Image_Classification_WACV_2024_paper.pdf)

* [An Empirical Investigation Into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification](http://arxiv.org/abs/2311.14859)

* [Letting 3D Guide the Way: 3D Guided 2D Few-Shot Image Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Chen_Letting_3D_Guide_the_Way_3D_Guided_2D_Few-Shot_Image_WACV_2024_paper.pdf)

* 长尾视觉识别

  * [Semantic Transfer From Head to Tail: Enlarging Tail Margin for Long-Tailed Visual Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Semantic_Transfer_From_Head_to_Tail_Enlarging_Tail_Margin_for_WACV_2024_paper.pdf)

* 多标签图像分类

  * [Discriminator-Free Unsupervised Domain Adaptation for Multi-Label Image Classification](http://arxiv.org/abs/2301.10611)

  * [Active Batch Sampling for Multi-Label Classification With Binary User Feedback](https://openaccess.thecvf.com/content/WACV2024/papers/Goswami_Active_Batch_Sampling_for_Multi-Label_Classification_With_Binary_User_Feedback_WACV_2024_paper.pdf)

* 小样本分类

  * [Domain Aligned CLIP for Few-shot Classification](http://arxiv.org/abs/2311.09191v1)

  * [HELA-VFA: A Hellinger Distance-Attention-Based Feature Aggregation Network for Few-Shot Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_HELA-VFA_A_Hellinger_Distance-Attention-Based_Feature_Aggregation_Network_for_Few-Shot_Classification_WACV_2024_paper.pdf)

* 多视图分类

  * [Multi-View Classification Using Hybrid Fusion and Mutual Distillation](https://openaccess.thecvf.com/content/WACV2024/papers/Black_Multi-View_Classification_Using_Hybrid_Fusion_and_Mutual_Distillation_WACV_2024_paper.pdf)

* 海草分类

  * [Image Labels Are All You Need for Coarse Seagrass Segmentation](http://arxiv.org/abs/2303.00973)

* 细粒度

  * [Elusive Images: Beyond Coarse Analysis for Fine-Grained Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Anderson_Elusive_Images_Beyond_Coarse_Analysis_for_Fine-Grained_Recognition_WACV_2024_paper.pdf)

* 鸟类物种分类

  * [BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping](http://arxiv.org/abs/2310.19168)



## 41.Image Progress(低层图像处理、质量评价)

* 图像恢复

  * [Conditional Velocity Score Estimation for Image Restoration](https://openaccess.thecvf.com/content/WACV2024/papers/Shi_Conditional_Velocity_Score_Estimation_for_Image_Restoration_WACV_2024_paper.pdf)

  * [UGPNet: Universal Generative Prior for Image Restoration](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_UGPNet_Universal_Generative_Prior_for_Image_Restoration_WACV_2024_paper.pdf)

  * [PAIR: Perception Aided Image Restoration for Natural Driving Conditions](https://openaccess.thecvf.com/content/WACV2024/papers/Shyam_PAIR_Perception_Aided_Image_Restoration_for_Natural_Driving_Conditions_WACV_2024_paper.pdf)

  * [LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration](http://arxiv.org/abs/2308.14596)

  * [Efficient Layout-Guided Image Inpainting for Mobile Use](https://openaccess.thecvf.com/content/WACV2024/papers/Li_Efficient_Layout-Guided_Image_Inpainting_for_Mobile_Use_WACV_2024_paper.pdf)

* 图像修复

  * [GraphFill: Deep Image Inpainting Using Graphs](https://openaccess.thecvf.com/content/WACV2024/papers/Verma_GraphFill_Deep_Image_Inpainting_Using_Graphs_WACV_2024_paper.pdf)

  * [LatentPaint: Image Inpainting in Latent Space With Diffusion Models](https://openaccess.thecvf.com/content/WACV2024/papers/Corneanu_LatentPaint_Image_Inpainting_in_Latent_Space_With_Diffusion_Models_WACV_2024_paper.pdf)

* 图像矫正

  * [4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters](http://arxiv.org/abs/2311.08759v1)
:star:[code](https://github.com/Zhou-Yijie/MSLTNet)

* 图像增强

  * 水下图像增强

    * [PhISH-Net: Physics Inspired System for High Resolution Underwater Image Enhancement](https://openaccess.thecvf.com/content/WACV2024/papers/Chandrasekar_PhISH-Net_Physics_Inspired_System_for_High_Resolution_Underwater_Image_Enhancement_WACV_2024_paper.pdf)

    * [Spectroformer: Multi-Domain Query Cascaded Transformer Network for Underwater Image Enhancement](https://openaccess.thecvf.com/content/WACV2024/papers/Khan_Spectroformer_Multi-Domain_Query_Cascaded_Transformer_Network_for_Underwater_Image_Enhancement_WACV_2024_paper.pdf)

* 图像去噪

  * [Self-Supervised Denoising Transformer With Gaussian Process](https://openaccess.thecvf.com/content/WACV2024/papers/Yasarla_Self-Supervised_Denoising_Transformer_With_Gaussian_Process_WACV_2024_paper.pdf)

  * [Spiking Denoising Diffusion Probabilistic Models](http://arxiv.org/abs/2306.17046)

  * [Image Denoising and the Generative Accumulation of Photons](http://arxiv.org/abs/2307.06607)

  * [Fixed Pattern Noise Removal for Multi-View Single-Sensor Infrared Camera](https://openaccess.thecvf.com/content/WACV2024/papers/Barral_Fixed_Pattern_Noise_Removal_for_Multi-View_Single-Sensor_Infrared_Camera_WACV_2024_paper.pdf)

  * [LIVENet: A Novel Network for Real-World Low-Light Image Denoising and Enhancement](https://openaccess.thecvf.com/content/WACV2024/papers/Makwana_LIVENet_A_Novel_Network_for_Real-World_Low-Light_Image_Denoising_and_WACV_2024_paper.pdf)

* 图像去雾

  * [C2AIR: Consolidated Compact Aerial Image Haze Removal](https://openaccess.thecvf.com/content/WACV2024/papers/Kulkarni_C2AIR_Consolidated_Compact_Aerial_Image_Haze_Removal_WACV_2024_paper.pdf)

* 图像去闪光

  * [Revolutionize the Oceanic Drone RGB Imagery With Pioneering Sun Glint Detection and Removal Techniques](https://openaccess.thecvf.com/content/WACV2024/papers/Qin_Revolutionize_the_Oceanic_Drone_RGB_Imagery_With_Pioneering_Sun_Glint_WACV_2024_paper.pdf)

* 图像去反射

  * [Fully-Automatic Reflection Removal for 360-Degree Images](https://openaccess.thecvf.com/content/WACV2024/papers/Park_Fully-Automatic_Reflection_Removal_for_360-Degree_Images_WACV_2024_paper.pdf)

* 图像去模糊

  * [Sharp-NeRF: Grid-Based Fast Deblurring Neural Radiance Fields Using Sharpness Prior](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_Sharp-NeRF_Grid-Based_Fast_Deblurring_Neural_Radiance_Fields_Using_Sharpness_Prior_WACV_2024_paper.pdf)

  * [Deep Plug-and-Play Nighttime Non-Blind Deblurring With Saturated Pixel Handling Schemes](https://openaccess.thecvf.com/content/WACV2024/papers/Shu_Deep_Plug-and-Play_Nighttime_Non-Blind_Deblurring_With_Saturated_Pixel_Handling_Schemes_WACV_2024_paper.pdf)

  * [Deblur-NSFF: Neural Scene Flow Fields for Blurry Dynamic Scenes](https://openaccess.thecvf.com/content/WACV2024/papers/Luthra_Deblur-NSFF_Neural_Scene_Flow_Fields_for_Blurry_Dynamic_Scenes_WACV_2024_paper.pdf)

  * [Single-Image Deblurring, Trajectory and Shape Recovery of Fast Moving Objects With Denoising Diffusion Probabilistic Models](https://openaccess.thecvf.com/content/WACV2024/papers/Spetlik_Single-Image_Deblurring_Trajectory_and_Shape_Recovery_of_Fast_Moving_Objects_WACV_2024_paper.pdf)

* 图像去阴影

  * [Latent Feature-Guided Diffusion Models for Shadow Removal](http://arxiv.org/abs/2312.02156)

* 图像质量评估

  * [ARNIQA: Learning Distortion Manifold for Image Quality Assessment](http://arxiv.org/abs/2310.14918)

  * [Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment](https://arxiv.org/abs/2312.04838)

  * [Opinion Unaware Image Quality Assessment via Adversarial Convolutional Variational Autoencoder](https://openaccess.thecvf.com/content/WACV2024/papers/Shukla_Opinion_Unaware_Image_Quality_Assessment_via_Adversarial_Convolutional_Variational_Autoencoder_WACV_2024_paper.pdf)

* 图像颜色编辑

  * [Content-Aware Image Color Editing With Auxiliary Color Restoration Tasks](https://openaccess.thecvf.com/content/WACV2024/papers/Ren_Content-Aware_Image_Color_Editing_With_Auxiliary_Color_Restoration_Tasks_WACV_2024_paper.pdf)

  * [Real-Time User-Guided Adaptive Colorization With Vision Transformer](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_Real-Time_User-Guided_Adaptive_Colorization_With_Vision_Transformer_WACV_2024_paper.pdf)

  * 再着色

    * [Latent-Guided Exemplar-Based Image Re-Colorization](https://openaccess.thecvf.com/content/WACV2024/papers/Yang_Latent-Guided_Exemplar-Based_Image_Re-Colorization_WACV_2024_paper.pdf)



## 40.Self/Semi-supervised learning

* 无监督学习

  * [United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos](http://arxiv.org/abs/2311.03550v1)

  * [FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association](https://openaccess.thecvf.com/content/WACV2024/papers/Zhuo_FELGA_Unsupervised_Fragment_Embedding_for_Fine-Grained_Cross-Modal_Association_WACV_2024_paper.pdf)

* 半监督学习

  * [SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning](https://arxiv.org/abs/2310.15787)
:star:[code](https://github.com/beandkay/SequenceMatch)

  * [Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector](https://arxiv.org/abs/2310.15764)
:star:[code](https://github.com/beandkay/EPASS)

  * [Universal Semi-Supervised Model Adaptation via Collaborative Consistency Training](http://arxiv.org/abs/2307.03449)

  * [Improving Open-Set Semi-Supervised Learning With Self-Supervision](http://arxiv.org/abs/2301.10127)

  * [Appearance-Based Curriculum for Semi-Supervised Learning With Multi-Angle Unlabeled Data](https://openaccess.thecvf.com/content/WACV2024/papers/Tanaka_Appearance-Based_Curriculum_for_Semi-Supervised_Learning_With_Multi-Angle_Unlabeled_Data_WACV_2024_paper.pdf)

* 自监督学习

  * [Self-Supervised Learning of Semantic Correspondence Using Web Videos](https://openaccess.thecvf.com/content/WACV2024/papers/Kwon_Self-Supervised_Learning_of_Semantic_Correspondence_Using_Web_Videos_WACV_2024_paper.pdf)

  * [CycleCL: Self-supervised Learning for Periodic Videos](http://arxiv.org/abs/2311.03402v1)

  * [Self-Supervised Representation Learning With Cross-Context Learning Between Global and Hypercolumn Features](http://arxiv.org/abs/2308.13392)

  * [Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction](http://arxiv.org/abs/2311.04834v1)
:star:[code](https://github.com/deeplab-ai/SelfSupervisedVRD)

  * [Self-Supervised Learning for Place Representation Generalization Across Appearance Changes](https://openaccess.thecvf.com/content/WACV2024/papers/Musallam_Self-Supervised_Learning_for_Place_Representation_Generalization_Across_Appearance_Changes_WACV_2024_paper.pdf)

  * [Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where](http://arxiv.org/abs/2309.12757)

  * [MGM-AE: Self-Supervised Learning on 3D Shape Using Mesh Graph Masked Autoencoders](https://openaccess.thecvf.com/content/WACV2024/papers/Yang_MGM-AE_Self-Supervised_Learning_on_3D_Shape_Using_Mesh_Graph_Masked_WACV_2024_paper.pdf)



## 39.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)

* 零样本学习

  * [GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning](http://arxiv.org/abs/2311.05729v1)

  * [Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning](http://arxiv.org/abs/2312.01167v1)

  * [CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning](http://arxiv.org/abs/2305.16681)

* 小样本学习

  * [Adaptive Manifold for Imbalanced Transductive Few-Shot Learning](http://arxiv.org/abs/2304.14281)

  * [Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin](https://arxiv.org/abs/2309.10013)

* DG

  * [Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization](http://arxiv.org/abs/2311.02599v1)

  * [On the Fly Neural Style Smoothing for Risk-Averse Domain Generalization](http://arxiv.org/abs/2307.08551)

  * [Domain Generalization With Correlated Style Uncertainty](http://arxiv.org/abs/2212.09950)

  * [Randomized Adversarial Style Perturbations for Domain Generalization](http://arxiv.org/abs/2304.01959)

  * [Domain Generalisation via Risk Distribution Matching](http://arxiv.org/abs/2310.18598)

  * [Domain Generalization by Rejecting Extreme Augmentations](https://openaccess.thecvf.com/content/WACV2024/papers/Aminbeidokhti_Domain_Generalization_by_Rejecting_Extreme_Augmentations_WACV_2024_paper.pdf)

  * [Single Domain Generalization via Normalised Cross-Correlation Based Convolutions](http://arxiv.org/abs/2307.05901)

  * [STYLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-Based Domain Generalization](http://arxiv.org/abs/2302.09251)

* DA

  * [Gradual Source Domain Expansion for Unsupervised Domain Adaptation](http://arxiv.org/abs/2311.09599v1)

  * [Continual Test-Time Domain Adaptation via Dynamic Sample Selection](http://arxiv.org/abs/2301.10611)

  * [Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation](https://arxiv.org/abs/2311.12623)
:star:[code](https://github.com/cfredinh/coda)

  * [GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap](https://arxiv.org/abs/2311.12467)
:star:[code](https://github.com/KHU-VLL/GLAD)

  * [Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation](https://arxiv.org/abs/2311.16294)
:house:[project](https://val.cds.iisc.ac.in/C-SFTrans/)

  * [Robust Unsupervised Domain Adaptation Through Negative-View Regularization](https://openaccess.thecvf.com/content/WACV2024/papers/Jang_Robust_Unsupervised_Domain_Adaptation_Through_Negative-View_Regularization_WACV_2024_paper.pdf)

  * [ReCLIP: Refine Contrastive Language Image Pre-Training With Source Free Domain Adaptation](http://arxiv.org/abs/2308.03793)

  * [Stochastic Binary Network for Universal Domain Adaptation](https://openaccess.thecvf.com/content/WACV2024/papers/Jain_Stochastic_Binary_Network_for_Universal_Domain_Adaptation_WACV_2024_paper.pdf)

  * [D3GU: Multi-Target Active Domain Adaptation via Enhancing Domain Alignment](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_D3GU_Multi-Target_Active_Domain_Adaptation_via_Enhancing_Domain_Alignment_WACV_2024_paper.pdf)

  * [Feed-Forward Latent Domain Adaptation](https://openaccess.thecvf.com/content/WACV2024/papers/Bohdal_Feed-Forward_Latent_Domain_Adaptation_WACV_2024_paper.pdf)



## 38.Visual Representation Learning

* [Group-Wise Contrastive Bottleneck for Weakly-Supervised Visual Representation Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Yap_Group-Wise_Contrastive_Bottleneck_for_Weakly-Supervised_Visual_Representation_Learning_WACV_2024_paper.pdf)



## 37.Machine Learning(机器学习)

* 元学习

  * [SigmML: Metric Meta-Learning for Writer Independent Offline Signature Verification in the Space of SPD Matrices](https://openaccess.thecvf.com/content/WACV2024/papers/Giazitzis_SigmML_Metric_Meta-Learning_for_Writer_Independent_Offline_Signature_Verification_in_WACV_2024_paper.pdf)

* 持续学习/增量学习

  * [MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Nicolas_MoP-CLIP_A_Mixture_of_Prompt-Tuned_CLIP_Models_for_Domain_Incremental_WACV_2024_paper.pdf)

  * [Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning](http://arxiv.org/abs/2312.01188v1)

  * 类增量

    * [Expanding Hyperspherical Space for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Deng_Expanding_Hyperspherical_Space_for_Few-Shot_Class-Incremental_Learning_WACV_2024_paper.pdf)

    * [Overcoming Catastrophic Forgetting for Multi-Label Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Song_Overcoming_Catastrophic_Forgetting_for_Multi-Label_Class-Incremental_Learning_WACV_2024_paper.pdf)

    * [An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning](http://arxiv.org/abs/2308.11677)

    * [Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos](http://arxiv.org/abs/2310.16115v1)
:star:[code](https://github.com/yaoyao-liu/online-placebos)

    * [Robust Feature Learning and Global Variance-Driven Classifier Alignment for Long-Tail Class Incremental Learning](http://arxiv.org/abs/2311.01227v1)

    * [TCP: Triplet Contrastive-Relationship Preserving for Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Li_TCP_Triplet_Contrastive-Relationship_Preserving_for_Class-Incremental_Learning_WACV_2024_paper.pdf)

    * [MICS: Midpoint Interpolation To Learn Compact and Separated Representations for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Kim_MICS_Midpoint_Interpolation_To_Learn_Compact_and_Separated_Representations_for_WACV_2024_paper.pdf)

  * CL

    * [Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning](https://arxiv.org/abs/2309.06086)

    * [Kaizen: Practical Self-Supervised Continual Learning With Continual Fine-Tuning](https://openaccess.thecvf.com/content/WACV2024/papers/Tang_Kaizen_Practical_Self-Supervised_Continual_Learning_With_Continual_Fine-Tuning_WACV_2024_paper.pdf)

    * [Evolve: Enhancing Unsupervised Continual Learning With Multiple Experts](https://openaccess.thecvf.com/content/WACV2024/papers/Yu_Evolve_Enhancing_Unsupervised_Continual_Learning_With_Multiple_Experts_WACV_2024_paper.pdf)

    * [Steering Prototypes With Prompt-Tuning for Rehearsal-Free Continual Learning](http://arxiv.org/abs/2303.09447)

* 度量学习/Metric Learning

  * [ProcSim: Proxy-based Confidence for Robust Similarity Learning](http://arxiv.org/abs/2311.00668v1)

  * [Deep Metric Learning With Chance Constraints](https://openaccess.thecvf.com/content/WACV2024/papers/Gurbuz_Deep_Metric_Learning_With_Chance_Constraints_WACV_2024_paper.pdf)

  * [Understanding Hyperbolic Metric Learning Through Hard Negative Sampling](https://openaccess.thecvf.com/content/WACV2024/papers/Yue_Understanding_Hyperbolic_Metric_Learning_Through_Hard_Negative_Sampling_WACV_2024_paper.pdf)

* 对抗学习

  * [Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection](http://arxiv.org/abs/2311.04588v1)

  * 对抗攻击

    * [Hard-Label Based Small Query Black-Box Adversarial Attack](https://openaccess.thecvf.com/content/WACV2024/papers/Park_Hard-Label_Based_Small_Query_Black-Box_Adversarial_Attack_WACV_2024_paper.pdf) 

  * 后门

    * [A Closer Look at Robustness of Vision Transformers to Backdoor Attacks](https://openaccess.thecvf.com/content/WACV2024/papers/Subramanya_A_Closer_Look_at_Robustness_of_Vision_Transformers_to_Backdoor_WACV_2024_paper.pdf)

* 主动学习

  * [Training Ensembles With Inliers and Outliers for Semi-Supervised Active Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Stojnic_Training_Ensembles_With_Inliers_and_Outliers_for_Semi-Supervised_Active_Learning_WACV_2024_paper.pdf)

  * [Active Learning With Task Consistency and Diversity in Multi-Task Networks](https://openaccess.thecvf.com/content/WACV2024/papers/Hekimoglu_Active_Learning_With_Task_Consistency_and_Diversity_in_Multi-Task_Networks_WACV_2024_paper.pdf)

  * [Critical Gap Between Generalization Error and Empirical Error in Active Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Kanebako_Critical_Gap_Between_Generalization_Error_and_Empirical_Error_in_Active_WACV_2024_paper.pdf)

* 联邦学习

  * [Gradient Coreset for Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Sivasubramanian_Gradient_Coreset_for_Federated_Learning_WACV_2024_paper.pdf)

  * [Late to the Party? On-Demand Unlabeled Personalized Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Amosy_Late_to_the_Party_On-Demand_Unlabeled_Personalized_Federated_Learning_WACV_2024_paper.pdf)

  * [MetaVers: Meta-Learned Versatile Representations for Personalized Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Lim_MetaVers_Meta-Learned_Versatile_Representations_for_Personalized_Federated_Learning_WACV_2024_paper.pdf)

  * [Maximum Knowledge Orthogonality Reconstruction With Gradients in Federated Learning](http://arxiv.org/abs/2310.19222)

  * [Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Yashwanth_Minimizing_Layerwise_Activation_Norm_Improves_Generalization_in_Federated_Learning_WACV_2024_paper.pdf)

  * [TransFed: A Way To Epitomize Focal Modulation Using Transformer-Based Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Ashraf_TransFed_A_Way_To_Epitomize_Focal_Modulation_Using_Transformer-Based_Federated_WACV_2024_paper.pdf)

  * [Mixing Gradients in Neural Networks as a Strategy To Enhance Privacy in Federated Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Eloul_Mixing_Gradients_in_Neural_Networks_as_a_Strategy_To_Enhance_WACV_2024_paper.pdf)

* 对比学习

  * [Activity-Based Early Autism Diagnosis Using a Multi-Dataset Supervised Contrastive Learning Approach](https://openaccess.thecvf.com/content/WACV2024/papers/Rani_Activity-Based_Early_Autism_Diagnosis_Using_a_Multi-Dataset_Supervised_Contrastive_Learning_WACV_2024_paper.pdf)

  * [Distortion-Disentangled Contrastive Learning](http://arxiv.org/abs/2303.05066)

  * [OOD Aware Supervised Contrastive Learning](http://arxiv.org/abs/2310.01942)

* 强化学习

  * [CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection](http://arxiv.org/abs/2204.07543)

* 迁移学习

  * [DR10K: Transfer Learning Using Weak Labels for Grading Diabetic Retinopathy on DR10K Dataset](https://openaccess.thecvf.com/content/WACV2024/papers/ElHabebe_DR10K_Transfer_Learning_Using_Weak_Labels_for_Grading_Diabetic_Retinopathy_WACV_2024_paper.pdf)

* 多任务学习

  * [BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements](http://arxiv.org/abs/2303.11573)



## 36.NLP

* [Few-Shot Event Classification in Images Using Knowledge Graphs for Prompting](https://openaccess.thecvf.com/content/WACV2024/papers/Tahmasebzadeh_Few-Shot_Event_Classification_in_Images_Using_Knowledge_Graphs_for_Prompting_WACV_2024_paper.pdf)



## 35.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

* [Wino Vidi Vici: Conquering Numerical Instability of 8-Bit Winograd Convolution for Accurate Inference Acceleration on Edge](https://openaccess.thecvf.com/content/WACV2024/papers/Mori_Wino_Vidi_Vici_Conquering_Numerical_Instability_of_8-Bit_Winograd_Convolution_WACV_2024_paper.pdf)

* 量化

  * [Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks](http://arxiv.org/abs/2311.05109v1)

  * [Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths](http://arxiv.org/abs/2103.01435)

  * [Evidential Uncertainty Quantification: A Variance-Based Perspective](https://arxiv.org/abs/2311.11367)

  * [Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks](http://arxiv.org/abs/2206.07741)

* 剪枝

  * [Token Fusion: Bridging the Gap between Token Pruning and Token Merging](http://arxiv.org/abs/2312.01026v1)

  * [Torque Based Structured Pruning for Deep Neural Network](https://openaccess.thecvf.com/content/WACV2024/papers/Gupta_Torque_Based_Structured_Pruning_for_Deep_Neural_Network_WACV_2024_paper.pdf)

  * [Pruning From Scratch via Shared Pruning Module and Nuclear Norm-Based Regularization](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_Pruning_From_Scratch_via_Shared_Pruning_Module_and_Nuclear_Norm-Based_WACV_2024_paper.pdf)

  * [Towards Better Structured Pruning Saliency by Reorganizing Convolution](https://openaccess.thecvf.com/content/WACV2024/papers/Sun_Towards_Better_Structured_Pruning_Saliency_by_Reorganizing_Convolution_WACV_2024_paper.pdf)

  * [PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks](http://arxiv.org/abs/2307.10981)

* KD

  * [Frequency Attention for Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2024/papers/Pham_Frequency_Attention_for_Knowledge_Distillation_WACV_2024_paper.pdf)

  * [Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Szatkowski_Adapt_Your_Teacher_Improving_Knowledge_Distillation_for_Exemplar-Free_Continual_Learning_WACV_2024_paper.pdf)

  * [Towards Domain-Aware Knowledge Distillation for Continual Model Generalization](https://openaccess.thecvf.com/content/WACV2024/papers/Reddy_Towards_Domain-Aware_Knowledge_Distillation_for_Continual_Model_Generalization_WACV_2024_paper.pdf)

  * [Reverse Knowledge Distillation: Training a Large Model Using a Small One for Retinal Image Matching on Limited Data](http://arxiv.org/abs/2307.10698)



## 34.NAS

* [FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer](http://arxiv.org/abs/2311.03912v1)
:star:[code](https://github.com/shadowpa0327/FLORA)

* [Hardware Aware Evolutionary Neural Architecture Search Using Representation Similarity Metric](http://arxiv.org/abs/2311.03923)



## 33.Optical Flow Estimation(光流估计)

* [Detection Defenses: An Empty Promise against Adversarial Patch Attacks on Optical Flow](http://arxiv.org/abs/2310.17403v1)
:star:[code](https://github.com/cv-stuttgart/DetectionDefenses)

* [CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning](http://arxiv.org/abs/2311.02661v1)
:star:[code](https://github.com/cv-stuttgart)



## 32.Scene Flow Estimation(场景流估计)

* [OptFlow: Fast Optimization-Based Scene Flow Estimation Without Supervision](https://openaccess.thecvf.com/content/WACV2024/papers/Ahuja_OptFlow_Fast_Optimization-Based_Scene_Flow_Estimation_Without_Supervision_WACV_2024_paper.pdf)



## 31.Automated Driving(自动驾驶)

* 车道线检测

  * [CLRerNet: Improving Confidence of Lane Detection With LaneIoU](http://arxiv.org/abs/2305.08366)

* 自动驾驶

  * [Re-Evaluating LiDAR Scene Flow for Autonomous Driving](https://arxiv.org/abs/2304.02150)

  * [NVAutoNet: Fast and Accurate 360deg 3D Visual Perception for Self Driving](https://openaccess.thecvf.com/content/WACV2024/papers/Pham_NVAutoNet_Fast_and_Accurate_360deg_3D_Visual_Perception_for_Self_WACV_2024_paper.pdf)

  * [Driving Through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving](http://arxiv.org/abs/2310.16639)

  * [StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction](https://arxiv.org/abs/2308.12570)
:star:[code](https://github.com/yuantianyuan01/StreamMapNet)

* 驾驶员损伤评估

  * [Estimating Blood Alcohol Level Through Facial Features for Driver Impairment Assessment](https://openaccess.thecvf.com/content/WACV2024/papers/Keshtkaran_Estimating_Blood_Alcohol_Level_Through_Facial_Features_for_Driver_Impairment_WACV_2024_paper.pdf)

* 交通标志检测

  * [Natural Light Can Also Be Dangerous: Traffic Sign Misinterpretation Under Adversarial Natural Light Attacks](https://openaccess.thecvf.com/content/WACV2024/papers/Hsiao_Natural_Light_Can_Also_Be_Dangerous_Traffic_Sign_Misinterpretation_Under_WACV_2024_paper.pdf)

* 障碍物检测

  * [Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles From Driving Scenes](http://arxiv.org/abs/2309.04302)

* 驾驶员动作意图识别

  * [Evaluation of Video Masked Autoencoders' Performance and Uncertainty Estimations for Driver Action and Intention Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Vellenga_Evaluation_of_Video_Masked_Autoencoders_Performance_and_Uncertainty_Estimations_for_WACV_2024_paper.pdf)



## 30.GNN/GCN

* GNN 

  * [Automated Camera Calibration via Homography Estimation With GNNs](http://arxiv.org/abs/2311.02598)

  * [RIMeshGNN: A Rotation-Invariant Graph Neural Network for Mesh Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Shakibajahromi_RIMeshGNN_A_Rotation-Invariant_Graph_Neural_Network_for_Mesh_Classification_WACV_2024_paper.pdf)

* 图网络

  * [Improving Graph Networks Through Selection-Based Convolution](https://openaccess.thecvf.com/content/WACV2024/papers/Hart_Improving_Graph_Networks_Through_Selection-Based_Convolution_WACV_2024_paper.pdf)



## 29.Scene Graph Generation(场景图生成)

* [Self-Supervised Relation Alignment for Scene Graph Generation](http://arxiv.org/abs/2302.01403)

* [Refine and Redistribute: Multi-Domain Fusion and Dynamic Label Assignment for Unbiased Scene Graph Generation](https://openaccess.thecvf.com/content/WACV2024/papers/Zang_Refine_and_Redistribute_Multi-Domain_Fusion_and_Dynamic_Label_Assignment_for_WACV_2024_paper.pdf)



## 28.Point-Cloud(点云)

* [MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds](http://arxiv.org/abs/2212.07207)

* [Cross-Domain Few-Shot Incremental Learning for Point-Cloud Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Tan_Cross-Domain_Few-Shot_Incremental_Learning_for_Point-Cloud_Recognition_WACV_2024_paper.pdf)

* [Sparse Convolutional Networks for Surface Reconstruction From Noisy Point Clouds](https://openaccess.thecvf.com/content/WACV2024/papers/Wang_Sparse_Convolutional_Networks_for_Surface_Reconstruction_From_Noisy_Point_Clouds_WACV_2024_paper.pdf)

* [LidarCLIP or: How I Learned To Talk to Point Clouds](http://arxiv.org/abs/2212.06858)

* [FinderNet: A Data Augmentation Free Canonicalization Aided Loop Detection and Closure Technique for Point Clouds in 6-DOF Separation](https://openaccess.thecvf.com/content/WACV2024/papers/Harithas_FinderNet_A_Data_Augmentation_Free_Canonicalization_Aided_Loop_Detection_and_WACV_2024_paper.pdf)

* [Indoor Visual Localization Using Point and Line Correspondences in Dense Colored Point Cloud](https://openaccess.thecvf.com/content/WACV2024/papers/Matsumoto_Indoor_Visual_Localization_Using_Point_and_Line_Correspondences_in_Dense_WACV_2024_paper.pdf)

* [SSP: Semi-Signed Prioritized Neural Fitting for Surface Reconstruction From Unoriented Point Clouds](https://openaccess.thecvf.com/content/WACV2024/papers/Zhu_SSP_Semi-Signed_Prioritized_Neural_Fitting_for_Surface_Reconstruction_From_Unoriented_WACV_2024_paper.pdf)

* 3D 点云

  * [Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation](http://arxiv.org/abs/2308.14126)

* 点云配准

  * [MagneticPillars: Efficient Point Cloud Registration Through Hierarchized Birds-Eye-View Cell Correspondence Refinement](https://openaccess.thecvf.com/content/WACV2024/papers/Fischer_MagneticPillars_Efficient_Point_Cloud_Registration_Through_Hierarchized_Birds-Eye-View_Cell_Correspondence_WACV_2024_paper.pdf)

  * [HDMNet: A Hierarchical Matching Network With Double Attention for Large-Scale Outdoor LiDAR Point Cloud Registration](http://arxiv.org/abs/2310.18874)

* 点云补全

  * [WalkFormer: Point Cloud Completion via Guided Walks](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_WalkFormer_Point_Cloud_Completion_via_Guided_Walks_WACV_2024_paper.pdf)

* 点云分割

  * [When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation With Weak-and-Noisy Supervision](http://arxiv.org/abs/2309.00828)

  * [PointCT: Point Central Transformer Network for Weakly-Supervised Point Cloud Semantic Segmentation](https://openaccess.thecvf.com/content/WACV2024/papers/Tran_PointCT_Point_Central_Transformer_Network_for_Weakly-Supervised_Point_Cloud_Semantic_WACV_2024_paper.pdf)

* 点云分类

  * [SimpliMix: A Simplified Manifold Mixup for Few-Shot Point Cloud Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Yang_SimpliMix_A_Simplified_Manifold_Mixup_for_Few-Shot_Point_Cloud_Classification_WACV_2024_paper.pdf)



## 27.Human-Object Interactions(人物交互)

* [Exploiting CLIP for Zero-Shot HOI Detection Requires Knowledge Distillation at Multiple Levels](http://arxiv.org/abs/2309.05069)

* [Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations](http://arxiv.org/abs/2303.13129)

* [Beyond Active Learning: Leveraging the Full Potential of Human Interaction via Auto-Labeling, Human Correction, and Human Verification](http://arxiv.org/abs/2306.01277)

* [Bipartite Graph Diffusion Model for Human Interaction Generation](http://arxiv.org/abs/2301.10134)



## 26.Human Motion Prediction(人体运动预测)

* [Incorporating Physics Principles for Precise Human Motion Prediction](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Incorporating_Physics_Principles_for_Precise_Human_Motion_Prediction_WACV_2024_paper.pdf)

* [Context-Based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting](https://openaccess.thecvf.com/content/WACV2024/papers/Medina_Context-Based_Interpretable_Spatio-Temporal_Graph_Convolutional_Network_for_Human_Motion_Forecasting_WACV_2024_paper.pdf)

* 人体运动合成

  * [MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting](https://openaccess.thecvf.com/content/WACV2024/papers/Ribeiro-Gomes_MotionGPT_Human_Motion_Synthesis_With_Improved_Diversity_and_Realism_via_WACV_2024_paper.pdf)



## 25.Multimodal(多模态)

* [Dynamic Multimodal Information Bottleneck for Multimodality Classification](http://arxiv.org/abs/2311.01066v1)
:star:[code](https://github.com/BII-wushuang/DMIB)

* [CoD: Coherent Detection of Entities From Images With Multiple Modalities](https://openaccess.thecvf.com/content/WACV2024/papers/Verma_CoD_Coherent_Detection_of_Entities_From_Images_With_Multiple_Modalities_WACV_2024_paper.pdf)

* [Multimodal Deep Learning for Remote Stress Estimation Using CCT-LSTM](https://openaccess.thecvf.com/content/WACV2024/papers/Ziaratnia_Multimodal_Deep_Learning_for_Remote_Stress_Estimation_Using_CCT-LSTM_WACV_2024_paper.pdf)

* [Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining](https://arxiv.org/abs/2311.03964)
:star:[code](https://github.com/ugorsahin/Generative-Negative-Mining)

* [OmniVec: Learning robust representations with cross modal sharing](http://arxiv.org/abs/2311.05709v1)

* [Complementary-Contradictory Feature Regularization Against Multimodal Overfitting](https://openaccess.thecvf.com/content/WACV2024/papers/Tejero-de-Pablos_Complementary-Contradictory_Feature_Regularization_Against_Multimodal_Overfitting_WACV_2024_paper.pdf)
:star:[code](https://github.com/CyberAgentAILab/CM-VQVAE)

* [Learning Intra-Class Multimodal Distributions With Orthonormal Matrices](https://openaccess.thecvf.com/content/WACV2024/papers/Goto_Learning_Intra-Class_Multimodal_Distributions_With_Orthonormal_Matrices_WACV_2024_paper.pdf)

* [EASUM: Enhancing Affective State Understanding Through Joint Sentiment and Emotion Modeling for Multimodal Tasks](https://openaccess.thecvf.com/content/WACV2024/papers/Hwang_EASUM_Enhancing_Affective_State_Understanding_Through_Joint_Sentiment_and_Emotion_WACV_2024_paper.pdf)

* CLIP

  * [C-CLIP: Contrastive Image-Text Encoders To Close the Descriptive-Commentative Gap](https://openaccess.thecvf.com/content/WACV2024/papers/Theisen_C-CLIP_Contrastive_Image-Text_Encoders_To_Close_the_Descriptive-Commentative_Gap_WACV_2024_paper.pdf)

  * [DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification](http://arxiv.org/abs/2305.15957)

  * [ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition](http://arxiv.org/abs/2307.00586)



## 24.Lage Language Models(大语言模型)

* [Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models](https://openaccess.thecvf.com/content/WACV2024/papers/Pan_Zero-Shot_Building_Attribute_Extraction_From_Large-Scale_Vision_and_Language_Models_WACV_2024_paper.pdf)



## 23.Vision-Language(视觉语言)

* [Multitask Vision-Language Prompt Tuning](http://arxiv.org/abs/2211.11720)

* [Improving Fairness Using Vision-Language Driven Image Augmentation](https://openaccess.thecvf.com/content/WACV2024/papers/DInca_Improving_Fairness_Using_Vision-Language_Driven_Image_Augmentation_WACV_2024_paper.pdf)

* [Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models](https://openaccess.thecvf.com/content/WACV2024/papers/Lai_Empowering_Unsupervised_Domain_Adaptation_With_Large-Scale_Pre-Trained_Vision-Language_Models_WACV_2024_paper.pdf)

* [Can Vision-Language Models Be a Good Guesser? Exploring VLMs for Times and Location Reasoning](http://arxiv.org/abs/2307.06166)

* [Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding](https://arxiv.org/abs/2309.00215)

* [Improving Vision-and-Language Reasoning via Spatial Relations Modeling](http://arxiv.org/abs/2311.05298)

* [MIVC: Multiple Instance Visual Component for Visual-Language Models](https://openaccess.thecvf.com/content/WACV2024/papers/Wu_MIVC_Multiple_Instance_Visual_Component_for_Visual-Language_Models_WACV_2024_paper.pdf)



## 22.Visual Answer Questions(视觉问答)

* [RankDVQA: Deep VQA Based on Ranking-Inspired Hybrid Training](http://arxiv.org/abs/2202.08595)

* [POP-VQA - Privacy Preserving, On-Device, Personalized Visual Question Answering](https://openaccess.thecvf.com/content/WACV2024/papers/Sahu_POP-VQA_-_Privacy_Preserving_On-Device_Personalized_Visual_Question_Answering_WACV_2024_paper.pdf)

* [Benchmarking Out-of-Distribution Detection in Visual Question Answering](https://openaccess.thecvf.com/content/WACV2024/papers/Shi_Benchmarking_Out-of-Distribution_Detection_in_Visual_Question_Answering_WACV_2024_paper.pdf)

* [Can You Even Tell Left From Right? Presenting a New Challenge for VQA](https://openaccess.thecvf.com/content/WACV2024/papers/Venkataraman_Can_You_Even_Tell_Left_From_Right_Presenting_a_New_WACV_2024_paper.pdf)

* 视觉对话

  * [VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs](http://arxiv.org/abs/2310.16590v1)

* AVQA

  * [CAD - Contextual Multi-Modal Alignment for Dynamic AVQA](https://openaccess.thecvf.com/content/WACV2024/papers/Nadeem_CAD_-_Contextual_Multi-Modal_Alignment_for_Dynamic_AVQA_WACV_2024_paper.pdf)

  * [Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering](https://openaccess.thecvf.com/content/WACV2024/papers/Liu_Tackling_Data_Bias_in_MUSIC-AVQA_Crafting_a_Balanced_Dataset_for_WACV_2024_paper.pdf)

* ArtVQA

  * [ArtQuest: Countering Hidden Language Biases in ArtVQA](https://openaccess.thecvf.com/content/WACV2024/papers/Bleidt_ArtQuest_Countering_Hidden_Language_Biases_in_ArtVQA_WACV_2024_paper.pdf)



## 21.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

* 虚拟试穿

  * [A Generative Multi-Resolution Pyramid and Normal-Conditioning 3D Cloth Draping](http://arxiv.org/abs/2311.02700v1)

  * [Controlling Virtual Try-On Pipeline Through Rendering Policies](https://openaccess.thecvf.com/content/WACV2024/papers/Li_Controlling_Virtual_Try-On_Pipeline_Through_Rendering_Policies_WACV_2024_paper.pdf)

  * [GC-VTON: Predicting Globally Consistent and Occlusion Aware Local Flows with Neighborhood Integrity Preservation for Virtual Try-on](http://arxiv.org/abs/2311.04932v1)

* 虚拟化身

  * [CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer](http://arxiv.org/abs/2311.06443v1)

* 机器人

  * [Shape From Shading for Robotic Manipulation](http://arxiv.org/abs/2304.11824)

  * [Optimizing Long-Term Robot Tracking With Multi-Platform Sensor Fusion](https://openaccess.thecvf.com/content/WACV2024/papers/Albanese_Optimizing_Long-Term_Robot_Tracking_With_Multi-Platform_Sensor_Fusion_WACV_2024_paper.pdf)

  * 机器人定位

    * [Cross-Attention Between Satellite and Ground Views for Enhanced Fine-Grained Robot Geo-Localization](https://openaccess.thecvf.com/content/WACV2024/papers/Yuan_Cross-Attention_Between_Satellite_and_Ground_Views_for_Enhanced_Fine-Grained_Robot_WACV_2024_paper.pdf)

* 导航

  * [MOPA: Modular Object Navigation With PointGoal Agents](http://arxiv.org/abs/2304.03696)

* 视觉定位

  * [FocusTune: Tuning Visual Localization Through Focus-Guided Sampling](http://arxiv.org/abs/2311.02872)

* 轨迹预测

  * [Second-Order Graph ODEs for Multi-Agent Trajectory Forecasting](https://openaccess.thecvf.com/content/WACV2024/papers/Wen_Second-Order_Graph_ODEs_for_Multi-Agent_Trajectory_Forecasting_WACV_2024_paper.pdf)



## 20.GAN/生成

* [FacadeNet: Conditional Facade Synthesis via Selective Editing](http://arxiv.org/abs/2311.01240)

* [Synthesizing Anyone, Anywhere, in Any Pose](https://openaccess.thecvf.com/content/WACV2024/papers/Hukkelas_Synthesizing_Anyone_Anywhere_in_Any_Pose_WACV_2024_paper.pdf)

* GAN

  * [Consistent Multimodal Generation via a Unified GAN Framework](http://arxiv.org/abs/2307.01425)

  * [StyleGenes: Discrete and Efficient Latent Distributions for GANs](http://arxiv.org/abs/2305.00599)

  * [Improving the Fairness of the Min-Max Game in GANs Training](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Improving_the_Fairness_of_the_Min-Max_Game_in_GANs_Training_WACV_2024_paper.pdf)

  * [StyleGAN-Fusion: Diffusion Guided Domain Adaptation of Image Generators](https://openaccess.thecvf.com/content/WACV2024/papers/Song_StyleGAN-Fusion_Diffusion_Guided_Domain_Adaptation_of_Image_Generators_WACV_2024_paper.pdf)

  * [PlantPlotGAN: A Physics-Informed Generative Adversarial Network for Plant Disease Prediction](https://arxiv.org/abs/2310.18268)

  * [P2D: Plug and Play Discriminator for Accelerating GAN Frameworks](https://openaccess.thecvf.com/content/WACV2024/papers/Chong_P2D_Plug_and_Play_Discriminator_for_Accelerating_GAN_Frameworks_WACV_2024_paper.pdf)

  * [Soft Curriculum for Learning Conditional GANs With Noisy-Labeled and Uncurated Unlabeled Data](http://arxiv.org/abs/2307.08319)

  * [What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion](http://arxiv.org/abs/2301.12141)

  * [Improving the Leaking of Augmentations in Data-Efficient GANs via Adaptive Negative Data Augmentation](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Improving_the_Leaking_of_Augmentations_in_Data-Efficient_GANs_via_Adaptive_WACV_2024_paper.pdf)

  * [PETIT-GAN: Physically Enhanced Thermal Image-Translating Generative Adversarial Network](https://openaccess.thecvf.com/content/WACV2024/papers/Berman_PETIT-GAN_Physically_Enhanced_Thermal_Image-Translating_Generative_Adversarial_Network_WACV_2024_paper.pdf)

* 图像生成

  * [Improving the Effectiveness of Deep Generative Data](http://arxiv.org/abs/2311.03959v1)

  * [Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models](http://arxiv.org/abs/2310.19410)

  * [Nested Diffusion Processes for Anytime Image Generation](http://arxiv.org/abs/2305.19066)

* 图像合成

  * [Painterly Image Harmonization via Adversarial Residual Learning](http://arxiv.org/abs/2311.08646v1)

  * [Controllable Image Synthesis of Industrial Data Using Stable Diffusion](https://openaccess.thecvf.com/content/WACV2024/papers/Valvano_Controllable_Image_Synthesis_of_Industrial_Data_Using_Stable_Diffusion_WACV_2024_paper.pdf)

  * [Label Augmentation As Inter-Class Data Augmentation for Conditional Image Synthesis With Imbalanced Data](https://openaccess.thecvf.com/content/WACV2024/papers/Katsumata_Label_Augmentation_As_Inter-Class_Data_Augmentation_for_Conditional_Image_Synthesis_WACV_2024_paper.pdf)

* 文本-图像

  * [CLIPAG: Towards Generator-Free Text-to-Image Generation](http://arxiv.org/abs/2306.16805)

  * [Customizing 360-Degree Panoramas Through Text-to-Image Diffusion Models](http://arxiv.org/abs/2310.18840)

  * [Text-to-Image Models for Counterfactual Explanations: A Black-Box Approach](http://arxiv.org/abs/2309.07944)

  * [TIAM - A Metric for Evaluating Alignment in Text-to-Image Generation](https://openaccess.thecvf.com/content/WACV2024/papers/Grimal_TIAM_-_A_Metric_for_Evaluating_Alignment_in_Text-to-Image_Generation_WACV_2024_paper.pdf)

  * [Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation](https://openaccess.thecvf.com/content/WACV2024/papers/Park_Localization_and_Manipulation_of_Immoral_Visual_Cues_for_Safe_Text-to-Image_WACV_2024_paper.pdf)

  * [Unsupervised Co-Generation of Foreground-Background Segmentation From Text-to-Image Synthesis](https://openaccess.thecvf.com/content/WACV2024/papers/Ahmed_Unsupervised_Co-Generation_of_Foreground-Background_Segmentation_From_Text-to-Image_Synthesis_WACV_2024_paper.pdf)

* 图像-文本

  * [SciOL and MuLMS-Img: Introducing a Large-Scale Multimodal Scientific Dataset and Models for Image-Text Tasks in the Scientific Domain](https://openaccess.thecvf.com/content/WACV2024/papers/Tarsi_SciOL_and_MuLMS-Img_Introducing_a_Large-Scale_Multimodal_Scientific_Dataset_and_WACV_2024_paper.pdf)

* 视频合成

  * [RADIO: Reference-Agnostic Dubbing Video Synthesis](http://arxiv.org/abs/2309.01950)

  * [One Style Is All You Need To Generate a Video](http://arxiv.org/abs/2310.17835)

* 扩散模型

  * [Fast Diffusion EM: A Diffusion Model for Blind Inverse Problems With Application to Deconvolution](https://openaccess.thecvf.com/content/WACV2024/papers/Laroche_Fast_Diffusion_EM_A_Diffusion_Model_for_Blind_Inverse_Problems_WACV_2024_paper.pdf)

  * [Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning](http://arxiv.org/abs/2311.01018v1)

  * [Preserving Image Properties Through Initializations in Diffusion Models](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Preserving_Image_Properties_Through_Initializations_in_Diffusion_Models_WACV_2024_paper.pdf)

  * [Exploiting the Signal-Leak Bias in Diffusion Models](http://arxiv.org/abs/2309.15842)

  * [Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation](http://arxiv.org/abs/2304.11829)

  * [Common Diffusion Noise Schedules and Sample Steps Are Flawed](http://arxiv.org/abs/2305.08891)

  * [Training-Free Content Injection Using H-Space in Diffusion Models](https://openaccess.thecvf.com/content/WACV2024/papers/Jeong_Training-Free_Content_Injection_Using_H-Space_in_Diffusion_Models_WACV_2024_paper.pdf)

  * [PoseDiff: Pose-Conditioned Multimodal Diffusion Model for Unbounded Scene Synthesis From Sparse Inputs](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_PoseDiff_Pose-Conditioned_Multimodal_Diffusion_Model_for_Unbounded_Scene_Synthesis_From_WACV_2024_paper.pdf)

  * [Diffusion Models Meet Image Counter-Forensics](https://openaccess.thecvf.com/content/WACV2024/papers/Tailanian_Diffusion_Models_Meet_Image_Counter-Forensics_WACV_2024_paper.pdf)

  * [PathLDM: Text Conditioned Latent Diffusion Model for Histopathology](http://arxiv.org/abs/2309.00748)

  * [Synthesizing Coherent Story With Auto-Regressive Latent Diffusion Models](http://arxiv.org/abs/2211.10950)

  * [Towards More Realistic Membership Inference Attacks on Large Diffusion Models](https://openaccess.thecvf.com/content/WACV2024/papers/Dubinski_Towards_More_Realistic_Membership_Inference_Attacks_on_Large_Diffusion_Models_WACV_2024_paper.pdf)

  * [Dual Domain Diffusion Guidance for 3D CBCT Metal Artifact Reduction](https://openaccess.thecvf.com/content/WACV2024/papers/Choi_Dual_Domain_Diffusion_Guidance_for_3D_CBCT_Metal_Artifact_Reduction_WACV_2024_paper.pdf)

* 图像翻译

  * [SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment](http://arxiv.org/abs/2310.04995)

* 图像-图像翻译

  * [GRIT: GAN Residuals for Paired Image-to-Image Translation](https://openaccess.thecvf.com/content/WACV2024/papers/Suri_GRIT_GAN_Residuals_for_Paired_Image-to-Image_Translation_WACV_2024_paper.pdf)

* 文本-3D

  * [HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation](https://openaccess.thecvf.com/content/WACV2024/papers/Wu_HD-Fusion_Detailed_Text-to-3D_Generation_Leveraging_Multiple_Noise_Estimation_WACV_2024_paper.pdf)

* 文本-视频

  * [Human Motion Aware Text-to-Video Generation With Explicit Camera Control](https://openaccess.thecvf.com/content/WACV2024/papers/Kim_Human_Motion_Aware_Text-to-Video_Generation_With_Explicit_Camera_Control_WACV_2024_paper.pdf)

* 合成图像检测

  * [Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis](http://arxiv.org/abs/2303.10762)



## 19.Object Pose Estimation(物体姿态估计)

* 6D

  * [Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers](http://arxiv.org/abs/2310.16618v1)

  * [Effects of Markers in Training Datasets on the Accuracy of 6D Pose Estimation](https://openaccess.thecvf.com/content/WACV2024/papers/Rosskamp_Effects_of_Markers_in_Training_Datasets_on_the_Accuracy_of_WACV_2024_paper.pdf)

  * [Learning Better Keypoints for Multi-Object 6DoF Pose Estimation](http://arxiv.org/abs/2308.07827)

* 物体计数

  * [Training-Free Object Counting With Prompts](http://arxiv.org/abs/2307.00038)

* 目标重识别

  * [Object Re-Identification From Point Clouds](https://openaccess.thecvf.com/content/WACV2024/papers/Therien_Object_Re-Identification_From_Point_Clouds_WACV_2024_paper.pdf)



## 18.Animal

* 犬类姿态分析

  * [RGBT-Dog: A Parametric Model and Pose Prior for Canine Body Analysis Data Creation](https://openaccess.thecvf.com/content/WACV2024/papers/Deane_RGBT-Dog_A_Parametric_Model_and_Pose_Prior_for_Canine_Body_WACV_2024_paper.pdf)

* 动物重识别

  * [WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Cermak_WildlifeDatasets_An_Open-Source_Toolkit_for_Animal_Re-Identification_WACV_2024_paper.pdf)



## 17.Human Pose Estimation(人体姿态估计)

* [Re-VoxelDet: Rethinking Neck and Head Architectures for High-Performance Voxel-Based 3D Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_Re-VoxelDet_Rethinking_Neck_and_Head_Architectures_for_High-Performance_Voxel-Based_3D_WACV_2024_paper.pdf)

* [DiffBody: Diffusion-Based Pose and Shape Editing of Human Images](https://openaccess.thecvf.com/content/WACV2024/papers/Okuyama_DiffBody_Diffusion-Based_Pose_and_Shape_Editing_of_Human_Images_WACV_2024_paper.pdf)

* [Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation](http://arxiv.org/abs/2310.00099)

* [Rethinking Visibility in Human Pose Estimation: Occluded Pose Reasoning via Transformers](https://openaccess.thecvf.com/content/WACV2024/papers/Sun_Rethinking_Visibility_in_Human_Pose_Estimation_Occluded_Pose_Reasoning_via_WACV_2024_paper.pdf)

* [Active Transfer Learning for Efficient Video-Specific Human Pose Estimation](http://arxiv.org/abs/2311.05041v1)
:star:[code](https://github.com/ImIntheMiddle/VATL4Pose-WACV2024)

* [LInKs "Lifting Independent Keypoints" - Partial Pose Lifting for Occlusion Handling With Improved Accuracy in 2D-3D Human Pose Estimation](https://openaccess.thecvf.com/content/WACV2024/papers/Hardy_LInKs_Lifting_Independent_Keypoints_-_Partial_Pose_Lifting_for_Occlusion_WACV_2024_paper.pdf)

* 3D HPE

  * [3D Human Pose Estimation With Two-Step Mixed-Training Strategy](https://openaccess.thecvf.com/content/WACV2024/papers/Wang_3D_Human_Pose_Estimation_With_Two-Step_Mixed-Training_Strategy_WACV_2024_paper.pdf)

  * [Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling](http://arxiv.org/abs/2308.10705)

  * [Back to Optimization: Diffusion-Based Zero-Shot 3D Human Pose Estimation](http://arxiv.org/abs/2307.03833)

  * [MotionAGFormer: Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network](http://arxiv.org/abs/2310.16288)

  * [UNSPAT: Uncertainty-Guided SpatioTemporal Transformer for 3D Human Pose and Shape Estimation on Videos](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_UNSPAT_Uncertainty-Guided_SpatioTemporal_Transformer_for_3D_Human_Pose_and_Shape_WACV_2024_paper.pdf)

  * [A Geometry Loss Combination for 3D Human Pose Estimation](https://openaccess.thecvf.com/content/WACV2024/papers/Matsune_A_Geometry_Loss_Combination_for_3D_Human_Pose_Estimation_WACV_2024_paper.pdf)

  * [Robust Category-Level 3D Pose Estimation From Diffusion-Enhanced Synthetic Data](https://openaccess.thecvf.com/content/WACV2024/papers/Yang_Robust_Category-Level_3D_Pose_Estimation_From_Diffusion-Enhanced_Synthetic_Data_WACV_2024_paper.pdf)

* 多身体网格检测

  * [Physical-Space Multi-Body Mesh Detection Achieved by Local Alignment and Global Dense Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Dong_Physical-Space_Multi-Body_Mesh_Detection_Achieved_by_Local_Alignment_and_Global_WACV_2024_paper.pdf)

* 人定位与姿态分类

  * [Learning-Based Spotlight Position Optimization for Non-Line-of-Sight Human Localization and Posture Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Chandran_Learning-Based_Spotlight_Position_Optimization_for_Non-Line-of-Sight_Human_Localization_and_Posture_WACV_2024_paper.pdf)

* 三维人体网格恢复

  * [Progressive Hypothesis Transformer for 3D Human Mesh Recovery](https://openaccess.thecvf.com/content/WACV2024/papers/Liao_Progressive_Hypothesis_Transformer_for_3D_Human_Mesh_Recovery_WACV_2024_paper.pdf)

* 人体姿态与网格重建

  * [MPT: Mesh Pre-Training With Transformers for Human Pose and Mesh Reconstruction](http://arxiv.org/abs/2211.13357)

* 着装人体重建

  * [PIDiffu: Pixel-Aligned Diffusion Model for High-Fidelity Clothed Human Reconstruction](https://openaccess.thecvf.com/content/WACV2024/papers/Lee_PIDiffu_Pixel-Aligned_Diffusion_Model_for_High-Fidelity_Clothed_Human_Reconstruction_WACV_2024_paper.pdf)

* 手部

  * 手部重建

    * [Intrinsic Hand Avatar: Illumination-Aware Hand Appearance and Shape Reconstruction From Monocular RGB Video](https://openaccess.thecvf.com/content/WACV2024/papers/Kalshetti_Intrinsic_Hand_Avatar_Illumination-Aware_Hand_Appearance_and_Shape_Reconstruction_From_WACV_2024_paper.pdf)

  * 手语翻译

    * [Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models](https://arxiv.org/abs/2311.12128)
:star:[code](https://github.com/pooyafayyaz/Fingerspelling-PoseNet)

  * 手语制作

    * [Sign Language Production With Latent Motion Transformer](https://openaccess.thecvf.com/content/WACV2024/papers/Xie_Sign_Language_Production_With_Latent_Motion_Transformer_WACV_2024_paper.pdf)

  * 手部姿态估计

    * [HMP: Hand Motion Priors for Pose and Shape Estimation From Video](https://openaccess.thecvf.com/content/WACV2024/papers/Duran_HMP_Hand_Motion_Priors_for_Pose_and_Shape_Estimation_From_WACV_2024_paper.pdf)

    * [Handformer2T: A Lightweight Regression-Based Model for Interacting Hands Pose Estimation From a Single RGB Image](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Handformer2T_A_Lightweight_Regression-Based_Model_for_Interacting_Hands_Pose_Estimation_WACV_2024_paper.pdf)

  * 手势检测

    * [Co-Speech Gesture Detection Through Multi-Phase Sequence Labeling](http://arxiv.org/abs/2308.10680)

  * 抄写员手识别

    * [The Paleographer's Eye ex machina: Using Computer Vision To Assist Humanists in Scribal Hand Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Grieggs_The_Paleographers_Eye_ex_machina_Using_Computer_Vision_To_Assist_WACV_2024_paper.pdf)

  * 交互式分割

    * [Interactive Segmentation for Diverse Gesture Types Without Context](http://arxiv.org/abs/2307.10518)  

    * [Continuous Adaptation for Interactive Segmentation Using Teacher-Student Architecture](https://openaccess.thecvf.com/content/WACV2024/papers/Atanyan_Continuous_Adaptation_for_Interactive_Segmentation_Using_Teacher-Student_Architecture_WACV_2024_paper.pdf)

* 人体轮廓提取

  * [POISE: Pose Guided Human Silhouette Extraction Under Occlusions](http://arxiv.org/abs/2311.05077)

* 动作捕捉

  * [A Sequential Learning-Based Approach for Monocular Human Performance Capture](https://openaccess.thecvf.com/content/WACV2024/papers/Chen_A_Sequential_Learning-Based_Approach_for_Monocular_Human_Performance_Capture_WACV_2024_paper.pdf)

* 人体动画

  * [AvatarOne: Monocular 3D Human Animation](https://openaccess.thecvf.com/content/WACV2024/papers/Karthikeyan_AvatarOne_Monocular_3D_Human_Animation_WACV_2024_paper.pdf)

  * [StyleAvatar: Stylizing Animatable Head Avatars](https://openaccess.thecvf.com/content/WACV2024/papers/Perez_StyleAvatar_Stylizing_Animatable_Head_Avatars_WACV_2024_paper.pdf)



## 16.Action Detection(动作检测)

* [Context in Human Action Through Motion Complementarity](https://openaccess.thecvf.com/content/WACV2024/papers/Dessalene_Context_in_Human_Action_Through_Motion_Complementarity_WACV_2024_paper.pdf)

* 小样本动作检测

  * [Semantic-aware Video Representation for Few-shot Action Recognition](http://arxiv.org/abs/2311.06218v1)

* 细粒度动作识别

  * [PGVT: Pose-Guided Video Transformer for Fine-Grained Action Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_PGVT_Pose-Guided_Video_Transformer_for_Fine-Grained_Action_Recognition_WACV_2024_paper.pdf)

* 时序动作分割

  * [OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation](https://arxiv.org/abs/2309.06276)

* 时序动作检测

  * [A*: Atrous Spatial Temporal Action Recognition for Real Time Applications](https://openaccess.thecvf.com/content/WACV2024/papers/Kim_A_Atrous_Spatial_Temporal_Action_Recognition_for_Real_Time_Applications_WACV_2024_paper.pdf)

  * [ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection](http://arxiv.org/abs/2311.00729)

* 动作检测

  * [Embodied Human Activity Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Hu_Embodied_Human_Activity_Recognition_WACV_2024_paper.pdf)

  * [JOADAA: Joint Online Action Detection and Action Anticipation](http://arxiv.org/abs/2309.06130)

  * [A Hybrid Graph Network for Complex Activity Detection in Video](http://arxiv.org/abs/2310.17493v1)

  * [Differentially Private Video Activity Recognition](http://arxiv.org/abs/2306.15742)

  * [Embedding Task Structure for Action Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Peven_Embedding_Task_Structure_for_Action_Detection_WACV_2024_paper.pdf)

  * [Egocentric Action Recognition by Capturing Hand-Object Contact and Object State](https://openaccess.thecvf.com/content/WACV2024/papers/Shiota_Egocentric_Action_Recognition_by_Capturing_Hand-Object_Contact_and_Object_State_WACV_2024_paper.pdf)

  * [Exploring the Impact of Rendering Method and Motion Quality on Model Performance When Using Multi-View Synthetic Data for Action Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Panev_Exploring_the_Impact_of_Rendering_Method_and_Motion_Quality_on_WACV_2024_paper.pdf)

  * [Learnable Cube-Based Video Encryption for Privacy-Preserving Action Recognition](https://openaccess.thecvf.com/content/WACV2024/papers/Ishikawa_Learnable_Cube-Based_Video_Encryption_for_Privacy-Preserving_Action_Recognition_WACV_2024_paper.pdf)

* 动作预测

  * [Object-centric Video Representation for Long-term Action Anticipation](http://arxiv.org/abs/2311.00180v1)
:star:[code](https://github.com/brown-palm/ObjectPrompt)

  * [Interaction Region Visual Transformer for Egocentric Action Anticipation](https://openaccess.thecvf.com/content/WACV2024/papers/Roy_Interaction_Region_Visual_Transformer_for_Egocentric_Action_Anticipation_WACV_2024_paper.pdf)

* 动作分割

  * [Permutation-Aware Activity Segmentation via Unsupervised Frame-To-Segment Alignment](https://openaccess.thecvf.com/content/WACV2024/papers/Tran_Permutation-Aware_Activity_Segmentation_via_Unsupervised_Frame-To-Segment_Alignment_WACV_2024_paper.pdf)

  * [Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation](https://openaccess.thecvf.com/content/WACV2024/papers/Duan_Mining_and_Unifying_Heterogeneous_Contrastive_Relations_for_Weakly-Supervised_Actor-Action_Segmentation_WACV_2024_paper.pdf)

  * 时序动作分割

    * [Random Walks for Temporal Action Segmentation With Timestamp Supervision](https://openaccess.thecvf.com/content/WACV2024/papers/Hirsch_Random_Walks_for_Temporal_Action_Segmentation_With_Timestamp_Supervision_WACV_2024_paper.pdf)

* 动作分类

  * [Spatio-Temporal Filter Analysis Improves 3D-CNN for Action Classification](https://openaccess.thecvf.com/content/WACV2024/papers/Kobayashi_Spatio-Temporal_Filter_Analysis_Improves_3D-CNN_for_Action_Classification_WACV_2024_paper.pdf)

* 动作合成

  * [Few-Shot Generative Model for Skeleton-Based Human Action Synthesis Using Cross-Domain Adversarial Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Fukushi_Few-Shot_Generative_Model_for_Skeleton-Based_Human_Action_Synthesis_Using_Cross-Domain_WACV_2024_paper.pdf)

* 动作质量评估

  * [PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment](http://arxiv.org/abs/2311.07603v1)
:star:[code](https://github.com/Plrbear/PECoP)

* 重复动作计数

  * [Repetitive Action Counting with Motion Feature Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Li_Repetitive_Action_Counting_With_Motion_Feature_Learning_WACV_2024_paper.pdf)



## 15.Video

* [Detecting Content Segments From Online Sports Streaming Events: Challenges and Solutions](https://openaccess.thecvf.com/content/WACV2024/papers/Liu_Detecting_Content_Segments_From_Online_Sports_Streaming_Events_Challenges_and_WACV_2024_paper.pdf)

* 视频理解

  * [M33D: Learning 3D Priors Using Multi-Modal Masked Autoencoders for 2D Image and Video Understanding](https://openaccess.thecvf.com/content/WACV2024/papers/Jamal_M33D_Learning_3D_Priors_Using_Multi-Modal_Masked_Autoencoders_for_2D_WACV_2024_paper.pdf)

  * [PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data](https://openaccess.thecvf.com/content/WACV2024/papers/Herzig_PromptonomyViT_Multi-Task_Prompt_Learning_Improves_Video_Transformers_Using_Synthetic_Scene_WACV_2024_paper.pdf)
:house:[project](https://ofir1080.github.io/PromptonomyViT)

* 视频分割

  * [Correlation-aware active learning for surgery video segmentation](http://arxiv.org/abs/2311.08811v1)

* 视频识别

  * [Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition](http://arxiv.org/abs/2311.05927v1)

* 视频稳定

  * [Leveraging Synthetic Data To Learn Video Stabilization Under Adverse Conditions](http://arxiv.org/abs/2208.12763)

* 视频重建

  * [Unsupervised Event-Based Video Reconstruction](https://openaccess.thecvf.com/content/WACV2024/papers/Fox_Unsupervised_Event-Based_Video_Reconstruction_WACV_2024_paper.pdf)

* 视频监控

  * [Lightweight Delivery Detection on Doorbell Cameras](http://arxiv.org/abs/2305.07812)

* 视频分析

  * [Weakly-Supervised Representation Learning for Video Alignment and Analysis](http://arxiv.org/abs/2302.04064)

* 视频和谐化

  * [TSA2: Temporal Segment Adaptation and Aggregation for Video Harmonization](https://openaccess.thecvf.com/content/WACV2024/papers/Xiao_TSA2_Temporal_Segment_Adaptation_and_Aggregation_for_Video_Harmonization_WACV_2024_paper.pdf)

* 录像带修复

  * [Reference-Based Restoration of Digitized Analog Videotapes](http://arxiv.org/abs/2310.14926)

  * [Restoring Degraded Old Films With Recursive Recurrent Transformer Networks](https://openaccess.thecvf.com/content/WACV2024/papers/Lin_Restoring_Degraded_Old_Films_With_Recursive_Recurrent_Transformer_Networks_WACV_2024_paper.pdf)

* 视频时刻检索

  * [Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models](https://arxiv.org/abs/2309.00661)

  * [Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval](https://openaccess.thecvf.com/content/WACV2024/papers/Huang_Semantic_Fusion_Augmentation_and_Semantic_Boundary_Detection_A_Novel_Approach_WACV_2024_paper.pdf)

* 视频目标定位

  * [Sketch-Based Video Object Localization](http://arxiv.org/abs/2304.00450)

* 电影类型分类

  * [Movie Genre Classification by Language Augmentation and Shot Sampling](http://arxiv.org/abs/2203.13281)

* 视频质量增强

  * [Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement](http://arxiv.org/abs/2202.00011)

* VAD

  * [A Coarse-to-Fine Pseudo-Labeling (C2FPL) Framework for Unsupervised Video Anomaly Detection](http://arxiv.org/abs/2310.17650v1)

  * [Real-Time Weakly Supervised Video Anomaly Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Karim_Real-Time_Weakly_Supervised_Video_Anomaly_Detection_WACV_2024_paper.pdf)

  * [OE-CTST: Outlier-Embedded Cross Temporal Scale Transformer for Weakly-Supervised Video Anomaly Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Majhi_OE-CTST_Outlier-Embedded_Cross_Temporal_Scale_Transformer_for_Weakly-Supervised_Video_Anomaly_WACV_2024_paper.pdf)



## 14.OCR(文本检测识别)

* [DTrOCR: Decoder-only Transformer for Optical Character Recognition](https://arxiv.org/abs/2308.15996)

* [On Manipulating Scene Text in the Wild with Diffusion Models](http://arxiv.org/abs/2311.00734v1)

* [DECDM: Document Enhancement using Cycle-Consistent Diffusion Models](http://arxiv.org/abs/2311.09625v1)

* 文本检测

  * [Sequential Transformer for End-to-End Video Text Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Sequential_Transformer_for_End-to-End_Video_Text_Detection_WACV_2024_paper.pdf)

  * [Textron: Weakly Supervised Multilingual Text Detection Through Data Programming](https://openaccess.thecvf.com/content/WACV2024/papers/Kudale_Textron_Weakly_Supervised_Multilingual_Text_Detection_Through_Data_Programming_WACV_2024_paper.pdf)

  * [Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition](http://arxiv.org/abs/2303.04291)

* Text Spotting

  * [Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance](https://arxiv.org/abs/2310.00917)

  * [Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis](https://arxiv.org/abs/2310.17674)

* Scene-Text Spotting

  * [STEP - Towards Structured Scene-Text Spotting](https://openaccess.thecvf.com/content/WACV2024/papers/Garcia-Bordils_STEP_-_Towards_Structured_Scene-Text_Spotting_WACV_2024_paper.pdf)

* Document Dewarping(文档矫正)

  * [DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction](https://openaccess.thecvf.com/content/WACV2024/papers/Yu_DocReal_Robust_Document_Dewarping_of_Real-Life_Images_via_Attention-Enhanced_Control_WACV_2024_paper.pdf)

* 场景文本理解

  * [Textual Alchemy: CoFormer for Scene Text Understanding](https://openaccess.thecvf.com/content/WACV2024/papers/Deshmukh_Textual_Alchemy_CoFormer_for_Scene_Text_Understanding_WACV_2024_paper.pdf)

* 文档布局分割

  * [A One-Shot Learning Approach To Document Layout Segmentation of Ancient Arabic Manuscripts](https://openaccess.thecvf.com/content/WACV2024/papers/De_Nardin_A_One-Shot_Learning_Approach_To_Document_Layout_Segmentation_of_Ancient_WACV_2024_paper.pdf)

* 字体生成

  * [Towards Diverse and Consistent Typography Generation](http://arxiv.org/abs/2309.02099)

* 信息提取

  * [Graph Neural Networks for End-to-End Information Extraction From Handwritten Documents](https://openaccess.thecvf.com/content/WACV2024/papers/Khanfir_Graph_Neural_Networks_for_End-to-End_Information_Extraction_From_Handwritten_Documents_WACV_2024_paper.pdf)



## 13.Reid(人员重识别/步态识别/行人检测)

* Reid

  * [Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach](https://openaccess.thecvf.com/content/WACV2024/papers/Kansal_Privacy-Enhancing_Person_Re-Identification_Framework_-_A_Dual-Stage_Approach_WACV_2024_paper.pdf)

  * [HashReID: Dynamic Network with Binary Codes for Efficient Person Re-identification](https://arxiv.org/abs/2308.11900)

  * [Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID](https://arxiv.org/abs/2310.15913)

  * [Source-Guided Similarity Preservation for Online Person Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Rami_Source-Guided_Similarity_Preservation_for_Online_Person_Re-Identification_WACV_2024_paper.pdf)

  * [Contrastive Viewpoint-Aware Shape Learning for Long-Term Person Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Nguyen_Contrastive_Viewpoint-Aware_Shape_Learning_for_Long-Term_Person_Re-Identification_WACV_2024_paper.pdf)

  * 可见光红外Reid

    * [Enhancing Diverse Intra-Identity Representation for Visible-Infrared Person Re-Identification](https://openaccess.thecvf.com/content/WACV2024/papers/Kim_Enhancing_Diverse_Intra-Identity_Representation_for_Visible-Infrared_Person_Re-Identification_WACV_2024_paper.pdf)

* 行人识别

  * [ShARc: Shape and Appearance Recognition for Person Identification In-the-wild](https://arxiv.org/abs/2310.15946)

* 行人搜索

  * [DDAM-PS: Diligent Domain Adaptive Mixer for Person Search](https://arxiv.org/abs/2310.20706)
:star:[code](https://github.com/mustansarfiaz/DDAM-PS)

* 行人检测

  * [HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information](https://openaccess.thecvf.com/content/WACV2024/papers/Medeiros_HalluciDet_Hallucinating_RGB_Modality_for_Person_Detection_Through_Privileged_Information_WACV_2024_paper.pdf)

  * [Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Xie_Beyond_Fusion_Modality_Hallucination-Based_Multispectral_Fusion_for_Pedestrian_Detection_WACV_2024_paper.pdf)

  * [Booster-SHOT: Boosting Stacked Homography Transformations for Multiview Pedestrian Detection With Attention](https://openaccess.thecvf.com/content/WACV2024/papers/Hwang_Booster-SHOT_Boosting_Stacked_Homography_Transformations_for_Multiview_Pedestrian_Detection_With_WACV_2024_paper.pdf)

  * [Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling](https://openaccess.thecvf.com/content/WACV2024/papers/Aung_Enhancing_Multi-View_Pedestrian_Detection_Through_Generalized_3D_Feature_Pulling_WACV_2024_paper.pdf)

  * [Favoring One Among Equals - Not a Good Idea: Many-to-One Matching for Robust Transformer Based Pedestrian Detection](https://openaccess.thecvf.com/content/WACV2024/papers/Shastry_Favoring_One_Among_Equals_-_Not_a_Good_Idea_Many-to-One_WACV_2024_paper.pdf)

* 人群计数

  * 弱监督人群计数

    * [Glance To Count: Learning To Rank With Anchors for Weakly-Supervised Crowd Counting](http://arxiv.org/abs/2205.14659)

  * 基于红外的人群计数

    * [Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting](https://arxiv.org/abs/2311.11974)
:star:[code](https://github.com/tortueTortue/IRPeopleCounting)

* 步态识别

  * [You Can Run but not Hide: Improving Gait Recognition with Intrinsic Occlusion Type Awareness](http://arxiv.org/abs/2312.02290v1)

  * [Watch Where You Head: A View-Biased Domai
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52cv/wacv-2024-papers

Awesome Lists containing this project

README