https://github.com/52cv/wacv-2025-papers

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/52cv/wacv-2025-papers
Owner: 52CV
Created: 2024-09-05T09:08:55.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-06-30T02:27:34.000Z (about 1 year ago)
Last Synced: 2025-10-11T10:39:25.540Z (9 months ago)
Size: 590 KB
Stars: 26
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# WACV-2025-Papers
![Alt text](71b01f14605d807ad78d267e5528243.jpg)
## 会议时间：2025年2月28日–3月4日
## 会议网址：https://wacv2025.thecvf.com/

## 查看2025年综述文献点这里↘️[2025-CV-Surveys](https://github.com/52CV/CV-Surveys)

## 2025 年论文分类汇总戳这里
↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)
↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)
↘️[ICCV-2025-Papers](https://github.com/52CV/ICCV-2025-Papers)

## 2024 年论文分类汇总戳这里
↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)
↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)
↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)

## [2023 年论文分类汇总戳这里](#0000)
## [2022 年论文分类汇总戳这里](#000)
## [2021 年论文分类汇总戳这里](#00)
## [2020 年论文分类汇总戳这里](#0)

# ❣❣❣ WACV 2025 论文分类整理已完成
# :loudspeaker::loudspeaker::loudspeaker:获奖论文
### :trophy:最佳论文(算法)
* [RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis](https://arxiv.org/abs/2408.03356)
:star:[code](https://github.com/hugobl1/ray_gauss)
:house:[project](https://raygauss.github.io/)
### :trophy:最佳论文(应用)
* [Optimizing Vision-Language Model for Road Crossing Intention Estimation](https://openaccess.thecvf.com/content/WACV2025/html/Uziel_Optimizing_Vision-Language_Model_for_Road_Crossing_Intention_Estimation_WACV_2025_paper.html)
### :trophy:最佳学生论文
* [GeoDiffuser: Geometry-Based Image Editing with Diffusion Models](https://arxiv.org/abs/2404.14403)
:star:[code](https://github.com/RahulSajnani/GeoDiffuser)
:house:[project](https://ivl.cs.brown.edu/research/geodiffuser.html)
### :trophy:最佳学生论文荣誉提名奖
* [Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers](https://arxiv.org/abs/2403.06601)
:star:[code](https://github.com/AlexanderHBerger/cross-dim_i2g)
### :trophy:Time of time award: (tie)
* [Deeply-Learned Feature for Age Estimation](https://ieeexplore.ieee.org/document/7045931)
* [Bayesian Multi-object Tracking Using Motion Context from Multiple Objects](https://ieeexplore.ieee.org/document/7045866)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|
|------|------|------|------|
|[1.Othere(其它)](#1)|[2.Face(人脸)](#2)|[3.Image Segmentation(图像分割)](#3)|[4.Image Progress(图像/视频处理)](#4)|
|[5.Image Classification(图像分类)](#5)|[6.Image/video Compression(图像/视频压缩)](#6)|[7.Image Captioning(图像字幕)](#7)|[8.Image/Video Retrieval(图像/视频检索)](#8)|
|[9.SR(超分辨率)](#9)|[10.OD(目标检测)](#10)|[11.OT(目标跟踪)](#11)|[12.UAV/RS/Satellite Image(无人机/遥感/卫星图像)](#12)|
|[13.Biomedical(生物特征识别)](#13)|[14.Autonomous Driving(自动驾驶)](#14)|[15.Medical Image Progress(医学影响处理)](#15)|[16.HPE(人体姿态估计)](#16)|
|[17.Action Detection(动作检测)](#17)|[18.Person Re-id(行人重识别)](#18)|[19.Video](#19)|[20.Point Cloud(点云)](#20)|
|[21.3D(三维重建\三维视觉)](#21)|[22.OCR](#22)|[23.VQA(视觉问答)](#23)|[24.GAN/Image Synthesis(图像生成)](#24)|
|[25.Style Transfer(风格迁移)](#25)|[26.Motion Generation(人体运动生成)](#26)|[27.Machine Learning(机器学习)](#27)|[28.GNN/GCN](#28)|
|[29.Deep Learning](#29)|[30.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#30)|[31.NAS(神经架构搜索)](#31)|[32.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#32)|
|[33.Semi/SSL(半/自监督)](#33)|[34.VL(视觉语言)](#34)|[35.Dataset/Benchmark(数据集/基准)](#35)|[36.Object Pose Estimation(物体姿态估计)](#36)|
|[37.Scene(场景)](#37)|[38.HOI Detection(交互检测)](#38)|[39.Robots(机器人)](#39)|[40.Deepfake](#40)|
|[41.Anomaly Detection(异常检测)](#41)|[42.Industrial Anomaly Detection(工业缺陷检测)](#42)|[43.Neural Radiance Fields](#43)|[44.Dense Prediction(密集预测)](#44)|
|[45.Transformer](#45)|[46.Sound](#46)|[47.Sketch(草图)](#47)|[48.Protecting copyright(保护版权)](#48)|
|[49.计算成像](#49)|

## 49.计算成像
* [FaVoR: Features via Voxel Rendering for Camera Relocalization](https://openaccess.thecvf.com/content/WACV2025/html/Polizzi_FaVoR_Features_via_Voxel_Rendering_for_Camera_Relocalization_WACV_2025_paper.html)
* [Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter](https://openaccess.thecvf.com/content/WACV2025/html/McGriff_Dense_Scene_Reconstruction_from_Light-Field_Images_Affected_by_Rolling_Shutter_WACV_2025_paper.html)光场图像密集场景重建
* [PrivateEye: In-Sensor Privacy Preservation Through Optical Feature Separation](https://openaccess.thecvf.com/content/WACV2025/html/Boloor_PrivateEye_In-Sensor_Privacy_Preservation_Through_Optical_Feature_Separation_WACV_2025_paper.html)光学
* [Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series](https://openaccess.thecvf.com/content/WACV2025/html/Niu_Solar_Multimodal_Transformer_Intraday_Solar_Irradiance_Predictor_using_Public_Cameras_WACV_2025_paper.html)
* [TaCOS: Task-Specific Camera Optimization with Simulation](https://openaccess.thecvf.com/content/WACV2025/html/Yan_TaCOS_Task-Specific_Camera_Optimization_with_Simulation_WACV_2025_paper.html)

## 48.Protecting copyright(保护版权)
* [Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking](https://openaccess.thecvf.com/content/WACV2025/html/Singh_Towards_Secure_and_Usable_3D_Assets_A_Novel_Framework_for_WACV_2025_paper.html)
* 图像复制检测
* [Relational Self-Supervised Distillation with Compact Descriptors for Image Copy Detection](https://openaccess.thecvf.com/content/WACV2025/html/Kim_Relational_Self-Supervised_Distillation_with_Compact_Descriptors_for_Image_Copy_Detection_WACV_2025_paper.html)

## 47.Sketch(草图)
* [3D Edge Sketch from Multiview Images](https://openaccess.thecvf.com/content/WACV2025/html/Zheng_3D_Edge_Sketch_from_Multiview_Images_WACV_2025_paper.html)
* [PICASSO: A Feed-Forward Framework for Parametric Inference of CAD Sketches via Rendering Self-Supervision](https://openaccess.thecvf.com/content/WACV2025/html/Karadeniz_PICASSO_A_Feed-Forward_Framework_for_Parametric_Inference_of_CAD_Sketches_WACV_2025_paper.html)
* [ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html)

## 46.Sound
* [SoundSil-DS: Deep Denoising and Segmentation of Sound-Field Images with Silhouettes](https://openaccess.thecvf.com/content/WACV2025/html/Tanigawa_SoundSil-DS_Deep_Denoising_and_Segmentation_of_Sound-Field_Images_with_Silhouettes_WACV_2025_paper.html)利用轮廓对声场图像进行深度去噪和分割
* [NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context](https://openaccess.thecvf.com/content/WACV2025/html/Park_NarrAD_Automatic_Generation_of_Audio_Descriptions_for_Movies_with_Rich_WACV_2025_paper.html)
* [NowYouSee Me: Context-Aware Automatic Audio Description](http://arxiv.org/abs/2412.10002v1)
* [EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos](http://arxiv.org/abs/2407.20592)
:house:[project](https://aashishrai3799.github.io/EgoSonics/)
* [SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera](http://arxiv.org/abs/2412.16861v1)
* [Temporally Streaming Audio-Visual Synchronization for Real-World Videos](https://openaccess.thecvf.com/content/WACV2025/html/Voas_Temporally_Streaming_Audio-Visual_Synchronization_for_Real-World_Videos_WACV_2025_paper.html)
* [Multimodal Interpretable Depression Analysis using Visual Physiological Audio and Textual Data](https://openaccess.thecvf.com/content/WACV2025/html/Kumar_Multimodal_Interpretable_Depression_Analysis_using_Visual_Physiological_Audio_and_Textual_WACV_2025_paper.html)
* [Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence](https://openaccess.thecvf.com/content/WACV2025/html/Islam_Unsupervised_Video_Highlight_Detection_by_Learning_from_Audio_and_Visual_WACV_2025_paper.html)
* [VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference](https://openaccess.thecvf.com/content/WACV2025/html/Yoo_VioPose_Violin_Performance_4D_Pose_Estimation_by_Hierarchical_Audiovisual_Inference_WACV_2025_paper.html)
* [VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos](https://openaccess.thecvf.com/content/WACV2025/html/Lin_VMAs_Video-to-Music_Generation_via_Semantic_Alignment_in_Web_Music_Videos_WACV_2025_paper.html)

## 45.Transformer
* [LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones](https://arxiv.org/abs/2409.03460)
* [Bandit Based Attention Mechanism in Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Chowdhury_Bandit_Based_Attention_Mechanism_in_Vision_Transformers_WACV_2025_paper.html)
* [AMP-ViT: Optimizing Vision Transformer Efficiency with Adaptive Mixed-Precision Post-Training Quantization](https://openaccess.thecvf.com/content/WACV2025/html/Tai_AMP-ViT_Optimizing_Vision_Transformer_Efficiency_with_Adaptive_Mixed-Precision_Post-Training_Quantization_WACV_2025_paper.html)
* [Channel Propagation Networks for Refreshable Vision Transformer](https://openaccess.thecvf.com/content/WACV2025/html/Go_Channel_Propagation_Networks_for_Refreshable_Vision_Transformer_WACV_2025_paper.html)
* [Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Nauen_Which_Transformer_to_Favor_A_Comparative_Analysis_of_Efficiency_in_WACV_2025_paper.html)
* [SpectFormer: Frequency and Attention is What You Need in a Vision Transformer](https://openaccess.thecvf.com/content/WACV2025/html/Patro_SpectFormer_Frequency_and_Attention_is_What_You_Need_in_a_WACV_2025_paper.html)
* [Image Adaptation for Colour Vision Deficient Viewers using Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Gillooly_Image_Adaptation_for_Colour_Vision_Deficient_Viewers_using_Vision_Transformers_WACV_2025_paper.html)
* [QuantAttack: Exploiting Quantization Techniques to Attack Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Baras_QuantAttack_Exploiting_Quantization_Techniques_to_Attack_Vision_Transformers_WACV_2025_paper.html)
* [TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration](https://openaccess.thecvf.com/content/WACV2025/html/Olszewski_TORE_Token_Recycling_in_Vision_Transformers_for_Efficient_Active_Visual_WACV_2025_paper.html)
* [Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches](https://openaccess.thecvf.com/content/WACV2025/html/Alam_Adversarial_Attention_Deficit_Fooling_Deformable_Vision_Transformers_with_Collaborative_Adversarial_WACV_2025_paper.html)
* [Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Pardyl_Beyond_Grids_Exploring_Elastic_Input_Sampling_for_Vision_Transformers_WACV_2025_paper.html)
* [Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Grigore_Weight_Copy_and_Low-Rank_Adaptation_for_Few-Shot_Distillation_of_Vision_WACV_2025_paper.html)

## 44.Dense Prediction(密集预测)
* [Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization](http://arxiv.org/abs/2412.03179v1)
:star:[code](https://github.com/Klodivio355/MT-CP)
* [Cross-Task Affinity Learning for Multitask Dense Scene Predictions](https://openaccess.thecvf.com/content/WACV2025/html/Sinodinos_Cross-Task_Affinity_Learning_for_Multitask_Dense_Scene_Predictions_WACV_2025_paper.html)密集场景预测

## 43.Neural Radiance Fields
* [Radiance Field-Based Pose Estimation via Decoupled Optimization Under Challenging Initial Conditions](https://openaccess.thecvf.com/content/WACV2025/html/Lu_Radiance_Field-Based_Pose_Estimation_via_Decoupled_Optimization_Under_Challenging_Initial_WACV_2025_paper.html)
* [MFNeRF: Memory Efficient NeRF with Mixed-Feature Hash Table](https://openaccess.thecvf.com/content/WACV2025/html/Lee_MFNeRF_Memory_Efficient_NeRF_with_Mixed-Feature_Hash_Table_WACV_2025_paper.html)
* [GANESH: Generalizable NeRF for Lensless Imaging](http://arxiv.org/abs/2411.04810v1)
:star:[code](https://rakesh-123-cryp.github.io/Rakesh.github.io/)
* [TRNeRF: Restoring Blurry Rolling Shutter and Noisy Thermal Images with Neural Radiance Fields](https://openaccess.thecvf.com/content/WACV2025/html/Carmichael_TRNeRF_Restoring_Blurry_Rolling_Shutter_and_Noisy_Thermal_Images_with_WACV_2025_paper.html)
* [BASED: Bundle-Adjusting Surgical Endoscopic Dynamic Video Reconstruction using Neural Radiance Fields](https://openaccess.thecvf.com/content/WACV2025/html/Saha_BASED_Bundle-Adjusting_Surgical_Endoscopic_Dynamic_Video_Reconstruction_using_Neural_Radiance_WACV_2025_paper.html)
* [ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization](https://openaccess.thecvf.com/content/WACV2025/html/Li_ARF-Plus_Controlling_Perceptual_Factors_in_Artistic_Radiance_Fields_for_3D_WACV_2025_paper.html)
* [Self-Aligning Depth-Regularized Radiance Fields for Asynchronous RGB-D Sequences](https://openaccess.thecvf.com/content/WACV2025/html/Huang_Self-Aligning_Depth-Regularized_Radiance_Fields_for_Asynchronous_RGB-D_Sequences_WACV_2025_paper.html)
* 新视图合成
* [RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation](https://openaccess.thecvf.com/content/WACV2025/html/Monteagudo_RendBEV_Semantic_Novel_View_Synthesis_for_Self-Supervised_Birds_Eye_View_WACV_2025_paper.html)
* [VaLID: Variable-Length Input Diffusion for Novel View Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Li_VaLID_Variable-Length_Input_Diffusion_for_Novel_View_Synthesis_WACV_2025_paper.html)
* [RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Blanc_RayGauss_Volumetric_Gaussian-Based_Ray_Casting_for_Photorealistic_Novel_View_Synthesis_WACV_2025_paper.html)
* [GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Liang_GauFRe_Gaussian_Deformation_Fields_for_Real-Time_Dynamic_Novel_View_Synthesis_WACV_2025_paper.html)
* [MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image Aided Generalizable Neural Radiance Field](https://openaccess.thecvf.com/content/WACV2025/html/Yan_MSI-NeRF_Linking_Omni-Depth_with_View_Synthesis_through_Multi-Sphere_Image_Aided_WACV_2025_paper.html)
* [FluoNeRF: Fluorescent Novel-View Synthesis under Novel Light Source Colors](https://openaccess.thecvf.com/content/WACV2025/html/Shi_FluoNeRF_Fluorescent_Novel-View_Synthesis_under_Novel_Light_Source_Colors_WACV_2025_paper.html)
* [SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior](https://openaccess.thecvf.com/content/WACV2025/html/Yu_SGD_Street_View_Synthesis_with_Gaussian_Splatting_and_Diffusion_Prior_WACV_2025_paper.html)
* 渲染
* [Global-Guided Focal Neural Radiance Field for Large-Scale Scene Rendering](https://openaccess.thecvf.com/content/WACV2025/html/Shao_Global-Guided_Focal_Neural_Radiance_Field_for_Large-Scale_Scene_Rendering_WACV_2025_paper.html)
* [OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow](https://openaccess.thecvf.com/content/WACV2025/html/Boeder_OccFlowNet_Occupancy_Estimation_via_Differentiable_Rendering_and_Occupancy_Flow_WACV_2025_paper.html)
* [PRoGS: Progressive Rendering of Gaussian Splats](https://openaccess.thecvf.com/content/WACV2025/html/Zoomers_PRoGS_Progressive_Rendering_of_Gaussian_Splats_WACV_2025_paper.html)渲染
* [NeuManifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support](https://openaccess.thecvf.com/content/WACV2025/html/Wei_NeuManifold_Neural_Watertight_Manifold_Reconstruction_with_Efficient_and_High-Quality_Rendering_WACV_2025_paper.html)

## 42.Industrial Anomaly Detection(工业缺陷检测)
* [SPACE: SPAtial-aware Consistency rEgularization for anomaly detection in Industrial applications](http://arxiv.org/abs/2411.05822v1)
* [Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination](http://arxiv.org/abs/2411.09558v1)
* [Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera](http://arxiv.org/abs/2411.10945v1)
* [ROADS: Robust Prompt-driven Multi-Class Anomaly Detection under Domain Shift](http://arxiv.org/abs/2411.16049v1)
* [FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data](http://arxiv.org/abs/2411.16110v1)
:star:[code](https://github.com/HY-Vision-Lab/FUNAD)
* [Single-Layer Distillation with Fourier Convolutions for Texture Anomaly Detection](https://openaccess.thecvf.com/content/WACV2025/html/Thomine_Single-Layer_Distillation_with_Fourier_Convolutions_for_Texture_Anomaly_Detection_WACV_2025_paper.html)
* [Looking at Model Debiasing through the Lens of Anomaly Detection](https://openaccess.thecvf.com/content/WACV2025/html/Pastore_Looking_at_Model_Debiasing_through_the_Lens_of_Anomaly_Detection_WACV_2025_paper.html)
* [AnomalyDINO: Boosting Patch-Based Few-Shot Anomaly Detection with DINOv2](https://openaccess.thecvf.com/content/WACV2025/html/Damm_AnomalyDINO_Boosting_Patch-Based_Few-Shot_Anomaly_Detection_with_DINOv2_WACV_2025_paper.html)
* [Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation](https://openaccess.thecvf.com/content/WACV2025/html/Hermary_Removing_Geometric_Bias_in_One-Class_Anomaly_Detection_with_Adaptive_Feature_WACV_2025_paper.html)
* 图像异常检测
* [Heterogeneous Datasets for Unsupervised Image Anomaly Detection](https://openaccess.thecvf.com/content/WACV2025/html/Lagos_Heterogeneous_Datasets_for_Unsupervised_Image_Anomaly_Detection_WACV_2025_paper.html)
* 异常定位
* [Towards Zero-shot 3D Anomaly Localization](http://arxiv.org/abs/2412.04304v1)
* 异常分割
* [Towards Accurate Unified Anomaly Segmentation](https://openaccess.thecvf.com/content/WACV2025/html/Ma_Towards_Accurate_Unified_Anomaly_Segmentation_WACV_2025_paper.html)

## 41.Anomaly Detection(异常检测)
* 奇异值检测
* [Robust Novelty Detection through Style-Conscious Feature Ranking](https://openaccess.thecvf.com/content/WACV2025/html/Smeu_Robust_Novelty_Detection_through_Style-Conscious_Feature_Ranking_WACV_2025_paper.html)
* OOD
* [Exploiting Inter-Sample Information for Long-Tailed Out-of-Distribution Detection](https://openaccess.thecvf.com/content/WACV2025/html/Udayangani_Exploiting_Inter-Sample_Information_for_Long-Tailed_Out-of-Distribution_Detection_WACV_2025_paper.html)
* [Identity Curvature Laplace Approximation for Improved Out-of-Distribution Detection](https://openaccess.thecvf.com/content/WACV2025/html/Zhdanov_Identity_Curvature_Laplace_Approximation_for_Improved_Out-of-Distribution_Detection_WACV_2025_paper.html)
* [CRAFT: Class Ranking Aware Fine-Tuning for Enhanced Out-of-Distribution Detection](https://openaccess.thecvf.com/content/WACV2025/html/Karunanayake_CRAFT_Class_Ranking_Aware_Fine-Tuning_for_Enhanced_Out-of-Distribution_Detection_WACV_2025_paper.html)
* [Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects using Prototypes](https://openaccess.thecvf.com/content/WACV2025/html/Sinhamahapatra_Finding_Dino_A_Plug-and-Play_Framework_for_Zero-Shot_Detection_of_Out-of-Distribution_WACV_2025_paper.html)
* [CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring](https://openaccess.thecvf.com/content/WACV2025/html/Fu_CLIPScope_Enhancing_Zero-Shot_OOD_Detection_with_Bayesian_Scoring_WACV_2025_paper.html)

## 40.Deepfake
* [DeCLIP: Decoding CLIP representations for deepfake localization](https://arxiv.org/abs/2409.08849)
:star:[code](https://github.com/bit-ml/DeCLIP)
* [Texture Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection](https://openaccess.thecvf.com/content/WACV2025/html/Li_Texture_Shape_and_Order_Matter_A_New_Transformer_Design_for_WACV_2025_paper.html)
* AI生成图像检测
* [Reducing the Content Bias for AI-Generated Image Detection](https://openaccess.thecvf.com/content/WACV2025/html/Gye_Reducing_the_Content_Bias_for_AI-Generated_Image_Detection_WACV_2025_paper.html)
* [InvisMark: Invisible and Robust Watermarking for AI-Generated Image Provenance](https://openaccess.thecvf.com/content/WACV2025/html/Xu_InvisMark_Invisible_and_Robust_Watermarking_for_AI-Generated_Image_Provenance_WACV_2025_paper.html)
* 错误信息检测
* [Similarity over Factuality: Are we Making Progress on Multimodal Out-of-Context Misinformation Detection?](https://openaccess.thecvf.com/content/WACV2025/html/Papadopoulos_Similarity_over_Factuality_Are_we_Making_Progress_on_Multimodal_Out-of-Context_WACV_2025_paper.html)
* [Can Out-of-Domain Data Help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection?](https://openaccess.thecvf.com/content/WACV2025/html/Bhattacharya_Can_Out-of-Domain_Data_Help_to_Learn_Domain-Specific_Prompts_for_Multimodal_WACV_2025_paper.html)错误信息检测

## 39.Robots(机器人)
* [Transferring Foundation Models for Generalizable Robotic Manipulation](https://openaccess.thecvf.com/content/WACV2025/html/Yang_Transferring_Foundation_Models_for_Generalizable_Robotic_Manipulation_WACV_2025_paper.html)机器人操作
* Avatar
* [Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities](https://arxiv.org/abs/2409.16147)
* [DivAvatar: Diverse 3D Avatar Generation with a Single Prompt](https://openaccess.thecvf.com/content/WACV2025/html/Tao_DivAvatar_Diverse_3D_Avatar_Generation_with_a_Single_Prompt_WACV_2025_paper.html)
* Try-On
* [Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images](https://openaccess.thecvf.com/content/WACV2025/html/Cui_Street_TryOn_Learning_In-the-Wild_Virtual_Try-On_from_Unpaired_Person_Images_WACV_2025_paper.html)
* SLAM
* [Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction](http://arxiv.org/abs/2412.00242v1)
:star:[code](https://shaoxiang777.github.io/project/uni-slam/)
* 室内定位
* [Multi-Surrogate-Teacher Assistance for Representation Alignment in Fingerprint-based Indoor Localization](http://arxiv.org/abs/2412.12189v1)
* 视觉位置识别
* [Breaking the Frame: Visual Place Recognition by Overlap Prediction](https://openaccess.thecvf.com/content/WACV2025/html/Wei_Breaking_the_Frame_Visual_Place_Recognition_by_Overlap_Prediction_WACV_2025_paper.html)

## 38.HOI Detection(交互检测)
* [Unleashing Potentials of Vision-Language Models for Zero-Shot HOI Detection](https://openaccess.thecvf.com/content/WACV2025/html/Yamada_Unleashing_Potentials_of_Vision-Language_Models_for_Zero-Shot_HOI_Detection_WACV_2025_paper.html)
* 手物交互
* [A Versatile and Differentiable Hand-Object Interaction Representation](https://openaccess.thecvf.com/content/WACV2025/html/Morales_A_Versatile_and_Differentiable_Hand-Object_Interaction_Representation_WACV_2025_paper.html)

## 37.Scene(场景)
* [LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations](http://arxiv.org/abs/2412.06322v1)
:star:[code](https://github.com/Endlinc/LLaVA-SpaceSGG)
* [DDS: Decoupled Dynamic Scene-Graph Generation Network](https://openaccess.thecvf.com/content/WACV2025/html/Iftekhar_DDS_Decoupled_Dynamic_Scene-Graph_Generation_Network_WACV_2025_paper.html)
* [Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge](https://openaccess.thecvf.com/content/WACV2025/html/Jiang_Enhancing_Scene_Graph_Generation_with_Hierarchical_Relationships_and_Commonsense_Knowledge_WACV_2025_paper.html)
* [Effective Scene Graph Generation by Statistical Relation Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Nguyen_Effective_Scene_Graph_Generation_by_Statistical_Relation_Distillation_WACV_2025_paper.html)

## 36.Object Pose Estimation(物体姿态估计)//
* [Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching](http://arxiv.org/abs/2411.15860v1)
:star:[code](https://github.com/scy639/Gen2SM)
* 重识别
* [DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification](https://openaccess.thecvf.com/content/WACV2025/html/Lin_DMPT_Decoupled_Modality-Aware_Prompt_Tuning_for_Multi-Modal_Object_Re-Identification_WACV_2025_paper.html)

## 35.Dataset/Benchmark(数据集/基准)
* [SynDRA: Synthetic Dataset for Railway Applications](https://openaccess.thecvf.com/content/WACV2025/html/DAmico_SynDRA_Synthetic_Dataset_for_Railway_Applications_WACV_2025_paper.html)
* [High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer](http://arxiv.org/abs/2410.22922v1)
* [Needles & Haystacks: Dataset and Benchmark for Domain-Agnostic Image-Based Rigid Slice-to-Volume Registration](https://openaccess.thecvf.com/content/WACV2025/html/Frolov_Needles__Haystacks_Dataset_and_Benchmark_for_Domain-Agnostic_Image-Based_Rigid_WACV_2025_paper.html)
* [The FineView Dataset:A 3D Scanned Multi-View Object Dataset of Fine-Grained Category Instances](https://openaccess.thecvf.com/content/WACV2025/html/Onda_The_FineView_DatasetA_3D_Scanned_Multi-View_Object_Dataset_of_Fine-Grained_WACV_2025_paper.html)
* [PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests](https://openaccess.thecvf.com/content/WACV2025/html/Gaydon_PureForest_A_Large-Scale_Aerial_Lidar_and_Aerial_Imagery_Dataset_for_WACV_2025_paper.html)
* [IRIS-VIS: A New Dataset for Visibility Estimation in an Industrial Environment](https://openaccess.thecvf.com/content/WACV2025/html/Armangeon_IRIS-VIS_A_New_Dataset_for_Visibility_Estimation_in_an_Industrial_WACV_2025_paper.html)
* [GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction](https://openaccess.thecvf.com/content/WACV2025/html/Barua_GTA-HDR_A_Large-Scale_Synthetic_Dataset_for_HDR_Image_Reconstruction_WACV_2025_paper.html)
* [CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach](http://arxiv.org/abs/2410.21932v1)
:star:[code](https://github.com/thanhhff/CPDM)
* [PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation](http://arxiv.org/abs/2410.22623v1)
* [CLIPping Imbalances: A Novel Evaluation Baseline and PEARL Dataset for Pedestrian Attribute Recognition](https://openaccess.thecvf.com/content/WACV2025/html/Vijay_CLIPping_Imbalances_A_Novel_Evaluation_Baseline_and_PEARL_Dataset_for_WACV_2025_paper.html)
* [SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection](http://arxiv.org/abs/2411.05633v1)
* [SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark](http://arxiv.org/abs/2412.00730v1)
:star:[code](https://seed4d.github.io/)
:star:[code](https://github.com/continental/seed4d)
* [SEED4D: A Synthetic Ego-Exo Dynamic 4D Data Generator Driving Dataset and Benchmark](https://openaccess.thecvf.com/content/WACV2025/html/Kastingschafer_SEED4D_A_Synthetic_Ego-Exo_Dynamic_4D_Data_Generator_Driving_Dataset_WACV_2025_paper.html)
* [DrIFT: Autonomous Drone Dataset with Integrated Real and Synthetic Data, Flexible Views, and Transformed Domains](http://arxiv.org/abs/2412.04789v1)
:star:[code](https://github.com/CARG-uOttawa/DrIFT.git)
* [DrIFT: Autonomous Drone Dataset with Integrated Real and Synthetic Data Flexible Views and Transformed Domains](https://openaccess.thecvf.com/content/WACV2025/html/Dadboud_DrIFT_Autonomous_Drone_Dataset_with_Integrated_Real_and_Synthetic_Data_WACV_2025_paper.html)
* [TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations](http://arxiv.org/abs/2501.07360v1)
:star:[code](https://github.com/timbervision/timbervision)
* [A Pipeline and NIR-Enhanced Dataset for Parking Lot Segmentation](http://arxiv.org/abs/2412.13179v1)
* [3D Understanding of Deformable Linear Objects: Datasets and Transferability Benchmark](https://openaccess.thecvf.com/content/WACV2025/html/Zagar_3D_Understanding_of_Deformable_Linear_Objects_Datasets_and_Transferability_Benchmark_WACV_2025_paper.html)
* [CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry](https://openaccess.thecvf.com/content/WACV2025/html/Tschirschwitz_CISOL_An_Open_and_Extensible_Dataset_for_Table_Structure_Recognition_WACV_2025_paper.html)
* [SANPO: A Scene Understanding Accessibility and Human Navigation Dataset](https://openaccess.thecvf.com/content/WACV2025/html/Waghmare_SANPO_A_Scene_Understanding_Accessibility_and_Human_Navigation_Dataset_WACV_2025_paper.html)
* [Sign Language Recognition: A Large-Scale Multi-View Dataset and Comprehensive Evaluation](https://openaccess.thecvf.com/content/WACV2025/html/Dinh_Sign_Language_Recognition_A_Large-Scale_Multi-View_Dataset_and_Comprehensive_Evaluation_WACV_2025_paper.html)
* [CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis](https://openaccess.thecvf.com/content/WACV2025/html/Desai_CycleCrash_A_Dataset_of_Bicycle_Collision_Videos_for_Collision_Prediction_WACV_2025_paper.html)
* [A Semantically Impactful Image Manipulation Dataset: Characterizing Image Manipulations using Semantic Significance](https://openaccess.thecvf.com/content/WACV2025/html/Chen_A_Semantically_Impactful_Image_Manipulation_Dataset_Characterizing_Image_Manipulations_using_WACV_2025_paper.html)
* 基准
* [GazeSearch: Radiology Findings Search Benchmark](http://arxiv.org/abs/2411.05780v1)
* [ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage](http://arxiv.org/abs/2412.04580v1)
:star:[code](https://daniela997.github.io/ARTeFACT/)
* [CardioSyntax: End-to-End SYNTAX Score Prediction - Dataset Benchmark and Method](https://openaccess.thecvf.com/content/WACV2025/html/Ponomarchuk_CardioSyntax_End-to-End_SYNTAX_Score_Prediction_-_Dataset_Benchmark_and_Method_WACV_2025_paper.html)
* [Oriented Cell Dataset: A Dataset and Benchmark for Oriented Cell Detection and Applications](https://openaccess.thecvf.com/content/WACV2025/html/Kirsten_Oriented_Cell_Dataset_A_Dataset_and_Benchmark_for_Oriented_Cell_WACV_2025_paper.html)
* [ANTHROPOS-V: Benchmarking the Novel Task of Crowd Volume Estimation](https://openaccess.thecvf.com/content/WACV2025/html/Collorone_ANTHROPOS-V_Benchmarking_the_Novel_Task_of_Crowd_Volume_Estimation_WACV_2025_paper.html)
* [SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-Grade Videos](https://openaccess.thecvf.com/content/WACV2025/html/Chierchia_SALVE_A_3D_Reconstruction_Benchmark_of_Wounds_from_Consumer-Grade_Videos_WACV_2025_paper.html)
* [Mind the Prompt: A Novel Benchmark for Prompt-Based Class-Agnostic Counting](https://openaccess.thecvf.com/content/WACV2025/html/Ciampi_Mind_the_Prompt_A_Novel_Benchmark_for_Prompt-Based_Class-Agnostic_Counting_WACV_2025_paper.html)
* [VG-SSL: Benchmarking Self-Supervised Representation Learning Approaches for Visual Geo-Localization](https://openaccess.thecvf.com/content/WACV2025/html/Xiao_VG-SSL_Benchmarking_Self-Supervised_Representation_Learning_Approaches_for_Visual_Geo-Localization_WACV_2025_paper.html)
* [Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark](https://openaccess.thecvf.com/content/WACV2025/html/Ceccon_Multi-Label_Continual_Learning_for_the_Medical_Domain_A_Novel_Benchmark_WACV_2025_paper.html)
* [OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics](https://openaccess.thecvf.com/content/WACV2025/html/Gozlan_OpenCapBench_A_Benchmark_to_Bridge_Pose_Estimation_and_Biomechanics_WACV_2025_paper.html)
* [UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark](https://openaccess.thecvf.com/content/WACV2025/html/Abdullah_UAL-Bench_The_First_Comprehensive_Unusual_Activity_Localization_Benchmark_WACV_2025_paper.html)
* [MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding](https://openaccess.thecvf.com/content/WACV2025/html/Madan_MIP-GAF_A_MLLM-Annotated_Benchmark_for_Most_Important_Person_Localization_and_WACV_2025_paper.html)

## 34.Vision-Language(视觉语言)
* [Active Learning for Vision-Language Models](http://arxiv.org/abs/2410.22187v1)
* [Active Learning for Vision Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Safaei_Active_Learning_for_Vision_Language_Models_WACV_2025_paper.html)
* [Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Chang_Generalist_YOLO_Towards_Real-Time_End-to-End_Multi-Task_Visual_Language_Models_WACV_2025_paper.html)
* [LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology](https://openaccess.thecvf.com/content/WACV2025/html/Bahadir_LLM-Generated_Rewrite_and_Context_Modulation_for_Enhanced_Vision_Language_Models_WACV_2025_paper.html)
* [@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology](https://arxiv.org/abs/2409.14215)
:star:[code](https://github.com/jystin/ATBench)
:house:[project](https://junweizheng93.github.io/publications/ATBench/ATBench.html)
* [Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models](http://arxiv.org/abs/2411.16018v1)
* [Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis](http://arxiv.org/abs/2412.02946v1)
* [Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling](https://arxiv.org/abs/2412.07077)
* [Enhancing Vision-Language Few-Shot Adaptation with Negative Learning](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_Enhancing_Vision-Language_Few-Shot_Adaptation_with_Negative_Learning_WACV_2025_paper.html)
* [DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Ali_DPA_Dual_Prototypes_Alignment_for_Unsupervised_Adaptation_of_Vision-Language_Models_WACV_2025_paper.html)
* [Optimizing Vision-Language Model for Road Crossing Intention Estimation](https://openaccess.thecvf.com/content/WACV2025/html/Uziel_Optimizing_Vision-Language_Model_for_Road_Crossing_Intention_Estimation_WACV_2025_paper.html)
* [OpenCity3D: What do Vision-Language Models Know About Urban Environments?](https://openaccess.thecvf.com/content/WACV2025/html/Bieri_OpenCity3D_What_do_Vision-Language_Models_Know_About_Urban_Environments_WACV_2025_paper.html)
* [Automated Evaluation of Large Vision-Language Models on Self-Driving Corner Cases](https://openaccess.thecvf.com/content/WACV2025/html/Chen_Automated_Evaluation_of_Large_Vision-Language_Models_on_Self-Driving_Corner_Cases_WACV_2025_paper.html)
* 视频语言
* [ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos](http://arxiv.org/abs/2411.15628v1)
* VLN
* [To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation](http://arxiv.org/abs/2411.05831v1)
* [Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks](http://arxiv.org/abs/2412.02795v1)
* [GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation](https://openaccess.thecvf.com/content/WACV2025/html/Liu_GroundingMate_Aiding_Object_Grounding_for_Goal-Oriented_Vision-and-Language_Navigation_WACV_2025_paper.html)
* [ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion](https://openaccess.thecvf.com/content/WACV2025/html/Shen_ELBA_Learning_by_Asking_for_Embodied_Visual_Navigation_and_Task_WACV_2025_paper.html)
* LLM
* [Learning Multiple Object States from Actions via Large Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Tateno_Learning_Multiple_Object_States_from_Actions_via_Large_Language_Models_WACV_2025_paper.html)
* [LLM-RSPF: Large Language Model-Based Robotic System Planning Framework for Domain Specific Use-Cases](https://openaccess.thecvf.com/content/WACV2025/html/Singh_LLM-RSPF_Large_Language_Model-Based_Robotic_System_Planning_Framework_for_Domain_WACV_2025_paper.html)
* MLLM
* [Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?](https://openaccess.thecvf.com/content/WACV2025/html/Chen_Can_Multimodal_Large_Language_Models_Truly_Perform_Multimodal_In-Context_Learning_WACV_2025_paper.html)
* [User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance](https://openaccess.thecvf.com/content/WACV2025/html/Verghese_User-in-the-Loop_Evaluation_of_Multimodal_LLMs_for_Activity_Assistance_WACV_2025_paper.html)
* [Multi-Modal Large Language Model with RAG Strategies in Soccer Commentary Generation](https://openaccess.thecvf.com/content/WACV2025/html/Li_Multi-Modal_Large_Language_Model_with_RAG_Strategies_in_Soccer_Commentary_WACV_2025_paper.html)
* [MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning](https://openaccess.thecvf.com/content/WACV2025/html/Wang_MLLM-Tool_A_Multimodal_Large_Language_Model_for_Tool_Agent_Learning_WACV_2025_paper.html)
* [Multi-Modal Large Language Models are Effective Vision Learners](https://openaccess.thecvf.com/content/WACV2025/html/Sun_Multi-Modal_Large_Language_Models_are_Effective_Vision_Learners_WACV_2025_paper.html)
* Visual Grounding
* [Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding](http://arxiv.org/abs/2411.03405v1)
* [Data-Efficient 3D Visual Grounding via Order-Aware Referring](https://openaccess.thecvf.com/content/WACV2025/html/Wu_Data-Efficient_3D_Visual_Grounding_via_Order-Aware_Referring_WACV_2025_paper.html)
* [Learning Visual Grounding from Generative Vision and Language Model](https://openaccess.thecvf.com/content/WACV2025/html/Wang_Learning_Visual_Grounding_from_Generative_Vision_and_Language_Model_WACV_2025_paper.html)
* 农业+视觉语言
* [Leveraging Vision Language Models for Specialized Agricultural Tasks](https://openaccess.thecvf.com/content/WACV2025/html/Arshad_Leveraging_Vision_Language_Models_for_Specialized_Agricultural_Tasks_WACV_2025_paper.html)
* [AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning](https://openaccess.thecvf.com/content/WACV2025/html/Awais_AgroGPT_Efficient_Agricultural_Vision-Language_Model_with_Expert_Tuning_WACV_2025_paper.html)
* [Flowering Time Prediction of Wheat from DIA-MS Data](https://openaccess.thecvf.com/content/WACV2025/html/Yang_Flowering_Time_Prediction_of_Wheat_from_DIA-MS_Data_WACV_2025_paper.html)预测小麦开花期

## 33.Semi/self-supervised learning(半/自监督)//
* 自监督
* [HEX: Hierarchical Emergence Exploitation in Self-Supervised Algorithms](http://arxiv.org/abs/2410.23200v1)
* [Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation](http://arxiv.org/abs/2412.05825v1)
:star:[code](https://github.com/joonha425/SSLPDL)
* [Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images](https://openaccess.thecvf.com/content/WACV2025/html/Chen_Local_Masked_Reconstruction_for_Efficient_Self-Supervised_Learning_on_High-Resolution_Images_WACV_2025_paper.html)
* 半监督
* [PivotAlign: Improve Semi-Supervised Learning by Learning Intra-Class Heterogeneity and Aligning with Pivots](https://openaccess.thecvf.com/content/WACV2025/html/Yi_PivotAlign_Improve_Semi-Supervised_Learning_by_Learning_Intra-Class_Heterogeneity_and_Aligning_WACV_2025_paper.html)
* [Defending Against Repetitive Backdoor Attacks on Semi-Supervised Learning through Lens of Rate-Distortion-Perception Trade-Off](https://openaccess.thecvf.com/content/WACV2025/html/Lee_Defending_Against_Repetitive_Backdoor_Attacks_on_Semi-Supervised_Learning_through_Lens_WACV_2025_paper.html)
* 新类别发现
* [Towards On-the-Fly Novel Category Discovery in Dynamic Long-Tailed Distributions](https://openaccess.thecvf.com/content/WACV2025/html/Jung_Towards_On-the-Fly_Novel_Category_Discovery_in_Dynamic_Long-Tailed_Distributions_WACV_2025_paper.html)

## 32.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)//
* [A Multi-Task Supervised Compression Model for Split Computing](https://openaccess.thecvf.com/content/WACV2025/html/Matsubara_A_Multi-Task_Supervised_Compression_Model_for_Split_Computing_WACV_2025_paper.html)
* KD
* [On Explaining Knowledge Distillation: Measuring and Visualising the Knowledge Transfer Process](http://arxiv.org/abs/2412.13943v1)
* [KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder](https://openaccess.thecvf.com/content/WACV2025/html/Bora_KDC-MAE_Knowledge_Distilled_Contrastive_Mask_Auto-Encoder_WACV_2025_paper.html)
* [Dropout Connects Transformers and CNNs: Transfer General Knowledge for Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Lee_Dropout_Connects_Transformers_and_CNNs_Transfer_General_Knowledge_for_Knowledge_WACV_2025_paper.html)
* [InDistill: Information Flow-Preserving Knowledge Distillation for Model Compression](https://openaccess.thecvf.com/content/WACV2025/html/Sarridis_InDistill_Information_Flow-Preserving_Knowledge_Distillation_for_Model_Compression_WACV_2025_paper.html)
* [Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Bhattacharyya_Information_Extraction_from_Heterogeneous_Documents_without_Ground_Truth_Labels_using_WACV_2025_paper.html)
* [EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data](https://openaccess.thecvf.com/content/WACV2025/html/Petit_EchoDFKD_Data-Free_Knowledge_Distillation_for_Cardiac_Ultrasound_Segmentation_using_Synthetic_WACV_2025_paper.html)
* [Comparative Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Xu_Comparative_Knowledge_Distillation_WACV_2025_paper.html)
* [SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Liu_SMDAF_A_Scalable_Sidewalk_Material_Data_Acquisition_Framework_with_Bidirectional_WACV_2025_paper.html)
* [ChromaDistill : Colorizing Monochrome Radiance Fields with Knowledge Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Dhiman_ChromaDistill__Colorizing_Monochrome_Radiance_Fields_with_Knowledge_Distillation_WACV_2025_paper.html)
* 剪枝
* [VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation](https://openaccess.thecvf.com/content/WACV2025/html/Chen_VLTP_Vision-Language_Guided_Token_Pruning_for_Task-Oriented_Segmentation_WACV_2025_paper.html)
* [Shapley Consensus Deep Learning for Ensemble Pruning](https://openaccess.thecvf.com/content/WACV2025/html/Djenouri_Shapley_Consensus_Deep_Learning_for_Ensemble_Pruning_WACV_2025_paper.html)
* [Patch Ranking: Token Pruning as Ranking Prediction for Efficient CLIP](https://openaccess.thecvf.com/content/WACV2025/html/Wu_Patch_Ranking_Token_Pruning_as_Ranking_Prediction_for_Efficient_CLIP_WACV_2025_paper.html)
* [Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge](https://openaccess.thecvf.com/content/WACV2025/html/Eliopoulos_Pruning_One_More_Token_is_Enough_Leveraging_Latency-Workload_Non-Linearities_for_WACV_2025_paper.html)
* [Information Theoretic Pruning of Coupled Channels in Deep Neural Networks](https://openaccess.thecvf.com/content/WACV2025/html/Rostami_Information_Theoretic_Pruning_of_Coupled_Channels_in_Deep_Neural_Networks_WACV_2025_paper.html)
* 量化
* [PTQ4VM: Post-Training Quantization for Visual Mamba](http://arxiv.org/abs/2412.20386v1)
:star:[code](https://github.com/YoungHyun197/ptq4vm)
* [Dequantization and Color Transfer with Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Vavilala_Dequantization_and_Color_Transfer_with_Diffusion_Models_WACV_2025_paper.html)
* [Data Generation for Hardware-Friendly Post-Training Quantization](https://openaccess.thecvf.com/content/WACV2025/html/Dikstein_Data_Generation_for_Hardware-Friendly_Post-Training_Quantization_WACV_2025_paper.html)
* [Difficulty Diversity and Plausibility: Dynamic Data-Free Quantization](https://openaccess.thecvf.com/content/WACV2025/html/Hong_Difficulty_Diversity_and_Plausibility_Dynamic_Data-Free_Quantization_WACV_2025_paper.html)
* [Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation](https://openaccess.thecvf.com/content/WACV2025/html/Yu_Q-TempFusion_Quantization-Aware_Temporal_Multi-Sensor_Fusion_on_Birds-Eye_View_Representation_WACV_2025_paper.html)

## 31.Neural Architecture Search(神经架构搜索)
* [MONAS-ESNN: Multi-Objective Neural Architecture Search for Efficient Spiking Neural Networks](https://openaccess.thecvf.com/content/WACV2025/html/Saghand_MONAS-ESNN_Multi-Objective_Neural_Architecture_Search_for_Efficient_Spiking_Neural_Networks_WACV_2025_paper.html)
* [Delta-NAS: Difference of Architecture Encoding for Predictor-Based Evolutionary Neural Architecture Search](https://openaccess.thecvf.com/content/WACV2025/html/Sridhar_Delta-NAS_Difference_of_Architecture_Encoding_for_Predictor-Based_Evolutionary_Neural_Architecture_WACV_2025_paper.html)

## 30.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)
* 域泛化
* [ERM++: An Improved Baseline for Domain Generalization](https://openaccess.thecvf.com/content/WACV2025/html/Teterwak_ERM_An_Improved_Baseline_for_Domain_Generalization_WACV_2025_paper.html)
* [Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization](https://openaccess.thecvf.com/content/WACV2025/html/Efthymiadis_Crafting_Distribution_Shifts_for_Validation_and_Training_in_Single_Source_WACV_2025_paper.html)
* [FRAUD-Net: Fraud News Detection using Sample Uncertainty & Domain Aware Generalized Network](https://openaccess.thecvf.com/content/WACV2025/html/Patel_FRAUD-Net_Fraud_News_Detection_using_Sample_Uncertainty__Domain_Aware_WACV_2025_paper.html)
* [Domain-Generalized Object Anti-Spoofing: Bridging Gaps and Patch Selection for Robust Detection Across Domains](https://openaccess.thecvf.com/content/WACV2025/html/Lee_Domain-Generalized_Object_Anti-Spoofing_Bridging_Gaps_and_Patch_Selection_for_Robust_WACV_2025_paper.html)
* [Fair Domain Generalization with Heterogeneous Sensitive Attributes Across Domains](https://openaccess.thecvf.com/content/WACV2025/html/Palakkadavath_Fair_Domain_Generalization_with_Heterogeneous_Sensitive_Attributes_Across_Domains_WACV_2025_paper.html)
* [Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization](https://openaccess.thecvf.com/content/WACV2025/html/Galappaththige_Domain-Guided_Weight_Modulation_for_Semi-Supervised_Domain_Generalization_WACV_2025_paper.html)
* [FDS: Feedback-Guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization](https://openaccess.thecvf.com/content/WACV2025/html/Noori_FDS_Feedback-Guided_Domain_Synthesis_with_Multi-Source_Conditional_Diffusion_Models_for_WACV_2025_paper.html)
* [Domain Generalization using Large Pretrained Models with Mixture-of-Adapters](https://openaccess.thecvf.com/content/WACV2025/html/Lee_Domain_Generalization_using_Large_Pretrained_Models_with_Mixture-of-Adapters_WACV_2025_paper.html)
* [ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalizatio](https://openaccess.thecvf.com/content/WACV2025/html/Matsun_ConDiSR_Contrastive_Disentanglement_and_Style_Regularization_for_Single_Domain_Generalizatio_WACV_2025_paper.html)
* 域适应
* [Label Calibration in Source Free Domain Adaptation](http://arxiv.org/abs/2501.07072v1)
* [AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation](http://arxiv.org/abs/2412.02280v1)
* [Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Yu_Feature_Fusion_Transferability_Aware_Transformer_for_Unsupervised_Domain_Adaptation_WACV_2025_paper.html)
* [Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Zheng_Instance-Warp_Saliency_Guided_Image_Warping_for_Unsupervised_Domain_Adaptation_WACV_2025_paper.html)
* [Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model](https://openaccess.thecvf.com/content/WACV2025/html/Schlachter_Memory-Efficient_Pseudo-Labeling_for_Online_Source-Free_Universal_Domain_Adaptation_using_a_WACV_2025_paper.html)
* [Combining Inherent Knowledge of Vision-Language Models with Unsupervised Domain Adaptation through Strong-Weak Guidance](https://openaccess.thecvf.com/content/WACV2025/html/Westfechtel_Combining_Inherent_Knowledge_of_Vision-Language_Models_with_Unsupervised_Domain_Adaptation_WACV_2025_paper.html)
* [Transferable-Guided Attention is All You Need for Video Domain Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Sacilotti_Transferable-Guided_Attention_is_All_You_Need_for_Video_Domain_Adaptation_WACV_2025_paper.html)
* [When Cars Meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather](https://openaccess.thecvf.com/content/WACV2025/html/Rizzoli_When_Cars_Meet_Drones_Hyperbolic_Federated_Learning_for_Source-Free_Domain_WACV_2025_paper.html)
* [Ad^2mix: Adversarial and Adaptive Mixup for Unsupervised Domain Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Zhu_Ad2mix_Adversarial_and_Adaptive_Mixup_for_Unsupervised_Domain_Adaptation_WACV_2025_paper.html)
* 零样本
* [Unified Framework for Open-World Compositional Zero-Shot Learning](https://openaccess.thecvf.com/content/WACV2025/html/Jayasekara_Unified_Framework_for_Open-World_Compositional_Zero-Shot_Learning_WACV_2025_paper.html)
* [Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Sui_Just_Shift_It_Test-Time_Prototype_Shifting_for_Zero-Shot_Generalization_with_WACV_2025_paper.html)
* [Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Imam_Test-Time_Low_Rank_Adaptation_via_Confidence_Maximization_for_Zero-Shot_Generalization_WACV_2025_paper.html)
* [SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting](http://arxiv.org/abs/2412.08536v1)
* [HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts](https://openaccess.thecvf.com/content/WACV2025/html/Dat_HOPE_A_Memory-Based_and_Composition-Aware_Framework_for_Zero-Shot_Learning_with_WACV_2025_paper.html)
* [Learning to Identify Seen Unseen and Unknown in the Open World: A Practical Setting for Zero-Shot Learning](https://openaccess.thecvf.com/content/WACV2025/html/Parameswaran_Learning_to_Identify_Seen_Unseen_and_Unknown_in_the_Open_WACV_2025_paper.html)
* [PC-GZSL: Prior Correction for Generalized Zero Shot Learning](https://openaccess.thecvf.com/content/WACV2025/html/Bhat_PC-GZSL_Prior_Correction_for_Generalized_Zero_Shot_Learning_WACV_2025_paper.html)

## 29.Deep Learning
* [Prior2Posterior: Model Prior Correction for Long-Tailed Learning](http://arxiv.org/abs/2412.16540v1)
* [SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data](https://openaccess.thecvf.com/content/WACV2025/html/Choudhary_SADDLe_Sharpness-Aware_Decentralized_Deep_Learning_with_Heterogeneous_Data_WACV_2025_paper.html)
* DNN
* [Guardian of the Ensembles: Introducing Pairwise Adversarially Robust Loss for Resisting Adversarial Attacks in DNN Ensembles](https://openaccess.thecvf.com/content/WACV2025/html/Shukla_Guardian_of_the_Ensembles_Introducing_Pairwise_Adversarially_Robust_Loss_for_WACV_2025_paper.html)
* [DARDA: Domain-Aware Real-Time Dynamic Neural Network Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Rifat_DARDA_Domain-Aware_Real-Time_Dynamic_Neural_Network_Adaptation_WACV_2025_paper.html)动态神经网络

## 28.GNN/GCN
* [WiGNet: Windowed Vision Graph Neural Network](https://openaccess.thecvf.com/content/WACV2025/html/Spadaro_WiGNet_Windowed_Vision_Graph_Neural_Network_WACV_2025_paper.html)
* [SIGNN - Star Identification using Graph Neural Networks](https://openaccess.thecvf.com/content/WACV2025/html/Hepburn-Dickins_SIGNN_-_Star_Identification_using_Graph_Neural_Networks_WACV_2025_paper.html)

# 27.Machine Learning(机器学习)
* 度量学习
* [Uncertainty-Guided Metric Learning without Labels](https://openaccess.thecvf.com/content/WACV2025/html/Devalraju_Uncertainty-Guided_Metric_Learning_without_Labels_WACV_2025_paper.html)
* 迁移学习
* [Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning](https://openaccess.thecvf.com/content/WACV2025/html/Kim_Learning_Unified_Distance_Metric_Across_Diverse_Data_Distributions_with_Parameter-Efficient_WACV_2025_paper.html)
* 机器遗忘
* [Revisiting Machine Unlearning with Dimensional Alignment](https://openaccess.thecvf.com/content/WACV2025/html/Seo_Revisiting_Machine_Unlearning_with_Dimensional_Alignment_WACV_2025_paper.html)
* 增量学习
* [Self-Supervised Incremental Learning of Object Representations from Arbitrary Image Sets](https://openaccess.thecvf.com/content/WACV2025/html/Leotescu_Self-Supervised_Incremental_Learning_of_Object_Representations_from_Arbitrary_Image_Sets_WACV_2025_paper.html)
* 类增量
* [Covariance-based Space Regularization for Few-shot Class Incremental Learning](http://arxiv.org/abs/2411.01172v1)
* [A Reality Check on Pre-training for Exemplar-free Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2025/html/Feillet_A_Reality_Check_on_Pre-training_for_Exemplar-free_Class-Incremental_Learning_WACV_2025_paper.html)
* [Dynamic Adapter Tuning for Long-Tailed Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2025/html/Gu_Dynamic_Adapter_Tuning_for_Long-Tailed_Class-Incremental_Learning_WACV_2025_paper.html)
* [ReFu: Recursive Fusion for Exemplar-Free 3D Class-Incremental Learning](https://openaccess.thecvf.com/content/WACV2025/html/Yang_ReFu_Recursive_Fusion_for_Exemplar-Free_3D_Class-Incremental_Learning_WACV_2025_paper.html)
* [Strategic Base Representation Learning via Feature Augmentations for Few-Shot Class Incremental Learning](http://arxiv.org/abs/2501.09361v1)
* [Are Exemplar-Based Class Incremental Learning Models Victim of Black-Box Poison Attacks?](https://openaccess.thecvf.com/content/WACV2025/html/Perla_Are_Exemplar-Based_Class_Incremental_Learning_Models_Victim_of_Black-Box_Poison_WACV_2025_paper.html)
* [TACLE: Task and Class-Aware Exemplar-Free Semi-Supervised Class Incremental Learning](https://openaccess.thecvf.com/content/WACV2025/html/Kalla_TACLE_Task_and_Class-Aware_Exemplar-Free_Semi-Supervised_Class_Incremental_Learning_WACV_2025_paper.html)
* 主动学习
* [CRAAC: Consistency Regularised Active Learning with Automatic Corrections for Real-Life Road Image Annotations](https://openaccess.thecvf.com/content/WACV2025/html/Lam_CRAAC_Consistency_Regularised_Active_Learning_with_Automatic_Corrections_for_Real-Life_WACV_2025_paper.html)
* 联邦学习
* [Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Mendieta_Navigating_Heterogeneity_and_Privacy_in_One-Shot_Federated_Learning_with_Diffusion_WACV_2025_paper.html)
* [Predicting Event Memorability using Personalized Federated Learning](https://openaccess.thecvf.com/content/WACV2025/html/Banerjee_Predicting_Event_Memorability_using_Personalized_Federated_Learning_WACV_2025_paper.html)
* [Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation](https://openaccess.thecvf.com/content/WACV2025/html/Xu_Achieving_Byzantine-Resilient_Federated_Learning_via_Layer-Adaptive_Sparsified_Model_Aggregation_WACV_2025_paper.html)
* [Identify Backdoored Model in Federated Learning via Individual Unlearning](https://openaccess.thecvf.com/content/WACV2025/html/Xu_Identify_Backdoored_Model_in_Federated_Learning_via_Individual_Unlearning_WACV_2025_paper.html)
* [MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_MLLM-LLaVA-FL_Multimodal_Large_Language_Model_Assisted_Federated_Learning_WACV_2025_paper.html)
* 对比学习
* [MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning](https://arxiv.org/abs/2409.02714)
* [Tuned Contrastive Learning](https://openaccess.thecvf.com/content/WACV2025/html/Animesh_Tuned_Contrastive_Learning_WACV_2025_paper.html)
* [Contrastive Learning of Image Representations Guided by Spatial Relations](https://openaccess.thecvf.com/content/WACV2025/html/Servant_Contrastive_Learning_of_Image_Representations_Guided_by_Spatial_Relations_WACV_2025_paper.html)
* [CATALOG: A Camera Trap Language-Guided Contrastive Learning Model](https://openaccess.thecvf.com/content/WACV2025/html/Santamaria_CATALOG_A_Camera_Trap_Language-Guided_Contrastive_Learning_Model_WACV_2025_paper.html)
* [PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning](https://openaccess.thecvf.com/content/WACV2025/html/Liu_PLReMix_Combating_Noisy_Labels_with_Pseudo-Label_Relaxed_Contrastive_Representation_Learning_WACV_2025_paper.html)
* 持续学习
* [Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation](http://arxiv.org/abs/2411.05663v1)
:star:[code](https://github.com/Christina200/Online-LoRA-official.git)
* [Memory-efficient Continual Learning with Neural Collapse Contrastive](http://arxiv.org/abs/2412.02865v1)
* [Exploring the Stability Gap in Continual Learning: The Role of the Classification Head](http://arxiv.org/abs/2411.04723v1)
* [Semantic Prompting with Image Token for Continual Learning](https://openaccess.thecvf.com/content/WACV2025/html/Han_Semantic_Prompting_with_Image_Token_for_Continual_Learning_WACV_2025_paper.html)
* [Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations](https://openaccess.thecvf.com/content/WACV2025/html/Capitani_Towards_Unbiased_Continual_Learning_Avoiding_Forgetting_in_the_Presence_of_WACV_2025_paper.html)
* [EvoCL: Continual Learning over Evolving Domains](https://openaccess.thecvf.com/content/WACV2025/html/Kumaravelu_EvoCL_Continual_Learning_over_Evolving_Domains_WACV_2025_paper.html)
* [AdaPrefix++: Integrating Adapters Prefixes and Hypernetwork for Continual Learning](https://openaccess.thecvf.com/content/WACV2025/html/Adhikari_AdaPrefix_Integrating_Adapters_Prefixes_and_Hypernetwork_for_Continual_Learning_WACV_2025_paper.html)
* 多任务学习
* [Diffusion-based Visual Anagram as Multi-task Learning](http://arxiv.org/abs/2412.02693v1)
:star:[code](https://github.com/Pixtella/Anagram-MTL)
* 对抗
* [FAIR-TAT: Improving Model Fairness using Targeted Adversarial Training](https://openaccess.thecvf.com/content/WACV2025/html/Medi_FAIR-TAT_Improving_Model_Fairness_using_Targeted_Adversarial_Training_WACV_2025_paper.html)
* [PoolAtnRes: Towards Generalisable Differential Morphing Attack Detection](https://openaccess.thecvf.com/content/WACV2025/html/Ramachandra_PoolAtnRes_Towards_Generalisable_Differential_Morphing_Attack_Detection_WACV_2025_paper.html)变形攻击检测
* [Knockoff Branch: Model Stealing Attack via Adding Neurons in the Pre-Trained Model](https://openaccess.thecvf.com/content/WACV2025/html/Hung_Knockoff_Branch_Model_Stealing_Attack_via_Adding_Neurons_in_the_WACV_2025_paper.html)
* [Low-Frequency Black-Box Backdoor Attack via Evolutionary Algorithm](https://openaccess.thecvf.com/content/WACV2025/html/Qiao_Low-Frequency_Black-Box_Backdoor_Attack_via_Evolutionary_Algorithm_WACV_2025_paper.html)
* [Can Adversarial Examples Be Parsed to Reveal Victim Model Information?](https://openaccess.thecvf.com/content/WACV2025/html/Yao_Can_Adversarial_Examples_Be_Parsed_to_Reveal_Victim_Model_Information_WACV_2025_paper.html)
* [When Visual State Space Model Meets Backdoor Attacks](https://openaccess.thecvf.com/content/WACV2025/html/Nagaonkar_When_Visual_State_Space_Model_Meets_Backdoor_Attacks_WACV_2025_paper.html)后门攻击
* [Pre-Trained Multiple Latent Variable Generative Models are Good Defenders Against Adversarial Attacks](https://openaccess.thecvf.com/content/WACV2025/html/Serez_Pre-Trained_Multiple_Latent_Variable_Generative_Models_are_Good_Defenders_Against_WACV_2025_paper.html)

## 26.Motion Generation(人体运动生成)
* [SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering](https://arxiv.org/abs/2412.08343)
:star:[code](https://github.com/Kakanat/SyncViolinist)
* [GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts](https://openaccess.thecvf.com/content/WACV2025/html/Milacski_GHOST_Grounded_Human_Motion_Generation_with_Open_Vocabulary_Scene-and-Text_Contexts_WACV_2025_paper.html)
* [Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Mandelli_Generation_of_Complex_3D_Human_Motion_by_Temporal_and_Spatial_WACV_2025_paper.html)
* [MoRAG - Multi-Fusion Retrieval Augmented Generation for Human Motion](https://openaccess.thecvf.com/content/WACV2025/html/Kalakonda_MoRAG_-_Multi-Fusion_Retrieval_Augmented_Generation_for_Human_Motion_WACV_2025_paper.html)
* [UniTMGE: Uniform Text-Motion Generation and Editing Model via Diffusion](https://openaccess.thecvf.com/content/WACV2025/html/Wang_UniTMGE_Uniform_Text-Motion_Generation_and_Editing_Model_via_Diffusion_WACV_2025_paper.html)
* 基于骨架的运动预测
* [Geometry-Aware Deep Learning for 3D Skeleton-Based Motion Prediction](https://openaccess.thecvf.com/content/WACV2025/html/Zaier_Geometry-Aware_Deep_Learning_for_3D_Skeleton-Based_Motion_Prediction_WACV_2025_paper.html)

## 25.Style Transfer(风格迁移)
* [Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer](https://openaccess.thecvf.com/content/WACV2025/html/Stump_Meta-Learning_for_Color-to-Infrared_Cross-Modal_Style_Transfer_WACV_2025_paper.html)
* [D-LUT: Photorealistic Style Transfer via Diffusion Process](https://openaccess.thecvf.com/content/WACV2025/html/Li_D-LUT_Photorealistic_Style_Transfer_via_Diffusion_Process_WACV_2025_paper.html)
* [Mamba-ST: State Space Model for Efficient Style Transfer](https://openaccess.thecvf.com/content/WACV2025/html/Botti_Mamba-ST_State_Space_Model_for_Efficient_Style_Transfer_WACV_2025_paper.html)

## 24.GAN/Image Synthesis(图像生成)
* [Unsupervised Single-Image Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training](https://openaccess.thecvf.com/content/WACV2025/html/Sato_Unsupervised_Single-Image_Intrinsic_Image_Decomposition_with_LiDAR_Intensity_Enhanced_Training_WACV_2025_paper.html)图像分解
* [ZeroComp: Zero-Shot Object Compositing from Image Intrinsics via Diffusion](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_ZeroComp_Zero-Shot_Object_Compositing_from_Image_Intrinsics_via_Diffusion_WACV_2025_paper.html)
* [MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations](https://openaccess.thecvf.com/content/WACV2025/html/Bafghi_MixDiff_Mixing_Natural_and_Synthetic_Images_for_Robust_Self-Supervised_Representations_WACV_2025_paper.html)合成图像
* [3D Synthesis for Architectural Design](https://openaccess.thecvf.com/content/WACV2025/html/Tsai_3D_Synthesis_for_Architectural_Design_WACV_2025_paper.html)
* [ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_ARTIST_Improving_the_Generation_of_Text-Rich_Images_with_Disentangled_Diffusion_WACV_2025_paper.html)
* [360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation](https://openaccess.thecvf.com/content/WACV2025/html/Wang_360PanT_Training-Free_Text-Driven_360-Degree_Panorama-to-Panorama_Translation_WACV_2025_paper.html)文本驱动的360度全景到全景翻译
* [Clarity Amidst Blur: A Deterministic Method for Synthetic Generation of Water Droplets on Camera Lenses](https://openaccess.thecvf.com/content/WACV2025/html/Eberhardt_Clarity_Amidst_Blur_A_Deterministic_Method_for_Synthetic_Generation_of_WACV_2025_paper.html)水滴合成
* [SpotDiffusion: A Fast Approach for Seamless Panorama Generation Over Time](https://openaccess.thecvf.com/content/WACV2025/html/Frolov_SpotDiffusion_A_Fast_Approach_for_Seamless_Panorama_Generation_Over_Time_WACV_2025_paper.html)全景图生成
* [Attribute Diffusion: Diffusion Driven Diverse Attribute Editing](https://openaccess.thecvf.com/content/WACV2025/html/Parihar_Attribute_Diffusion_Diffusion_Driven_Diverse_Attribute_Editing_WACV_2025_paper.html)多样化属性编辑
* [DiffQRCoder: Diffusion-Based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement](https://openaccess.thecvf.com/content/WACV2025/html/Liao_DiffQRCoder_Diffusion-Based_Aesthetic_QR_Code_Generation_with_Scanning_Robustness_Guided_WACV_2025_paper.html)美学二维码生成
* GAN
* [WINE : Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement](https://openaccess.thecvf.com/content/WACV2025/html/Kim_WINE__Wavelet-Guided_GAN_Inversion_and_Editing_for_High-Fidelity_Refinement_WACV_2025_paper.html)
* 图像合成
* [GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis using Convolutional Neural Networks](https://openaccess.thecvf.com/content/WACV2025/html/Hosseini_GeoPos_A_Minimal_Positional_Encoding_for_Enhanced_Fine-Grained_Details_in_WACV_2025_paper.html)
* [Rethinking Cluster-Conditioned Diffusion Models for Label-Free Image Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Adaloglou_Rethinking_Cluster-Conditioned_Diffusion_Models_for_Label-Free_Image_Synthesis_WACV_2025_paper.html)
* 纹理生成
* [Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds](https://arxiv.org/abs/2412.07766)
* [Make-A-Texture: Fast Shape-Aware 3D Texture Generation in 3 Seconds](https://openaccess.thecvf.com/content/WACV2025/html/Gorelik_Make-A-Texture_Fast_Shape-Aware_3D_Texture_Generation_in_3_Seconds_WACV_2025_paper.html)
* 图像生成
* [RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation](http://arxiv.org/abs/2411.13150v1)(https://github.com/SonyResearch/RAW-Diffusion)
* [MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning](https://arxiv.org/abs/2408.11001)
:star:[code](https://github.com/haoningwu3639/MegaFusion)
:house:[project](https://haoningwu3639.github.io/MegaFusion/)
* [Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis](https://openaccess.thecvf.com/content/WACV2025/html/Lee_Beta_Sampling_is_All_You_Need_Efficient_Image_Generation_Strategy_WACV_2025_paper.html)
* [Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation](https://openaccess.thecvf.com/content/WACV2025/html/Lakhanpal_Refining_Text-to-Image_Generation_Towards_Accurate_Training-Free_Glyph-Enhanced_Image_Generation_WACV_2025_paper.html)
* [Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects](https://openaccess.thecvf.com/content/WACV2025/html/Jo_Skip-and-Play_Depth-Driven_Pose-Preserved_Image_Generation_for_Any_Objects_WACV_2025_paper.html)
* [FineControlNet: Fine-Level Text Control for Image Generation with Spatially Aligned Text Control Injection](https://openaccess.thecvf.com/content/WACV2025/html/Choi_FineControlNet_Fine-Level_Text_Control_for_Image_Generation_with_Spatially_Aligned_WACV_2025_paper.html)
* 食谱生成
* [Retrieval Augmented Recipe Generation](http://arxiv.org/abs/2411.08715v1)
* 图像编辑
* [Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing](http://arxiv.org/abs/2411.19652v1)
:star:[code](https://github.com/Mowenyii/Uniform-Attention-Maps)
* [Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance](http://arxiv.org/abs/2412.15798v1)
* [Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing](https://openaccess.thecvf.com/content/WACV2025/html/Huang_Dual-Schedule_Inversion_Training-_and_Tuning-Free_Inversion_for_Real_Image_Editing_WACV_2025_paper.html)
* [LIME: Localized Image Editing via Attention Regularization in Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Simsar_LIME_Localized_Image_Editing_via_Attention_Regularization_in_Diffusion_Models_WACV_2025_paper.html)
* [ReEdit: Multimodal Exemplar-Based Image Editing](https://openaccess.thecvf.com/content/WACV2025/html/Srivastava_ReEdit_Multimodal_Exemplar-Based_Image_Editing_WACV_2025_paper.html)
* [GeoDiffuser: Geometry-Based Image Editing with Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Sajnani_GeoDiffuser_Geometry-Based_Image_Editing_with_Diffusion_Models_WACV_2025_paper.html)
* [DragText: Rethinking Text Embedding in Point-Based Image Editing](https://openaccess.thecvf.com/content/WACV2025/html/Choi_DragText_Rethinking_Text_Embedding_in_Point-Based_Image_Editing_WACV_2025_paper.html)
* [Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits](https://openaccess.thecvf.com/content/WACV2025/html/Kang_Incorporating_Task_Progress_Knowledge_for_Subgoal_Generation_in_Robotic_Manipulation_WACV_2025_paper.html)
* 文本-图像
* [DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models](http://arxiv.org/abs/2411.19390v1)
* [Disentangling Subject-Irrelevant Elements in Personalized Text-to-Image Diffusion via Filtered Self-Distillation](https://openaccess.thecvf.com/content/WACV2025/html/Choi_Disentangling_Subject-Irrelevant_Elements_in_Personalized_Text-to-Image_Diffusion_via_Filtered_Self-Distillation_WACV_2025_paper.html)
* [Counting Guidance for High Fidelity Text-to-Image Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Kang_Counting_Guidance_for_High_Fidelity_Text-to-Image_Synthesis_WACV_2025_paper.html)
* [Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Jena_Elucidating_Optimal_Reward-Diversity_Tradeoffs_in_Text-to-Image_Diffusion_Models_WACV_2025_paper.html)
* [Detecting Origin Attribution for Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Xu_Detecting_Origin_Attribution_for_Text-to-Image_Diffusion_Models_WACV_2025_paper.html)
* [AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models](https://openaccess.thecvf.com/content/WACV2025/html/Agarwal_AlignIT_Enhancing_Prompt_Alignment_in_Customization_of_Text-to-Image_Models_WACV_2025_paper.html)
* [Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Xu_Good_Seed_Makes_a_Good_Crop_Discovering_Secret_Seeds_in_WACV_2025_paper.html)
* [Improving Faithfulness of Text-to-Image Diffusion Models through Inference Intervention](https://openaccess.thecvf.com/content/WACV2025/html/Guo_Improving_Faithfulness_of_Text-to-Image_Diffusion_Models_through_Inference_Intervention_WACV_2025_paper.html)
* [Structured Human Assessment of Text-to-Image Generative Models](https://openaccess.thecvf.com/content/WACV2025/html/Corneanu_Structured_Human_Assessment_of_Text-to-Image_Generative_Models_WACV_2025_paper.html)
* [Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Buchheim_Controlling_Human_Shape_and_Pose_in_Text-to-Image_Diffusion_Models_via_WACV_2025_paper.html)
* [An Image is Worth Multiple Words: Multi-Attribute Inversion for Constrained Text-to-Image Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Agarwal_An_Image_is_Worth_Multiple_Words_Multi-Attribute_Inversion_for_Constrained_WACV_2025_paper.html)
* 布局到图像生成
* [STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation](https://openaccess.thecvf.com/content/WACV2025/html/Wang_STAY_Diffusion_Styled_Layout_Diffusion_Model_for_Diverse_Layout-to-Image_Generation_WACV_2025_paper.html)
* 三维生成
* [My3DGen: A Scalable Personalized 3D Generative Model](https://openaccess.thecvf.com/content/WACV2025/html/Qi_My3DGen_A_Scalable_Personalized_3D_Generative_Model_WACV_2025_paper.html)
* 文本-3D
* [GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space](http://arxiv.org/abs/2412.16717v1)
:star:[code](https://ganfusion.github.io/)
* [Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation](https://openaccess.thecvf.com/content/WACV2025/html/Nath_Deep_Geometric_Moments_Promote_Shape_Consistency_in_Text-to-3D_Generation_WACV_2025_paper.html)
* [HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation](https://openaccess.thecvf.com/content/WACV2025/html/Mercier_HexaGen3D_StableDiffusion_is_One_Step_Away_from_Fast_and_Diverse_WACV_2025_paper.html)
* 图像-图像翻译
* [Uncertainty-Aware Regularization for Image-to-Image Translation](http://arxiv.org/abs/2412.01705v1)
* 视频编辑
* [Ada-VE: Training-Free Consistent Video Editing using Adaptive Motion Prior](https://openaccess.thecvf.com/content/WACV2025/html/Mahmud_Ada-VE_Training-Free_Consistent_Video_Editing_using_Adaptive_Motion_Prior_WACV_2025_paper.html)
* [MagicStick: Controllable Video Editing via Control Handle Transformations](https://openaccess.thecvf.com/content/WACV2025/html/Ma_MagicStick_Controllable_Video_Editing_via_Control_Handle_Transformations_WACV_2025_paper.html)
* 文本-视频编辑
* [FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_FastVideoEdit_Leveraging_Consistency_Models_for_Efficient_Text-to-Video_Editing_WACV_2025_paper.html)
* 视频生成
* [Fine-Grained Controllable Video Generation via Object Appearance and Context](https://openaccess.thecvf.com/content/WACV2025/html/Huang_Fine-Grained_Controllable_Video_Generation_via_Object_Appearance_and_Context_WACV_2025_paper.html)
* [Generating Long-Take Videos via Effective Keyframes and Guidance](https://openaccess.thecvf.com/content/WACV2025/html/Huang_Generating_Long-Take_Videos_via_Effective_Keyframes_and_Guidance_WACV_2025_paper.html)生成长镜头视频
* [Dance Any Beat: Blending Beats with Visuals in Dance Video Generation](https://openaccess.thecvf.com/content/WACV2025/html/Wang_Dance_Any_Beat_Blending_Beats_with_Visuals_in_Dance_Video_WACV_2025_paper.html)
* [Corgi: Cached Memory Guided Video Generation](https://openaccess.thecvf.com/content/WACV2025/html/Wu_Corgi_Cached_Memory_Guided_Video_Generation_WACV_2025_paper.html)
* [TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Li_TrackDiffusion_Tracklet-Conditioned_Video_Generation_via_Diffusion_Models_WACV_2025_paper.html)
* 视频合成
* [Contrastive Sequential-Diffusion Learning: Non-Linear and Multi-Scene Instructional Video Synthesis](https://openaccess.thecvf.com/content/WACV2025/html/Ramos_Contrastive_Sequential-Diffusion_Learning_Non-Linear_and_Multi-Scene_Instructional_Video_Synthesis_WACV_2025_paper.html)
* 轮胎足迹生成
* [CTIP: Towards Accurate Tabular-to-Image Generation for Tire Footprint Generation](https://openaccess.thecvf.com/content/WACV2025/html/Roh_CTIP_Towards_Accurate_Tabular-to-Image_Generation_for_Tire_Footprint_Generation_WACV_2025_paper.html)
* 扩散模型
* [Enhancing Image Layout Control with Loss-Guided Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Patel_Enhancing_Image_Layout_Control_with_Loss-Guided_Diffusion_Models_WACV_2025_paper.html)
* [GeoGuide: Geometric Guidance of Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Poleski_GeoGuide_Geometric_Guidance_of_Diffusion_Models_WACV_2025_paper.html)
* [Inverse Problems with Diffusion Models: A MAP Estimation Perspective](https://openaccess.thecvf.com/content/WACV2025/html/Gutha_Inverse_Problems_with_Diffusion_Models_A_MAP_Estimation_Perspective_WACV_2025_paper.html)
* [Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation](http://arxiv.org/abs/2412.00205v1)
* [SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models](http://arxiv.org/abs/2412.02332v1)
:star:[code](https://github.com/SanoScience/SimuScope)
* [Negative-Prompt Inversion: Fast Image Inversion for Editing with Text-Guided Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Miyake_Negative-Prompt_Inversion_Fast_Image_Inversion_for_Editing_with_Text-Guided_Diffusion_WACV_2025_paper.html)
* [SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_SODA_Spectral_Orthogonal_Decomposition_Adaptation_for_Diffusion_Models_WACV_2025_paper.html)
* [Improving Conditional Diffusion Models through Re-Noising from Unconditional Diffusion Priors](https://openaccess.thecvf.com/content/WACV2025/html/Mei_Improving_Conditional_Diffusion_Models_through_Re-Noising_from_Unconditional_Diffusion_Priors_WACV_2025_paper.html)
* [MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection](https://openaccess.thecvf.com/content/WACV2025/html/Dutt_MemControl_Mitigating_Memorization_in_Diffusion_Models_via_Automated_Parameter_Selection_WACV_2025_paper.html)
* [Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Jun_Disentangling_Disentangled_Representations_Towards_Improved_Latent_Units_via_Diffusion_Models_WACV_2025_paper.html)
* [Elucidating the Solution Space of Extended Reverse-Time SDE for Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Cui_Elucidating_the_Solution_Space_of_Extended_Reverse-Time_SDE_for_Diffusion_WACV_2025_paper.html)
* [DiffuseKronA: A Parameter Efficient Fine-Tuning Method for Personalized Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Marjit_DiffuseKronA_A_Parameter_Efficient_Fine-Tuning_Method_for_Personalized_Diffusion_Models_WACV_2025_paper.html)
* [CusConcept: Customized Visual Concept Decomposition with Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Xu_CusConcept_Customized_Visual_Concept_Decomposition_with_Diffusion_Models_WACV_2025_paper.html)
* [CharDiff: Improving Sampling Convergence via Characteristic Function Consistency in Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Sinha_CharDiff_Improving_Sampling_Convergence_via_Characteristic_Function_Consistency_in_Diffusion_WACV_2025_paper.html)
* [Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think](https://openaccess.thecvf.com/content/WACV2025/html/Garcia_Fine-Tuning_Image-Conditional_Diffusion_Models_is_Easier_than_You_Think_WACV_2025_paper.html)

## 23.Visual Question Answering(视觉问答)
* 视频问答
* [DAM: Dynamic Adapter Merging for Continual Video QA Learning](https://openaccess.thecvf.com/content/WACV2025/html/Cheng_DAM_Dynamic_Adapter_Merging_for_Continual_Video_QA_Learning_WACV_2025_paper.html)
* [Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering](https://arxiv.org/abs/2412.09230)
* [Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries](http://arxiv.org/abs/2412.19304v1)
* 视觉问答
* [CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_CL-Cross_VQA_A_Continual_Learning_Benchmark_for_Cross-Domain_Visual_Question_WACV_2025_paper.html)
* [AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements](https://openaccess.thecvf.com/content/WACV2025/html/Choudhary_AdQuestA_Knowledge-Guided_Visual_Question_Answer_Framework_for_Advertisements_WACV_2025_paper.html)
* [One VLM to Keep it Learning: Generation and Balancing for Data-Free Continual Visual Question Answering](https://openaccess.thecvf.com/content/WACV2025/html/Das_One_VLM_to_Keep_it_Learning_Generation_and_Balancing_for_WACV_2025_paper.html)
* [Unsupervised Domain Adaptive Visual Question Answering in the Era of Multi-Modal Large Language Models](https://openaccess.thecvf.com/content/WACV2025/html/Weng_Unsupervised_Domain_Adaptive_Visual_Question_Answering_in_the_Era_of_WACV_2025_paper.html)
* [Visual Robustness Benchmark for Visual Question Answering (VQA)](https://openaccess.thecvf.com/content/WACV2025/html/Ishmam_Visual_Robustness_Benchmark_for_Visual_Question_Answering_VQA_WACV_2025_paper.html)
* 图表问答
* [Advancing Chart Question Answering with Robust Chart Component Recognition](https://openaccess.thecvf.com/content/WACV2025/html/Zheng_Advancing_Chart_Question_Answering_with_Robust_Chart_Component_Recognition_WACV_2025_paper.html)
* 表格问答
* [TRH2TQA: Table Recognition with Hierarchical Relationships to Table Question-Answering on Business Table Images](https://openaccess.thecvf.com/content/WACV2025/html/Jirachanchaisiri_TRH2TQA_Table_Recognition_with_Hierarchical_Relationships_to_Table_Question-Answering_on_WACV_2025_paper.html)

## 22.OCR
* 手写文档识别
* [DocTTT: Test-Time Training for Handwritten Document Recognition using Meta-Auxiliary Learning](https://openaccess.thecvf.com/content/WACV2025/html/Gu_DocTTT_Test-Time_Training_for_Handwritten_Document_Recognition_using_Meta-Auxiliary_Learning_WACV_2025_paper.html)
* 场景文本识别
* [Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition](https://openaccess.thecvf.com/content/WACV2025/html/Le_Stratified_Domain_Adaptation_A_Progressive_Self-Training_Approach_for_Scene_Text_WACV_2025_paper.html)
* 场景文本编辑
* [FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework](https://openaccess.thecvf.com/content/WACV2025/html/Das_FASTER_A_Font-Agnostic_Scene_Text_Editing_and_Rendering_Framework_WACV_2025_paper.html)
* 文本变化检测
* [Text Change Detection in Multilingual Documents Using Image Comparison](http://arxiv.org/abs/2412.04137v1)
* 文本多边形检测
* [TPD-STR: Text Polygon Detection with Split Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Kim_TPD-STR_Text_Polygon_Detection_with_Split_Transformers_WACV_2025_paper.html)
* 表结构识别
* [Treading Towards Privacy-Preserving Table Structure Recognition](https://openaccess.thecvf.com/content/WACV2025/html/Raja_Treading_Towards_Privacy-Preserving_Table_Structure_Recognition_WACV_2025_paper.html)

# 21.3D(三维重建\三维视觉)
* [NPL-MVPS: Neural Point-Light Multi-View Photometric Stereo](https://openaccess.thecvf.com/content/WACV2025/html/Logothetis_NPL-MVPS_Neural_Point-Light_Multi-View_Photometric_Stereo_WACV_2025_paper.html)多视角光度立体
* [HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors](https://openaccess.thecvf.com/content/WACV2025/html/Ganj_HybridDepth_Robust_Metric_Depth_Fusion_by_Leveraging_Depth_from_Focus_WACV_2025_paper.html)
* [LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo](https://arxiv.org/abs/2409.02716)
* [Instructive3D: Editing Large Reconstruction Models with Text Instructions](http://arxiv.org/abs/2501.04374v1)
* [Scene-LLM: Extending Language Model for 3D Visual Reasoning](https://openaccess.thecvf.com/content/WACV2025/html/Fu_Scene-LLM_Extending_Language_Model_for_3D_Visual_Reasoning_WACV_2025_paper.html)
* [Towards a Training Free Approach for 3D Scene Editing](https://openaccess.thecvf.com/content/WACV2025/html/Madhavaram_Towards_a_Training_Free_Approach_for_3D_Scene_Editing_WACV_2025_paper.html)
* [CRAFT: Designing Creative and Functional 3D Objects](https://openaccess.thecvf.com/content/WACV2025/html/Guo_CRAFT_Designing_Creative_and_Functional_3D_Objects_WACV_2025_paper.html)
* [NeRFs are Mirror Detectors: using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives](https://openaccess.thecvf.com/content/WACV2025/html/Van_Holland_NeRFs_are_Mirror_Detectors_using_Structural_Similarity_for_Multi-View_Mirror_WACV_2025_paper.html)
* [VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field](https://openaccess.thecvf.com/content/WACV2025/html/Thomas_VortSDF_3D_Modeling_with_Centroidal_Voronoi_Tesselation_on_Signed_Distance_WACV_2025_paper.html)
* [EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration](https://openaccess.thecvf.com/content/WACV2025/html/Bin_Aziz_EfficientMorph_Parameter-Efficient_Transformer-Based_Architecture_for_3D_Image_Registration_WACV_2025_paper.html)
* 3DGS
* [Planar Gaussian Splatting](http://arxiv.org/abs/2412.01931v1)
* [UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction](https://openaccess.thecvf.com/content/WACV2025/html/Wang_UW-GS_Distractor-Aware_3D_Gaussian_Splatting_for_Enhanced_Underwater_Scene_Reconstruction_WACV_2025_paper.html)
* [EdgeGaussians - 3D Edge Mapping via Gaussian Splatting](https://openaccess.thecvf.com/content/WACV2025/html/Chelani_EdgeGaussians_-_3D_Edge_Mapping_via_Gaussian_Splatting_WACV_2025_paper.html)
* [Localized Gaussian Splatting Editing with Contextual Awareness](https://openaccess.thecvf.com/content/WACV2025/html/Xiao_Localized_Gaussian_Splatting_Editing_with_Contextual_Awareness_WACV_2025_paper.html)
* [DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing](https://openaccess.thecvf.com/content/WACV2025/html/Turkulainen_DN-Splatter_Depth_and_Normal_Priors_for_Gaussian_Splatting_and_Meshing_WACV_2025_paper.html)
* [ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting](https://openaccess.thecvf.com/content/WACV2025/html/Ali_ELMGS_Enhancing_Memory_and_Computation_Scalability_through_Compression_for_3D_WACV_2025_paper.html)
* [OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting](https://openaccess.thecvf.com/content/WACV2025/html/Li_OmniGS_Fast_Radiance_Field_Reconstruction_using_Omnidirectional_Gaussian_Splatting_WACV_2025_paper.html)
* 三维重建
* [Assessing the Quality of 3D Reconstruction in the Absence of Ground Truth: Application to a Multimodal Archaeological Dataset](https://openaccess.thecvf.com/content/WACV2025/html/Coupry_Assessing_the_Quality_of_3D_Reconstruction_in_the_Absence_of_WACV_2025_paper.html)
* [Multi-HexPlanes: A Lightweight Map Representation for Rendering and 3D Reconstruction](https://openaccess.thecvf.com/content/WACV2025/html/Zheng_Multi-HexPlanes_A_Lightweight_Map_Representation_for_Rendering_and_3D_Reconstruction_WACV_2025_paper.html)
* [Semantic Segmentation Method for Automated Indoor 3D Reconstruction Based on Architectural-Knowledge-Aware Features](https://openaccess.thecvf.com/content/WACV2025/html/Chen_Semantic_Segmentation_Method_for_Automated_Indoor_3D_Reconstruction_Based_on_WACV_2025_paper.html)
* [DreaMo: Articulated 3D Reconstruction from a Single Casual Video](https://openaccess.thecvf.com/content/WACV2025/html/Tu_DreaMo_Articulated_3D_Reconstruction_from_a_Single_Casual_Video_WACV_2025_paper.html)
* [Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps](https://openaccess.thecvf.com/content/WACV2025/html/Wu_Sparse-View_3D_Reconstruction_of_Clothed_Humans_via_Normal_Maps_WACV_2025_paper.html)
* [Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation](https://openaccess.thecvf.com/content/WACV2025/html/Burde_Comparative_Evaluation_of_3D_Reconstruction_Methods_for_Object_Pose_Estimation_WACV_2025_paper.html)
* [ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction](https://openaccess.thecvf.com/content/WACV2025/html/Weder_ALSTER_A_Local_Spatio-Temporal_Expert_for_Online_3D_Semantic_Reconstruction_WACV_2025_paper.html)
* 表面重建
* [Spatially-Adaptive Hash Encodings for Neural Surface Reconstruction](https://openaccess.thecvf.com/content/WACV2025/html/Walker_Spatially-Adaptive_Hash_Encodings_for_Neural_Surface_Reconstruction_WACV_2025_paper.html)
* [Transientangelo: Few-Viewpoint Surface Reconstruction using Single-Photon Lidar](https://openaccess.thecvf.com/content/WACV2025/html/Luo_Transientangelo_Few-Viewpoint_Surface_Reconstruction_using_Single-Photon_Lidar_WACV_2025_paper.html)
* [PVT: An Implicit Surface Reconstruction Framework via Point Voxel Geometric-Aware Transformer](https://openaccess.thecvf.com/content/WACV2025/html/Fan_PVT_An_Implicit_Surface_Reconstruction_Framework_via_Point_Voxel_Geometric-Aware_WACV_2025_paper.html)
* 深度估计
* [GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling](https://arxiv.org/abs/2409.02720)
* [MDCN-PS: Monocular-Depth-Guided Coarse Normal Attention for Robust Photometric Stereo](https://openaccess.thecvf.com/content/WACV2025/html/Yamaguchi_MDCN-PS_Monocular-Depth-Guided_Coarse_Normal_Attention_for_Robust_Photometric_Stereo_WACV_2025_paper.html)
* [Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation](http://arxiv.org/abs/2411.04714v1)
* [MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications](http://arxiv.org/abs/2411.19717v1)
:star:[code](https://mono-pp.github.io/)
* [OmniDiffusion: Reformulating 360 Monocular Depth Estimation using Semantic and Surface Normal Conditioned Diffusion](https://openaccess.thecvf.com/content/WACV2025/html/Mohadikar_OmniDiffusion_Reformulating_360_Monocular_Depth_Estimation_using_Semantic_and_Surface_WACV_2025_paper.html)
* [Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks](https://openaccess.thecvf.com/content/WACV2025/html/Quercia_Enhancing_Monocular_Depth_Estimation_with_Multi-Source_Auxiliary_Tasks_WACV_2025_paper.html)
* [CabNIR: A Benchmark for In-Vehicle Infrared Monocular Depth Estimation](https://openaccess.thecvf.com/content/WACV2025/html/Cavalcanti_CabNIR_A_Benchmark_for_In-Vehicle_Infrared_Monocular_Depth_Estimation_WACV_2025_paper.html)
* 房屋布局估计
* [uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images](https://openaccess.thecvf.com/content/WACV2025/html/Lee_uLayout_Unified_Room_Layout_Estimation_for_Perspective_and_Panoramic_Images_WACV_2025_paper.html)
* 三维场景理解
* [Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding](https://openaccess.thecvf.com/content/WACV2025/html/Kong_Calib3D_Calibrating_Model_Preferences_for_Reliable_3D_Scene_Understanding_WACV_2025_paper.html)
* 三维语义场景补全
* [DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Yao_DepthSSC_Monocular_3D_Semantic_Scene_Completion_via_Depth-Spatial_Alignment_and_WACV_2025_paper.html)
* 3D形状补全
* [A Recipe for Geometry-Aware 3D Mesh Transformers](https://openaccess.thecvf.com/content/WACV2025/html/Farazi_A_Recipe_for_Geometry-Aware_3D_Mesh_Transformers_WACV_2025_paper.html)
* [3D Shape Completion using Multi-Resolution Spectral Encoding](https://openaccess.thecvf.com/content/WACV2025/html/Deka_3D_Shape_Completion_using_Multi-Resolution_Spectral_Encoding_WACV_2025_paper.html)
* [ShapeMorph: 3D Shape Completion via Blockwise Discrete Diffusion](https://openaccess.thecvf.com/content/WACV2025/html/Li_ShapeMorph_3D_Shape_Completion_via_Blockwise_Discrete_Diffusion_WACV_2025_paper.html)

## 20.Point Cloud(点云)
* [BioNet and NeFF: Crop Biomass Prediction from Point Clouds to Drone Imagery](https://openaccess.thecvf.com/content/WACV2025/html/Li_BioNet_and_NeFF_Crop_Biomass_Prediction_from_Point_Clouds_to_WACV_2025_paper.html)
* [Point Cloud Color Upsampling with Attention-Based Coarse Colorization and Refinement](https://openaccess.thecvf.com/content/WACV2025/html/Matsuzaki_Point_Cloud_Color_Upsampling_with_Attention-Based_Coarse_Colorization_and_Refinement_WACV_2025_paper.html)
* [On-the-Fly Object-aware Representative Point Selection in Point Cloud](https://openaccess.thecvf.com/content/WACV2025/html/Zhang_On-the-Fly_Object-aware_Representative_Point_Selection_in_Point_Cloud_WACV_2025_paper.html)
* [PocoLoco: A Point Cloud Diffusion Model of Human Shape in Loose Clothing](http://arxiv.org/abs/2411.04249v1)
:star:[code](https://github.com/sidsunny/pocoloco)
* [Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging](https://openaccess.thecvf.com/content/WACV2025/html/Bahri_Test-Time_Adaptation_in_Point_Clouds_Leveraging_Sampling_Variation_with_Weight_WACV_2025_paper.html)
* [Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud](https://openaccess.thecvf.com/content/WACV2025/html/Saito_Point-JEPA_A_Joint_Embedding_Predictive_Architecture_for_Self-Supervised_Learning_on_WACV_2025_paper.html)
* 3D 点云
* [Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models](http://arxiv.org/abs/2411.14495v1)
:star:[code](https://github.com/hamidreza-dastmalchi/3DD-TTA)
* [Learning under Noisy Labels Spurious Points and Diverse Structures: TS40K a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission Systems](https://openaccess.thecvf.com/content/WACV2025/html/Lavado_Learning_under_Noisy_Labels_Spurious_Points_and_Diverse_Structures_TS40K_WACV_2025_paper.html)
* [Learning Semantic Part-Based Graph Structure for 3D Point Cloud Domain Generalization](https://openaccess.thecvf.com/content/WACV2025/html/Sai_Learning_Semantic_Part-Based_Graph_Structure_for_3D_Point_Cloud_Domain_WACV_2025_paper.html)
* [Adversarial Learning Based Knowledge Distillation on 3D Point Clouds](https://openaccess.thecvf.com/content/WACV2025/html/J_Adversarial_Learning_Based_Knowledge_Distillation_on_3D_Point_Clouds_WACV_2025_paper.html)
* [RGB2Point: 3D Point Cloud Generation from Single RGB Images](https://openaccess.thecvf.com/content/WACV2025/html/Lee_RGB2Point_3D_Point_Cloud_Generation_from_Single_RGB_Images_WACV_2025_paper.html)
* [Continual Learning in 3D Point Clouds: Employing Spectral Techniques for Exemplar Selection](https://openaccess.thecvf.com/content/WACV2025/html/Resani_Continual_Learning_in_3D_Point_Clouds_Employing_Spectral_Techniques_for_WACV_2025_paper.html)
* 点云分类
* [Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification](http://arxiv.org/abs/2412.03056v1)
* 点云分割
* [ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset](http://arxiv.org/abs/2411.04865v1)
* 点云配准
* [XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration](https://arxiv.org/abs/2411.18377)

## 19.Video
* [NeuroViG - Integrating Event Cameras for Resource-Efficient Video Grounding](https://openaccess.thecvf.com/content/WACV2025/html/Weerakoon_NeuroViG_-_Integrating_Event_Cameras_for_Resource-Efficient_Video_Grounding_WACV_2025_paper.html)
* [MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence](https://openaccess.thecvf.com/content/WACV2025/html/Nguyen_MVFNet_Multipurpose_Video_Forensics_Network_using_Multiple_Forms_of_Forensic_WACV_2025_paper.html)
* [GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-Grained Video-Language Learning](https://openaccess.thecvf.com/content/WACV2025/html/Wang_GEXIA_Granularity_Expansion_and_Iterative_Approximation_for_Scalable_Multi-Grained_Video-Language_WACV_2025_paper.html)视频语言学习
* 视频监控
* [DashCop: Automated E-Ticket Generation for Two-Wheeler Traffic Violations using Dashcam Videos](https://openaccess.thecvf.com/content/WACV2025/html/Rawat_DashCop_Automated_E-Ticket_Generation_for_Two-Wheeler_Traffic_Violations_using_Dashcam_WACV_2025_paper.html)使用Dashcam视频自动生成两轮车交通违章电子票
* 视频理解
* [Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation](https://openaccess.thecvf.com/content/WACV2025/html/Wu_Ego-VPA_Egocentric_Video_Understanding_with_Parameter-Efficient_Adaptation_WACV_2025_paper.html)
* [Paladin: Understanding Video Intentions in Political Advertisement Videos](https://openaccess.thecvf.com/content/WACV2025/html/Liu_Paladin_Understanding_Video_Intentions_in_Political_Advertisement_Videos_WACV_2025_paper.html)
* [Frame by Familiar Frame: Understanding Replication in Video Diffusion Models](https://openaccess.thecvf.com/content/WACV2025/html/Rahman_Frame_by_Familiar_Frame_Understanding_Replication_in_Video_Diffusion_Models_WACV_2025_paper.html)
* 视频时许定位
* [FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding](http://arxiv.org/abs/2412.13441v1)
:star:[code](https://github.com/Zhuo-Cao/FlashVTG)
* 视频异常检测
* [Graph-Jigsaw Conditioned Diffusion Model for Skeleton-Based Video Anomaly Detection](https://openaccess.thecvf.com/content/WACV2025/html/Karami_Graph-Jigsaw_Conditioned_Diffusion_Model_for_Skeleton-Based_Video_Anomaly_Detection_WACV_2025_paper.html)
* [Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos](https://openaccess.thecvf.com/content/WACV2025/html/Majhi_Guess_Future_Anomalies_from_Normalcy_Forecasting_Abnormal_Behavior_in_Real-World_WACV_2025_paper.html)
* [Distilli

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52cv/wacv-2025-papers

Awesome Lists containing this project

README