https://github.com/52cv/iccv-2025-papers

Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/52cv/iccv-2025-papers
Owner: 52CV
Created: 2025-06-30T03:30:36.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-11-07T03:14:29.000Z (8 months ago)
Last Synced: 2025-11-07T05:28:11.531Z (8 months ago)
Size: 196 KB
Stars: 24
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ICCV-2025-Papers

![image](https://github.com/user-attachments/assets/0b93ce8a-4383-46ba-9672-6c746728c9f9)

## 会议时间：2025年10月19日至23日

## 会议网址：https://iccv.thecvf.com/

## 查看2025年综述文献点这里↘️[2025-CV-Surveys](https://github.com/52CV/CV-Surveys)

## 2025 年论文分类汇总戳这里

↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)

↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)

↘️[ICCV-2025-Papers](https://github.com/52CV/ICCV-2025-Papers)

## 2024 年论文分类汇总戳这里

↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)

↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)

↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)

## [2023 年论文分类汇总戳这里](#0000)

## [2022 年论文分类汇总戳这里](#000)

## [2021 年论文分类汇总戳这里](#00)

## [2020 年论文分类汇总戳这里](#0)

## 已全部分类完

### 🏆最佳论文

* [Generating Physically Stable and Buildable Brick Structures from Text](http://arxiv.org/abs/2505.05469)
:house:[project](https://avalovelace1.github.io/BrickGPT) :house:[project](https://avalovelace1.github.io/BrickGPT/)

* [ICCV 2025 最佳论文公布！卡内基梅隆大学提出BrickGPT:文本生成实体积木，还能保证搭得稳！](https://zhuanlan.zhihu.com/p/81635387724)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|

|------|------|------|------|

|[1.Other](#1)|[2.Image Progress(图像/视频处理)](#2)|[3.Super-Resolution(超分辨率)](#3)|[4.Image Captioning(图像字幕)](#4)|

|[5.Image Generation(图像生成)](#5)|[6.Image Segmentation(图像分割)](#6)|[7.Image Classification(图像分类)](#7)|[8.Image/Video Retrieval(图像/视频检索)](#8)|

|[9.Image/Video Compression(图像/视频压缩)](#9)|[10.Medical Image Progress(医学图像处理)](#10)|[11.Face](#11)|[12.Avatar](#12)|

|[13.Object Detection(目标检测) ](#13)|[14.Object Track(目标跟踪)](#14)|[15.pose](#15)|[16.Human Motion](#16)|

|[17.Action Recognition(动作识别)](17#)|[18.Re-Id(行人重识别)](18#)|[19.Video](19#)|[20.OCR](20#)|

|[21.UAV/RS/Satellite Image(无人机/遥感/卫星图像)](21#)|[22.3D](22#)|[23.Point Cloud(点云)](23#)|[24.Autonomous Driving(自动驾驶)](24#)|

|[25.HOI(人机交互)](#25)|[26.Robot](#26)|[27.Visual Question Answering(视觉问答)](#27)|[28.Optical Flow Estimation(光流估计)](#28)|

|[29.Deepfake Detection/AI生成图像检测](#29)|[30.Image Fusion(图像融合)](#30)|[31.Image Matching(图像匹配)](#31)|[32.Image Registration(图像配准)](#32)|

|[33.Keypoint Detection(关键点检测)](#33)|[34.Object Pose Estimation(物体姿态估计)](#34)|[35.Style Transfer(风格迁移)](#35)|[36.Scene Graph Generation(场景图生成)](#36)|

|[37.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#37)|[38.F/ZSL/DG/A(小/零样本/域泛化/适应)](#38)|[39.Machine learning(机器学习)](#39)|[40.Deep learning(深度学习)](#40)|

|[41.NAS(神经架构搜索)](#41)|[42.Vision Transformer](#42)|[43.Vision Language(视觉语言)](#43)|[44.Neural Radiance Fields](#44)|

|[45.Dataset](45#)|[46.Sound](46#)|[47.Animation(动画)](47#)|[48.Industrial Anomaly Detection(工业异常检测)](48#)|

|[49.biometric recognition(生物特征识别)](49#)|[50.Protecting copyright(保护版权)](50#)|[51.Visual Relationship Detection,VRD(视觉关系检测)](51#)|[52.Gaze](52#)|

|[53.Dense Prediction](53#)|[54.计算成像](54#)|



## 54.计算成像

* [IM360 Large-scale Indoor Mapping with 360 Cameras](http://arxiv.org/abs/2502.12545)

* [Multispectral Demosaicing via Dual Cameras](http://arxiv.org/abs/2503.22026)

* [Processing and acquisition traces in visual encoders What does CLIP know about your camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Ramos_Processing_and_acquisition_traces_in_visual_encoders_What_does_CLIP_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ryan-caesar-ramos/visual-encoder-traces)

* [Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras](http://arxiv.org/pdf/2506.22069v1)

* [Estimating 2D Camera Motion with Hybrid Motion Basis](https://arxiv.org/pdf/2507.22480v1)
:star:[code](https://lhaippp.github.io/CamFlow/)
:star:[code](https://github.com/lhaippp/camflow)

* [Image as an IMU Estimating Camera Motion from a Single Motion-Blurred Image](http://arxiv.org/abs/2503.17358)

* [AlignDiff Learning Physically-Grounded Camera Alignment via Diffusion](http://arxiv.org/abs/2503.21581)

* [TrajectoryCrafter Redirecting Camera Trajectory for Monocular Videos via Diffusion Models](http://arxiv.org/abs/2503.05638)

* [Super Resolved Imaging with Adaptive Optics](https://arxiv.org/pdf/2508.04648v1)
:house:[project](https://www.cs.toronto.edu/~robin/aosr/)

* [HccePose(BF) Predicting Front  Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation](http://arxiv.org/abs/2510.10177)

* [RePoseD Efficient Relative Pose Estimation With Known Depth Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_RePoseD_Efficient_Relative_Pose_Estimation_With_Known_Depth_Information_ICCV_2025_paper.pdf)
:star:[code](https://github.com/kocurvik/mdrp)

* [Scaling 3D Compositional Models for Robust Classification and Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Scaling_3D_Compositional_Models_for_Robust_Classification_and_Pose_Estimation_ICCV_2025_paper.pdf)

* [DRaM-LHM A Quaternion Framework for Iterative Camera Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_DRaM-LHM_A_Quaternion_Framework_for_Iterative_Camera_Pose_Estimation_ICCV_2025_paper.pdf)

* [Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_Epipolar_Consistent_Attention_Aggregation_Network_for_Unsupervised_Light_Field_Disparity_ICCV_2025_paper.pdf)

* [TESPEC Temporally-Enhanced Self-Supervised Pretraining for Event Cameras](http://arxiv.org/abs/2508.00913)
:house:[project](https://mhdmohammadi.github.io/TESPEC_webpage)

* [Simultaneous Motion And Noise Estimation with Event Cameras](http://arxiv.org/abs/2504.04029)
:star:[code](https://github.com/tub-rip/ESMD) :house:[project](https://github.com/tub-rip/ESMD)

* [EventUPS Uncalibrated Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_EventUPS_Uncalibrated_Photometric_Stereo_Using_an_Event_Camera_ICCV_2025_paper.pdf)

* [GenDoP Auto-regressive Camera Trajectory Generation as a Director of Photography](http://arxiv.org/abs/2504.07083)
:house:[project](https://kszpxxzmc.github.io/GenDoP)

* [Inverse Image-Based Rendering for Light Field Generation from Single Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Jung_Inverse_Image-Based_Rendering_for_Light_Field_Generation_from_Single_Images_ICCV_2025_paper.pdf)

* [Princeton365 A Diverse Dataset with Accurate Camera Pose](http://arxiv.org/abs/2506.09035)

* [CF3 Compact and Fast 3D Feature Fields](http://arxiv.org/abs/2508.05254)

* [CCMNet Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy](http://arxiv.org/abs/2504.07959)



## 53.Dense Prediction

* [Frequency-Dynamic Attention Modulation for Dense Prediction](https://arxiv.org/pdf/2507.12006v1)
:star:[code](https://github.com/Linwei-Chen/FDAM)

* [FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment](https://arxiv.org/pdf/2506.22509v1)
:star:[code](https://github.com/xuhang07/FreeDNA)

* [ATAS Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2506.08678)

* [Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2412.06244)
:star:[code](https://github.com/HVision-NKU/DenseVLM)

* [Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction](http://arxiv.org/abs/2508.20376)



## 52.Gaze

* [Multi-view Gaze Target Estimation](https://arxiv.org/pdf/2508.05857v1)
:house:[project](https://www3.cs.stonybrook.edu/~cvl/multiview_gte.html)

* [Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction](https://arxiv.org/pdf/2507.23021v1)
:star:[code](https://aimagelab.github.io/ScanDiff)
:star:[code](https://github.com/aimagelab/scandiff)视觉注意力预测

* [Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths](https://openaccess.thecvf.com/content/ICCV2025/papers/Mondal_Gaze-Language_Alignment_for_Zero-Shot_Prediction_of_Visual_Search_Targets_from_ICCV_2025_paper.pdf)

* [What we need is explicit controllability Training 3D gaze estimator using only facial images](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_What_we_need_is_explicit_controllability_Training_3D_gaze_estimator_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ATinyBites/ControllableGaze)



## 51.Visual Relationship Detection,VRD(视觉关系检测)

* [ART: Adaptive Relation Tuning for Generalized Relation Prediction](https://arxiv.org/pdf/2507.23543v1)



## 50.Protecting copyright(保护版权)

* [TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity](https://arxiv.org/pdf/2506.23484v1)

* [Your Text Encoder Can Be An Object-Level Watermarking Controller](http://arxiv.org/abs/2503.11945)

* [SpecGuard Spectral Projection-based Advanced Invisible Watermarking](http://arxiv.org/abs/2510.07302)
:star:[code](https://github.com/inzamamulDU/SpecGuard_ICCV_2025)

* [Learning Robust Image Watermarking with Lossless Cover Recovery](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Learning_Robust_Image_Watermarking_with_Lossless_Cover_Recovery_ICCV_2025_paper.pdf)
:star:[code](https://github.com/chenoly/CRMark)

* [SynTag Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_SynTag_Enhancing_the_Geometric_Robustness_of_Inversion-based_Generative_Image_Watermarking_ICCV_2025_paper.pdf)

* [PlugMark A Plug-in Zero-Watermarking Framework for Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_PlugMark_A_Plug-in_Zero-Watermarking_Framework_for_Diffusion_Models_ICCV_2025_paper.pdf)

* [ROAR Reducing Inversion Error in Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_ROAR_Reducing_Inversion_Error_in_Generative_Image_Watermarking_ICCV_2025_paper.pdf)

* [SEAL Semantic Aware Image Watermarking](http://arxiv.org/abs/2503.12172)

* [Semantic Watermarking Reinvented Enhancing Robustness and Generation Quality with Fourier Integrity](http://arxiv.org/abs/2509.07647)
:star:[code](https://github.com/thomas11809/SFWMark)

* [Invisible Watermarks Visible Gains Steering Machine Unlearning with Bi-Level Watermarking Design](http://arxiv.org/abs/2508.10065)

* [TrustMark Robust Watermarking and Watermark Removal for Arbitrary Resolution Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Bui_TrustMark_Robust_Watermarking_and_Watermark_Removal_for_Arbitrary_Resolution_Images_ICCV_2025_paper.pdf)

* [Attention to Neural Plagiarism Diffusion Models Can Plagiarize Your Copyrighted Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Zou_Attention_to_Neural_Plagiarism_Diffusion_Models_Can_Plagiarize_Your_Copyrighted_ICCV_2025_paper.pdf)
:star:[code](https://github.com/zzzucf/Neural-Plagiarism)

* [From Imitation to Innovation The Emergence of AIs Unique Artistic Styles and the Challenge of Copyright Protection](https://openaccess.thecvf.com/content/ICCV2025/papers/Jia_From_Imitation_to_Innovation_The_Emergence_of_AIs_Unique_Artistic_ICCV_2025_paper.pdf)



## 49.biometric recognition(生物特征识别)

* [DisenQ: Disentangling Q-Former for Activity-Biometrics](https://arxiv.org/pdf/2507.07262v1)

* [A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition](https://arxiv.org/pdf/2508.00053v1)

* 指纹

  * [Training-Free Personalization via Retrieval and Reasoning on Fingerprints](http://arxiv.org/abs/2503.18623)

  * [DiffIP Representation Fingerprints for Robust IP Protection of Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_DiffIP_Representation_Fingerprints_for_Robust_IP_Protection_of_Diffusion_Models_ICCV_2025_paper.pdf)

  * [Riemannian-Geometric Fingerprints of Generative Models](http://arxiv.org/abs/2506.22802)



## 48.Industrial Anomaly Detection(工业异常检测)

* [RareCLIP Rarity-aware Online Zero-shot Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/He_RareCLIP_Rarity-aware_Online_Zero-shot_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/hjf02/RareCLIP)

* [ReMP-AD Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Ma_ReMP-AD_Retrieval-enhanced_Multi-modal_Prompt_Fusion_for_Few-Shot_Industrial_Visual_Anomaly_ICCV_2025_paper.pdf)
:star:[code](https://github.com/cshcma/ReMP-AD.git)

* [G2SF Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Tao_G2SF_Geometry-Guided_Score_Fusion_for_Multimodal_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ctaoaa/G2SF)

* [Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC Dataset Construction Methodology and Application](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Anomaly_Detection_of_Integrated_Circuits_Package_Substrates_Using_the_Large_ICCV_2025_paper.pdf)
:star:[code](https://github.com/Bingyang0410/CPS2D-AD)

* [SeaS Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning](http://arxiv.org/abs/2410.14987)
:star:[code](https://github.com/HUST-SLOW/SeaS)

* [Kaputt A Large-Scale Dataset for Visual Defect Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Hofer_Kaputt_A_Large-Scale_Dataset_for_Visual_Defect_Detection_ICCV_2025_paper.pdf)

* [Training-Free Industrial Defect Generation with Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_Training-Free_Industrial_Defect_Generation_with_Diffusion_Models_ICCV_2025_paper.pdf)

* [DADet Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_DADet_Safeguarding_Image_Conditional_Diffusion_Models_against_Adversarial_and_Backdoor_ICCV_2025_paper.pdf)

* [Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation](http://arxiv.org/abs/2505.24431)



## 47.Animation(动画)

* [LayerAnimate: Layer-level Control for Animation](http://arxiv.org/abs/2501.08295)

* [Occlusion-robust Stylization for Drawing-based 3D Animation](https://arxiv.org/pdf/2508.00398v1)

* [Multi-Object Sketch Animation by Scene Decomposition and Motion Planning](http://arxiv.org/abs/2503.19351)

* [Animate Anyone 2 High-Fidelity Character Image Animation with Environment Affordance](http://arxiv.org/abs/2502.06145)

* [LongAnimation Long Animation Generation with Dynamic Global-Local Memory](http://arxiv.org/abs/2507.01945)

* [V2M4 4D Mesh Animation Reconstruction from a Single Monocular Video](http://arxiv.org/abs/2503.09631)

* [OmniHuman-1 Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_OmniHuman-1_Rethinking_the_Scaling-Up_of_One-Stage_Conditioned_Human_Animation_Models_ICCV_2025_paper.pdf)

* [Multi-identity Human Image Animation with Structural Video Diffusion](http://arxiv.org/abs/2504.04126)
:star:[code](https://github.com/zhenzhiwang/Multi-HumanVid)

* [Perception-as-Control Fine-grained Controllable Image Animation with 3D-aware Motion Representation](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Perception-as-Control_Fine-grained_Controllable_Image_Animation_with_3D-aware_Motion_Representation_ICCV_2025_paper.pdf)

* [DreamActor-M1 Holistic Expressive and Robust Human Image Animation with Hybrid Guidance](https://openaccess.thecvf.com/content/ICCV2025/papers/Luo_DreamActor-M1_Holistic_Expressive_and_Robust_Human_Image_Animation_with_Hybrid_ICCV_2025_paper.pdf)

* [Ponimator Unfolding Interactive Pose for Versatile Human-human Interaction Animation](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Ponimator_Unfolding_Interactive_Pose_for_Versatile_Human-human_Interaction_Animation_ICCV_2025_paper.pdf)
:house:[project](https://stevenlsw.github.io/ponimator)



## 46.Sound

* [Music Grounding by Short Video](http://arxiv.org/abs/2408.16990)

* [VGGSounder Audio-Visual Evaluations for Foundation Models](http://arxiv.org/abs/2508.08237)

* [AV-Flow Transforming Text to Audio-Visual Human-like Interactions](https://openaccess.thecvf.com/content/ICCV2025/papers/Chatziagapi_AV-Flow_Transforming_Text_to_Audio-Visual_Human-like_Interactions_ICCV_2025_paper.pdf)

* [MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing](https://arxiv.org/pdf/2507.01384v1)
:star:[code](https://github.com/WangLY136/MUG)

* [What's Making That Sound Right Now? Video-centric Audio-Visual Localization](https://arxiv.org/pdf/2507.04667v1)
:star:[code](https://hahyeon610.github.io/Video-centric_Audio_Visual_Localization/)

* [Implicit Counterfactual Learning for Audio-Visual Segmentation](https://arxiv.org/pdf/2507.20740v1)

* [Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation](https://arxiv.org/pdf/2507.22886v1)
:house:[project](https://henghuiding.com/OmniAVS/)

* [Zero-AVSR Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations](https://openaccess.thecvf.com/content/ICCV2025/papers/Yeo_Zero-AVSR_Zero-Shot_Audio-Visual_Speech_Recognition_with_LLMs_by_Learning_Language-Agnostic_ICCV_2025_paper.pdf)

* [Not Only Vision Evolve Visual Speech Recognition via Peripheral Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Not_Only_Vision_Evolve_Visual_Speech_Recognition_via_Peripheral_Information_ICCV_2025_paper.pdf)

* [CogCM Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_CogCM_Cognition-Inspired_Contextual_Modeling_for_Audio-Visual_Speech_Enhancement_ICCV_2025_paper.pdf)

* [How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_How_Do_Optical_Flow_and_Textual_Prompts_Collaborate_to_Assist_ICCV_2025_paper.pdf)

* [TAViS Text-bridged Audio-Visual Segmentation with Foundation Models](http://arxiv.org/abs/2506.11436)

* [AV-Link Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Haji-Ali_AV-Link_Temporally-Aligned_Diffusion_Features_for_Cross-Modal_Audio-Video_Generation_ICCV_2025_paper.pdf)

* [AURELIA Test-time Reasoning Distillation in Audio-Visual LLMs](http://arxiv.org/abs/2503.23219)

* [p-AVAS Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_p-AVAS_Can_Physics-Integrated_Audio-Visual_Modeling_Boost_Neural_Acoustic_Synthesis_ICCV_2025_paper.pdf)

* [TARO Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis](http://arxiv.org/abs/2504.05684)

* [VAFlow Video-to-Audio Generation with Cross-Modality Flow Matching](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_VAFlow_Video-to-Audio_Generation_with_Cross-Modality_Flow_Matching_ICCV_2025_paper.pdf)

* [Shot-by-Shot Film-Grammar-Aware Training-Free Audio Description Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Shot-by-Shot_Film-Grammar-Aware_Training-Free_Audio_Description_Generation_ICCV_2025_paper.pdf)

* [AVTrustBench Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs](http://arxiv.org/abs/2501.02135)

* 合成语音检测

  * [Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Anshul_Intra-modal_and_Cross-modal_Synchronization_for_Audio-visual_Deepfake_Detection_and_Temporal_ICCV_2025_paper.pdf)



## 45.Dataset

* [Context-Aware Academic Emotion Dataset and Benchmark](https://arxiv.org/pdf/2507.00586v1)
:star:[code](https://zgsfer.github.io/CAER)

* [ROADWork A Dataset and Benchmark for Learning to Recognize Observe Analyze and Drive Through Work Zones](https://openaccess.thecvf.com/content/ICCV2025/papers/Ghosh_ROADWork_A_Dataset_and_Benchmark_for_Learning_to_Recognize_Observe_ICCV_2025_paper.pdf)

* [4D-Bench Benchmarking Multi-modal Large Language Models for 4D Object Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_4D-Bench_Benchmarking_Multi-modal_Large_Language_Models_for_4D_Object_Understanding_ICCV_2025_paper.pdf)

* [Bias in Gender Bias Benchmarks How Spurious Features Distort Evaluation](http://arxiv.org/abs/2509.07596)

* 基准

  * [IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark](https://arxiv.org/pdf/2507.14449v1)

  * [Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding](https://arxiv.org/pdf/2507.15028v1)
:star:[code](https://zhangyuanhan-ai.github.io/video-tt/)

  * [One Object Multiple Lies A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models](http://arxiv.org/abs/2507.07709)

  * [Beyond the Destination A Novel Benchmark for Exploration-Aware Embodied Question Answering](http://arxiv.org/abs/2503.11117)

  * [JailbreakDiffBench A Comprehensive Benchmark for Jailbreaking Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_JailbreakDiffBench_A_Comprehensive_Benchmark_for_Jailbreaking_Diffusion_Models_ICCV_2025_paper.pdf)

  * [MMReason An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI](http://arxiv.org/abs/2506.23563)

  * [GRAB A Challenging GRaph Analysis Benchmark for Large Multimodal Models](http://arxiv.org/abs/2408.11817)

  * [INS-MMBench A Comprehensive Benchmark for Evaluating LVLMs Performance in Insurance](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_INS-MMBench_A_Comprehensive_Benchmark_for_Evaluating_LVLMs_Performance_in_Insurance_ICCV_2025_paper.pdf)
:star:[code](https://github.com/FDU-INS/INS-MMBench)

  * [MIEB Massive Image Embedding Benchmark](http://arxiv.org/abs/2504.10471)
:star:[code](https://github.com/embeddings-benchmark/mteb)

  * [LVBench An Extreme Long Video Understanding Benchmark](http://arxiv.org/abs/2406.08035)

  * [ProJudge A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges](http://arxiv.org/abs/2503.06553)

  * [From Abyssal Darkness to Blinding Glare A Benchmark on Extreme Exposure Correction in Real World](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_From_Abyssal_Darkness_to_Blinding_Glare_A_Benchmark_on_Extreme_ICCV_2025_paper.pdf)
:star:[code](https://github.com/juvenoia/REED)

  * [Beyond Walking A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search](http://arxiv.org/abs/2411.17776)

  * [MultiVerse A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_MultiVerse_A_Multi-Turn_Conversation_Benchmark_for_Evaluating_Large_Vision_and_ICCV_2025_paper.pdf)

  * [Extrapolated Urban View Synthesis Benchmark](http://arxiv.org/abs/2412.05256)

  * [WorldScore A Unified Evaluation Benchmark for World Generation](http://arxiv.org/abs/2504.00983)
:house:[project](https://haoyi-duan.github.io/WorldScore)

  * [ICE-Bench A Unified and Comprehensive Benchmark for Image Creating and Editing](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_ICE-Bench_A_Unified_and_Comprehensive_Benchmark_for_Image_Creating_and_ICCV_2025_paper.pdf)

  * [MVGBench a Comprehensive Benchmark for Multi-view Generation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_MVGBench_a_Comprehensive_Benchmark_for_Multi-view_Generation_Models_ICCV_2025_paper.pdf)

* 数据集

  * [Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning](https://arxiv.org/pdf/2507.04790v1)

  * [ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users](https://arxiv.org/pdf/2507.10223v1)
:star:[code](https://github.com/pittisl/ProGait)
:house:[project](https://huggingface.co/datasets/ericyxy98/ProGait)

  * [DiffTell A High-Quality Dataset for Describing Image Manipulation Changes](https://openaccess.thecvf.com/content/ICCV2025/papers/Di_DiffTell_A_High-Quality_Dataset_for_Describing_Image_Manipulation_Changes_ICCV_2025_paper.pdf)

  * [CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling](https://arxiv.org/pdf/2507.12591v1)

  * [Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions](https://arxiv.org/pdf/2508.04681v1)
:star:[code](https://liangxuy.github.io/InterVLA/)
:star:[code](https://github.com/liangxuy/intervla)

  * [HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis](https://arxiv.org/pdf/2508.09137v1)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/)

  * [Dataset Ownership Verification for Pre-trained Masked Models](http://arxiv.org/abs/2507.12022)
:star:[code](https://github.com/xieyc99/DOV4MM)

  * [Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset](http://arxiv.org/abs/2507.05728)
:star:[code](https://github.com/rfww/uevs)

  * [BlueNeg A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_BlueNeg_A_35mm_Negative_Film_Dataset_for_Restoring_Channel-Heterogeneous_Deterioration_ICCV_2025_paper.pdf)

  * [CMB-ML A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task](https://openaccess.thecvf.com/content/ICCV2025/papers/Amato_CMB-ML_A_Cosmic_Microwave_Background_Dataset_for_the_Oldest_Possible_ICCV_2025_paper.pdf)
:star:[code](https://github.com/CMB-ML/cmb-ml)

  * [UAVScenes A Multi-Modal Dataset for UAVs](http://arxiv.org/abs/2507.22412)
:star:[code](https://github.com/sijieaaa/UAVScenes)

  * [UDC-VIT A Real-World Video Dataset for Under-Display Cameras](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahn_UDC-VIT_A_Real-World_Video_Dataset_for_Under-Display_Cameras_ICCV_2025_paper.pdf)

  * [Towards Comprehensive Lecture Slides Understanding Large-scale Dataset and Effective Method](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Towards_Comprehensive_Lecture_Slides_Understanding_Large-scale_Dataset_and_Effective_Method_ICCV_2025_paper.pdf)

  * [R-LiViT A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Mirlach_R-LiViT_A_LiDAR-Visual-Thermal_Dataset_Enabling_Vulnerable_Road_User_Focused_Roadside_ICCV_2025_paper.pdf)

  * [MEH A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Golyadkin_MEH_A_Multi-Style_Dataset_and_Toolkit_for_Advancing_Egyptian_Hieroglyph_ICCV_2025_paper.pdf)

  * [3DRealCar An In-the-wild RGB-D Car Dataset with 360-degree Views](https://openaccess.thecvf.com/content/ICCV2025/papers/Du_3DRealCar_An_In-the-wild_RGB-D_Car_Dataset_with_360-degree_Views_ICCV_2025_paper.pdf)

  * [PBFG A New Physically-Based Dataset and Removal of Lens Flares and Glares](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_PBFG_A_New_Physically-Based_Dataset_and_Removal_of_Lens_Flares_ICCV_2025_paper.pdf)

  * [Feature Coding in the Era of Large Models Dataset Test Conditions and Benchmark](http://arxiv.org/abs/2412.04307)
:star:[code](https://github.com/chansongoal/LaMoFC)

  * [Modeling Saliency Dataset Bias](https://openaccess.thecvf.com/content/ICCV2025/papers/Kummerer_Modeling_Saliency_Dataset_Bias_ICCV_2025_paper.pdf)

  * [TrackVerse A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_TrackVerse_A_Large-Scale_Object-Centric_Video_Dataset_for_Image-Level_Representation_Learning_ICCV_2025_paper.pdf)

  * [OpenSubstance A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes](https://openaccess.thecvf.com/content/ICCV2025/papers/Pei_OpenSubstance_A_High-quality_Measured_Dataset_of_Multi-View_and_-Lighting_Images_ICCV_2025_paper.pdf)
:house:[project](https://opensubstance.github.io/)

  * [MMAT-1M A Large Reasoning Dataset for Multimodal Agent Tuning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_MMAT-1M_A_Large_Reasoning_Dataset_for_Multimodal_Agent_Tuning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/VIS-MPU-Agent/MMAT-1M)

  * [ImageGem In-the-wild Generative Image Interaction Dataset for Generative Model Personalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Guo_ImageGem_In-the-wild_Generative_Image_Interaction_Dataset_for_Generative_Model_Personalization_ICCV_2025_paper.pdf)

  * [LANGTRAJ Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation](http://arxiv.org/abs/2504.11521)
:house:[project](https://langtraj.github.io/)

  * [LightCity An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_LightCity_An_Urban_Dataset_for_Outdoor_Inverse_Rendering_and_Reconstruction_ICCV_2025_paper.pdf)

  * [CULTURE3D A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering](http://arxiv.org/abs/2501.06927)

  * [A Real-world Display Inverse Rendering Dataset](http://arxiv.org/abs/2508.14411)
:house:[project](https://michaelcsj.github.io/DIR)

* 数据蒸馏

  * [CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation](https://arxiv.org/pdf/2506.22637v1)
:star:[code](https://github.com/hatchetProject/CaO2)

  * [Dataset Distillation via Vision-Language Category Prototype](https://arxiv.org/pdf/2506.23580v1)
:star:[code](https://github.com/zou-yawen/Dataset-Distillation-via-Vision-Language-Category-Prototype/)

  * [Dataset Distillation as Data Compression: A Rate-Utility Perspective](https://arxiv.org/pdf/2507.17221v1)

  * [Heavy Labels Out Dataset Distillation with Label Space Lightening](http://arxiv.org/abs/2408.08201)
:star:[code](https://github.com/Lexie-YU/HeLlO)

  * [Dataset Distillation via the Wasserstein Metric](http://arxiv.org/abs/2311.18531)
:star:[code](https://github.com/Liu-Hy/WMDD) :house:[project](https://liu-hy.github.io/WMDD)

  * [Diversity-Enhanced Distribution Alignment for Dataset Distillation](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Diversity-Enhanced_Distribution_Alignment_for_Dataset_Distillation_ICCV_2025_paper.pdf)

  * [Improving Noise Efficiency in Privacy-preserving Dataset Distillation](http://arxiv.org/abs/2508.01749)



## 44.Neural Radiance Fields

* [UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields](http://arxiv.org/pdf/2506.21884v1)
:house:[project](https://www.factral.co/UnMix-NeRF)

* [LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling](https://arxiv.org/pdf/2507.02363v1)
:star:[code](https://wujh2001.github.io/LocalDyGS/)

* [DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF](https://arxiv.org/pdf/2507.14596v1)

* [A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields](https://arxiv.org/pdf/2507.04408v1)

* [NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement](https://arxiv.org/pdf/2507.12714v1)
:star:[code](https://neuraleaf-yang.github.io/)

* [MuGS Multi-Baseline Generalizable Gaussian Splatting Reconstruction](http://arxiv.org/abs/2508.04297)
:star:[code](https://github.com/EuclidLou/MuGS)

* [UniVerse Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction](http://arxiv.org/abs/2510.01669)

* 渲染

  * [BokehDiff: Neural Lens Blur with One-Step Diffusion](https://arxiv.org/pdf/2507.18060v1)

  * [OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering](http://arxiv.org/abs/2503.16177)
:house:[project](https://occlugaussian.github.io)

  * [ReCamMaster Camera-Controlled Generative Rendering from A Single Video](http://arxiv.org/abs/2503.11647)

  * [Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Tourani_Leveraging_2D_Priors_and_SDF_Guidance_for_Urban_Scene_Rendering_ICCV_2025_paper.pdf)

  * [Bokehlicious Photorealistic Bokeh Rendering with Controllable Apertures](http://arxiv.org/abs/2503.16067)

  * [UNIS A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Deng_UNIS_A_Unified_Framework_for_Achieving_Unbiased_Neural_Implicit_Surfaces_ICCV_2025_paper.pdf)

  * [Stochastic Gradient Estimation for Higher-Order Differentiable Rendering](http://arxiv.org/abs/2412.03489)

  * [Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity](http://arxiv.org/abs/2507.15775)

  * [FonTS Text Rendering With Typography and Style Controls](http://arxiv.org/abs/2412.00136)

  * [Differentiable Room Acoustic Rendering with Multi-View Vision Priors](http://arxiv.org/abs/2504.21847)

* 逆向渲染

  * [Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues](https://arxiv.org/pdf/2507.23162v1)

  * [Ouroboros Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering](http://arxiv.org/abs/2508.14461)

  * [Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns](https://openaccess.thecvf.com/content/ICCV2025/papers/Urakawa_Neural_Inverse_Rendering_for_High-Accuracy_3D_Measurement_of_Moving_Objects_ICCV_2025_paper.pdf)

  * [InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling](https://arxiv.org/pdf/2507.17613v1)

  * [DNF-Intrinsic Deterministic Noise-Free Diffusion for Indoor Inverse Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_DNF-Intrinsic_Deterministic_Noise-Free_Diffusion_for_Indoor_Inverse_Rendering_ICCV_2025_paper.pdf)
:star:[code](https://github.com/OnlyZZZZ/DNF-Intrinsic)

* NVS

  * [FVGen Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation](http://arxiv.org/abs/2508.06392)

  * [E-NeMF Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_E-NeMF_Event-based_Neural_Motion_Field_for_Novel_Space-time_View_Synthesis_ICCV_2025_paper.pdf)

  * [Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis](http://arxiv.org/abs/2411.00144)
:house:[project](https://sailor-z.github.io/projects)

  * [RayZer A Self-supervised Large View Synthesis Model](http://arxiv.org/abs/2505.00702)

  * [BillBoard Splatting (BBSplat) Learnable Textured Primitives for Novel View Synthesis](http://arxiv.org/abs/2411.08508)

  * [WAVE Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image](http://arxiv.org/abs/2506.23518)

  * [UniGS Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images](http://arxiv.org/abs/2410.13195)
:star:[code](https://github.com/jwubz123/UNIG)

  * [Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Nair_Scaling_Transformer-Based_Novel_View_Synthesis_with_Models_Token_Disentanglement_and_ICCV_2025_paper.pdf)

  * [SEHDR Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing](http://arxiv.org/abs/2509.20400)

  * [RayGaussX Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis](http://arxiv.org/abs/2509.07782)



## 43.Vision Language(视觉语言)

* [Improving Large Vision and Language Models by Learning from a Panel of Peers](http://arxiv.org/abs/2509.01610)

* [DASH Detection and Assessment of Systematic Hallucinations of VLMs](http://arxiv.org/abs/2503.23573)

* [Vision-Language Models Cant See the Obvious](https://openaccess.thecvf.com/content/ICCV2025/papers/Huynh_Vision-Language_Models_Cant_See_the_Obvious_ICCV_2025_paper.pdf)

* [Web Artifact Attacks Disrupt Vision Language Models](http://arxiv.org/abs/2503.13652)
:star:[code](https://github.com/mqraitem/Web-Artifact-Attacks)

* [ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models](https://arxiv.org/pdf/2507.00898v1)
:star:[code](https://github.com/zifuwan/ONLY)
:star:[code](https://zifuwan.github.io/ONLY/)

* [VLM4D Towards Spatiotemporal Awareness in Vision Language Models](http://arxiv.org/abs/2508.02095)

* [WalkVLM Aid Visually Impaired People Walking by Vision Language Model](http://arxiv.org/abs/2412.20903)

* [ViLU: Learning Vision-Language Uncertainties for Failure Prediction](https://arxiv.org/pdf/2507.07620v1)
:star:[code](https://github.com/ykrmm/ViLU)

* [PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection](https://arxiv.org/pdf/2507.08979v1)
:star:[code](https://github.com/MahdiyarMM/PRISM)

* [One Last Attention for Your Vision-Language Model](https://arxiv.org/pdf/2507.15480v1)
:star:[code](https://github.com/khufia/RAda/tree/main)

* [Hierarchical Cross-modal Prompt Learning for Vision-Language Models](https://arxiv.org/pdf/2507.14976v1)
:star:[code](https://github.com/zzeoZheng/HiCroPL)

* [METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models](https://arxiv.org/pdf/2507.20842v1)
:star:[code](https://github.com/YuchenLiu98/METEOR)

* [ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking](https://arxiv.org/pdf/2507.19875v1)
:star:[code](https://github.com/XiaokunFeng/ATCTrack)

* [AgroBench: Vision-Language Model Benchmark in Agriculture](https://arxiv.org/pdf/2507.20519v1)
:star:[code](https://dahlian00.github.io/AgroBenchPage/)

* [MM-IFEngine Towards Multimodal Instruction Following](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_MM-IFEngine_Towards_Multimodal_Instruction_Following_ICCV_2025_paper.pdf)

* [Robustifying Zero-Shot Vision Language Models by Subspaces Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Robustifying_Zero-Shot_Vision_Language_Models_by_Subspaces_Alignment_ICCV_2025_paper.pdf)

* [FDPT Federated Discrete Prompt Tuning for Black-Box Visual-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_FDPT_Federated_Discrete_Prompt_Tuning_for_Black-Box_Visual-Language_Models_ICCV_2025_paper.pdf)

* [Griffon v2 Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring](http://arxiv.org/abs/2403.09333)
:star:[code](https://github.com/jefferyZhan/Griffon)

* [CLIP-GS Unifying Vision-Language Representation with 3D Gaussian Splatting](https://openaccess.thecvf.com/content/ICCV2025/papers/Jiao_CLIP-GS_Unifying_Vision-Language_Representation_with_3D_Gaussian_Splatting_ICCV_2025_paper.pdf)

* [Growing a Twig to Accelerate Large Vision-Language Models](http://arxiv.org/abs/2503.14075)

* [Test-Time Retrieval-Augmented Adaptation for Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Fan_Test-Time_Retrieval-Augmented_Adaptation_for_Vision-Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/xinqi-fan/TT-RAA)

* [Understanding Museum Exhibits using Vision-Language Reasoning](http://arxiv.org/abs/2412.01370)

* [One Perturbation is Enough On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models](http://arxiv.org/abs/2406.05491)

* [When Lighting Deceives Exposing Vision-Language Models Illumination Vulnerability Through Illumination Transformation Attack](http://arxiv.org/abs/2503.06903)

* [Target Bias Is All You Need Zero-Shot Debiasing of Vision-Language Models with Bias Corpus](https://openaccess.thecvf.com/content/ICCV2025/papers/Jang_Target_Bias_Is_All_You_Need_Zero-Shot_Debiasing_of_Vision-Language_ICCV_2025_paper.pdf)

* [TAB Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models](http://arxiv.org/abs/2412.18675)

* [Feather the Throttle Revisiting Visual Token Pruning for Vision-Language Model Acceleration](http://arxiv.org/abs/2412.13180)

* [Derm1M A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology](http://arxiv.org/abs/2503.14911)
:star:[code](https://github.com/SiyuanYan1/Derm1M)

* [ReCoT Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Qu_ReCoT_Reflective_Self-Correction_Training_for_Mitigating_Confirmation_Bias_in_Large_ICCV_2025_paper.pdf)

* [AutoOcc Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting](http://arxiv.org/abs/2502.04981)

* [D-Attn Decomposed Attention for Large Vision-and-Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Kuo_D-Attn_Decomposed_Attention_for_Large_Vision-and-Language_Model_ICCV_2025_paper.pdf)
:star:[code](https://github.com/bytedance/DecomposedAttention)

* [Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_Deciphering_Cross-Modal_Alignment_in_Large_Vision-Language_Models_via_Modality_Integration_ICCV_2025_paper.pdf)
:star:[code](https://github.com/shikiw/Modality-Integration-Rate)

* [Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Fuzzy_Contrastive_Decoding_to_Alleviate_Object_Hallucination_in_Large_Vision-Language_ICCV_2025_paper.pdf)

* [IDEATOR Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves](http://arxiv.org/abs/2411.00827)

* [25 Years in Class A Multimodal Textbook for Vision-Language Pretraining](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_2.5_Years_in_Class_A_Multimodal_Textbook_for_Vision-Language_Pretraining_ICCV_2025_paper.pdf)

* [Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features](http://arxiv.org/abs/2412.00142)

* [FedMVP Federated Multimodal Visual Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2504.20860)
:star:[code](https://github.com/mainaksingha01/FedMVP)

* [Physics Context Builders A Modular Framework for Physical Reasoning in Vision-Language Models](http://arxiv.org/abs/2412.08619)

* [VLRMBench A Comprehensive and Challenging Benchmark for Vision-Language Reward Models](http://arxiv.org/abs/2503.07478)
:star:[code](https://github.com/JCruan519/VLRMBench)

* [ZipVL Accelerating Vision-Language Models through Dynamic Token Sparsity](https://openaccess.thecvf.com/content/ICCV2025/papers/He_ZipVL_Accelerating_Vision-Language_Models_through_Dynamic_Token_Sparsity_ICCV_2025_paper.pdf)

* [Skip-Vision Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Skip-Vision_Efficient_and_Scalable_Acceleration_of_Vision-Language_Models_via_Adaptive_ICCV_2025_paper.pdf)

* [SAUCE Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders](http://arxiv.org/abs/2503.14530)

* [The Inter-Intra Modal Measure A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models](http://arxiv.org/abs/2407.15731)
:star:[code](https://github.com/mit-ll/IIMM)

* [MaTVLM Hybrid Mamba-Transformer for Efficient Vision-Language Modeling](http://arxiv.org/abs/2503.13440)
:star:[code](https://github.com/hustvl/MaTVLM)

* [Safeguarding Vision-Language Models Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks](http://arxiv.org/abs/2504.01308)
:star:[code](https://github.com/JarvisUSTC/DiffPure-RobustVLM)

* [Dynamic Multimodal Prototype Learning in Vision-Language Models](http://arxiv.org/abs/2507.03657)

* [GEOBench-VLM Benchmarking Vision-Language Models for Geospatial Tasks](https://openaccess.thecvf.com/content/ICCV2025/papers/Danish_GEOBench-VLM_Benchmarking_Vision-Language_Models_for_Geospatial_Tasks_ICCV_2025_paper.pdf)

* [Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models](http://arxiv.org/abs/2405.14715)

* [V2PE Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding](http://arxiv.org/abs/2412.09616)

* [DexVLG Dexterous Vision-Language-Grasp Model at Scale](http://arxiv.org/abs/2507.02747)

* [Vision-Language Neural Graph Featurization for Extracting Retinal Lesions](https://openaccess.thecvf.com/content/ICCV2025/papers/Hassan_Vision-Language_Neural_Graph_Featurization_for_Extracting_Retinal_Lesions_ICCV_2025_paper.pdf)

* [MotionCtrl A Real-time Controllable Vision-Language-Motion Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_MotionCtrl_A_Real-time_Controllable_Vision-Language-Motion_Model_ICCV_2025_paper.pdf)

* [Breaking the Encoder Barrier for Seamless Video-Language Understanding](http://arxiv.org/abs/2503.18422)

* [OphCLIP Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining](http://arxiv.org/abs/2411.15421)

* [How Can Objects Help Video-Language Understanding](http://arxiv.org/abs/2504.07454)
:star:[code](https://github.com/brown-palm/ObjectMLLM)

* [Factorized Learning for Temporally Grounded Video-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Factorized_Learning_for_Temporally_Grounded_Video-Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/nusnlp/d2vlm)

* [Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models](http://arxiv.org/abs/2508.01225)

* [AdvDreamer Unveils Are Vision-Language Models Truly Ready for Real-World 3D Variations](http://arxiv.org/abs/2412.03002)

* [HQ-CLIP Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_HQ-CLIP_Leveraging_Large_Vision-Language_Models_to_Create_High-Quality_Image-Text_Datasets_ICCV_2025_paper.pdf)

* [Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation](http://arxiv.org/abs/2504.17207)

* [The Scalability of Simplicity Empirical Analysis of Vision-Language Learning with a Single Transformer](http://arxiv.org/abs/2504.10462)
:star:[code](https://github.com/bytedance/SAIL)

* [EVEv2 Improved Baselines for Encoder-Free Vision-Language Models](http://arxiv.org/abs/2502.06788)
:star:[code](https://github.com/baaivision/EVE)

* [TruthPrInt Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention](https://openaccess.thecvf.com/content/ICCV2025/papers/Duan_TruthPrInt_Mitigating_Large_Vision-Language_Models_Object_Hallucination_Via_Latent_Truthful-Guided_ICCV_2025_paper.pdf)

* [Structured Policy Optimization Enhance Large Vision-Language Model via Self-referenced Dialogue](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Structured_Policy_Optimization_Enhance_Large_Vision-Language_Model_via_Self-referenced_Dialogue_ICCV_2025_paper.pdf)

* [Causality-guided Prompt Learning for Vision-language Models via Visual Granulation](http://arxiv.org/abs/2509.03803)
:star:[code](https://github.com/GaoMY-521/CaPL_Code)

* [CalliReader Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model](http://arxiv.org/abs/2503.06472)

* [Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma](http://arxiv.org/abs/2503.12496)

* [Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images](http://arxiv.org/abs/2508.15256)

* [Uncertainty-Driven Expert Control Enhancing the Reliability of Medical Vision-Language Models](http://arxiv.org/abs/2507.09209)

* [Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kang_Dynamic_Multi-Layer_Null_Space_Projection_for_Vision-Language_Continual_Learning_ICCV_2025_paper.pdf)

* [Learning Beyond Still Frames Scaling Vision-Language Models with Video](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Learning_Beyond_Still_Frames_Scaling_Vision-Language_Models_with_Video_ICCV_2025_paper.pdf)

* [GLEAM Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_GLEAM_Enhanced_Transferable_Adversarial_Attacks_for_Vision-Language_Pre-training_Models_via_ICCV_2025_paper.pdf)
:star:[code](https://github.com/LuckAlex/GLEAM)

* [INTER Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling](http://arxiv.org/abs/2507.05056)
:star:[code](https://github.com/xxxxx313/INTER)

* [SmolDocling An ultra-compact vision-language model for end-to-end multi-modal document conversion](http://arxiv.org/abs/2503.11576)

* VLN

  * [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://arxiv.org/pdf/2507.13019v1)
:star:[code](https://crystalsixone.github.io/vln_pe.github.io/)

  * [monoVLN Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_monoVLN_Bridging_the_Observation_Gap_between_Monocular_and_Panoramic_Vision_ICCV_2025_paper.pdf)

  * [NavQ Learning a Q-Model for Foresighted Vision-and-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_NavQ_Learning_a_Q-Model_for_Foresighted_Vision-and-Language_Navigation_ICCV_2025_paper.pdf)

  * [COSMO Combination of Selective Memorization for Low-cost Vision-and-Language Navigation](http://arxiv.org/abs/2503.24065)
:star:[code](https://github.com/siqiZ805/VLN-COSMO.git)

  * [NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments](https://arxiv.org/pdf/2506.23468v1)
:star:[code](https://github.com/Feliciaxyao/NavMorph)

  * [3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_3D_Gaussian_Map_with_Open-Set_Semantic_Grouping_for_Vision-Language_Navigation_ICCV_2025_paper.pdf)

* LLM

  * [LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching](https://arxiv.org/pdf/2506.23502v1)

  * [Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching](http://arxiv.org/abs/2503.14953)

  * [Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs](https://arxiv.org/pdf/2507.07990v1)
:house:[project](https://www.jshyun.me/projects/sttm)

  * [Why LVLMs Are More Prone to Hallucinations in Longer Responses The Role of Context](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_Why_LVLMs_Are_More_Prone_to_Hallucinations_in_Longer_Responses_ICCV_2025_paper.pdf)

  * [Zeroth-Order Fine-Tuning of LLMs in Random Subspaces](http://arxiv.org/abs/2410.08989)
:star:[code](https://github.com/zimingyy/SubZero)

  * [Advancing Visual Large Language Model for Multi-granular Versatile Perception](https://arxiv.org/pdf/2507.16213v1)
:star:[code](https://github.com/xiangwentao666/MVP-LM)

  * [DisTime Distribution-based Time Representation for Video Large Language Models](http://arxiv.org/abs/2505.24329)
:star:[code](https://github.com/josephzpng/DisTime)

  * [Aligning Effective Tokens with Video Anomaly in Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Aligning_Effective_Tokens_with_Video_Anomaly_in_Large_Language_Models_ICCV_2025_paper.pdf)

  * [MeshLLM Empowering Large Language Models to Progressively Understand and Generate 3D Mesh](http://arxiv.org/abs/2508.01242)

  * [FOLDER Accelerating Multi-Modal Large Language Models with Enhanced Performance](http://arxiv.org/abs/2501.02430)
:star:[code](https://github.com/anakin-skywalker-Joseph/Folder)

  * [B-VLLM A Vision Large Language Model with Balanced Spatio-Temporal Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_B-VLLM_A_Vision_Large_Language_Model_with_Balanced_Spatio-Temporal_Tokens_ICCV_2025_paper.pdf)

  * [Robin3D Improving 3D Large Language Model via Robust Instruction Tuning](http://arxiv.org/abs/2410.00255)

  * [GenieBlue Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices](http://arxiv.org/abs/2503.06019)

  * [CATP-LLM Empowering Large Language Models for Cost-Aware Tool Planning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_CATP-LLM_Empowering_Large_Language_Models_for_Cost-Aware_Tool_Planning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/duowuyms/OpenCATP-LLM)

  * [Multimodal LLM Guided Exploration and Active Mapping using Fisher Information](http://arxiv.org/abs/2410.17422)

  * [Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Multimodal_Large_Language_Model-Guided_ISP_Hyperparameter_Optimization_with_Dynamic_Preference_ICCV_2025_paper.pdf)

  * [Aligning Vision to Language Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning](http://arxiv.org/abs/2503.12972)
:star:[code](https://github.com/Wings-Of-Disaster/VaLiK)

* MLLM

  * [Token Activation Map to Visually Explain Multimodal LLMs](http://arxiv.org/abs/2506.23270)
:star:[code](https://github.com/xmed-lab/TAM)

  * [DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs](https://arxiv.org/pdf/2507.10302v1)
:star:[code](https://github.com/ZJHTerry18/DisCo)

  * [UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding](https://arxiv.org/pdf/2506.23219v1)
:star:[code](https://github.com/tsinghua-fib-lab/UrbanLLaVA)

  * [Kestrel 3D Multimodal LLM for Part-Aware Grounded Description](http://arxiv.org/abs/2405.18937)

  * [Are They the Same Exploring Visual Correspondence Shortcomings of Multimodal LLMs](http://arxiv.org/abs/2501.04670)

  * [Analyzing Finetuning Representation Shift for Multimodal LLMs Steering](http://arxiv.org/abs/2501.03012)

  * [Visual Chronicles Using Multimodal LLMs to Analyze Massive Collections of Images](http://arxiv.org/abs/2504.08727)

  * [Controlling Multimodal LLMs via Reward-guided Decoding](https://openaccess.thecvf.com/content/ICCV2025/papers/Manas_Controlling_Multimodal_LLMs_via_Reward-guided_Decoding_ICCV_2025_paper.pdf)

  * [TWIST  SCOUT Grounding Multimodal LLM-Experts by Forget-Free Tuning](http://arxiv.org/abs/2410.10491)

  * [FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging](https://arxiv.org/pdf/2508.04625v1)

  * [Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation](https://arxiv.org/pdf/2507.02859v1)

  * [BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models](https://arxiv.org/pdf/2508.06895v1)

  * [Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning](https://arxiv.org/pdf/2507.07424v1)
:star:[code](https://mm-vl.github.io/corvid)

  * [Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs](http://arxiv.org/abs/2503.20309)

  * [CompCap Improving Multimodal Large Language Models with Composite Captions](http://arxiv.org/abs/2412.05243)

  * [AVAM a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_AVAM_a_Universal_Training-free_Adaptive_Visual_Anchoring_Embedded_into_Multimodal_ICCV_2025_paper.pdf)

  * [How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning Placing Them in An Extensible Escape Game](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_How_Do_Multimodal_Large_Language_Models_Handle_Complex_Multimodal_Reasoning_ICCV_2025_paper.pdf)

  * [LLaVA-KD A Framework of Distilling Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Cai_LLaVA-KD_A_Framework_of_Distilling_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)

  * [LIRA Reasoning Reconstruction via Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_LIRA_Reasoning_Reconstruction_via_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/zhen6618/LIRA)

  * [MissRAG Addressing the Missing Modality Challenge in Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Pipoli_MissRAG_Addressing_the_Missing_Modality_Challenge_in_Multimodal_Large_Language_ICCV_2025_paper.pdf)
:star:[code](https://github.com/aimagelab/MissRAG)

  * [Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models](http://arxiv.org/abs/2411.12790)
:star:[code](https://github.com/zeng-zhen/FGVEdit)

  * [Benchmarking Multimodal Large Language Models Against Image Corruptions](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Benchmarking_Multimodal_Large_Language_Models_Against_Image_Corruptions_ICCV_2025_paper.pdf)

  * [SHIFT Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_SHIFT_Smoothing_Hallucinations_by_Information_Flow_Tuning_for_Multimodal_Large_ICCV_2025_paper.pdf)

  * [Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency](http://arxiv.org/abs/2501.04931)

  * [VisNumBench Evaluating Number Sense of Multimodal Large Language Models](http://arxiv.org/abs/2503.14939)

  * [ShortV Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers](http://arxiv.org/abs/2504.00502)
:star:[code](https://github.com/icip-cas/ShortV)

  * [Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models](http://arxiv.org/abs/2412.05934)
:star:[code](https://github.com/MaTengSYSU/HIMRD-jailbreak)

  * [Learning to Inference Adaptively for Multimodal Large Language Models](http://arxiv.org/abs/2503.10905)

  * [FALCON Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers](http://arxiv.org/abs/2501.16297)

  * [R1-VL Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_R1-VL_Learning_to_Reason_with_Multimodal_Large_Language_Models_via_ICCV_2025_paper.pdf)

  * [Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Slyman_Calibrating_MLLM-as-a-judge_via_Multimodal_Bayesian_Prompt_Ensembles_ICCV_2025_paper.pdf)

  * [Boosting MLLM Reasoning with Text-Debiased Hint-GRPO](http://arxiv.org/abs/2503.23905)
:star:[code](https://github.com/hqhQAQ/Hint-GRPO)

  * [Information Density Principle for MLLM Benchmarks](http://arxiv.org/abs/2503.10079)

  * [Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Auto-Controlled_Image_Perception_in_MLLMs_via_Visual_Perception_Tokens_ICCV_2025_paper.pdf)

  * [VSP Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_VSP_Diagnosing_the_Dual_Challenges_of_Perception_and_Reasoning_in_ICCV_2025_paper.pdf)

  * [MM-Spatial Exploring 3D Spatial Understanding in Multimodal LLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Daxberger_MM-Spatial_Exploring_3D_Spatial_Understanding_in_Multimodal_LLMs_ICCV_2025_paper.pdf)

  * [Spatial Preference Rewarding for MLLMs Spatial Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Spatial_Preference_Rewarding_for_MLLMs_Spatial_Understanding_ICCV_2025_paper.pdf)
:star:[code](https://github.com/hanqiu-hq/SPR)

  * [SparseMM Head Sparsity Emerges from Visual Concept Responses in MLLMs](http://arxiv.org/abs/2506.05344)
:star:[code](https://github.com/CR400AF-A/SparseMM)

  * [OrderChain Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM](http://arxiv.org/abs/2504.04801)
:house:[project](https://order-chain.github.io/)

  * [STI-Bench Are MLLMs Ready for Precise Spatial-Temporal World Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_STI-Bench_Are_MLLMs_Ready_for_Precise_Spatial-Temporal_World_Understanding_ICCV_2025_paper.pdf)

  * [ChartPoint Guiding MLLMs with Grounding Reflection for Chart Reasoning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_ChartPoint_Guiding_MLLMs_with_Grounding_Reflection_for_Chart_Reasoning_ICCV_2025_paper.pdf)

  * [Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning](http://arxiv.org/abs/2507.17539)
:star:[code](https://github.com/MeteorElf/FundusExpert)

  * [p-MoD Building Mixture-of-Depths MLLMs via Progressive Ratio Decay](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_p-MoD_Building_Mixture-of-Depths_MLLMs_via_Progressive_Ratio_Decay_ICCV_2025_paper.pdf)

  * [LLaVA-SP Enhancing Visual Representation with Visual Spatial Tokens for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Lou_LLaVA-SP_Enhancing_Visual_Representation_with_Visual_Spatial_Tokens_for_MLLMs_ICCV_2025_paper.pdf)
:star:[code](https://github.com/CnFaker/LLaVA-SP)

  * [Enhancing Numerical Prediction of MLLMs with Soft Labeling](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Enhancing_Numerical_Prediction_of_MLLMs_with_Soft_Labeling_ICCV_2025_paper.pdf)

  * [Creation-MMBench Assessing Context-Aware Creative Intelligence in MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_Creation-MMBench_Assessing_Context-Aware_Creative_Intelligence_in_MLLMs_ICCV_2025_paper.pdf)

* Visual Grounding

  * [PropVG End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination](http://arxiv.org/abs/2509.04833)

  * [Move to Understand a 3D Scene Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation](http://arxiv.org/abs/2507.04047)

  * [MC-Bench A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_MC-Bench_A_Benchmark_for_Multi-Context_Visual_Grounding_in_the_Era_ICCV_2025_paper.pdf)
:house:[project](https://xuyunqiu.github.io/MC-Bench)

  * [AerialVG A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations](http://arxiv.org/abs/2504.07836)
:star:[code](https://github.com/Ideal-ljl/AerialVG)

  * [NAVER A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning](http://arxiv.org/abs/2502.00372)
:star:[code](https://github.com/ControlNet/NAVER)

  * [VGMamba Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_VGMamba_Attribute-to-Location_Clue_Reasoning_for_Quantity-Agnostic_3D_Visual_Grounding_ICCV_2025_paper.pdf)

  * [Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Ouyang_Region-aware_Anchoring_Mechanism_for_Efficient_Referring_Visual_Grounding_ICCV_2025_paper.pdf)

* REC

  * [Referring Expression Comprehension for Small Objects](http://arxiv.org/abs/2510.03701)

  * [Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Leveraging_Debiased_Cross-modal_Attention_Maps_and_Code-based_Reasoning_for_Zero-shot_ICCV_2025_paper.pdf)



## 42.Vision Transformer

* [Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features](http://arxiv.org/pdf/2506.21046v1)
:star:[code](https://github.com/spencerwooo/dSVA)

* [Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy](https://arxiv.org/pdf/2507.13260v1)

* [EA-ViT: Efficient Adaptation for Elastic Vision Transformer](https://arxiv.org/pdf/2507.19360v1)
:star:[code](https://github.com/zcxcf/EA-ViT)

* [MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective](https://arxiv.org/pdf/2507.19131v1)

* [OminiControl Minimal and Universal Control for Diffusion Transformer](http://arxiv.org/abs/2411.15098)

* [Pinco Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting](http://arxiv.org/abs/2412.03812)

* [SAFER Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers](http://arxiv.org/abs/2501.01529)

* [OmniCache A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chu_OmniCache_A_Trajectory-Oriented_Global_Perspective_on_Training-Free_Cache_Reuse_for_ICCV_2025_paper.pdf)

* [Sparse Fine-Tuning of Transformers for Generative Tasks](http://arxiv.org/abs/2507.10855)

* [MaTe Images Are All You Need for Material Transfer via Diffusion Transformer](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_MaTe_Images_Are_All_You_Need_for_Material_Transfer_via_ICCV_2025_paper.pdf)

* [Hybrid Layout Control for Diffusion Transformer Fewer Annotations Superior Aesthetics](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hybrid_Layout_Control_for_Diffusion_Transformer_Fewer_Annotations_Superior_Aesthetics_ICCV_2025_paper.pdf)

* [UniCombine Unified Multi-Conditional Combination with Diffusion Transformer](http://arxiv.org/abs/2503.09277)

* [EasyControl Adding Efficient and Flexible Control for Diffusion Transformer](http://arxiv.org/abs/2503.07027)

* [Accelerating Diffusion Transformer via Gradient-Optimized Cache](http://arxiv.org/abs/2503.05156)
:star:[code](https://github.com/qiujx0520/GOC_ICCV2025.git)

* [LeGrad An Explainability Method for Vision Transformers via Feature Formation Sensitivity](http://arxiv.org/abs/2404.03214)

* [An Efficient Hybrid Vision Transformer for TinyML Applications](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_An_Efficient_Hybrid_Vision_Transformer_for_TinyML_Applications_ICCV_2025_paper.pdf)
:star:[code](https://github.com/yuffeenn/TinyNeXt)

* [MixA A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahmed_MixA_A_Mixed_Attention_approach_with_Stable_Lightweight_Linear_Attention_ICCV_2025_paper.pdf)



## 41.Neural Architecture Search(神经架构搜索)

* [Neural Architecture Search Driven by Locally Guided Diffusion for Personalized Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Liao_Neural_Architecture_Search_Driven_by_Locally_Guided_Diffusion_for_Personalized_ICCV_2025_paper.pdf)

* [Loss Functions for Predictor-based Neural Architecture Search](http://arxiv.org/abs/2506.05869)

* [TRNAS A Training-Free Robust Neural Architecture Search](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_TRNAS_A_Training-Free_Robust_Neural_Architecture_Search_ICCV_2025_paper.pdf)



## 40.Deep learning(深度学习)

* 胶囊网络

  * [EquiCaps Predictor-Free Pose-Aware Pre-Trained Capsule Networks](http://arxiv.org/abs/2506.09895)
:star:[code](http://github.com/AberdeenML/EquiCaps) :star:[code2](https://github.com/AberdeenML/EquiCaps)

* RNN

  * [ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers](http://arxiv.org/pdf/2506.21537v1)



## 39.Machine learning(机器学习)

* 机器遗忘

  * [MUNBa Machine Unlearning via Nash Bargaining](http://arxiv.org/abs/2411.15537)

  * [Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels](http://arxiv.org/abs/2503.13917)

  * [Learning to Unlearn while Retaining Combating Gradient Conflicts in Machine Unlearning](http://arxiv.org/abs/2503.06339)

  * [Reminiscence Attack on Residuals Exploiting Approximate Machine Unlearning for Privacy](http://arxiv.org/abs/2507.20573)

* 主动学习

  * [To Label or Not to Label: PALM -- A Predictive Model for Evaluating Sample Efficiency in Active Learning Models](https://arxiv.org/pdf/2507.15381v1)
:star:[code](https://github.com/juliamachnio/PALM)

  * [Consensus-Driven Active Model Selection](https://arxiv.org/pdf/2507.23771v1)
:star:[code](https://github.com/justinkay/coda)

* 对比学习

  * [Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision](http://arxiv.org/pdf/2506.20850v1)  

  * [Selective Contrastive Learning for Weakly Supervised Affordance Grounding](https://arxiv.org/pdf/2508.07877v1)

  * [Fix-CLIP Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Fix-CLIP_Dual-Branch_Hierarchical_Contrastive_Learning_via_Synthetic_Captions_for_Better_ICCV_2025_paper.pdf)
:star:[code](https://github.com/bcwang-sjtu/Fix-CLIP)

  * [Robust Dataset Condensation using Supervised Contrastive Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Robust_Dataset_Condensation_using_Supervised_Contrastive_Learning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/DISL-Lab/RDC-ICCV2025)

  * [Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning](http://arxiv.org/abs/2507.12998)
:star:[code](https://github.com/MediaBrain-SJTU/DISSect)

  * [Backdooring Self-Supervised Contrastive Learning by Noisy Alignment](http://arxiv.org/abs/2508.14015)
:star:[code](https://github.com/jsrdcht/Noisy-Alignment)

  * [Salvaging the Overlooked Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection](http://arxiv.org/abs/2412.04769)

  * [AMD Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction](http://arxiv.org/abs/2507.01801)

* 强化学习

  * [RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment](http://arxiv.org/pdf/2506.21037v1)

  * [Reinforcement Learning-Guided Data Selection via Redundancy Assessment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Reinforcement_Learning-Guided_Data_Selection_via_Redundancy_Assessment_ICCV_2025_paper.pdf)

  * [RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction](https://arxiv.org/pdf/2507.04839v1)
:star:[code](https://github.com/fraunhoferhhi/RIPE)

  * [DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding](https://arxiv.org/pdf/2508.08589v1)
:star:[code](https://github.com/wenwenyu/DocThinker)

  * [DeepMesh Auto-Regressive Artist-mesh Creation with Reinforcement Learning](http://arxiv.org/abs/2503.15265)

  * [ULTHO Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning](http://arxiv.org/abs/2503.06101)

  * [Disentangled World Models Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning](http://arxiv.org/abs/2503.08751)

  * [One Encoder to Rule them All Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators](https://openaccess.thecvf.com/content/ICCV2025/papers/Dutta_One_Encoder_to_Rule_them_All_Representation_Learning_for_Model-free_ICCV_2025_paper.pdf)

  * [Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_Diffusion_Guided_Adaptive_Augmentation_for_Generalization_in_Visual_Reinforcement_Learning_ICCV_2025_paper.pdf)

  * [GenFlowRL Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning](http://arxiv.org/abs/2508.11049)
:house:[project](https://colinyu1.github.io/genflowrl) :house:[project](https://colinyu1.github.io/genflowrl/)

* 持续学习

  * [CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization](http://arxiv.org/pdf/2506.21117v1)
:star:[code](https://cl-splats.github.io)

  * [PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning](https://arxiv.org/pdf/2507.12305v1)
:star:[code](https://github.com/anwarmaxsum/PROL)

  * [Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning](https://arxiv.org/pdf/2507.09118v1)
:star:[code](https://github.com/linlany/MindtheGap)

  * [RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning](https://arxiv.org/pdf/2507.22553v1)

  * [Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models](https://arxiv.org/pdf/2508.00260v1)

  * [Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning](https://arxiv.org/pdf/2508.05316v1)
:star:[code](https://github.com/NJUyued/USP4SSCL)

  * [Any-SSR How Recursive Least Squares Works in Continual Learning of Large Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Tong_Any-SSR_How_Recursive_Least_Squares_Works_in_Continual_Learning_of_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ZHUANGHP/Any-SSR)

  * [Joint Diffusion Models in Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Skiers_Joint_Diffusion_Models_in_Continual_Learning_ICCV_2025_paper.pdf)

  * [PLAN Proactive Low-Rank Allocation for Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_PLAN_Proactive_Low-Rank_Allocation_for_Continual_Learning_ICCV_2025_paper.pdf)

  * [Divide-and-Conquer for Enhancing Unlabeled Learning Stability and Plasticity in Semi-supervised Continual Learning](http://arxiv.org/abs/2508.05316)
:star:[code](https://github.com/NJUyued/USP4SSCL)

  * [CODE-CL Conceptor-Based Gradient Projection for Deep Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Apolinario_CODE-CL_Conceptor-Based_Gradient_Projection_for_Deep_Continual_Learning_ICCV_2025_paper.pdf)

  * [FedAGC Federated Continual Learning with Asymmetric Gradient Correction](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_FedAGC_Federated_Continual_Learning_with_Asymmetric_Gradient_Correction_ICCV_2025_paper.pdf)

* 对抗学习

  * [TITAN Query-Token based Domain Adaptive Adversarial Learning](http://arxiv.org/abs/2506.21484)
:star:[code](https://github.com/Tajamul21/TITAN)

  * [ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models](https://arxiv.org/pdf/2507.21985v1)

  * [Pretend Benign A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_Pretend_Benign_A_Stealthy_Adversarial_Attack_by_Exploiting_Vulnerabilities_in_ICCV_2025_paper.pdf)

  * [KOEnsAttack Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_KOEnsAttack_Towards_Efficient_Data-Free_Black-Box_Adversarial_Attacks_via_Knowledge-Orthogonalized_Substitute_ICCV_2025_paper.pdf)

  * [SMP-Attack Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_SMP-Attack_Boosting_the_Transferability_of_Feature_Importance-based_Adversarial_Attack_with_ICCV_2025_paper.pdf)
:star:[code](https://github.com/AdvML-Group/SMP-Attack)

  * [DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion](https://arxiv.org/pdf/2507.22813v1)
:star:[code](https://github.com/AdaptiveMotorControlLab/DISTIL)

  * [Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights](https://arxiv.org/pdf/2508.00649v1)
:star:[code](https://github.com/Gandolfczjh/APDE)

  * [Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance](http://arxiv.org/abs/2508.15650)
:star:[code](https://github.com/AIASLab/CFG-ICCV2025)

  * [Boosting Adversarial Transferability via Residual Perturbation Attack](https://arxiv.org/pdf/2508.05689v1)
:star:[code](https://github.com/ZezeTao/ResPA)

  * [Confound from All Sides Distill with Resilience Multi-Objective Adversarial Paths to Zero-Shot Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Confound_from_All_Sides_Distill_with_Resilience_Multi-Objective_Adversarial_Paths_ICCV_2025_paper.pdf)

  * [Adversarial Training for Probabilistic Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adversarial_Training_for_Probabilistic_Robustness_ICCV_2025_paper.pdf)

  * [Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_Mitigating_Catastrophic_Overfitting_in_Fast_Adversarial_Training_via_Label_Information_ICCV_2025_paper.pdf)
:star:[code](https://github.com/fzjcdt/LIET)

  * [Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment](http://arxiv.org/abs/2408.06079)
:star:[code](https://github.com/KejiaZhang-Robust/DHAT)

  * [Adversarial Exploitation of Data Diversity Improves Visual Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Adversarial_Exploitation_of_Data_Diversity_Improves_Visual_Localization_ICCV_2025_paper.pdf)

  * [FedPall Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift](http://arxiv.org/abs/2507.04781)

  * [Adversarial Robust Memory-Based Continual Learner](http://arxiv.org/abs/2311.17608)

  * [ViT-EnsembleAttack Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_ViT-EnsembleAttack_Augmenting_Ensemble_Models_for_Stronger_Adversarial_Transferability_in_Vision_ICCV_2025_paper.pdf)
:star:[code](https://github.com/Trustworthy-AI-Group/TransferAttack)

  * [CIARD Cyclic Iterative Adversarial Robustness Distillation](http://arxiv.org/abs/2509.12633)
:star:[code](https://github.com/CIARD2025/CIARD)

  * [Failure Cases Are Better Learned But Boundary Says Sorry Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training](http://arxiv.org/abs/2508.02186)
:star:[code](https://github.com/FlaAI/RPAT)

  * [Backdoor Mitigation by Distance-Driven Detoxification](http://arxiv.org/abs/2411.09585)

  * [Mind the Cost of Scaffold Benign Clients May Even Become Accomplices of Backdoor Attack](http://arxiv.org/abs/2411.16167)

  * [Prototype Guided Backdoor Defense via Activation Space Manipulation](https://openaccess.thecvf.com/content/ICCV2025/papers/Amula_Prototype_Guided_Backdoor_Defense_via_Activation_Space_Manipulation_ICCV_2025_paper.pdf)

  * [Leveraging Spatial Invariance to Boost Adversarial Transferability](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_Leveraging_Spatial_Invariance_to_Boost_Adversarial_Transferability_ICCV_2025_paper.pdf)
:star:[code](https://github.com/TheMoss7/SID)

  * [SPD Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_SPD_Shallow_Backdoor_Protecting_Deep_Backdoor_Against_Backdoor_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/YuanShunJie1/SPD)

  * [Backdoor Defense via Enhanced Splitting and Trap Isolation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Backdoor_Defense_via_Enhanced_Splitting_and_Trap_Isolation_ICCV_2025_paper.pdf)

  * [Backdoor Attacks on Neural Networks via One-Bit Flip](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Backdoor_Attacks_on_Neural_Networks_via_One-Bit_Flip_ICCV_2025_paper.pdf)

  * [Seal Your Backdoor with Variational Defense](https://openaccess.thecvf.com/content/ICCV2025/papers/Sabolic_Seal_Your_Backdoor_with_Variational_Defense_ICCV_2025_paper.pdf)

  * [Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling](https://openaccess.thecvf.com/content/ICCV2025/papers/Niu_Enhancing_Adversarial_Transferability_by_Balancing_Exploration_and_Exploitation_with_Gradient-Guided_ICCV_2025_paper.pdf)
:star:[code](https://github.com/anuin-cat/GGS)

  * [Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Enhancing_Transferability_of_Targeted_Adversarial_Examples_via_Inverse_Target_Gradient_ICCV_2025_paper.pdf)

  * [Boosting Adversarial Transferability via Negative Hessian Trace Regularization](https://openaccess.thecvf.com/content/ICCV2025/papers/Long_Boosting_Adversarial_Transferability_via_Negative_Hessian_Trace_Regularization_ICCV_2025_paper.pdf)

  * [Unified Adversarial Augmentation for Improving Palmprint Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_Unified_Adversarial_Augmentation_for_Improving_Palmprint_Recognition_ICCV_2025_paper.pdf)

  * [DIA The Adversarial Exposure of Deterministic Inversion in Diffusion Models](http://arxiv.org/abs/2510.00778)

  * [Generative Adversarial Diffusion](https://openaccess.thecvf.com/content/ICCV2025/papers/Jun_Generative_Adversarial_Diffusion_ICCV_2025_paper.pdf)

  * [ODDR Outlier Detection  Dimension Reduction Based Defense Against Adversarial Patches](http://arxiv.org/abs/2311.12084)

  * [Scaling and Taming Adversarial Training with Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Scaling_and_Taming_Adversarial_Training_with_Synthetic_Data_ICCV_2025_paper.pdf)

* 多模态学习

  * [G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation](http://arxiv.org/pdf/2506.21514v1)
:star:[code](https://github.com/rAIson-Lab/G2D)

  * [Improving Multimodal Learning via Imbalanced Learning](https://arxiv.org/pdf/2507.10203v1)
:star:[code](https://github.com/shicaiwei123/ICCV2025-ARL)

  * [SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality](https://arxiv.org/pdf/2507.19264v1)

  * [Unbiased Missing-modality Multimodal Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Dai_Unbiased_Missing-modality_Multimodal_Learning_ICCV_2025_paper.pdf)
:house:[project](https://crystal-punk.github.io/)

  * [Boosting Multimodal Learning via Disentangled Gradient Learning](https://arxiv.org/pdf/2507.10213v1)
:star:[code](https://github.com/shicaiwei123/ICCV2025-GDL)

  * [OpenVision A Fully-Open Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning](http://arxiv.org/abs/2505.04601)

* 多任务学习

  * [Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning](https://arxiv.org/pdf/2507.07485v1)

  * [Beyond Losses Reweighting Empowering Multi-Task Learning via the Generalization Perspective](http://arxiv.org/abs/2211.13723)

  * [Resolving Token-Space Gradient Conflicts Token Space Manipulation for Transformer-Based Multi-Task Learning](http://arxiv.org/abs/2507.07485)

  * [Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning](https://arxiv.org/pdf/2507.21049v1)
:star:[code](https://jacky1128.github.io/RepMTL/)

  * [TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction](https://arxiv.org/pdf/2508.04682v1)

  * [ModalTune Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology](http://arxiv.org/abs/2503.17564)

  * [Active Membership Inference Test (aMINT) Enhancing Model Auditability with Multi-Task Learning](http://arxiv.org/abs/2509.07879)
:star:[code](https://github.com/DanieldeAlcala/Membership-Inference-Test.git)

* 类增量学习

  * [Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning](https://arxiv.org/pdf/2507.09183v1)
:star:[code](https://github.com/Jywsuperman/LGSP)

  * [Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning](https://arxiv.org/pdf/2508.08165v1)
:star:[code](https://github.com/LAMDA-CL/ICCV2025-TUNA)

  * [Achieving More with Less Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning](http://arxiv.org/abs/2503.07979)

  * [Lark Low-Rank Updates After Knowledge Localization for Few-shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Shi_Lark_Low-Rank_Updates_After_Knowledge_Localization_for_Few-shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)

  * [A Tiny Change A Giant Leap Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Lai_A_Tiny_Change_A_Giant_Leap_Long-Tailed_Class-Incremental_Learning_via_ICCV_2025_paper.pdf)
:star:[code](https://github.com/laixinyi023/Geometric-Prototype-Alignment)

  * [Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Ke_Task-Aware_Prompt_Gradient_Projection_for_Parameter-Efficient_Tuning_Federated_Class-Incremental_Learning_ICCV_2025_paper.pdf)

  * [External Knowledge Injection for CLIP-Based Class-Incremental Learning](http://arxiv.org/abs/2503.08510)
:star:[code](https://github.com/LAMDA-CL/ICCV25-ENGINE)

  * [ESSENTIAL Episodic and Semantic Memory Integration for Video Class-Incremental Learning](http://arxiv.org/abs/2508.10896)

  * [Flexi-FSCIL Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Flexi-FSCIL_Adaptive_Knowledge_Retention_for_Breaking_the_Stability-Plasticity_Dilemma_in_ICCV_2025_paper.pdf)

  * [Seeing 3D Through 2D Lenses 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification](http://arxiv.org/abs/2509.14958)

  * [Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xue_Feature_Decomposition-Recomposition_in_Large_Vision-Language_Model_for_Few-Shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)

* 增量学习

  * [Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning](https://arxiv.org/pdf/2507.21588v1)
:star:[code](https://github.com/ENJOY-Yin-jiong/PHP)

* 联邦学习

  * [Federated Representation Angle Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Yi_Federated_Representation_Angle_Learning_ICCV_2025_paper.pdf)

  * [Client2Vec Improving Federated Learning by Distribution Shifts Aware Client Indexing](http://arxiv.org/abs/2405.16233)
:star:[code](https://github.com/LINs-lab/client2vec)

  * [Geminio Language-Guided Gradient Inversion Attacks in Federated Learning](http://arxiv.org/abs/2411.14937)

  * [Sibai A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gotz_Sibai_A_Few-Shot_Meta-Classifier_for_Poisoning_Detection_in_Federated_Learning_ICCV_2025_paper.pdf)

  * [You Are Your Own Best Teacher Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data](http://arxiv.org/abs/2503.06916)

  * [Personalized Federated Learning under Local Supervision](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Personalized_Federated_Learning_under_Local_Supervision_ICCV_2025_paper.pdf)
:star:[code](https://github.com/jqLi1626/FedSimSup)

  * [FedWSQ Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization](http://arxiv.org/abs/2506.23516)

  * [FedXDS Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Hoefler_FedXDS_Leveraging_Model_Attribution_Methods_to_counteract_Data_Heterogeneity_in_ICCV_2025_paper.pdf)
:star:[code](https://github.com/MaxH1996/FedXDS)

  * [FLSeg Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Su_FLSeg_Enhancing_Privacy_and_Robustness_in_Federated_Learning_under_Heterogeneous_ICCV_2025_paper.pdf)

  * [Find a Scapegoat Poisoning Membership Inference Attack and Defense to Federated Learning](http://arxiv.org/abs/2507.00423)

  * [Forgetting Through Transforming Enabling Federated Unlearning via Class-Aware Representation Transformation](http://arxiv.org/abs/2410.06848)
:star:[code](https://github.com/zhentian777/FUCRT)

  * [Latte Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning](http://arxiv.org/abs/2507.21494)
:star:[code](https://github.com/baowenxuan/Latte)

  * 联邦遗忘学习

    * [Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Stealthy_Backdoor_Attack_in_Federated_Learning_via_Adaptive_Layer-wise_Gradient_ICCV_2025_paper.pdf)
:star:[code](https://github.com/yqqhyqq/LGA)

* 元学习

  * [FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields](https://arxiv.org/pdf/2508.06301v1)

  * [Meta-Unlearning on Diffusion Models Preventing Relearning Unlearned Concepts](http://arxiv.org/abs/2410.12777)

* Out-of-Distribution Detection(分布外检测)

  * [Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention](https://arxiv.org/pdf/2507.01417v1)

  * [NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection](https://arxiv.org/pdf/2507.09795v1)
:star:[code](https://github.com/ah-ansari/NegRefine)

  * [FEVER-OOD Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Isaac-Medina_FEVER-OOD_Free_Energy_Vulnerability_Elimination_for_Robust_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)

  * [Beyond Pixel Uncertainty Bounding the OoD Objects in Road Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_Beyond_Pixel_Uncertainty_Bounding_the_OoD_Objects_in_Road_Scenes_ICCV_2025_paper.pdf)
:star:[code](https://github.com/huachao0124/DetSeg-official)

  * [ODP-Bench Benchmarking Out-of-Distribution Performance Prediction](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_ODP-Bench_Benchmarking_Out-of-Distribution_Performance_Prediction_ICCV_2025_paper.pdf)

  * [A Unified Interpretation of Training-Time Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Cheng_A_Unified_Interpretation_of_Training-Time_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)

  * [Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection](http://arxiv.org/abs/2507.10225)
:star:[code](https://github.com/Jarvisgivemeasuit/SynOOD)

  * [Activation Subspaces for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zongur_Activation_Subspaces_for_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)

  * [Diagnosing Pretrained Models for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Xiong_Diagnosing_Pretrained_Models_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)

  * [Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection](http://arxiv.org/abs/2510.10584)

  * [DisCoPatch Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Caetano_DisCoPatch_Taming_Adversarially-driven_Batch_Statistics_for_Improved_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)

  * [Secure On-Device Video OOD Detection Without Backpropagation](http://arxiv.org/abs/2503.06166)
:star:[code](https://github.com/Dystopians/SecDOOD)

  * [FA Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection](http://arxiv.org/abs/2507.04511)
:star:[code](https://github.com/0xFAFA/FA)

  * [Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adaptive_Prompt_Learning_via_Gaussian_Outlier_Synthesis_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)

  * [Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Miao_Auxiliary_Prompt_Tuning_of_Vision-Language_Models_for_Few-Shot_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)

* 异常检测

  * [Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts](https://arxiv.org/pdf/2507.16946v1)
:house:[project](https://doi.org/10.5281/zenodo.16283852)

  * [DecAD Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_DecAD_Decoupling_Anomalies_in_Latent_Space_for_Multi-Class_Unsupervised_Anomaly_ICCV_2025_paper.pdf)

  * [Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning](http://arxiv.org/abs/2508.02293)

  * [Wave-MambaAD Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Wave-MambaAD_Wavelet-driven_State_Space_Model_for_Multi-class_Unsupervised_Anomaly_Detection_ICCV_2025_paper.pdf)

  * [Debiasing Trace Guidance Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Debiasing_Trace_Guidance_Top-down_Trace_Distillation_and_Bottom-up_Velocity_Alignment_ICCV_2025_paper.pdf)

  * [MultiADS Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning](http://arxiv.org/abs/2504.06740)

  * [Triad Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Triad_Empowering_LMM-based_Anomaly_Detection_with_Expert-guided_Region-of-Interest_Tokenizer_and_ICCV_2025_paper.pdf)

  * [SALAD -- Semantics-Aware Logical Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Fucka_SALAD_--_Semantics-Aware_Logical_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/MaticFuc/SALAD)

  * [SiM3D Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark](http://arxiv.org/abs/2506.21549)

  * [Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection](http://arxiv.org/abs/2410.10289)

* 表征学习

  * [Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM) A Task-Adaptive Representation Learning Framework](https://openaccess.thecvf.com/content/ICCV2025/papers/Sharma_Multi-Modal_Multi-Task_Unified_Embedding_Model_M3T-UEM_A_Task-Adaptive_Representation_Learning_ICCV_2025_paper.pdf)

  * [LayerLock Non-collapsing Representation Learning with Progressive Freezing](http://arxiv.org/abs/2509.10156)

  * [CARL Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor](http://arxiv.org/abs/2506.04001)

  * [Pretrained Reversible Generation as Unsupervised Visual Representation Learning](http://arxiv.org/abs/2412.01787)
:house:[project](https://opendilab.github.io/PRG)

  * [Region-based Cluster Discrimination for Visual Representation Learning](https://arxiv.org/pdf/2507.20025v1)
:star:[code](https://github.com/deepglint/MVT)

  * [Gradient Extrapolation for Debiased Representation Learning](http://arxiv.org/abs/2503.13236)
:house:[project](https://gerne-debias.github.io/)

  * [Scaling Language-Free Visual Representation Learning](http://arxiv.org/abs/2504.01017)
:star:[code](https://github.com/facebookresearch/webssl)

  * [Q-Norm Robust Representation Learning via Quality-Adaptive Normalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Q-Norm_Robust_Representation_Learning_via_Quality-Adaptive_Normalization_ICCV_2025_paper.pdf)
:star:[code](https://github.com/IIP-Lab-XDU/Q-Norm)

  * [Scaling Omni-modal Pretraining with Multimodal Context Advancing Universal Representation Learning Across Modalities](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Scaling_Omni-modal_Pretraining_with_Multimodal_Context_Advancing_Universal_Representation_Learning_ICCV_2025_paper.pdf)

* 提示学习

  * [Advancing Textual Prompt Learning with Anchored Attributes](http://arxiv.org/abs/2412.09442)
:star:[code](https://github.com/zhengli97/ATPrompt)



## 38.Few/Zero-Shot Learning/DG/Adaptation(小/零样本/域泛化/适应)

* 零样本

  * [Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model](https://arxiv.org/pdf/2506.23822v1)
:star:[code](https://github.com/shiming-chen/LaZSL)

  * [OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference](https://arxiv.org/pdf/2507.02929v1)

  * [Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Language-Driven_Multi-Label_Zero-Shot_Learning_with_Semantic_Granularity_ICCV_2025_paper.pdf)

  * [A Conditional Probability Framework for Compositional Zero-shot Learning](http://arxiv.org/abs/2507.17377)

  * [SVIP Semantically Contextualized Visual Patches for Zero-Shot Learning](http://arxiv.org/abs/2503.10252)
:star:[code](https://github.com/uqzhichen/SVIP)

  * [Learning Visual Proxy for Compositional Zero-Shot Learning](http://arxiv.org/abs/2501.13859)

  * [Verbalized Representation Learning for Interpretable Few-Shot Generalization](http://arxiv.org/abs/2411.18651)

  * [Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hierarchical_Variational_Test-Time_Prompt_Generation_for_Zero-Shot_Generalization_ICCV_2025_paper.pdf)

* 小样本
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52cv/iccv-2025-papers

Awesome Lists containing this project

README