https://github.com/52cv/iccv-2025-papers
https://github.com/52cv/iccv-2025-papers
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/52cv/iccv-2025-papers
- Owner: 52CV
- Created: 2025-06-30T03:30:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-11-07T03:14:29.000Z (8 months ago)
- Last Synced: 2025-11-07T05:28:11.531Z (8 months ago)
- Size: 196 KB
- Stars: 24
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ICCV-2025-Papers

## 会议时间:2025年10月19日至23日
## 会议网址:https://iccv.thecvf.com/
## 查看2025年综述文献点这里↘️[2025-CV-Surveys](https://github.com/52CV/CV-Surveys)
## 2025 年论文分类汇总戳这里
↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)
↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)
↘️[ICCV-2025-Papers](https://github.com/52CV/ICCV-2025-Papers)
## 2024 年论文分类汇总戳这里
↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)
↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)
↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)
## [2023 年论文分类汇总戳这里](#0000)
## [2022 年论文分类汇总戳这里](#000)
## [2021 年论文分类汇总戳这里](#00)
## [2020 年论文分类汇总戳这里](#0)
## 已全部分类完
### 🏆最佳论文
* [Generating Physically Stable and Buildable Brick Structures from Text](http://arxiv.org/abs/2505.05469)
:house:[project](https://avalovelace1.github.io/BrickGPT) :house:[project](https://avalovelace1.github.io/BrickGPT/)
* [ICCV 2025 最佳论文公布!卡内基梅隆大学提出BrickGPT:文本生成实体积木,还能保证搭得稳!](https://zhuanlan.zhihu.com/p/81635387724)
## 目录
|:cat:|:dog:|:tiger:|:wolf:|
|------|------|------|------|
|[1.Other](#1)|[2.Image Progress(图像/视频处理)](#2)|[3.Super-Resolution(超分辨率)](#3)|[4.Image Captioning(图像字幕)](#4)|
|[5.Image Generation(图像生成)](#5)|[6.Image Segmentation(图像分割)](#6)|[7.Image Classification(图像分类)](#7)|[8.Image/Video Retrieval(图像/视频检索)](#8)|
|[9.Image/Video Compression(图像/视频压缩)](#9)|[10.Medical Image Progress(医学图像处理)](#10)|[11.Face](#11)|[12.Avatar](#12)|
|[13.Object Detection(目标检测) ](#13)|[14.Object Track(目标跟踪)](#14)|[15.pose](#15)|[16.Human Motion](#16)|
|[17.Action Recognition(动作识别)](17#)|[18.Re-Id(行人重识别)](18#)|[19.Video](19#)|[20.OCR](20#)|
|[21.UAV/RS/Satellite Image(无人机/遥感/卫星图像)](21#)|[22.3D](22#)|[23.Point Cloud(点云)](23#)|[24.Autonomous Driving(自动驾驶)](24#)|
|[25.HOI(人机交互)](#25)|[26.Robot](#26)|[27.Visual Question Answering(视觉问答)](#27)|[28.Optical Flow Estimation(光流估计)](#28)|
|[29.Deepfake Detection/AI生成图像检测](#29)|[30.Image Fusion(图像融合)](#30)|[31.Image Matching(图像匹配)](#31)|[32.Image Registration(图像配准)](#32)|
|[33.Keypoint Detection(关键点检测)](#33)|[34.Object Pose Estimation(物体姿态估计)](#34)|[35.Style Transfer(风格迁移)](#35)|[36.Scene Graph Generation(场景图生成)](#36)|
|[37.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#37)|[38.F/ZSL/DG/A(小/零样本/域泛化/适应)](#38)|[39.Machine learning(机器学习)](#39)|[40.Deep learning(深度学习)](#40)|
|[41.NAS(神经架构搜索)](#41)|[42.Vision Transformer](#42)|[43.Vision Language(视觉语言)](#43)|[44.Neural Radiance Fields](#44)|
|[45.Dataset](45#)|[46.Sound](46#)|[47.Animation(动画)](47#)|[48.Industrial Anomaly Detection(工业异常检测)](48#)|
|[49.biometric recognition(生物特征识别)](49#)|[50.Protecting copyright(保护版权)](50#)|[51.Visual Relationship Detection,VRD(视觉关系检测)](51#)|[52.Gaze](52#)|
|[53.Dense Prediction](53#)|[54.计算成像](54#)|
## 54.计算成像
* [IM360 Large-scale Indoor Mapping with 360 Cameras](http://arxiv.org/abs/2502.12545)
* [Multispectral Demosaicing via Dual Cameras](http://arxiv.org/abs/2503.22026)
* [Processing and acquisition traces in visual encoders What does CLIP know about your camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Ramos_Processing_and_acquisition_traces_in_visual_encoders_What_does_CLIP_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ryan-caesar-ramos/visual-encoder-traces)
* [Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras](http://arxiv.org/pdf/2506.22069v1)
* [Estimating 2D Camera Motion with Hybrid Motion Basis](https://arxiv.org/pdf/2507.22480v1)
:star:[code](https://lhaippp.github.io/CamFlow/)
:star:[code](https://github.com/lhaippp/camflow)
* [Image as an IMU Estimating Camera Motion from a Single Motion-Blurred Image](http://arxiv.org/abs/2503.17358)
* [AlignDiff Learning Physically-Grounded Camera Alignment via Diffusion](http://arxiv.org/abs/2503.21581)
* [TrajectoryCrafter Redirecting Camera Trajectory for Monocular Videos via Diffusion Models](http://arxiv.org/abs/2503.05638)
* [Super Resolved Imaging with Adaptive Optics](https://arxiv.org/pdf/2508.04648v1)
:house:[project](https://www.cs.toronto.edu/~robin/aosr/)
* [HccePose(BF) Predicting Front Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation](http://arxiv.org/abs/2510.10177)
* [RePoseD Efficient Relative Pose Estimation With Known Depth Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_RePoseD_Efficient_Relative_Pose_Estimation_With_Known_Depth_Information_ICCV_2025_paper.pdf)
:star:[code](https://github.com/kocurvik/mdrp)
* [Scaling 3D Compositional Models for Robust Classification and Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Scaling_3D_Compositional_Models_for_Robust_Classification_and_Pose_Estimation_ICCV_2025_paper.pdf)
* [DRaM-LHM A Quaternion Framework for Iterative Camera Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_DRaM-LHM_A_Quaternion_Framework_for_Iterative_Camera_Pose_Estimation_ICCV_2025_paper.pdf)
* [Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_Epipolar_Consistent_Attention_Aggregation_Network_for_Unsupervised_Light_Field_Disparity_ICCV_2025_paper.pdf)
* [TESPEC Temporally-Enhanced Self-Supervised Pretraining for Event Cameras](http://arxiv.org/abs/2508.00913)
:house:[project](https://mhdmohammadi.github.io/TESPEC_webpage)
* [Simultaneous Motion And Noise Estimation with Event Cameras](http://arxiv.org/abs/2504.04029)
:star:[code](https://github.com/tub-rip/ESMD) :house:[project](https://github.com/tub-rip/ESMD)
* [EventUPS Uncalibrated Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_EventUPS_Uncalibrated_Photometric_Stereo_Using_an_Event_Camera_ICCV_2025_paper.pdf)
* [GenDoP Auto-regressive Camera Trajectory Generation as a Director of Photography](http://arxiv.org/abs/2504.07083)
:house:[project](https://kszpxxzmc.github.io/GenDoP)
* [Inverse Image-Based Rendering for Light Field Generation from Single Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Jung_Inverse_Image-Based_Rendering_for_Light_Field_Generation_from_Single_Images_ICCV_2025_paper.pdf)
* [Princeton365 A Diverse Dataset with Accurate Camera Pose](http://arxiv.org/abs/2506.09035)
* [CF3 Compact and Fast 3D Feature Fields](http://arxiv.org/abs/2508.05254)
* [CCMNet Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy](http://arxiv.org/abs/2504.07959)
## 53.Dense Prediction
* [Frequency-Dynamic Attention Modulation for Dense Prediction](https://arxiv.org/pdf/2507.12006v1)
:star:[code](https://github.com/Linwei-Chen/FDAM)
* [FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment](https://arxiv.org/pdf/2506.22509v1)
:star:[code](https://github.com/xuhang07/FreeDNA)
* [ATAS Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2506.08678)
* [Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2412.06244)
:star:[code](https://github.com/HVision-NKU/DenseVLM)
* [Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction](http://arxiv.org/abs/2508.20376)
## 52.Gaze
* [Multi-view Gaze Target Estimation](https://arxiv.org/pdf/2508.05857v1)
:house:[project](https://www3.cs.stonybrook.edu/~cvl/multiview_gte.html)
* [Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction](https://arxiv.org/pdf/2507.23021v1)
:star:[code](https://aimagelab.github.io/ScanDiff)
:star:[code](https://github.com/aimagelab/scandiff)视觉注意力预测
* [Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths](https://openaccess.thecvf.com/content/ICCV2025/papers/Mondal_Gaze-Language_Alignment_for_Zero-Shot_Prediction_of_Visual_Search_Targets_from_ICCV_2025_paper.pdf)
* [What we need is explicit controllability Training 3D gaze estimator using only facial images](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_What_we_need_is_explicit_controllability_Training_3D_gaze_estimator_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ATinyBites/ControllableGaze)
## 51.Visual Relationship Detection,VRD(视觉关系检测)
* [ART: Adaptive Relation Tuning for Generalized Relation Prediction](https://arxiv.org/pdf/2507.23543v1)
## 50.Protecting copyright(保护版权)
* [TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity](https://arxiv.org/pdf/2506.23484v1)
* [Your Text Encoder Can Be An Object-Level Watermarking Controller](http://arxiv.org/abs/2503.11945)
* [SpecGuard Spectral Projection-based Advanced Invisible Watermarking](http://arxiv.org/abs/2510.07302)
:star:[code](https://github.com/inzamamulDU/SpecGuard_ICCV_2025)
* [Learning Robust Image Watermarking with Lossless Cover Recovery](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Learning_Robust_Image_Watermarking_with_Lossless_Cover_Recovery_ICCV_2025_paper.pdf)
:star:[code](https://github.com/chenoly/CRMark)
* [SynTag Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_SynTag_Enhancing_the_Geometric_Robustness_of_Inversion-based_Generative_Image_Watermarking_ICCV_2025_paper.pdf)
* [PlugMark A Plug-in Zero-Watermarking Framework for Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_PlugMark_A_Plug-in_Zero-Watermarking_Framework_for_Diffusion_Models_ICCV_2025_paper.pdf)
* [ROAR Reducing Inversion Error in Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_ROAR_Reducing_Inversion_Error_in_Generative_Image_Watermarking_ICCV_2025_paper.pdf)
* [SEAL Semantic Aware Image Watermarking](http://arxiv.org/abs/2503.12172)
* [Semantic Watermarking Reinvented Enhancing Robustness and Generation Quality with Fourier Integrity](http://arxiv.org/abs/2509.07647)
:star:[code](https://github.com/thomas11809/SFWMark)
* [Invisible Watermarks Visible Gains Steering Machine Unlearning with Bi-Level Watermarking Design](http://arxiv.org/abs/2508.10065)
* [TrustMark Robust Watermarking and Watermark Removal for Arbitrary Resolution Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Bui_TrustMark_Robust_Watermarking_and_Watermark_Removal_for_Arbitrary_Resolution_Images_ICCV_2025_paper.pdf)
* [Attention to Neural Plagiarism Diffusion Models Can Plagiarize Your Copyrighted Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Zou_Attention_to_Neural_Plagiarism_Diffusion_Models_Can_Plagiarize_Your_Copyrighted_ICCV_2025_paper.pdf)
:star:[code](https://github.com/zzzucf/Neural-Plagiarism)
* [From Imitation to Innovation The Emergence of AIs Unique Artistic Styles and the Challenge of Copyright Protection](https://openaccess.thecvf.com/content/ICCV2025/papers/Jia_From_Imitation_to_Innovation_The_Emergence_of_AIs_Unique_Artistic_ICCV_2025_paper.pdf)
## 49.biometric recognition(生物特征识别)
* [DisenQ: Disentangling Q-Former for Activity-Biometrics](https://arxiv.org/pdf/2507.07262v1)
* [A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition](https://arxiv.org/pdf/2508.00053v1)
* 指纹
* [Training-Free Personalization via Retrieval and Reasoning on Fingerprints](http://arxiv.org/abs/2503.18623)
* [DiffIP Representation Fingerprints for Robust IP Protection of Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_DiffIP_Representation_Fingerprints_for_Robust_IP_Protection_of_Diffusion_Models_ICCV_2025_paper.pdf)
* [Riemannian-Geometric Fingerprints of Generative Models](http://arxiv.org/abs/2506.22802)
## 48.Industrial Anomaly Detection(工业异常检测)
* [RareCLIP Rarity-aware Online Zero-shot Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/He_RareCLIP_Rarity-aware_Online_Zero-shot_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/hjf02/RareCLIP)
* [ReMP-AD Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Ma_ReMP-AD_Retrieval-enhanced_Multi-modal_Prompt_Fusion_for_Few-Shot_Industrial_Visual_Anomaly_ICCV_2025_paper.pdf)
:star:[code](https://github.com/cshcma/ReMP-AD.git)
* [G2SF Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Tao_G2SF_Geometry-Guided_Score_Fusion_for_Multimodal_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ctaoaa/G2SF)
* [Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC Dataset Construction Methodology and Application](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Anomaly_Detection_of_Integrated_Circuits_Package_Substrates_Using_the_Large_ICCV_2025_paper.pdf)
:star:[code](https://github.com/Bingyang0410/CPS2D-AD)
* [SeaS Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning](http://arxiv.org/abs/2410.14987)
:star:[code](https://github.com/HUST-SLOW/SeaS)
* [Kaputt A Large-Scale Dataset for Visual Defect Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Hofer_Kaputt_A_Large-Scale_Dataset_for_Visual_Defect_Detection_ICCV_2025_paper.pdf)
* [Training-Free Industrial Defect Generation with Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_Training-Free_Industrial_Defect_Generation_with_Diffusion_Models_ICCV_2025_paper.pdf)
* [DADet Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_DADet_Safeguarding_Image_Conditional_Diffusion_Models_against_Adversarial_and_Backdoor_ICCV_2025_paper.pdf)
* [Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation](http://arxiv.org/abs/2505.24431)
## 47.Animation(动画)
* [LayerAnimate: Layer-level Control for Animation](http://arxiv.org/abs/2501.08295)
* [Occlusion-robust Stylization for Drawing-based 3D Animation](https://arxiv.org/pdf/2508.00398v1)
* [Multi-Object Sketch Animation by Scene Decomposition and Motion Planning](http://arxiv.org/abs/2503.19351)
* [Animate Anyone 2 High-Fidelity Character Image Animation with Environment Affordance](http://arxiv.org/abs/2502.06145)
* [LongAnimation Long Animation Generation with Dynamic Global-Local Memory](http://arxiv.org/abs/2507.01945)
* [V2M4 4D Mesh Animation Reconstruction from a Single Monocular Video](http://arxiv.org/abs/2503.09631)
* [OmniHuman-1 Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_OmniHuman-1_Rethinking_the_Scaling-Up_of_One-Stage_Conditioned_Human_Animation_Models_ICCV_2025_paper.pdf)
* [Multi-identity Human Image Animation with Structural Video Diffusion](http://arxiv.org/abs/2504.04126)
:star:[code](https://github.com/zhenzhiwang/Multi-HumanVid)
* [Perception-as-Control Fine-grained Controllable Image Animation with 3D-aware Motion Representation](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Perception-as-Control_Fine-grained_Controllable_Image_Animation_with_3D-aware_Motion_Representation_ICCV_2025_paper.pdf)
* [DreamActor-M1 Holistic Expressive and Robust Human Image Animation with Hybrid Guidance](https://openaccess.thecvf.com/content/ICCV2025/papers/Luo_DreamActor-M1_Holistic_Expressive_and_Robust_Human_Image_Animation_with_Hybrid_ICCV_2025_paper.pdf)
* [Ponimator Unfolding Interactive Pose for Versatile Human-human Interaction Animation](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Ponimator_Unfolding_Interactive_Pose_for_Versatile_Human-human_Interaction_Animation_ICCV_2025_paper.pdf)
:house:[project](https://stevenlsw.github.io/ponimator)
## 46.Sound
* [Music Grounding by Short Video](http://arxiv.org/abs/2408.16990)
* [VGGSounder Audio-Visual Evaluations for Foundation Models](http://arxiv.org/abs/2508.08237)
* [AV-Flow Transforming Text to Audio-Visual Human-like Interactions](https://openaccess.thecvf.com/content/ICCV2025/papers/Chatziagapi_AV-Flow_Transforming_Text_to_Audio-Visual_Human-like_Interactions_ICCV_2025_paper.pdf)
* [MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing](https://arxiv.org/pdf/2507.01384v1)
:star:[code](https://github.com/WangLY136/MUG)
* [What's Making That Sound Right Now? Video-centric Audio-Visual Localization](https://arxiv.org/pdf/2507.04667v1)
:star:[code](https://hahyeon610.github.io/Video-centric_Audio_Visual_Localization/)
* [Implicit Counterfactual Learning for Audio-Visual Segmentation](https://arxiv.org/pdf/2507.20740v1)
* [Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation](https://arxiv.org/pdf/2507.22886v1)
:house:[project](https://henghuiding.com/OmniAVS/)
* [Zero-AVSR Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations](https://openaccess.thecvf.com/content/ICCV2025/papers/Yeo_Zero-AVSR_Zero-Shot_Audio-Visual_Speech_Recognition_with_LLMs_by_Learning_Language-Agnostic_ICCV_2025_paper.pdf)
* [Not Only Vision Evolve Visual Speech Recognition via Peripheral Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Not_Only_Vision_Evolve_Visual_Speech_Recognition_via_Peripheral_Information_ICCV_2025_paper.pdf)
* [CogCM Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_CogCM_Cognition-Inspired_Contextual_Modeling_for_Audio-Visual_Speech_Enhancement_ICCV_2025_paper.pdf)
* [How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_How_Do_Optical_Flow_and_Textual_Prompts_Collaborate_to_Assist_ICCV_2025_paper.pdf)
* [TAViS Text-bridged Audio-Visual Segmentation with Foundation Models](http://arxiv.org/abs/2506.11436)
* [AV-Link Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Haji-Ali_AV-Link_Temporally-Aligned_Diffusion_Features_for_Cross-Modal_Audio-Video_Generation_ICCV_2025_paper.pdf)
* [AURELIA Test-time Reasoning Distillation in Audio-Visual LLMs](http://arxiv.org/abs/2503.23219)
* [p-AVAS Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_p-AVAS_Can_Physics-Integrated_Audio-Visual_Modeling_Boost_Neural_Acoustic_Synthesis_ICCV_2025_paper.pdf)
* [TARO Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis](http://arxiv.org/abs/2504.05684)
* [VAFlow Video-to-Audio Generation with Cross-Modality Flow Matching](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_VAFlow_Video-to-Audio_Generation_with_Cross-Modality_Flow_Matching_ICCV_2025_paper.pdf)
* [Shot-by-Shot Film-Grammar-Aware Training-Free Audio Description Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Shot-by-Shot_Film-Grammar-Aware_Training-Free_Audio_Description_Generation_ICCV_2025_paper.pdf)
* [AVTrustBench Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs](http://arxiv.org/abs/2501.02135)
* 合成语音检测
* [Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Anshul_Intra-modal_and_Cross-modal_Synchronization_for_Audio-visual_Deepfake_Detection_and_Temporal_ICCV_2025_paper.pdf)
## 45.Dataset
* [Context-Aware Academic Emotion Dataset and Benchmark](https://arxiv.org/pdf/2507.00586v1)
:star:[code](https://zgsfer.github.io/CAER)
* [ROADWork A Dataset and Benchmark for Learning to Recognize Observe Analyze and Drive Through Work Zones](https://openaccess.thecvf.com/content/ICCV2025/papers/Ghosh_ROADWork_A_Dataset_and_Benchmark_for_Learning_to_Recognize_Observe_ICCV_2025_paper.pdf)
* [4D-Bench Benchmarking Multi-modal Large Language Models for 4D Object Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_4D-Bench_Benchmarking_Multi-modal_Large_Language_Models_for_4D_Object_Understanding_ICCV_2025_paper.pdf)
* [Bias in Gender Bias Benchmarks How Spurious Features Distort Evaluation](http://arxiv.org/abs/2509.07596)
* 基准
* [IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark](https://arxiv.org/pdf/2507.14449v1)
* [Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding](https://arxiv.org/pdf/2507.15028v1)
:star:[code](https://zhangyuanhan-ai.github.io/video-tt/)
* [One Object Multiple Lies A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models](http://arxiv.org/abs/2507.07709)
* [Beyond the Destination A Novel Benchmark for Exploration-Aware Embodied Question Answering](http://arxiv.org/abs/2503.11117)
* [JailbreakDiffBench A Comprehensive Benchmark for Jailbreaking Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_JailbreakDiffBench_A_Comprehensive_Benchmark_for_Jailbreaking_Diffusion_Models_ICCV_2025_paper.pdf)
* [MMReason An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI](http://arxiv.org/abs/2506.23563)
* [GRAB A Challenging GRaph Analysis Benchmark for Large Multimodal Models](http://arxiv.org/abs/2408.11817)
* [INS-MMBench A Comprehensive Benchmark for Evaluating LVLMs Performance in Insurance](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_INS-MMBench_A_Comprehensive_Benchmark_for_Evaluating_LVLMs_Performance_in_Insurance_ICCV_2025_paper.pdf)
:star:[code](https://github.com/FDU-INS/INS-MMBench)
* [MIEB Massive Image Embedding Benchmark](http://arxiv.org/abs/2504.10471)
:star:[code](https://github.com/embeddings-benchmark/mteb)
* [LVBench An Extreme Long Video Understanding Benchmark](http://arxiv.org/abs/2406.08035)
* [ProJudge A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges](http://arxiv.org/abs/2503.06553)
* [From Abyssal Darkness to Blinding Glare A Benchmark on Extreme Exposure Correction in Real World](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_From_Abyssal_Darkness_to_Blinding_Glare_A_Benchmark_on_Extreme_ICCV_2025_paper.pdf)
:star:[code](https://github.com/juvenoia/REED)
* [Beyond Walking A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search](http://arxiv.org/abs/2411.17776)
* [MultiVerse A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_MultiVerse_A_Multi-Turn_Conversation_Benchmark_for_Evaluating_Large_Vision_and_ICCV_2025_paper.pdf)
* [Extrapolated Urban View Synthesis Benchmark](http://arxiv.org/abs/2412.05256)
* [WorldScore A Unified Evaluation Benchmark for World Generation](http://arxiv.org/abs/2504.00983)
:house:[project](https://haoyi-duan.github.io/WorldScore)
* [ICE-Bench A Unified and Comprehensive Benchmark for Image Creating and Editing](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_ICE-Bench_A_Unified_and_Comprehensive_Benchmark_for_Image_Creating_and_ICCV_2025_paper.pdf)
* [MVGBench a Comprehensive Benchmark for Multi-view Generation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_MVGBench_a_Comprehensive_Benchmark_for_Multi-view_Generation_Models_ICCV_2025_paper.pdf)
* 数据集
* [Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning](https://arxiv.org/pdf/2507.04790v1)
* [ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users](https://arxiv.org/pdf/2507.10223v1)
:star:[code](https://github.com/pittisl/ProGait)
:house:[project](https://huggingface.co/datasets/ericyxy98/ProGait)
* [DiffTell A High-Quality Dataset for Describing Image Manipulation Changes](https://openaccess.thecvf.com/content/ICCV2025/papers/Di_DiffTell_A_High-Quality_Dataset_for_Describing_Image_Manipulation_Changes_ICCV_2025_paper.pdf)
* [CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling](https://arxiv.org/pdf/2507.12591v1)
* [Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions](https://arxiv.org/pdf/2508.04681v1)
:star:[code](https://liangxuy.github.io/InterVLA/)
:star:[code](https://github.com/liangxuy/intervla)
* [HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis](https://arxiv.org/pdf/2508.09137v1)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/)
* [Dataset Ownership Verification for Pre-trained Masked Models](http://arxiv.org/abs/2507.12022)
:star:[code](https://github.com/xieyc99/DOV4MM)
* [Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset](http://arxiv.org/abs/2507.05728)
:star:[code](https://github.com/rfww/uevs)
* [BlueNeg A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_BlueNeg_A_35mm_Negative_Film_Dataset_for_Restoring_Channel-Heterogeneous_Deterioration_ICCV_2025_paper.pdf)
* [CMB-ML A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task](https://openaccess.thecvf.com/content/ICCV2025/papers/Amato_CMB-ML_A_Cosmic_Microwave_Background_Dataset_for_the_Oldest_Possible_ICCV_2025_paper.pdf)
:star:[code](https://github.com/CMB-ML/cmb-ml)
* [UAVScenes A Multi-Modal Dataset for UAVs](http://arxiv.org/abs/2507.22412)
:star:[code](https://github.com/sijieaaa/UAVScenes)
* [UDC-VIT A Real-World Video Dataset for Under-Display Cameras](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahn_UDC-VIT_A_Real-World_Video_Dataset_for_Under-Display_Cameras_ICCV_2025_paper.pdf)
* [Towards Comprehensive Lecture Slides Understanding Large-scale Dataset and Effective Method](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Towards_Comprehensive_Lecture_Slides_Understanding_Large-scale_Dataset_and_Effective_Method_ICCV_2025_paper.pdf)
* [R-LiViT A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Mirlach_R-LiViT_A_LiDAR-Visual-Thermal_Dataset_Enabling_Vulnerable_Road_User_Focused_Roadside_ICCV_2025_paper.pdf)
* [MEH A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Golyadkin_MEH_A_Multi-Style_Dataset_and_Toolkit_for_Advancing_Egyptian_Hieroglyph_ICCV_2025_paper.pdf)
* [3DRealCar An In-the-wild RGB-D Car Dataset with 360-degree Views](https://openaccess.thecvf.com/content/ICCV2025/papers/Du_3DRealCar_An_In-the-wild_RGB-D_Car_Dataset_with_360-degree_Views_ICCV_2025_paper.pdf)
* [PBFG A New Physically-Based Dataset and Removal of Lens Flares and Glares](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_PBFG_A_New_Physically-Based_Dataset_and_Removal_of_Lens_Flares_ICCV_2025_paper.pdf)
* [Feature Coding in the Era of Large Models Dataset Test Conditions and Benchmark](http://arxiv.org/abs/2412.04307)
:star:[code](https://github.com/chansongoal/LaMoFC)
* [Modeling Saliency Dataset Bias](https://openaccess.thecvf.com/content/ICCV2025/papers/Kummerer_Modeling_Saliency_Dataset_Bias_ICCV_2025_paper.pdf)
* [TrackVerse A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_TrackVerse_A_Large-Scale_Object-Centric_Video_Dataset_for_Image-Level_Representation_Learning_ICCV_2025_paper.pdf)
* [OpenSubstance A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes](https://openaccess.thecvf.com/content/ICCV2025/papers/Pei_OpenSubstance_A_High-quality_Measured_Dataset_of_Multi-View_and_-Lighting_Images_ICCV_2025_paper.pdf)
:house:[project](https://opensubstance.github.io/)
* [MMAT-1M A Large Reasoning Dataset for Multimodal Agent Tuning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_MMAT-1M_A_Large_Reasoning_Dataset_for_Multimodal_Agent_Tuning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/VIS-MPU-Agent/MMAT-1M)
* [ImageGem In-the-wild Generative Image Interaction Dataset for Generative Model Personalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Guo_ImageGem_In-the-wild_Generative_Image_Interaction_Dataset_for_Generative_Model_Personalization_ICCV_2025_paper.pdf)
* [LANGTRAJ Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation](http://arxiv.org/abs/2504.11521)
:house:[project](https://langtraj.github.io/)
* [LightCity An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_LightCity_An_Urban_Dataset_for_Outdoor_Inverse_Rendering_and_Reconstruction_ICCV_2025_paper.pdf)
* [CULTURE3D A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering](http://arxiv.org/abs/2501.06927)
* [A Real-world Display Inverse Rendering Dataset](http://arxiv.org/abs/2508.14411)
:house:[project](https://michaelcsj.github.io/DIR)
* 数据蒸馏
* [CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation](https://arxiv.org/pdf/2506.22637v1)
:star:[code](https://github.com/hatchetProject/CaO2)
* [Dataset Distillation via Vision-Language Category Prototype](https://arxiv.org/pdf/2506.23580v1)
:star:[code](https://github.com/zou-yawen/Dataset-Distillation-via-Vision-Language-Category-Prototype/)
* [Dataset Distillation as Data Compression: A Rate-Utility Perspective](https://arxiv.org/pdf/2507.17221v1)
* [Heavy Labels Out Dataset Distillation with Label Space Lightening](http://arxiv.org/abs/2408.08201)
:star:[code](https://github.com/Lexie-YU/HeLlO)
* [Dataset Distillation via the Wasserstein Metric](http://arxiv.org/abs/2311.18531)
:star:[code](https://github.com/Liu-Hy/WMDD) :house:[project](https://liu-hy.github.io/WMDD)
* [Diversity-Enhanced Distribution Alignment for Dataset Distillation](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Diversity-Enhanced_Distribution_Alignment_for_Dataset_Distillation_ICCV_2025_paper.pdf)
* [Improving Noise Efficiency in Privacy-preserving Dataset Distillation](http://arxiv.org/abs/2508.01749)
## 44.Neural Radiance Fields
* [UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields](http://arxiv.org/pdf/2506.21884v1)
:house:[project](https://www.factral.co/UnMix-NeRF)
* [LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling](https://arxiv.org/pdf/2507.02363v1)
:star:[code](https://wujh2001.github.io/LocalDyGS/)
* [DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF](https://arxiv.org/pdf/2507.14596v1)
* [A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields](https://arxiv.org/pdf/2507.04408v1)
* [NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement](https://arxiv.org/pdf/2507.12714v1)
:star:[code](https://neuraleaf-yang.github.io/)
* [MuGS Multi-Baseline Generalizable Gaussian Splatting Reconstruction](http://arxiv.org/abs/2508.04297)
:star:[code](https://github.com/EuclidLou/MuGS)
* [UniVerse Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction](http://arxiv.org/abs/2510.01669)
* 渲染
* [BokehDiff: Neural Lens Blur with One-Step Diffusion](https://arxiv.org/pdf/2507.18060v1)
* [OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering](http://arxiv.org/abs/2503.16177)
:house:[project](https://occlugaussian.github.io)
* [ReCamMaster Camera-Controlled Generative Rendering from A Single Video](http://arxiv.org/abs/2503.11647)
* [Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Tourani_Leveraging_2D_Priors_and_SDF_Guidance_for_Urban_Scene_Rendering_ICCV_2025_paper.pdf)
* [Bokehlicious Photorealistic Bokeh Rendering with Controllable Apertures](http://arxiv.org/abs/2503.16067)
* [UNIS A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Deng_UNIS_A_Unified_Framework_for_Achieving_Unbiased_Neural_Implicit_Surfaces_ICCV_2025_paper.pdf)
* [Stochastic Gradient Estimation for Higher-Order Differentiable Rendering](http://arxiv.org/abs/2412.03489)
* [Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity](http://arxiv.org/abs/2507.15775)
* [FonTS Text Rendering With Typography and Style Controls](http://arxiv.org/abs/2412.00136)
* [Differentiable Room Acoustic Rendering with Multi-View Vision Priors](http://arxiv.org/abs/2504.21847)
* 逆向渲染
* [Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues](https://arxiv.org/pdf/2507.23162v1)
* [Ouroboros Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering](http://arxiv.org/abs/2508.14461)
* [Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns](https://openaccess.thecvf.com/content/ICCV2025/papers/Urakawa_Neural_Inverse_Rendering_for_High-Accuracy_3D_Measurement_of_Moving_Objects_ICCV_2025_paper.pdf)
* [InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling](https://arxiv.org/pdf/2507.17613v1)
* [DNF-Intrinsic Deterministic Noise-Free Diffusion for Indoor Inverse Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_DNF-Intrinsic_Deterministic_Noise-Free_Diffusion_for_Indoor_Inverse_Rendering_ICCV_2025_paper.pdf)
:star:[code](https://github.com/OnlyZZZZ/DNF-Intrinsic)
* NVS
* [FVGen Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation](http://arxiv.org/abs/2508.06392)
* [E-NeMF Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_E-NeMF_Event-based_Neural_Motion_Field_for_Novel_Space-time_View_Synthesis_ICCV_2025_paper.pdf)
* [Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis](http://arxiv.org/abs/2411.00144)
:house:[project](https://sailor-z.github.io/projects)
* [RayZer A Self-supervised Large View Synthesis Model](http://arxiv.org/abs/2505.00702)
* [BillBoard Splatting (BBSplat) Learnable Textured Primitives for Novel View Synthesis](http://arxiv.org/abs/2411.08508)
* [WAVE Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image](http://arxiv.org/abs/2506.23518)
* [UniGS Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images](http://arxiv.org/abs/2410.13195)
:star:[code](https://github.com/jwubz123/UNIG)
* [Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Nair_Scaling_Transformer-Based_Novel_View_Synthesis_with_Models_Token_Disentanglement_and_ICCV_2025_paper.pdf)
* [SEHDR Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing](http://arxiv.org/abs/2509.20400)
* [RayGaussX Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis](http://arxiv.org/abs/2509.07782)
## 43.Vision Language(视觉语言)
* [Improving Large Vision and Language Models by Learning from a Panel of Peers](http://arxiv.org/abs/2509.01610)
* [DASH Detection and Assessment of Systematic Hallucinations of VLMs](http://arxiv.org/abs/2503.23573)
* [Vision-Language Models Cant See the Obvious](https://openaccess.thecvf.com/content/ICCV2025/papers/Huynh_Vision-Language_Models_Cant_See_the_Obvious_ICCV_2025_paper.pdf)
* [Web Artifact Attacks Disrupt Vision Language Models](http://arxiv.org/abs/2503.13652)
:star:[code](https://github.com/mqraitem/Web-Artifact-Attacks)
* [ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models](https://arxiv.org/pdf/2507.00898v1)
:star:[code](https://github.com/zifuwan/ONLY)
:star:[code](https://zifuwan.github.io/ONLY/)
* [VLM4D Towards Spatiotemporal Awareness in Vision Language Models](http://arxiv.org/abs/2508.02095)
* [WalkVLM Aid Visually Impaired People Walking by Vision Language Model](http://arxiv.org/abs/2412.20903)
* [ViLU: Learning Vision-Language Uncertainties for Failure Prediction](https://arxiv.org/pdf/2507.07620v1)
:star:[code](https://github.com/ykrmm/ViLU)
* [PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection](https://arxiv.org/pdf/2507.08979v1)
:star:[code](https://github.com/MahdiyarMM/PRISM)
* [One Last Attention for Your Vision-Language Model](https://arxiv.org/pdf/2507.15480v1)
:star:[code](https://github.com/khufia/RAda/tree/main)
* [Hierarchical Cross-modal Prompt Learning for Vision-Language Models](https://arxiv.org/pdf/2507.14976v1)
:star:[code](https://github.com/zzeoZheng/HiCroPL)
* [METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models](https://arxiv.org/pdf/2507.20842v1)
:star:[code](https://github.com/YuchenLiu98/METEOR)
* [ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking](https://arxiv.org/pdf/2507.19875v1)
:star:[code](https://github.com/XiaokunFeng/ATCTrack)
* [AgroBench: Vision-Language Model Benchmark in Agriculture](https://arxiv.org/pdf/2507.20519v1)
:star:[code](https://dahlian00.github.io/AgroBenchPage/)
* [MM-IFEngine Towards Multimodal Instruction Following](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_MM-IFEngine_Towards_Multimodal_Instruction_Following_ICCV_2025_paper.pdf)
* [Robustifying Zero-Shot Vision Language Models by Subspaces Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Robustifying_Zero-Shot_Vision_Language_Models_by_Subspaces_Alignment_ICCV_2025_paper.pdf)
* [FDPT Federated Discrete Prompt Tuning for Black-Box Visual-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_FDPT_Federated_Discrete_Prompt_Tuning_for_Black-Box_Visual-Language_Models_ICCV_2025_paper.pdf)
* [Griffon v2 Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring](http://arxiv.org/abs/2403.09333)
:star:[code](https://github.com/jefferyZhan/Griffon)
* [CLIP-GS Unifying Vision-Language Representation with 3D Gaussian Splatting](https://openaccess.thecvf.com/content/ICCV2025/papers/Jiao_CLIP-GS_Unifying_Vision-Language_Representation_with_3D_Gaussian_Splatting_ICCV_2025_paper.pdf)
* [Growing a Twig to Accelerate Large Vision-Language Models](http://arxiv.org/abs/2503.14075)
* [Test-Time Retrieval-Augmented Adaptation for Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Fan_Test-Time_Retrieval-Augmented_Adaptation_for_Vision-Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/xinqi-fan/TT-RAA)
* [Understanding Museum Exhibits using Vision-Language Reasoning](http://arxiv.org/abs/2412.01370)
* [One Perturbation is Enough On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models](http://arxiv.org/abs/2406.05491)
* [When Lighting Deceives Exposing Vision-Language Models Illumination Vulnerability Through Illumination Transformation Attack](http://arxiv.org/abs/2503.06903)
* [Target Bias Is All You Need Zero-Shot Debiasing of Vision-Language Models with Bias Corpus](https://openaccess.thecvf.com/content/ICCV2025/papers/Jang_Target_Bias_Is_All_You_Need_Zero-Shot_Debiasing_of_Vision-Language_ICCV_2025_paper.pdf)
* [TAB Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models](http://arxiv.org/abs/2412.18675)
* [Feather the Throttle Revisiting Visual Token Pruning for Vision-Language Model Acceleration](http://arxiv.org/abs/2412.13180)
* [Derm1M A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology](http://arxiv.org/abs/2503.14911)
:star:[code](https://github.com/SiyuanYan1/Derm1M)
* [ReCoT Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Qu_ReCoT_Reflective_Self-Correction_Training_for_Mitigating_Confirmation_Bias_in_Large_ICCV_2025_paper.pdf)
* [AutoOcc Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting](http://arxiv.org/abs/2502.04981)
* [D-Attn Decomposed Attention for Large Vision-and-Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Kuo_D-Attn_Decomposed_Attention_for_Large_Vision-and-Language_Model_ICCV_2025_paper.pdf)
:star:[code](https://github.com/bytedance/DecomposedAttention)
* [Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_Deciphering_Cross-Modal_Alignment_in_Large_Vision-Language_Models_via_Modality_Integration_ICCV_2025_paper.pdf)
:star:[code](https://github.com/shikiw/Modality-Integration-Rate)
* [Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Fuzzy_Contrastive_Decoding_to_Alleviate_Object_Hallucination_in_Large_Vision-Language_ICCV_2025_paper.pdf)
* [IDEATOR Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves](http://arxiv.org/abs/2411.00827)
* [25 Years in Class A Multimodal Textbook for Vision-Language Pretraining](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_2.5_Years_in_Class_A_Multimodal_Textbook_for_Vision-Language_Pretraining_ICCV_2025_paper.pdf)
* [Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features](http://arxiv.org/abs/2412.00142)
* [FedMVP Federated Multimodal Visual Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2504.20860)
:star:[code](https://github.com/mainaksingha01/FedMVP)
* [Physics Context Builders A Modular Framework for Physical Reasoning in Vision-Language Models](http://arxiv.org/abs/2412.08619)
* [VLRMBench A Comprehensive and Challenging Benchmark for Vision-Language Reward Models](http://arxiv.org/abs/2503.07478)
:star:[code](https://github.com/JCruan519/VLRMBench)
* [ZipVL Accelerating Vision-Language Models through Dynamic Token Sparsity](https://openaccess.thecvf.com/content/ICCV2025/papers/He_ZipVL_Accelerating_Vision-Language_Models_through_Dynamic_Token_Sparsity_ICCV_2025_paper.pdf)
* [Skip-Vision Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Skip-Vision_Efficient_and_Scalable_Acceleration_of_Vision-Language_Models_via_Adaptive_ICCV_2025_paper.pdf)
* [SAUCE Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders](http://arxiv.org/abs/2503.14530)
* [The Inter-Intra Modal Measure A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models](http://arxiv.org/abs/2407.15731)
:star:[code](https://github.com/mit-ll/IIMM)
* [MaTVLM Hybrid Mamba-Transformer for Efficient Vision-Language Modeling](http://arxiv.org/abs/2503.13440)
:star:[code](https://github.com/hustvl/MaTVLM)
* [Safeguarding Vision-Language Models Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks](http://arxiv.org/abs/2504.01308)
:star:[code](https://github.com/JarvisUSTC/DiffPure-RobustVLM)
* [Dynamic Multimodal Prototype Learning in Vision-Language Models](http://arxiv.org/abs/2507.03657)
* [GEOBench-VLM Benchmarking Vision-Language Models for Geospatial Tasks](https://openaccess.thecvf.com/content/ICCV2025/papers/Danish_GEOBench-VLM_Benchmarking_Vision-Language_Models_for_Geospatial_Tasks_ICCV_2025_paper.pdf)
* [Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models](http://arxiv.org/abs/2405.14715)
* [V2PE Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding](http://arxiv.org/abs/2412.09616)
* [DexVLG Dexterous Vision-Language-Grasp Model at Scale](http://arxiv.org/abs/2507.02747)
* [Vision-Language Neural Graph Featurization for Extracting Retinal Lesions](https://openaccess.thecvf.com/content/ICCV2025/papers/Hassan_Vision-Language_Neural_Graph_Featurization_for_Extracting_Retinal_Lesions_ICCV_2025_paper.pdf)
* [MotionCtrl A Real-time Controllable Vision-Language-Motion Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_MotionCtrl_A_Real-time_Controllable_Vision-Language-Motion_Model_ICCV_2025_paper.pdf)
* [Breaking the Encoder Barrier for Seamless Video-Language Understanding](http://arxiv.org/abs/2503.18422)
* [OphCLIP Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining](http://arxiv.org/abs/2411.15421)
* [How Can Objects Help Video-Language Understanding](http://arxiv.org/abs/2504.07454)
:star:[code](https://github.com/brown-palm/ObjectMLLM)
* [Factorized Learning for Temporally Grounded Video-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Factorized_Learning_for_Temporally_Grounded_Video-Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/nusnlp/d2vlm)
* [Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models](http://arxiv.org/abs/2508.01225)
* [AdvDreamer Unveils Are Vision-Language Models Truly Ready for Real-World 3D Variations](http://arxiv.org/abs/2412.03002)
* [HQ-CLIP Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_HQ-CLIP_Leveraging_Large_Vision-Language_Models_to_Create_High-Quality_Image-Text_Datasets_ICCV_2025_paper.pdf)
* [Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation](http://arxiv.org/abs/2504.17207)
* [The Scalability of Simplicity Empirical Analysis of Vision-Language Learning with a Single Transformer](http://arxiv.org/abs/2504.10462)
:star:[code](https://github.com/bytedance/SAIL)
* [EVEv2 Improved Baselines for Encoder-Free Vision-Language Models](http://arxiv.org/abs/2502.06788)
:star:[code](https://github.com/baaivision/EVE)
* [TruthPrInt Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention](https://openaccess.thecvf.com/content/ICCV2025/papers/Duan_TruthPrInt_Mitigating_Large_Vision-Language_Models_Object_Hallucination_Via_Latent_Truthful-Guided_ICCV_2025_paper.pdf)
* [Structured Policy Optimization Enhance Large Vision-Language Model via Self-referenced Dialogue](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Structured_Policy_Optimization_Enhance_Large_Vision-Language_Model_via_Self-referenced_Dialogue_ICCV_2025_paper.pdf)
* [Causality-guided Prompt Learning for Vision-language Models via Visual Granulation](http://arxiv.org/abs/2509.03803)
:star:[code](https://github.com/GaoMY-521/CaPL_Code)
* [CalliReader Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model](http://arxiv.org/abs/2503.06472)
* [Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma](http://arxiv.org/abs/2503.12496)
* [Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images](http://arxiv.org/abs/2508.15256)
* [Uncertainty-Driven Expert Control Enhancing the Reliability of Medical Vision-Language Models](http://arxiv.org/abs/2507.09209)
* [Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kang_Dynamic_Multi-Layer_Null_Space_Projection_for_Vision-Language_Continual_Learning_ICCV_2025_paper.pdf)
* [Learning Beyond Still Frames Scaling Vision-Language Models with Video](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Learning_Beyond_Still_Frames_Scaling_Vision-Language_Models_with_Video_ICCV_2025_paper.pdf)
* [GLEAM Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_GLEAM_Enhanced_Transferable_Adversarial_Attacks_for_Vision-Language_Pre-training_Models_via_ICCV_2025_paper.pdf)
:star:[code](https://github.com/LuckAlex/GLEAM)
* [INTER Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling](http://arxiv.org/abs/2507.05056)
:star:[code](https://github.com/xxxxx313/INTER)
* [SmolDocling An ultra-compact vision-language model for end-to-end multi-modal document conversion](http://arxiv.org/abs/2503.11576)
* VLN
* [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://arxiv.org/pdf/2507.13019v1)
:star:[code](https://crystalsixone.github.io/vln_pe.github.io/)
* [monoVLN Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_monoVLN_Bridging_the_Observation_Gap_between_Monocular_and_Panoramic_Vision_ICCV_2025_paper.pdf)
* [NavQ Learning a Q-Model for Foresighted Vision-and-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_NavQ_Learning_a_Q-Model_for_Foresighted_Vision-and-Language_Navigation_ICCV_2025_paper.pdf)
* [COSMO Combination of Selective Memorization for Low-cost Vision-and-Language Navigation](http://arxiv.org/abs/2503.24065)
:star:[code](https://github.com/siqiZ805/VLN-COSMO.git)
* [NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments](https://arxiv.org/pdf/2506.23468v1)
:star:[code](https://github.com/Feliciaxyao/NavMorph)
* [3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_3D_Gaussian_Map_with_Open-Set_Semantic_Grouping_for_Vision-Language_Navigation_ICCV_2025_paper.pdf)
* LLM
* [LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching](https://arxiv.org/pdf/2506.23502v1)
* [Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching](http://arxiv.org/abs/2503.14953)
* [Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs](https://arxiv.org/pdf/2507.07990v1)
:house:[project](https://www.jshyun.me/projects/sttm)
* [Why LVLMs Are More Prone to Hallucinations in Longer Responses The Role of Context](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_Why_LVLMs_Are_More_Prone_to_Hallucinations_in_Longer_Responses_ICCV_2025_paper.pdf)
* [Zeroth-Order Fine-Tuning of LLMs in Random Subspaces](http://arxiv.org/abs/2410.08989)
:star:[code](https://github.com/zimingyy/SubZero)
* [Advancing Visual Large Language Model for Multi-granular Versatile Perception](https://arxiv.org/pdf/2507.16213v1)
:star:[code](https://github.com/xiangwentao666/MVP-LM)
* [DisTime Distribution-based Time Representation for Video Large Language Models](http://arxiv.org/abs/2505.24329)
:star:[code](https://github.com/josephzpng/DisTime)
* [Aligning Effective Tokens with Video Anomaly in Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Aligning_Effective_Tokens_with_Video_Anomaly_in_Large_Language_Models_ICCV_2025_paper.pdf)
* [MeshLLM Empowering Large Language Models to Progressively Understand and Generate 3D Mesh](http://arxiv.org/abs/2508.01242)
* [FOLDER Accelerating Multi-Modal Large Language Models with Enhanced Performance](http://arxiv.org/abs/2501.02430)
:star:[code](https://github.com/anakin-skywalker-Joseph/Folder)
* [B-VLLM A Vision Large Language Model with Balanced Spatio-Temporal Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_B-VLLM_A_Vision_Large_Language_Model_with_Balanced_Spatio-Temporal_Tokens_ICCV_2025_paper.pdf)
* [Robin3D Improving 3D Large Language Model via Robust Instruction Tuning](http://arxiv.org/abs/2410.00255)
* [GenieBlue Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices](http://arxiv.org/abs/2503.06019)
* [CATP-LLM Empowering Large Language Models for Cost-Aware Tool Planning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_CATP-LLM_Empowering_Large_Language_Models_for_Cost-Aware_Tool_Planning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/duowuyms/OpenCATP-LLM)
* [Multimodal LLM Guided Exploration and Active Mapping using Fisher Information](http://arxiv.org/abs/2410.17422)
* [Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Multimodal_Large_Language_Model-Guided_ISP_Hyperparameter_Optimization_with_Dynamic_Preference_ICCV_2025_paper.pdf)
* [Aligning Vision to Language Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning](http://arxiv.org/abs/2503.12972)
:star:[code](https://github.com/Wings-Of-Disaster/VaLiK)
* MLLM
* [Token Activation Map to Visually Explain Multimodal LLMs](http://arxiv.org/abs/2506.23270)
:star:[code](https://github.com/xmed-lab/TAM)
* [DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs](https://arxiv.org/pdf/2507.10302v1)
:star:[code](https://github.com/ZJHTerry18/DisCo)
* [UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding](https://arxiv.org/pdf/2506.23219v1)
:star:[code](https://github.com/tsinghua-fib-lab/UrbanLLaVA)
* [Kestrel 3D Multimodal LLM for Part-Aware Grounded Description](http://arxiv.org/abs/2405.18937)
* [Are They the Same Exploring Visual Correspondence Shortcomings of Multimodal LLMs](http://arxiv.org/abs/2501.04670)
* [Analyzing Finetuning Representation Shift for Multimodal LLMs Steering](http://arxiv.org/abs/2501.03012)
* [Visual Chronicles Using Multimodal LLMs to Analyze Massive Collections of Images](http://arxiv.org/abs/2504.08727)
* [Controlling Multimodal LLMs via Reward-guided Decoding](https://openaccess.thecvf.com/content/ICCV2025/papers/Manas_Controlling_Multimodal_LLMs_via_Reward-guided_Decoding_ICCV_2025_paper.pdf)
* [TWIST SCOUT Grounding Multimodal LLM-Experts by Forget-Free Tuning](http://arxiv.org/abs/2410.10491)
* [FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging](https://arxiv.org/pdf/2508.04625v1)
* [Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation](https://arxiv.org/pdf/2507.02859v1)
* [BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models](https://arxiv.org/pdf/2508.06895v1)
* [Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning](https://arxiv.org/pdf/2507.07424v1)
:star:[code](https://mm-vl.github.io/corvid)
* [Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs](http://arxiv.org/abs/2503.20309)
* [CompCap Improving Multimodal Large Language Models with Composite Captions](http://arxiv.org/abs/2412.05243)
* [AVAM a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_AVAM_a_Universal_Training-free_Adaptive_Visual_Anchoring_Embedded_into_Multimodal_ICCV_2025_paper.pdf)
* [How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning Placing Them in An Extensible Escape Game](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_How_Do_Multimodal_Large_Language_Models_Handle_Complex_Multimodal_Reasoning_ICCV_2025_paper.pdf)
* [LLaVA-KD A Framework of Distilling Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Cai_LLaVA-KD_A_Framework_of_Distilling_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)
* [LIRA Reasoning Reconstruction via Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_LIRA_Reasoning_Reconstruction_via_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)
:star:[code](https://github.com/zhen6618/LIRA)
* [MissRAG Addressing the Missing Modality Challenge in Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Pipoli_MissRAG_Addressing_the_Missing_Modality_Challenge_in_Multimodal_Large_Language_ICCV_2025_paper.pdf)
:star:[code](https://github.com/aimagelab/MissRAG)
* [Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models](http://arxiv.org/abs/2411.12790)
:star:[code](https://github.com/zeng-zhen/FGVEdit)
* [Benchmarking Multimodal Large Language Models Against Image Corruptions](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Benchmarking_Multimodal_Large_Language_Models_Against_Image_Corruptions_ICCV_2025_paper.pdf)
* [SHIFT Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_SHIFT_Smoothing_Hallucinations_by_Information_Flow_Tuning_for_Multimodal_Large_ICCV_2025_paper.pdf)
* [Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency](http://arxiv.org/abs/2501.04931)
* [VisNumBench Evaluating Number Sense of Multimodal Large Language Models](http://arxiv.org/abs/2503.14939)
* [ShortV Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers](http://arxiv.org/abs/2504.00502)
:star:[code](https://github.com/icip-cas/ShortV)
* [Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models](http://arxiv.org/abs/2412.05934)
:star:[code](https://github.com/MaTengSYSU/HIMRD-jailbreak)
* [Learning to Inference Adaptively for Multimodal Large Language Models](http://arxiv.org/abs/2503.10905)
* [FALCON Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers](http://arxiv.org/abs/2501.16297)
* [R1-VL Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_R1-VL_Learning_to_Reason_with_Multimodal_Large_Language_Models_via_ICCV_2025_paper.pdf)
* [Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Slyman_Calibrating_MLLM-as-a-judge_via_Multimodal_Bayesian_Prompt_Ensembles_ICCV_2025_paper.pdf)
* [Boosting MLLM Reasoning with Text-Debiased Hint-GRPO](http://arxiv.org/abs/2503.23905)
:star:[code](https://github.com/hqhQAQ/Hint-GRPO)
* [Information Density Principle for MLLM Benchmarks](http://arxiv.org/abs/2503.10079)
* [Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Auto-Controlled_Image_Perception_in_MLLMs_via_Visual_Perception_Tokens_ICCV_2025_paper.pdf)
* [VSP Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_VSP_Diagnosing_the_Dual_Challenges_of_Perception_and_Reasoning_in_ICCV_2025_paper.pdf)
* [MM-Spatial Exploring 3D Spatial Understanding in Multimodal LLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Daxberger_MM-Spatial_Exploring_3D_Spatial_Understanding_in_Multimodal_LLMs_ICCV_2025_paper.pdf)
* [Spatial Preference Rewarding for MLLMs Spatial Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Spatial_Preference_Rewarding_for_MLLMs_Spatial_Understanding_ICCV_2025_paper.pdf)
:star:[code](https://github.com/hanqiu-hq/SPR)
* [SparseMM Head Sparsity Emerges from Visual Concept Responses in MLLMs](http://arxiv.org/abs/2506.05344)
:star:[code](https://github.com/CR400AF-A/SparseMM)
* [OrderChain Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM](http://arxiv.org/abs/2504.04801)
:house:[project](https://order-chain.github.io/)
* [STI-Bench Are MLLMs Ready for Precise Spatial-Temporal World Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_STI-Bench_Are_MLLMs_Ready_for_Precise_Spatial-Temporal_World_Understanding_ICCV_2025_paper.pdf)
* [ChartPoint Guiding MLLMs with Grounding Reflection for Chart Reasoning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_ChartPoint_Guiding_MLLMs_with_Grounding_Reflection_for_Chart_Reasoning_ICCV_2025_paper.pdf)
* [Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning](http://arxiv.org/abs/2507.17539)
:star:[code](https://github.com/MeteorElf/FundusExpert)
* [p-MoD Building Mixture-of-Depths MLLMs via Progressive Ratio Decay](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_p-MoD_Building_Mixture-of-Depths_MLLMs_via_Progressive_Ratio_Decay_ICCV_2025_paper.pdf)
* [LLaVA-SP Enhancing Visual Representation with Visual Spatial Tokens for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Lou_LLaVA-SP_Enhancing_Visual_Representation_with_Visual_Spatial_Tokens_for_MLLMs_ICCV_2025_paper.pdf)
:star:[code](https://github.com/CnFaker/LLaVA-SP)
* [Enhancing Numerical Prediction of MLLMs with Soft Labeling](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Enhancing_Numerical_Prediction_of_MLLMs_with_Soft_Labeling_ICCV_2025_paper.pdf)
* [Creation-MMBench Assessing Context-Aware Creative Intelligence in MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_Creation-MMBench_Assessing_Context-Aware_Creative_Intelligence_in_MLLMs_ICCV_2025_paper.pdf)
* Visual Grounding
* [PropVG End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination](http://arxiv.org/abs/2509.04833)
* [Move to Understand a 3D Scene Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation](http://arxiv.org/abs/2507.04047)
* [MC-Bench A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_MC-Bench_A_Benchmark_for_Multi-Context_Visual_Grounding_in_the_Era_ICCV_2025_paper.pdf)
:house:[project](https://xuyunqiu.github.io/MC-Bench)
* [AerialVG A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations](http://arxiv.org/abs/2504.07836)
:star:[code](https://github.com/Ideal-ljl/AerialVG)
* [NAVER A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning](http://arxiv.org/abs/2502.00372)
:star:[code](https://github.com/ControlNet/NAVER)
* [VGMamba Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_VGMamba_Attribute-to-Location_Clue_Reasoning_for_Quantity-Agnostic_3D_Visual_Grounding_ICCV_2025_paper.pdf)
* [Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Ouyang_Region-aware_Anchoring_Mechanism_for_Efficient_Referring_Visual_Grounding_ICCV_2025_paper.pdf)
* REC
* [Referring Expression Comprehension for Small Objects](http://arxiv.org/abs/2510.03701)
* [Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Leveraging_Debiased_Cross-modal_Attention_Maps_and_Code-based_Reasoning_for_Zero-shot_ICCV_2025_paper.pdf)
## 42.Vision Transformer
* [Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features](http://arxiv.org/pdf/2506.21046v1)
:star:[code](https://github.com/spencerwooo/dSVA)
* [Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy](https://arxiv.org/pdf/2507.13260v1)
* [EA-ViT: Efficient Adaptation for Elastic Vision Transformer](https://arxiv.org/pdf/2507.19360v1)
:star:[code](https://github.com/zcxcf/EA-ViT)
* [MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective](https://arxiv.org/pdf/2507.19131v1)
* [OminiControl Minimal and Universal Control for Diffusion Transformer](http://arxiv.org/abs/2411.15098)
* [Pinco Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting](http://arxiv.org/abs/2412.03812)
* [SAFER Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers](http://arxiv.org/abs/2501.01529)
* [OmniCache A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chu_OmniCache_A_Trajectory-Oriented_Global_Perspective_on_Training-Free_Cache_Reuse_for_ICCV_2025_paper.pdf)
* [Sparse Fine-Tuning of Transformers for Generative Tasks](http://arxiv.org/abs/2507.10855)
* [MaTe Images Are All You Need for Material Transfer via Diffusion Transformer](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_MaTe_Images_Are_All_You_Need_for_Material_Transfer_via_ICCV_2025_paper.pdf)
* [Hybrid Layout Control for Diffusion Transformer Fewer Annotations Superior Aesthetics](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hybrid_Layout_Control_for_Diffusion_Transformer_Fewer_Annotations_Superior_Aesthetics_ICCV_2025_paper.pdf)
* [UniCombine Unified Multi-Conditional Combination with Diffusion Transformer](http://arxiv.org/abs/2503.09277)
* [EasyControl Adding Efficient and Flexible Control for Diffusion Transformer](http://arxiv.org/abs/2503.07027)
* [Accelerating Diffusion Transformer via Gradient-Optimized Cache](http://arxiv.org/abs/2503.05156)
:star:[code](https://github.com/qiujx0520/GOC_ICCV2025.git)
* [LeGrad An Explainability Method for Vision Transformers via Feature Formation Sensitivity](http://arxiv.org/abs/2404.03214)
* [An Efficient Hybrid Vision Transformer for TinyML Applications](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_An_Efficient_Hybrid_Vision_Transformer_for_TinyML_Applications_ICCV_2025_paper.pdf)
:star:[code](https://github.com/yuffeenn/TinyNeXt)
* [MixA A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahmed_MixA_A_Mixed_Attention_approach_with_Stable_Lightweight_Linear_Attention_ICCV_2025_paper.pdf)
## 41.Neural Architecture Search(神经架构搜索)
* [Neural Architecture Search Driven by Locally Guided Diffusion for Personalized Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Liao_Neural_Architecture_Search_Driven_by_Locally_Guided_Diffusion_for_Personalized_ICCV_2025_paper.pdf)
* [Loss Functions for Predictor-based Neural Architecture Search](http://arxiv.org/abs/2506.05869)
* [TRNAS A Training-Free Robust Neural Architecture Search](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_TRNAS_A_Training-Free_Robust_Neural_Architecture_Search_ICCV_2025_paper.pdf)
## 40.Deep learning(深度学习)
* 胶囊网络
* [EquiCaps Predictor-Free Pose-Aware Pre-Trained Capsule Networks](http://arxiv.org/abs/2506.09895)
:star:[code](http://github.com/AberdeenML/EquiCaps) :star:[code2](https://github.com/AberdeenML/EquiCaps)
* RNN
* [ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers](http://arxiv.org/pdf/2506.21537v1)
## 39.Machine learning(机器学习)
* 机器遗忘
* [MUNBa Machine Unlearning via Nash Bargaining](http://arxiv.org/abs/2411.15537)
* [Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels](http://arxiv.org/abs/2503.13917)
* [Learning to Unlearn while Retaining Combating Gradient Conflicts in Machine Unlearning](http://arxiv.org/abs/2503.06339)
* [Reminiscence Attack on Residuals Exploiting Approximate Machine Unlearning for Privacy](http://arxiv.org/abs/2507.20573)
* 主动学习
* [To Label or Not to Label: PALM -- A Predictive Model for Evaluating Sample Efficiency in Active Learning Models](https://arxiv.org/pdf/2507.15381v1)
:star:[code](https://github.com/juliamachnio/PALM)
* [Consensus-Driven Active Model Selection](https://arxiv.org/pdf/2507.23771v1)
:star:[code](https://github.com/justinkay/coda)
* 对比学习
* [Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision](http://arxiv.org/pdf/2506.20850v1)
* [Selective Contrastive Learning for Weakly Supervised Affordance Grounding](https://arxiv.org/pdf/2508.07877v1)
* [Fix-CLIP Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Fix-CLIP_Dual-Branch_Hierarchical_Contrastive_Learning_via_Synthetic_Captions_for_Better_ICCV_2025_paper.pdf)
:star:[code](https://github.com/bcwang-sjtu/Fix-CLIP)
* [Robust Dataset Condensation using Supervised Contrastive Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Robust_Dataset_Condensation_using_Supervised_Contrastive_Learning_ICCV_2025_paper.pdf)
:star:[code](https://github.com/DISL-Lab/RDC-ICCV2025)
* [Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning](http://arxiv.org/abs/2507.12998)
:star:[code](https://github.com/MediaBrain-SJTU/DISSect)
* [Backdooring Self-Supervised Contrastive Learning by Noisy Alignment](http://arxiv.org/abs/2508.14015)
:star:[code](https://github.com/jsrdcht/Noisy-Alignment)
* [Salvaging the Overlooked Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection](http://arxiv.org/abs/2412.04769)
* [AMD Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction](http://arxiv.org/abs/2507.01801)
* 强化学习
* [RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment](http://arxiv.org/pdf/2506.21037v1)
* [Reinforcement Learning-Guided Data Selection via Redundancy Assessment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Reinforcement_Learning-Guided_Data_Selection_via_Redundancy_Assessment_ICCV_2025_paper.pdf)
* [RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction](https://arxiv.org/pdf/2507.04839v1)
:star:[code](https://github.com/fraunhoferhhi/RIPE)
* [DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding](https://arxiv.org/pdf/2508.08589v1)
:star:[code](https://github.com/wenwenyu/DocThinker)
* [DeepMesh Auto-Regressive Artist-mesh Creation with Reinforcement Learning](http://arxiv.org/abs/2503.15265)
* [ULTHO Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning](http://arxiv.org/abs/2503.06101)
* [Disentangled World Models Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning](http://arxiv.org/abs/2503.08751)
* [One Encoder to Rule them All Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators](https://openaccess.thecvf.com/content/ICCV2025/papers/Dutta_One_Encoder_to_Rule_them_All_Representation_Learning_for_Model-free_ICCV_2025_paper.pdf)
* [Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_Diffusion_Guided_Adaptive_Augmentation_for_Generalization_in_Visual_Reinforcement_Learning_ICCV_2025_paper.pdf)
* [GenFlowRL Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning](http://arxiv.org/abs/2508.11049)
:house:[project](https://colinyu1.github.io/genflowrl) :house:[project](https://colinyu1.github.io/genflowrl/)
* 持续学习
* [CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization](http://arxiv.org/pdf/2506.21117v1)
:star:[code](https://cl-splats.github.io)
* [PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning](https://arxiv.org/pdf/2507.12305v1)
:star:[code](https://github.com/anwarmaxsum/PROL)
* [Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning](https://arxiv.org/pdf/2507.09118v1)
:star:[code](https://github.com/linlany/MindtheGap)
* [RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning](https://arxiv.org/pdf/2507.22553v1)
* [Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models](https://arxiv.org/pdf/2508.00260v1)
* [Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning](https://arxiv.org/pdf/2508.05316v1)
:star:[code](https://github.com/NJUyued/USP4SSCL)
* [Any-SSR How Recursive Least Squares Works in Continual Learning of Large Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Tong_Any-SSR_How_Recursive_Least_Squares_Works_in_Continual_Learning_of_ICCV_2025_paper.pdf)
:star:[code](https://github.com/ZHUANGHP/Any-SSR)
* [Joint Diffusion Models in Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Skiers_Joint_Diffusion_Models_in_Continual_Learning_ICCV_2025_paper.pdf)
* [PLAN Proactive Low-Rank Allocation for Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_PLAN_Proactive_Low-Rank_Allocation_for_Continual_Learning_ICCV_2025_paper.pdf)
* [Divide-and-Conquer for Enhancing Unlabeled Learning Stability and Plasticity in Semi-supervised Continual Learning](http://arxiv.org/abs/2508.05316)
:star:[code](https://github.com/NJUyued/USP4SSCL)
* [CODE-CL Conceptor-Based Gradient Projection for Deep Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Apolinario_CODE-CL_Conceptor-Based_Gradient_Projection_for_Deep_Continual_Learning_ICCV_2025_paper.pdf)
* [FedAGC Federated Continual Learning with Asymmetric Gradient Correction](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_FedAGC_Federated_Continual_Learning_with_Asymmetric_Gradient_Correction_ICCV_2025_paper.pdf)
* 对抗学习
* [TITAN Query-Token based Domain Adaptive Adversarial Learning](http://arxiv.org/abs/2506.21484)
:star:[code](https://github.com/Tajamul21/TITAN)
* [ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models](https://arxiv.org/pdf/2507.21985v1)
* [Pretend Benign A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_Pretend_Benign_A_Stealthy_Adversarial_Attack_by_Exploiting_Vulnerabilities_in_ICCV_2025_paper.pdf)
* [KOEnsAttack Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_KOEnsAttack_Towards_Efficient_Data-Free_Black-Box_Adversarial_Attacks_via_Knowledge-Orthogonalized_Substitute_ICCV_2025_paper.pdf)
* [SMP-Attack Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_SMP-Attack_Boosting_the_Transferability_of_Feature_Importance-based_Adversarial_Attack_with_ICCV_2025_paper.pdf)
:star:[code](https://github.com/AdvML-Group/SMP-Attack)
* [DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion](https://arxiv.org/pdf/2507.22813v1)
:star:[code](https://github.com/AdaptiveMotorControlLab/DISTIL)
* [Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights](https://arxiv.org/pdf/2508.00649v1)
:star:[code](https://github.com/Gandolfczjh/APDE)
* [Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance](http://arxiv.org/abs/2508.15650)
:star:[code](https://github.com/AIASLab/CFG-ICCV2025)
* [Boosting Adversarial Transferability via Residual Perturbation Attack](https://arxiv.org/pdf/2508.05689v1)
:star:[code](https://github.com/ZezeTao/ResPA)
* [Confound from All Sides Distill with Resilience Multi-Objective Adversarial Paths to Zero-Shot Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Confound_from_All_Sides_Distill_with_Resilience_Multi-Objective_Adversarial_Paths_ICCV_2025_paper.pdf)
* [Adversarial Training for Probabilistic Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adversarial_Training_for_Probabilistic_Robustness_ICCV_2025_paper.pdf)
* [Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_Mitigating_Catastrophic_Overfitting_in_Fast_Adversarial_Training_via_Label_Information_ICCV_2025_paper.pdf)
:star:[code](https://github.com/fzjcdt/LIET)
* [Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment](http://arxiv.org/abs/2408.06079)
:star:[code](https://github.com/KejiaZhang-Robust/DHAT)
* [Adversarial Exploitation of Data Diversity Improves Visual Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Adversarial_Exploitation_of_Data_Diversity_Improves_Visual_Localization_ICCV_2025_paper.pdf)
* [FedPall Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift](http://arxiv.org/abs/2507.04781)
* [Adversarial Robust Memory-Based Continual Learner](http://arxiv.org/abs/2311.17608)
* [ViT-EnsembleAttack Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_ViT-EnsembleAttack_Augmenting_Ensemble_Models_for_Stronger_Adversarial_Transferability_in_Vision_ICCV_2025_paper.pdf)
:star:[code](https://github.com/Trustworthy-AI-Group/TransferAttack)
* [CIARD Cyclic Iterative Adversarial Robustness Distillation](http://arxiv.org/abs/2509.12633)
:star:[code](https://github.com/CIARD2025/CIARD)
* [Failure Cases Are Better Learned But Boundary Says Sorry Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training](http://arxiv.org/abs/2508.02186)
:star:[code](https://github.com/FlaAI/RPAT)
* [Backdoor Mitigation by Distance-Driven Detoxification](http://arxiv.org/abs/2411.09585)
* [Mind the Cost of Scaffold Benign Clients May Even Become Accomplices of Backdoor Attack](http://arxiv.org/abs/2411.16167)
* [Prototype Guided Backdoor Defense via Activation Space Manipulation](https://openaccess.thecvf.com/content/ICCV2025/papers/Amula_Prototype_Guided_Backdoor_Defense_via_Activation_Space_Manipulation_ICCV_2025_paper.pdf)
* [Leveraging Spatial Invariance to Boost Adversarial Transferability](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_Leveraging_Spatial_Invariance_to_Boost_Adversarial_Transferability_ICCV_2025_paper.pdf)
:star:[code](https://github.com/TheMoss7/SID)
* [SPD Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_SPD_Shallow_Backdoor_Protecting_Deep_Backdoor_Against_Backdoor_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/YuanShunJie1/SPD)
* [Backdoor Defense via Enhanced Splitting and Trap Isolation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Backdoor_Defense_via_Enhanced_Splitting_and_Trap_Isolation_ICCV_2025_paper.pdf)
* [Backdoor Attacks on Neural Networks via One-Bit Flip](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Backdoor_Attacks_on_Neural_Networks_via_One-Bit_Flip_ICCV_2025_paper.pdf)
* [Seal Your Backdoor with Variational Defense](https://openaccess.thecvf.com/content/ICCV2025/papers/Sabolic_Seal_Your_Backdoor_with_Variational_Defense_ICCV_2025_paper.pdf)
* [Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling](https://openaccess.thecvf.com/content/ICCV2025/papers/Niu_Enhancing_Adversarial_Transferability_by_Balancing_Exploration_and_Exploitation_with_Gradient-Guided_ICCV_2025_paper.pdf)
:star:[code](https://github.com/anuin-cat/GGS)
* [Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Enhancing_Transferability_of_Targeted_Adversarial_Examples_via_Inverse_Target_Gradient_ICCV_2025_paper.pdf)
* [Boosting Adversarial Transferability via Negative Hessian Trace Regularization](https://openaccess.thecvf.com/content/ICCV2025/papers/Long_Boosting_Adversarial_Transferability_via_Negative_Hessian_Trace_Regularization_ICCV_2025_paper.pdf)
* [Unified Adversarial Augmentation for Improving Palmprint Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_Unified_Adversarial_Augmentation_for_Improving_Palmprint_Recognition_ICCV_2025_paper.pdf)
* [DIA The Adversarial Exposure of Deterministic Inversion in Diffusion Models](http://arxiv.org/abs/2510.00778)
* [Generative Adversarial Diffusion](https://openaccess.thecvf.com/content/ICCV2025/papers/Jun_Generative_Adversarial_Diffusion_ICCV_2025_paper.pdf)
* [ODDR Outlier Detection Dimension Reduction Based Defense Against Adversarial Patches](http://arxiv.org/abs/2311.12084)
* [Scaling and Taming Adversarial Training with Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Scaling_and_Taming_Adversarial_Training_with_Synthetic_Data_ICCV_2025_paper.pdf)
* 多模态学习
* [G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation](http://arxiv.org/pdf/2506.21514v1)
:star:[code](https://github.com/rAIson-Lab/G2D)
* [Improving Multimodal Learning via Imbalanced Learning](https://arxiv.org/pdf/2507.10203v1)
:star:[code](https://github.com/shicaiwei123/ICCV2025-ARL)
* [SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality](https://arxiv.org/pdf/2507.19264v1)
* [Unbiased Missing-modality Multimodal Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Dai_Unbiased_Missing-modality_Multimodal_Learning_ICCV_2025_paper.pdf)
:house:[project](https://crystal-punk.github.io/)
* [Boosting Multimodal Learning via Disentangled Gradient Learning](https://arxiv.org/pdf/2507.10213v1)
:star:[code](https://github.com/shicaiwei123/ICCV2025-GDL)
* [OpenVision A Fully-Open Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning](http://arxiv.org/abs/2505.04601)
* 多任务学习
* [Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning](https://arxiv.org/pdf/2507.07485v1)
* [Beyond Losses Reweighting Empowering Multi-Task Learning via the Generalization Perspective](http://arxiv.org/abs/2211.13723)
* [Resolving Token-Space Gradient Conflicts Token Space Manipulation for Transformer-Based Multi-Task Learning](http://arxiv.org/abs/2507.07485)
* [Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning](https://arxiv.org/pdf/2507.21049v1)
:star:[code](https://jacky1128.github.io/RepMTL/)
* [TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction](https://arxiv.org/pdf/2508.04682v1)
* [ModalTune Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology](http://arxiv.org/abs/2503.17564)
* [Active Membership Inference Test (aMINT) Enhancing Model Auditability with Multi-Task Learning](http://arxiv.org/abs/2509.07879)
:star:[code](https://github.com/DanieldeAlcala/Membership-Inference-Test.git)
* 类增量学习
* [Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning](https://arxiv.org/pdf/2507.09183v1)
:star:[code](https://github.com/Jywsuperman/LGSP)
* [Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning](https://arxiv.org/pdf/2508.08165v1)
:star:[code](https://github.com/LAMDA-CL/ICCV2025-TUNA)
* [Achieving More with Less Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning](http://arxiv.org/abs/2503.07979)
* [Lark Low-Rank Updates After Knowledge Localization for Few-shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Shi_Lark_Low-Rank_Updates_After_Knowledge_Localization_for_Few-shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)
* [A Tiny Change A Giant Leap Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Lai_A_Tiny_Change_A_Giant_Leap_Long-Tailed_Class-Incremental_Learning_via_ICCV_2025_paper.pdf)
:star:[code](https://github.com/laixinyi023/Geometric-Prototype-Alignment)
* [Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Ke_Task-Aware_Prompt_Gradient_Projection_for_Parameter-Efficient_Tuning_Federated_Class-Incremental_Learning_ICCV_2025_paper.pdf)
* [External Knowledge Injection for CLIP-Based Class-Incremental Learning](http://arxiv.org/abs/2503.08510)
:star:[code](https://github.com/LAMDA-CL/ICCV25-ENGINE)
* [ESSENTIAL Episodic and Semantic Memory Integration for Video Class-Incremental Learning](http://arxiv.org/abs/2508.10896)
* [Flexi-FSCIL Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Flexi-FSCIL_Adaptive_Knowledge_Retention_for_Breaking_the_Stability-Plasticity_Dilemma_in_ICCV_2025_paper.pdf)
* [Seeing 3D Through 2D Lenses 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification](http://arxiv.org/abs/2509.14958)
* [Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xue_Feature_Decomposition-Recomposition_in_Large_Vision-Language_Model_for_Few-Shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)
* 增量学习
* [Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning](https://arxiv.org/pdf/2507.21588v1)
:star:[code](https://github.com/ENJOY-Yin-jiong/PHP)
* 联邦学习
* [Federated Representation Angle Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Yi_Federated_Representation_Angle_Learning_ICCV_2025_paper.pdf)
* [Client2Vec Improving Federated Learning by Distribution Shifts Aware Client Indexing](http://arxiv.org/abs/2405.16233)
:star:[code](https://github.com/LINs-lab/client2vec)
* [Geminio Language-Guided Gradient Inversion Attacks in Federated Learning](http://arxiv.org/abs/2411.14937)
* [Sibai A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gotz_Sibai_A_Few-Shot_Meta-Classifier_for_Poisoning_Detection_in_Federated_Learning_ICCV_2025_paper.pdf)
* [You Are Your Own Best Teacher Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data](http://arxiv.org/abs/2503.06916)
* [Personalized Federated Learning under Local Supervision](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Personalized_Federated_Learning_under_Local_Supervision_ICCV_2025_paper.pdf)
:star:[code](https://github.com/jqLi1626/FedSimSup)
* [FedWSQ Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization](http://arxiv.org/abs/2506.23516)
* [FedXDS Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Hoefler_FedXDS_Leveraging_Model_Attribution_Methods_to_counteract_Data_Heterogeneity_in_ICCV_2025_paper.pdf)
:star:[code](https://github.com/MaxH1996/FedXDS)
* [FLSeg Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Su_FLSeg_Enhancing_Privacy_and_Robustness_in_Federated_Learning_under_Heterogeneous_ICCV_2025_paper.pdf)
* [Find a Scapegoat Poisoning Membership Inference Attack and Defense to Federated Learning](http://arxiv.org/abs/2507.00423)
* [Forgetting Through Transforming Enabling Federated Unlearning via Class-Aware Representation Transformation](http://arxiv.org/abs/2410.06848)
:star:[code](https://github.com/zhentian777/FUCRT)
* [Latte Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning](http://arxiv.org/abs/2507.21494)
:star:[code](https://github.com/baowenxuan/Latte)
* 联邦遗忘学习
* [Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Stealthy_Backdoor_Attack_in_Federated_Learning_via_Adaptive_Layer-wise_Gradient_ICCV_2025_paper.pdf)
:star:[code](https://github.com/yqqhyqq/LGA)
* 元学习
* [FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields](https://arxiv.org/pdf/2508.06301v1)
* [Meta-Unlearning on Diffusion Models Preventing Relearning Unlearned Concepts](http://arxiv.org/abs/2410.12777)
* Out-of-Distribution Detection(分布外检测)
* [Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention](https://arxiv.org/pdf/2507.01417v1)
* [NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection](https://arxiv.org/pdf/2507.09795v1)
:star:[code](https://github.com/ah-ansari/NegRefine)
* [FEVER-OOD Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Isaac-Medina_FEVER-OOD_Free_Energy_Vulnerability_Elimination_for_Robust_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)
* [Beyond Pixel Uncertainty Bounding the OoD Objects in Road Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_Beyond_Pixel_Uncertainty_Bounding_the_OoD_Objects_in_Road_Scenes_ICCV_2025_paper.pdf)
:star:[code](https://github.com/huachao0124/DetSeg-official)
* [ODP-Bench Benchmarking Out-of-Distribution Performance Prediction](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_ODP-Bench_Benchmarking_Out-of-Distribution_Performance_Prediction_ICCV_2025_paper.pdf)
* [A Unified Interpretation of Training-Time Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Cheng_A_Unified_Interpretation_of_Training-Time_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)
* [Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection](http://arxiv.org/abs/2507.10225)
:star:[code](https://github.com/Jarvisgivemeasuit/SynOOD)
* [Activation Subspaces for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zongur_Activation_Subspaces_for_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)
* [Diagnosing Pretrained Models for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Xiong_Diagnosing_Pretrained_Models_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)
* [Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection](http://arxiv.org/abs/2510.10584)
* [DisCoPatch Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Caetano_DisCoPatch_Taming_Adversarially-driven_Batch_Statistics_for_Improved_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)
* [Secure On-Device Video OOD Detection Without Backpropagation](http://arxiv.org/abs/2503.06166)
:star:[code](https://github.com/Dystopians/SecDOOD)
* [FA Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection](http://arxiv.org/abs/2507.04511)
:star:[code](https://github.com/0xFAFA/FA)
* [Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adaptive_Prompt_Learning_via_Gaussian_Outlier_Synthesis_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)
* [Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Miao_Auxiliary_Prompt_Tuning_of_Vision-Language_Models_for_Few-Shot_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)
* 异常检测
* [Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts](https://arxiv.org/pdf/2507.16946v1)
:house:[project](https://doi.org/10.5281/zenodo.16283852)
* [DecAD Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_DecAD_Decoupling_Anomalies_in_Latent_Space_for_Multi-Class_Unsupervised_Anomaly_ICCV_2025_paper.pdf)
* [Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning](http://arxiv.org/abs/2508.02293)
* [Wave-MambaAD Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Wave-MambaAD_Wavelet-driven_State_Space_Model_for_Multi-class_Unsupervised_Anomaly_Detection_ICCV_2025_paper.pdf)
* [Debiasing Trace Guidance Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Debiasing_Trace_Guidance_Top-down_Trace_Distillation_and_Bottom-up_Velocity_Alignment_ICCV_2025_paper.pdf)
* [MultiADS Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning](http://arxiv.org/abs/2504.06740)
* [Triad Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Triad_Empowering_LMM-based_Anomaly_Detection_with_Expert-guided_Region-of-Interest_Tokenizer_and_ICCV_2025_paper.pdf)
* [SALAD -- Semantics-Aware Logical Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Fucka_SALAD_--_Semantics-Aware_Logical_Anomaly_Detection_ICCV_2025_paper.pdf)
:star:[code](https://github.com/MaticFuc/SALAD)
* [SiM3D Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark](http://arxiv.org/abs/2506.21549)
* [Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection](http://arxiv.org/abs/2410.10289)
* 表征学习
* [Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM) A Task-Adaptive Representation Learning Framework](https://openaccess.thecvf.com/content/ICCV2025/papers/Sharma_Multi-Modal_Multi-Task_Unified_Embedding_Model_M3T-UEM_A_Task-Adaptive_Representation_Learning_ICCV_2025_paper.pdf)
* [LayerLock Non-collapsing Representation Learning with Progressive Freezing](http://arxiv.org/abs/2509.10156)
* [CARL Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor](http://arxiv.org/abs/2506.04001)
* [Pretrained Reversible Generation as Unsupervised Visual Representation Learning](http://arxiv.org/abs/2412.01787)
:house:[project](https://opendilab.github.io/PRG)
* [Region-based Cluster Discrimination for Visual Representation Learning](https://arxiv.org/pdf/2507.20025v1)
:star:[code](https://github.com/deepglint/MVT)
* [Gradient Extrapolation for Debiased Representation Learning](http://arxiv.org/abs/2503.13236)
:house:[project](https://gerne-debias.github.io/)
* [Scaling Language-Free Visual Representation Learning](http://arxiv.org/abs/2504.01017)
:star:[code](https://github.com/facebookresearch/webssl)
* [Q-Norm Robust Representation Learning via Quality-Adaptive Normalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Q-Norm_Robust_Representation_Learning_via_Quality-Adaptive_Normalization_ICCV_2025_paper.pdf)
:star:[code](https://github.com/IIP-Lab-XDU/Q-Norm)
* [Scaling Omni-modal Pretraining with Multimodal Context Advancing Universal Representation Learning Across Modalities](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Scaling_Omni-modal_Pretraining_with_Multimodal_Context_Advancing_Universal_Representation_Learning_ICCV_2025_paper.pdf)
* 提示学习
* [Advancing Textual Prompt Learning with Anchored Attributes](http://arxiv.org/abs/2412.09442)
:star:[code](https://github.com/zhengli97/ATPrompt)
## 38.Few/Zero-Shot Learning/DG/Adaptation(小/零样本/域泛化/适应)
* 零样本
* [Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model](https://arxiv.org/pdf/2506.23822v1)
:star:[code](https://github.com/shiming-chen/LaZSL)
* [OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference](https://arxiv.org/pdf/2507.02929v1)
* [Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Language-Driven_Multi-Label_Zero-Shot_Learning_with_Semantic_Granularity_ICCV_2025_paper.pdf)
* [A Conditional Probability Framework for Compositional Zero-shot Learning](http://arxiv.org/abs/2507.17377)
* [SVIP Semantically Contextualized Visual Patches for Zero-Shot Learning](http://arxiv.org/abs/2503.10252)
:star:[code](https://github.com/uqzhichen/SVIP)
* [Learning Visual Proxy for Compositional Zero-Shot Learning](http://arxiv.org/abs/2501.13859)
* [Verbalized Representation Learning for Interpretable Few-Shot Generalization](http://arxiv.org/abs/2411.18651)
* [Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hierarchical_Variational_Test-Time_Prompt_Generation_for_Zero-Shot_Generalization_ICCV_2025_paper.pdf)
* 小样本