{"id":29627624,"url":"https://github.com/52cv/iccv-2025-papers","last_synced_at":"2026-02-15T05:32:24.230Z","repository":{"id":302006328,"uuid":"1010921391","full_name":"52CV/ICCV-2025-Papers","owner":"52CV","description":null,"archived":false,"fork":false,"pushed_at":"2025-11-07T03:14:29.000Z","size":201,"stargazers_count":24,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-07T05:28:11.531Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/52CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-30T03:30:36.000Z","updated_at":"2025-11-07T03:14:32.000Z","dependencies_parsed_at":"2025-06-30T04:36:02.058Z","dependency_job_id":"64df4b60-9bf1-4748-9077-138eb8b1f2e6","html_url":"https://github.com/52CV/ICCV-2025-Papers","commit_stats":null,"previous_names":["52cv/iccv-2025-papers"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/52CV/ICCV-2025-Papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2025-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2025-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2025-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2025-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/52CV","download_url":"https://codeload.github.com/52CV/ICCV-2025-Papers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FICCV-2025-Papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29470615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T05:26:30.465Z","status":"ssl_error","status_checked_at":"2026-02-15T05:26:21.858Z","response_time":118,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-21T08:05:23.545Z","updated_at":"2026-02-15T05:32:24.211Z","avatar_url":"https://github.com/52CV.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ICCV-2025-Papers\n![image](https://github.com/user-attachments/assets/0b93ce8a-4383-46ba-9672-6c746728c9f9)\n\n## 会议时间：2025年10月19日至23日\n## 会议网址：https://iccv.thecvf.com/\n\n\n\n## 查看2025年综述文献点这里↘️[2025-CV-Surveys](https://github.com/52CV/CV-Surveys)\n\n## 2025 年论文分类汇总戳这里\n↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)\n↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)\n↘️[ICCV-2025-Papers](https://github.com/52CV/ICCV-2025-Papers)\n\n## 2024 年论文分类汇总戳这里\n↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)\n↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)\n↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)\n\n\n## [2023 年论文分类汇总戳这里](#0000)\n## [2022 年论文分类汇总戳这里](#000)\n## [2021 年论文分类汇总戳这里](#00)\n## [2020 年论文分类汇总戳这里](#0)\n\n\n## 已全部分类完\n\n### 🏆最佳论文\n* [Generating Physically Stable and Buildable Brick Structures from Text](http://arxiv.org/abs/2505.05469)\u003cbr\u003e:house:[project](https://avalovelace1.github.io/BrickGPT) :house:[project](https://avalovelace1.github.io/BrickGPT/)\n* [ICCV 2025 最佳论文公布！卡内基梅隆大学提出BrickGPT:文本生成实体积木，还能保证搭得稳！](https://zhuanlan.zhihu.com/p/81635387724)\n\n## 目录\n\n|:cat:|:dog:|:tiger:|:wolf:|\n|------|------|------|------|\n|[1.Other](#1)|[2.Image Progress(图像/视频处理)](#2)|[3.Super-Resolution(超分辨率)](#3)|[4.Image Captioning(图像字幕)](#4)|\n|[5.Image Generation(图像生成)](#5)|[6.Image Segmentation(图像分割)](#6)|[7.Image Classification(图像分类)](#7)|[8.Image/Video Retrieval(图像/视频检索)](#8)|\n|[9.Image/Video Compression(图像/视频压缩)](#9)|[10.Medical Image Progress(医学图像处理)](#10)|[11.Face](#11)|[12.Avatar](#12)|\n|[13.Object Detection(目标检测) ](#13)|[14.Object Track(目标跟踪)](#14)|[15.pose](#15)|[16.Human Motion](#16)|\n|[17.Action Recognition(动作识别)](17#)|[18.Re-Id(行人重识别)](18#)|[19.Video](19#)|[20.OCR](20#)|\n|[21.UAV/RS/Satellite Image(无人机/遥感/卫星图像)](21#)|[22.3D](22#)|[23.Point Cloud(点云)](23#)|[24.Autonomous Driving(自动驾驶)](24#)|\n|[25.HOI(人机交互)](#25)|[26.Robot](#26)|[27.Visual Question Answering(视觉问答)](#27)|[28.Optical Flow Estimation(光流估计)](#28)|\n|[29.Deepfake Detection/AI生成图像检测](#29)|[30.Image Fusion(图像融合)](#30)|[31.Image Matching(图像匹配)](#31)|[32.Image Registration(图像配准)](#32)|\n|[33.Keypoint Detection(关键点检测)](#33)|[34.Object Pose Estimation(物体姿态估计)](#34)|[35.Style Transfer(风格迁移)](#35)|[36.Scene Graph Generation(场景图生成)](#36)|\n|[37.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#37)|[38.F/ZSL/DG/A(小/零样本/域泛化/适应)](#38)|[39.Machine learning(机器学习)](#39)|[40.Deep learning(深度学习)](#40)|\n|[41.NAS(神经架构搜索)](#41)|[42.Vision Transformer](#42)|[43.Vision Language(视觉语言)](#43)|[44.Neural Radiance Fields](#44)|\n|[45.Dataset](45#)|[46.Sound](46#)|[47.Animation(动画)](47#)|[48.Industrial Anomaly Detection(工业异常检测)](48#)|\n|[49.biometric recognition(生物特征识别)](49#)|[50.Protecting copyright(保护版权)](50#)|[51.Visual Relationship Detection,VRD(视觉关系检测)](51#)|[52.Gaze](52#)|\n|[53.Dense Prediction](53#)|[54.计算成像](54#)|\n\n\n\n\n\n\u003ca name=\"54\"/\u003e\n\n## 54.计算成像\n* [IM360 Large-scale Indoor Mapping with 360 Cameras](http://arxiv.org/abs/2502.12545)\n* [Multispectral Demosaicing via Dual Cameras](http://arxiv.org/abs/2503.22026)\n* [Processing and acquisition traces in visual encoders What does CLIP know about your camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Ramos_Processing_and_acquisition_traces_in_visual_encoders_What_does_CLIP_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/ryan-caesar-ramos/visual-encoder-traces)\n* [Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras](http://arxiv.org/pdf/2506.22069v1)\n* [Estimating 2D Camera Motion with Hybrid Motion Basis](https://arxiv.org/pdf/2507.22480v1)\u003cbr\u003e:star:[code](https://lhaippp.github.io/CamFlow/)\u003cbr\u003e:star:[code](https://github.com/lhaippp/camflow)\n* [Image as an IMU Estimating Camera Motion from a Single Motion-Blurred Image](http://arxiv.org/abs/2503.17358)\n* [AlignDiff Learning Physically-Grounded Camera Alignment via Diffusion](http://arxiv.org/abs/2503.21581)\n* [TrajectoryCrafter Redirecting Camera Trajectory for Monocular Videos via Diffusion Models](http://arxiv.org/abs/2503.05638)\n* [Super Resolved Imaging with Adaptive Optics](https://arxiv.org/pdf/2508.04648v1)\u003cbr\u003e:house:[project](https://www.cs.toronto.edu/~robin/aosr/)\n* [HccePose(BF) Predicting Front  Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation](http://arxiv.org/abs/2510.10177)\n* [RePoseD Efficient Relative Pose Estimation With Known Depth Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_RePoseD_Efficient_Relative_Pose_Estimation_With_Known_Depth_Information_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/kocurvik/mdrp)\n* [Scaling 3D Compositional Models for Robust Classification and Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Scaling_3D_Compositional_Models_for_Robust_Classification_and_Pose_Estimation_ICCV_2025_paper.pdf)\n* [DRaM-LHM A Quaternion Framework for Iterative Camera Pose Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_DRaM-LHM_A_Quaternion_Framework_for_Iterative_Camera_Pose_Estimation_ICCV_2025_paper.pdf)\n* [Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_Epipolar_Consistent_Attention_Aggregation_Network_for_Unsupervised_Light_Field_Disparity_ICCV_2025_paper.pdf)\n* [TESPEC Temporally-Enhanced Self-Supervised Pretraining for Event Cameras](http://arxiv.org/abs/2508.00913)\u003cbr\u003e:house:[project](https://mhdmohammadi.github.io/TESPEC_webpage)\n* [Simultaneous Motion And Noise Estimation with Event Cameras](http://arxiv.org/abs/2504.04029)\u003cbr\u003e:star:[code](https://github.com/tub-rip/ESMD) :house:[project](https://github.com/tub-rip/ESMD)\n* [EventUPS Uncalibrated Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_EventUPS_Uncalibrated_Photometric_Stereo_Using_an_Event_Camera_ICCV_2025_paper.pdf)\n* [GenDoP Auto-regressive Camera Trajectory Generation as a Director of Photography](http://arxiv.org/abs/2504.07083)\u003cbr\u003e:house:[project](https://kszpxxzmc.github.io/GenDoP)\n* [Inverse Image-Based Rendering for Light Field Generation from Single Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Jung_Inverse_Image-Based_Rendering_for_Light_Field_Generation_from_Single_Images_ICCV_2025_paper.pdf)\n* [Princeton365 A Diverse Dataset with Accurate Camera Pose](http://arxiv.org/abs/2506.09035)\n* [CF3 Compact and Fast 3D Feature Fields](http://arxiv.org/abs/2508.05254)\n* [CCMNet Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy](http://arxiv.org/abs/2504.07959)\n\n\u003ca name=\"53\"/\u003e\n\n## 53.Dense Prediction\n* [Frequency-Dynamic Attention Modulation for Dense Prediction](https://arxiv.org/pdf/2507.12006v1)\u003cbr\u003e:star:[code](https://github.com/Linwei-Chen/FDAM)\n* [FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment](https://arxiv.org/pdf/2506.22509v1)\u003cbr\u003e:star:[code](https://github.com/xuhang07/FreeDNA)\n* [ATAS Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2506.08678)\n* [Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction](http://arxiv.org/abs/2412.06244)\u003cbr\u003e:star:[code](https://github.com/HVision-NKU/DenseVLM)\n* [Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction](http://arxiv.org/abs/2508.20376)\n\n\u003ca name=\"52\"/\u003e\n\n## 52.Gaze\n* [Multi-view Gaze Target Estimation](https://arxiv.org/pdf/2508.05857v1)\u003cbr\u003e:house:[project](https://www3.cs.stonybrook.edu/~cvl/multiview_gte.html)\n* [Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction](https://arxiv.org/pdf/2507.23021v1)\u003cbr\u003e:star:[code](https://aimagelab.github.io/ScanDiff)\u003cbr\u003e:star:[code](https://github.com/aimagelab/scandiff)视觉注意力预测\n* [Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths](https://openaccess.thecvf.com/content/ICCV2025/papers/Mondal_Gaze-Language_Alignment_for_Zero-Shot_Prediction_of_Visual_Search_Targets_from_ICCV_2025_paper.pdf)\n* [What we need is explicit controllability Training 3D gaze estimator using only facial images](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_What_we_need_is_explicit_controllability_Training_3D_gaze_estimator_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/ATinyBites/ControllableGaze)\n\n\u003ca name=\"51\"/\u003e\n\n## 51.Visual Relationship Detection,VRD(视觉关系检测)\n* [ART: Adaptive Relation Tuning for Generalized Relation Prediction](https://arxiv.org/pdf/2507.23543v1)\n\n\u003ca name=\"50\"/\u003e\n\n## 50.Protecting copyright(保护版权)\n* [TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity](https://arxiv.org/pdf/2506.23484v1)\n* [Your Text Encoder Can Be An Object-Level Watermarking Controller](http://arxiv.org/abs/2503.11945)\n* [SpecGuard Spectral Projection-based Advanced Invisible Watermarking](http://arxiv.org/abs/2510.07302)\u003cbr\u003e:star:[code](https://github.com/inzamamulDU/SpecGuard_ICCV_2025)\n* [Learning Robust Image Watermarking with Lossless Cover Recovery](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Learning_Robust_Image_Watermarking_with_Lossless_Cover_Recovery_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/chenoly/CRMark)\n* [SynTag Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_SynTag_Enhancing_the_Geometric_Robustness_of_Inversion-based_Generative_Image_Watermarking_ICCV_2025_paper.pdf)\n* [PlugMark A Plug-in Zero-Watermarking Framework for Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_PlugMark_A_Plug-in_Zero-Watermarking_Framework_for_Diffusion_Models_ICCV_2025_paper.pdf)\n* [ROAR Reducing Inversion Error in Generative Image Watermarking](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_ROAR_Reducing_Inversion_Error_in_Generative_Image_Watermarking_ICCV_2025_paper.pdf)\n* [SEAL Semantic Aware Image Watermarking](http://arxiv.org/abs/2503.12172)\n* [Semantic Watermarking Reinvented Enhancing Robustness and Generation Quality with Fourier Integrity](http://arxiv.org/abs/2509.07647)\u003cbr\u003e:star:[code](https://github.com/thomas11809/SFWMark)\n* [Invisible Watermarks Visible Gains Steering Machine Unlearning with Bi-Level Watermarking Design](http://arxiv.org/abs/2508.10065)\n* [TrustMark Robust Watermarking and Watermark Removal for Arbitrary Resolution Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Bui_TrustMark_Robust_Watermarking_and_Watermark_Removal_for_Arbitrary_Resolution_Images_ICCV_2025_paper.pdf)\n* [Attention to Neural Plagiarism Diffusion Models Can Plagiarize Your Copyrighted Images](https://openaccess.thecvf.com/content/ICCV2025/papers/Zou_Attention_to_Neural_Plagiarism_Diffusion_Models_Can_Plagiarize_Your_Copyrighted_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/zzzucf/Neural-Plagiarism)\n* [From Imitation to Innovation The Emergence of AIs Unique Artistic Styles and the Challenge of Copyright Protection](https://openaccess.thecvf.com/content/ICCV2025/papers/Jia_From_Imitation_to_Innovation_The_Emergence_of_AIs_Unique_Artistic_ICCV_2025_paper.pdf)\n\n\n\n\u003ca name=\"49\"/\u003e\n\n\n## 49.biometric recognition(生物特征识别)\n* [DisenQ: Disentangling Q-Former for Activity-Biometrics](https://arxiv.org/pdf/2507.07262v1)\n* [A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition](https://arxiv.org/pdf/2508.00053v1)\n* 指纹\n  * [Training-Free Personalization via Retrieval and Reasoning on Fingerprints](http://arxiv.org/abs/2503.18623)\n  * [DiffIP Representation Fingerprints for Robust IP Protection of Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_DiffIP_Representation_Fingerprints_for_Robust_IP_Protection_of_Diffusion_Models_ICCV_2025_paper.pdf)\n  * [Riemannian-Geometric Fingerprints of Generative Models](http://arxiv.org/abs/2506.22802)\n\n\u003ca name=\"48\"/\u003e\n\n## 48.Industrial Anomaly Detection(工业异常检测)\n* [RareCLIP Rarity-aware Online Zero-shot Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/He_RareCLIP_Rarity-aware_Online_Zero-shot_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/hjf02/RareCLIP)\n* [ReMP-AD Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Ma_ReMP-AD_Retrieval-enhanced_Multi-modal_Prompt_Fusion_for_Few-Shot_Industrial_Visual_Anomaly_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/cshcma/ReMP-AD.git)\n* [G2SF Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Tao_G2SF_Geometry-Guided_Score_Fusion_for_Multimodal_Industrial_Anomaly_Detection_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/ctaoaa/G2SF)\n* [Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC Dataset Construction Methodology and Application](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Anomaly_Detection_of_Integrated_Circuits_Package_Substrates_Using_the_Large_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/Bingyang0410/CPS2D-AD)\n* [SeaS Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning](http://arxiv.org/abs/2410.14987)\u003cbr\u003e:star:[code](https://github.com/HUST-SLOW/SeaS)\n* [Kaputt A Large-Scale Dataset for Visual Defect Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Hofer_Kaputt_A_Large-Scale_Dataset_for_Visual_Defect_Detection_ICCV_2025_paper.pdf)\n* [Training-Free Industrial Defect Generation with Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_Training-Free_Industrial_Defect_Generation_with_Diffusion_Models_ICCV_2025_paper.pdf)\n* [DADet Safeguarding Image Conditional Diffusion Models against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_DADet_Safeguarding_Image_Conditional_Diffusion_Models_against_Adversarial_and_Backdoor_ICCV_2025_paper.pdf)\n* [Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation](http://arxiv.org/abs/2505.24431)\n\n\u003ca name=\"47\"/\u003e\n\n## 47.Animation(动画)\n* [LayerAnimate: Layer-level Control for Animation](http://arxiv.org/abs/2501.08295)\n* [Occlusion-robust Stylization for Drawing-based 3D Animation](https://arxiv.org/pdf/2508.00398v1)\n* [Multi-Object Sketch Animation by Scene Decomposition and Motion Planning](http://arxiv.org/abs/2503.19351)\n* [Animate Anyone 2 High-Fidelity Character Image Animation with Environment Affordance](http://arxiv.org/abs/2502.06145)\n* [LongAnimation Long Animation Generation with Dynamic Global-Local Memory](http://arxiv.org/abs/2507.01945)\n* [V2M4 4D Mesh Animation Reconstruction from a Single Monocular Video](http://arxiv.org/abs/2503.09631)\n* [OmniHuman-1 Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_OmniHuman-1_Rethinking_the_Scaling-Up_of_One-Stage_Conditioned_Human_Animation_Models_ICCV_2025_paper.pdf)\n* [Multi-identity Human Image Animation with Structural Video Diffusion](http://arxiv.org/abs/2504.04126)\u003cbr\u003e:star:[code](https://github.com/zhenzhiwang/Multi-HumanVid)\n* [Perception-as-Control Fine-grained Controllable Image Animation with 3D-aware Motion Representation](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Perception-as-Control_Fine-grained_Controllable_Image_Animation_with_3D-aware_Motion_Representation_ICCV_2025_paper.pdf)\n* [DreamActor-M1 Holistic Expressive and Robust Human Image Animation with Hybrid Guidance](https://openaccess.thecvf.com/content/ICCV2025/papers/Luo_DreamActor-M1_Holistic_Expressive_and_Robust_Human_Image_Animation_with_Hybrid_ICCV_2025_paper.pdf)\n* [Ponimator Unfolding Interactive Pose for Versatile Human-human Interaction Animation](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Ponimator_Unfolding_Interactive_Pose_for_Versatile_Human-human_Interaction_Animation_ICCV_2025_paper.pdf)\u003cbr\u003e:house:[project](https://stevenlsw.github.io/ponimator)\n\n\u003ca name=\"46\"/\u003e\n\n## 46.Sound\n* [Music Grounding by Short Video](http://arxiv.org/abs/2408.16990)\n* [VGGSounder Audio-Visual Evaluations for Foundation Models](http://arxiv.org/abs/2508.08237)\n* [AV-Flow Transforming Text to Audio-Visual Human-like Interactions](https://openaccess.thecvf.com/content/ICCV2025/papers/Chatziagapi_AV-Flow_Transforming_Text_to_Audio-Visual_Human-like_Interactions_ICCV_2025_paper.pdf)\n* [MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing](https://arxiv.org/pdf/2507.01384v1)\u003cbr\u003e:star:[code](https://github.com/WangLY136/MUG)\n* [What's Making That Sound Right Now? Video-centric Audio-Visual Localization](https://arxiv.org/pdf/2507.04667v1)\u003cbr\u003e:star:[code](https://hahyeon610.github.io/Video-centric_Audio_Visual_Localization/)\n* [Implicit Counterfactual Learning for Audio-Visual Segmentation](https://arxiv.org/pdf/2507.20740v1)\n* [Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation](https://arxiv.org/pdf/2507.22886v1)\u003cbr\u003e:house:[project](https://henghuiding.com/OmniAVS/)\n* [Zero-AVSR Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations](https://openaccess.thecvf.com/content/ICCV2025/papers/Yeo_Zero-AVSR_Zero-Shot_Audio-Visual_Speech_Recognition_with_LLMs_by_Learning_Language-Agnostic_ICCV_2025_paper.pdf)\n* [Not Only Vision Evolve Visual Speech Recognition via Peripheral Information](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_Not_Only_Vision_Evolve_Visual_Speech_Recognition_via_Peripheral_Information_ICCV_2025_paper.pdf)\n* [CogCM Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_CogCM_Cognition-Inspired_Contextual_Modeling_for_Audio-Visual_Speech_Enhancement_ICCV_2025_paper.pdf)\n* [How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_How_Do_Optical_Flow_and_Textual_Prompts_Collaborate_to_Assist_ICCV_2025_paper.pdf)\n* [TAViS Text-bridged Audio-Visual Segmentation with Foundation Models](http://arxiv.org/abs/2506.11436)\n* [AV-Link Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Haji-Ali_AV-Link_Temporally-Aligned_Diffusion_Features_for_Cross-Modal_Audio-Video_Generation_ICCV_2025_paper.pdf)\n* [AURELIA Test-time Reasoning Distillation in Audio-Visual LLMs](http://arxiv.org/abs/2503.23219)\n* [p-AVAS Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis](https://openaccess.thecvf.com/content/ICCV2025/papers/Liang_p-AVAS_Can_Physics-Integrated_Audio-Visual_Modeling_Boost_Neural_Acoustic_Synthesis_ICCV_2025_paper.pdf)\n* [TARO Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis](http://arxiv.org/abs/2504.05684)\n* [VAFlow Video-to-Audio Generation with Cross-Modality Flow Matching](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_VAFlow_Video-to-Audio_Generation_with_Cross-Modality_Flow_Matching_ICCV_2025_paper.pdf)\n* [Shot-by-Shot Film-Grammar-Aware Training-Free Audio Description Generation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Shot-by-Shot_Film-Grammar-Aware_Training-Free_Audio_Description_Generation_ICCV_2025_paper.pdf)\n* [AVTrustBench Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs](http://arxiv.org/abs/2501.02135)\n* 合成语音检测\n  * [Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Anshul_Intra-modal_and_Cross-modal_Synchronization_for_Audio-visual_Deepfake_Detection_and_Temporal_ICCV_2025_paper.pdf)\n\n\u003ca name=\"45\"/\u003e\n\n## 45.Dataset\n* [Context-Aware Academic Emotion Dataset and Benchmark](https://arxiv.org/pdf/2507.00586v1)\u003cbr\u003e:star:[code](https://zgsfer.github.io/CAER)\n* [ROADWork A Dataset and Benchmark for Learning to Recognize Observe Analyze and Drive Through Work Zones](https://openaccess.thecvf.com/content/ICCV2025/papers/Ghosh_ROADWork_A_Dataset_and_Benchmark_for_Learning_to_Recognize_Observe_ICCV_2025_paper.pdf)\n* [4D-Bench Benchmarking Multi-modal Large Language Models for 4D Object Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_4D-Bench_Benchmarking_Multi-modal_Large_Language_Models_for_4D_Object_Understanding_ICCV_2025_paper.pdf)\n* [Bias in Gender Bias Benchmarks How Spurious Features Distort Evaluation](http://arxiv.org/abs/2509.07596)\n* 基准\n  * [IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark](https://arxiv.org/pdf/2507.14449v1)\n  * [Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding](https://arxiv.org/pdf/2507.15028v1)\u003cbr\u003e:star:[code](https://zhangyuanhan-ai.github.io/video-tt/)\n  * [One Object Multiple Lies A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models](http://arxiv.org/abs/2507.07709)\n  * [Beyond the Destination A Novel Benchmark for Exploration-Aware Embodied Question Answering](http://arxiv.org/abs/2503.11117)\n  * [JailbreakDiffBench A Comprehensive Benchmark for Jailbreaking Diffusion Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_JailbreakDiffBench_A_Comprehensive_Benchmark_for_Jailbreaking_Diffusion_Models_ICCV_2025_paper.pdf)\n  * [MMReason An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI](http://arxiv.org/abs/2506.23563)\n  * [GRAB A Challenging GRaph Analysis Benchmark for Large Multimodal Models](http://arxiv.org/abs/2408.11817)\n  * [INS-MMBench A Comprehensive Benchmark for Evaluating LVLMs Performance in Insurance](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_INS-MMBench_A_Comprehensive_Benchmark_for_Evaluating_LVLMs_Performance_in_Insurance_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/FDU-INS/INS-MMBench)\n  * [MIEB Massive Image Embedding Benchmark](http://arxiv.org/abs/2504.10471)\u003cbr\u003e:star:[code](https://github.com/embeddings-benchmark/mteb)\n  * [LVBench An Extreme Long Video Understanding Benchmark](http://arxiv.org/abs/2406.08035)\n  * [ProJudge A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges](http://arxiv.org/abs/2503.06553)\n  * [From Abyssal Darkness to Blinding Glare A Benchmark on Extreme Exposure Correction in Real World](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_From_Abyssal_Darkness_to_Blinding_Glare_A_Benchmark_on_Extreme_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/juvenoia/REED)\n  * [Beyond Walking A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search](http://arxiv.org/abs/2411.17776)\n  * [MultiVerse A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_MultiVerse_A_Multi-Turn_Conversation_Benchmark_for_Evaluating_Large_Vision_and_ICCV_2025_paper.pdf)\n  * [Extrapolated Urban View Synthesis Benchmark](http://arxiv.org/abs/2412.05256)\n  * [WorldScore A Unified Evaluation Benchmark for World Generation](http://arxiv.org/abs/2504.00983)\u003cbr\u003e:house:[project](https://haoyi-duan.github.io/WorldScore)\n  * [ICE-Bench A Unified and Comprehensive Benchmark for Image Creating and Editing](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_ICE-Bench_A_Unified_and_Comprehensive_Benchmark_for_Image_Creating_and_ICCV_2025_paper.pdf)\n  * [MVGBench a Comprehensive Benchmark for Multi-view Generation Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_MVGBench_a_Comprehensive_Benchmark_for_Multi-view_Generation_Models_ICCV_2025_paper.pdf)\n* 数据集\n  * [Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning](https://arxiv.org/pdf/2507.04790v1)\n  * [ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users](https://arxiv.org/pdf/2507.10223v1)\u003cbr\u003e:star:[code](https://github.com/pittisl/ProGait)\u003cbr\u003e:house:[project](https://huggingface.co/datasets/ericyxy98/ProGait)\n  * [DiffTell A High-Quality Dataset for Describing Image Manipulation Changes](https://openaccess.thecvf.com/content/ICCV2025/papers/Di_DiffTell_A_High-Quality_Dataset_for_Describing_Image_Manipulation_Changes_ICCV_2025_paper.pdf)\n  * [CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling](https://arxiv.org/pdf/2507.12591v1)\n  * [Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions](https://arxiv.org/pdf/2508.04681v1)\u003cbr\u003e:star:[code](https://liangxuy.github.io/InterVLA/)\u003cbr\u003e:star:[code](https://github.com/liangxuy/intervla)\n  * [HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis](https://arxiv.org/pdf/2508.09137v1)\u003cbr\u003e:house:[project](https://vcai.mpi-inf.mpg.de/projects/HumanOLAT/)\n  * [Dataset Ownership Verification for Pre-trained Masked Models](http://arxiv.org/abs/2507.12022)\u003cbr\u003e:star:[code](https://github.com/xieyc99/DOV4MM)\n  * [Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset](http://arxiv.org/abs/2507.05728)\u003cbr\u003e:star:[code](https://github.com/rfww/uevs)\n  * [BlueNeg A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_BlueNeg_A_35mm_Negative_Film_Dataset_for_Restoring_Channel-Heterogeneous_Deterioration_ICCV_2025_paper.pdf)\n  * [CMB-ML A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task](https://openaccess.thecvf.com/content/ICCV2025/papers/Amato_CMB-ML_A_Cosmic_Microwave_Background_Dataset_for_the_Oldest_Possible_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/CMB-ML/cmb-ml)\n  * [UAVScenes A Multi-Modal Dataset for UAVs](http://arxiv.org/abs/2507.22412)\u003cbr\u003e:star:[code](https://github.com/sijieaaa/UAVScenes)\n  * [UDC-VIT A Real-World Video Dataset for Under-Display Cameras](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahn_UDC-VIT_A_Real-World_Video_Dataset_for_Under-Display_Cameras_ICCV_2025_paper.pdf)\n  * [Towards Comprehensive Lecture Slides Understanding Large-scale Dataset and Effective Method](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Towards_Comprehensive_Lecture_Slides_Understanding_Large-scale_Dataset_and_Effective_Method_ICCV_2025_paper.pdf)\n  * [R-LiViT A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Mirlach_R-LiViT_A_LiDAR-Visual-Thermal_Dataset_Enabling_Vulnerable_Road_User_Focused_Roadside_ICCV_2025_paper.pdf)\n  * [MEH A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Golyadkin_MEH_A_Multi-Style_Dataset_and_Toolkit_for_Advancing_Egyptian_Hieroglyph_ICCV_2025_paper.pdf)\n  * [3DRealCar An In-the-wild RGB-D Car Dataset with 360-degree Views](https://openaccess.thecvf.com/content/ICCV2025/papers/Du_3DRealCar_An_In-the-wild_RGB-D_Car_Dataset_with_360-degree_Views_ICCV_2025_paper.pdf)\n  * [PBFG A New Physically-Based Dataset and Removal of Lens Flares and Glares](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_PBFG_A_New_Physically-Based_Dataset_and_Removal_of_Lens_Flares_ICCV_2025_paper.pdf)\n  * [Feature Coding in the Era of Large Models Dataset Test Conditions and Benchmark](http://arxiv.org/abs/2412.04307)\u003cbr\u003e:star:[code](https://github.com/chansongoal/LaMoFC)\n  * [Modeling Saliency Dataset Bias](https://openaccess.thecvf.com/content/ICCV2025/papers/Kummerer_Modeling_Saliency_Dataset_Bias_ICCV_2025_paper.pdf)\n  * [TrackVerse A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_TrackVerse_A_Large-Scale_Object-Centric_Video_Dataset_for_Image-Level_Representation_Learning_ICCV_2025_paper.pdf)\n  * [OpenSubstance A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes](https://openaccess.thecvf.com/content/ICCV2025/papers/Pei_OpenSubstance_A_High-quality_Measured_Dataset_of_Multi-View_and_-Lighting_Images_ICCV_2025_paper.pdf)\u003cbr\u003e:house:[project](https://opensubstance.github.io/)\n  * [MMAT-1M A Large Reasoning Dataset for Multimodal Agent Tuning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_MMAT-1M_A_Large_Reasoning_Dataset_for_Multimodal_Agent_Tuning_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/VIS-MPU-Agent/MMAT-1M)\n  * [ImageGem In-the-wild Generative Image Interaction Dataset for Generative Model Personalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Guo_ImageGem_In-the-wild_Generative_Image_Interaction_Dataset_for_Generative_Model_Personalization_ICCV_2025_paper.pdf)\n  * [LANGTRAJ Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation](http://arxiv.org/abs/2504.11521)\u003cbr\u003e:house:[project](https://langtraj.github.io/)\n  * [LightCity An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_LightCity_An_Urban_Dataset_for_Outdoor_Inverse_Rendering_and_Reconstruction_ICCV_2025_paper.pdf)\n  * [CULTURE3D A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering](http://arxiv.org/abs/2501.06927)\n  * [A Real-world Display Inverse Rendering Dataset](http://arxiv.org/abs/2508.14411)\u003cbr\u003e:house:[project](https://michaelcsj.github.io/DIR)\n* 数据蒸馏\n  * [CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation](https://arxiv.org/pdf/2506.22637v1)\u003cbr\u003e:star:[code](https://github.com/hatchetProject/CaO2)\n  * [Dataset Distillation via Vision-Language Category Prototype](https://arxiv.org/pdf/2506.23580v1)\u003cbr\u003e:star:[code](https://github.com/zou-yawen/Dataset-Distillation-via-Vision-Language-Category-Prototype/)\n  * [Dataset Distillation as Data Compression: A Rate-Utility Perspective](https://arxiv.org/pdf/2507.17221v1)\n  * [Heavy Labels Out Dataset Distillation with Label Space Lightening](http://arxiv.org/abs/2408.08201)\u003cbr\u003e:star:[code](https://github.com/Lexie-YU/HeLlO)\n  * [Dataset Distillation via the Wasserstein Metric](http://arxiv.org/abs/2311.18531)\u003cbr\u003e:star:[code](https://github.com/Liu-Hy/WMDD) :house:[project](https://liu-hy.github.io/WMDD)\n  * [Diversity-Enhanced Distribution Alignment for Dataset Distillation](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Diversity-Enhanced_Distribution_Alignment_for_Dataset_Distillation_ICCV_2025_paper.pdf)\n  * [Improving Noise Efficiency in Privacy-preserving Dataset Distillation](http://arxiv.org/abs/2508.01749)\n\n\u003ca name=\"44\"/\u003e\n\n## 44.Neural Radiance Fields\n* [UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields](http://arxiv.org/pdf/2506.21884v1)\u003cbr\u003e:house:[project](https://www.factral.co/UnMix-NeRF)\n* [LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling](https://arxiv.org/pdf/2507.02363v1)\u003cbr\u003e:star:[code](https://wujh2001.github.io/LocalDyGS/)\n* [DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF](https://arxiv.org/pdf/2507.14596v1)\n* [A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields](https://arxiv.org/pdf/2507.04408v1)\n* [NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement](https://arxiv.org/pdf/2507.12714v1)\u003cbr\u003e:star:[code](https://neuraleaf-yang.github.io/)\n* [MuGS Multi-Baseline Generalizable Gaussian Splatting Reconstruction](http://arxiv.org/abs/2508.04297)\u003cbr\u003e:star:[code](https://github.com/EuclidLou/MuGS)\n* [UniVerse Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction](http://arxiv.org/abs/2510.01669)\n* 渲染\n  * [BokehDiff: Neural Lens Blur with One-Step Diffusion](https://arxiv.org/pdf/2507.18060v1)\n  * [OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering](http://arxiv.org/abs/2503.16177)\u003cbr\u003e:house:[project](https://occlugaussian.github.io)\n  * [ReCamMaster Camera-Controlled Generative Rendering from A Single Video](http://arxiv.org/abs/2503.11647)\n  * [Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Tourani_Leveraging_2D_Priors_and_SDF_Guidance_for_Urban_Scene_Rendering_ICCV_2025_paper.pdf)\n  * [Bokehlicious Photorealistic Bokeh Rendering with Controllable Apertures](http://arxiv.org/abs/2503.16067)\n  * [UNIS A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Deng_UNIS_A_Unified_Framework_for_Achieving_Unbiased_Neural_Implicit_Surfaces_ICCV_2025_paper.pdf)\n  * [Stochastic Gradient Estimation for Higher-Order Differentiable Rendering](http://arxiv.org/abs/2412.03489)\n  * [Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity](http://arxiv.org/abs/2507.15775)\n  * [FonTS Text Rendering With Typography and Style Controls](http://arxiv.org/abs/2412.00136)\n  * [Differentiable Room Acoustic Rendering with Multi-View Vision Priors](http://arxiv.org/abs/2504.21847)\n* 逆向渲染\n  * [Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues](https://arxiv.org/pdf/2507.23162v1)\n  * [Ouroboros Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering](http://arxiv.org/abs/2508.14461)\n  * [Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns](https://openaccess.thecvf.com/content/ICCV2025/papers/Urakawa_Neural_Inverse_Rendering_for_High-Accuracy_3D_Measurement_of_Moving_Objects_ICCV_2025_paper.pdf)\n  * [InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling](https://arxiv.org/pdf/2507.17613v1)\n  * [DNF-Intrinsic Deterministic Noise-Free Diffusion for Indoor Inverse Rendering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_DNF-Intrinsic_Deterministic_Noise-Free_Diffusion_for_Indoor_Inverse_Rendering_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/OnlyZZZZ/DNF-Intrinsic)\n* NVS\n  * [FVGen Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation](http://arxiv.org/abs/2508.06392)\n  * [E-NeMF Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_E-NeMF_Event-based_Neural_Motion_Field_for_Novel_Space-time_View_Synthesis_ICCV_2025_paper.pdf)\n  * [Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis](http://arxiv.org/abs/2411.00144)\u003cbr\u003e:house:[project](https://sailor-z.github.io/projects)\n  * [RayZer A Self-supervised Large View Synthesis Model](http://arxiv.org/abs/2505.00702)\n  * [BillBoard Splatting (BBSplat) Learnable Textured Primitives for Novel View Synthesis](http://arxiv.org/abs/2411.08508)\n  * [WAVE Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image](http://arxiv.org/abs/2506.23518)\n  * [UniGS Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images](http://arxiv.org/abs/2410.13195)\u003cbr\u003e:star:[code](https://github.com/jwubz123/UNIG)\n  * [Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Nair_Scaling_Transformer-Based_Novel_View_Synthesis_with_Models_Token_Disentanglement_and_ICCV_2025_paper.pdf)\n  * [SEHDR Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing](http://arxiv.org/abs/2509.20400)\n  * [RayGaussX Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis](http://arxiv.org/abs/2509.07782)\n\n\u003ca name=\"43\"/\u003e\n\n## 43.Vision Language(视觉语言)\n* [Improving Large Vision and Language Models by Learning from a Panel of Peers](http://arxiv.org/abs/2509.01610)\n* [DASH Detection and Assessment of Systematic Hallucinations of VLMs](http://arxiv.org/abs/2503.23573)\n* [Vision-Language Models Cant See the Obvious](https://openaccess.thecvf.com/content/ICCV2025/papers/Huynh_Vision-Language_Models_Cant_See_the_Obvious_ICCV_2025_paper.pdf)\n* [Web Artifact Attacks Disrupt Vision Language Models](http://arxiv.org/abs/2503.13652)\u003cbr\u003e:star:[code](https://github.com/mqraitem/Web-Artifact-Attacks)\n* [ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models](https://arxiv.org/pdf/2507.00898v1)\u003cbr\u003e:star:[code](https://github.com/zifuwan/ONLY)\u003cbr\u003e:star:[code](https://zifuwan.github.io/ONLY/)\n* [VLM4D Towards Spatiotemporal Awareness in Vision Language Models](http://arxiv.org/abs/2508.02095)\n* [WalkVLM Aid Visually Impaired People Walking by Vision Language Model](http://arxiv.org/abs/2412.20903)\n* [ViLU: Learning Vision-Language Uncertainties for Failure Prediction](https://arxiv.org/pdf/2507.07620v1)\u003cbr\u003e:star:[code](https://github.com/ykrmm/ViLU)\n* [PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection](https://arxiv.org/pdf/2507.08979v1)\u003cbr\u003e:star:[code](https://github.com/MahdiyarMM/PRISM)\n* [One Last Attention for Your Vision-Language Model](https://arxiv.org/pdf/2507.15480v1)\u003cbr\u003e:star:[code](https://github.com/khufia/RAda/tree/main)\n* [Hierarchical Cross-modal Prompt Learning for Vision-Language Models](https://arxiv.org/pdf/2507.14976v1)\u003cbr\u003e:star:[code](https://github.com/zzeoZheng/HiCroPL)\n* [METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models](https://arxiv.org/pdf/2507.20842v1)\u003cbr\u003e:star:[code](https://github.com/YuchenLiu98/METEOR)\n* [ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking](https://arxiv.org/pdf/2507.19875v1)\u003cbr\u003e:star:[code](https://github.com/XiaokunFeng/ATCTrack)\n* [AgroBench: Vision-Language Model Benchmark in Agriculture](https://arxiv.org/pdf/2507.20519v1)\u003cbr\u003e:star:[code](https://dahlian00.github.io/AgroBenchPage/)\n* [MM-IFEngine Towards Multimodal Instruction Following](https://openaccess.thecvf.com/content/ICCV2025/papers/Ding_MM-IFEngine_Towards_Multimodal_Instruction_Following_ICCV_2025_paper.pdf)\n* [Robustifying Zero-Shot Vision Language Models by Subspaces Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Robustifying_Zero-Shot_Vision_Language_Models_by_Subspaces_Alignment_ICCV_2025_paper.pdf)\n* [FDPT Federated Discrete Prompt Tuning for Black-Box Visual-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_FDPT_Federated_Discrete_Prompt_Tuning_for_Black-Box_Visual-Language_Models_ICCV_2025_paper.pdf)\n* [Griffon v2 Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring](http://arxiv.org/abs/2403.09333)\u003cbr\u003e:star:[code](https://github.com/jefferyZhan/Griffon)\n* [CLIP-GS Unifying Vision-Language Representation with 3D Gaussian Splatting](https://openaccess.thecvf.com/content/ICCV2025/papers/Jiao_CLIP-GS_Unifying_Vision-Language_Representation_with_3D_Gaussian_Splatting_ICCV_2025_paper.pdf)\n* [Growing a Twig to Accelerate Large Vision-Language Models](http://arxiv.org/abs/2503.14075)\n* [Test-Time Retrieval-Augmented Adaptation for Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Fan_Test-Time_Retrieval-Augmented_Adaptation_for_Vision-Language_Models_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/xinqi-fan/TT-RAA)\n* [Understanding Museum Exhibits using Vision-Language Reasoning](http://arxiv.org/abs/2412.01370)\n* [One Perturbation is Enough On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models](http://arxiv.org/abs/2406.05491)\n* [When Lighting Deceives Exposing Vision-Language Models Illumination Vulnerability Through Illumination Transformation Attack](http://arxiv.org/abs/2503.06903)\n* [Target Bias Is All You Need Zero-Shot Debiasing of Vision-Language Models with Bias Corpus](https://openaccess.thecvf.com/content/ICCV2025/papers/Jang_Target_Bias_Is_All_You_Need_Zero-Shot_Debiasing_of_Vision-Language_ICCV_2025_paper.pdf)\n* [TAB Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models](http://arxiv.org/abs/2412.18675)\n* [Feather the Throttle Revisiting Visual Token Pruning for Vision-Language Model Acceleration](http://arxiv.org/abs/2412.13180)\n* [Derm1M A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology](http://arxiv.org/abs/2503.14911)\u003cbr\u003e:star:[code](https://github.com/SiyuanYan1/Derm1M)\n* [ReCoT Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Qu_ReCoT_Reflective_Self-Correction_Training_for_Mitigating_Confirmation_Bias_in_Large_ICCV_2025_paper.pdf)\n* [AutoOcc Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting](http://arxiv.org/abs/2502.04981)\n* [D-Attn Decomposed Attention for Large Vision-and-Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Kuo_D-Attn_Decomposed_Attention_for_Large_Vision-and-Language_Model_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/bytedance/DecomposedAttention)\n* [Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_Deciphering_Cross-Modal_Alignment_in_Large_Vision-Language_Models_via_Modality_Integration_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/shikiw/Modality-Integration-Rate)\n* [Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Fuzzy_Contrastive_Decoding_to_Alleviate_Object_Hallucination_in_Large_Vision-Language_ICCV_2025_paper.pdf)\n* [IDEATOR Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves](http://arxiv.org/abs/2411.00827)\n* [25 Years in Class A Multimodal Textbook for Vision-Language Pretraining](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_2.5_Years_in_Class_A_Multimodal_Textbook_for_Vision-Language_Pretraining_ICCV_2025_paper.pdf)\n* [Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features](http://arxiv.org/abs/2412.00142)\n* [FedMVP Federated Multimodal Visual Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2504.20860)\u003cbr\u003e:star:[code](https://github.com/mainaksingha01/FedMVP)\n* [Physics Context Builders A Modular Framework for Physical Reasoning in Vision-Language Models](http://arxiv.org/abs/2412.08619)\n* [VLRMBench A Comprehensive and Challenging Benchmark for Vision-Language Reward Models](http://arxiv.org/abs/2503.07478)\u003cbr\u003e:star:[code](https://github.com/JCruan519/VLRMBench)\n* [ZipVL Accelerating Vision-Language Models through Dynamic Token Sparsity](https://openaccess.thecvf.com/content/ICCV2025/papers/He_ZipVL_Accelerating_Vision-Language_Models_through_Dynamic_Token_Sparsity_ICCV_2025_paper.pdf)\n* [Skip-Vision Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Skip-Vision_Efficient_and_Scalable_Acceleration_of_Vision-Language_Models_via_Adaptive_ICCV_2025_paper.pdf)\n* [SAUCE Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders](http://arxiv.org/abs/2503.14530)\n* [The Inter-Intra Modal Measure A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models](http://arxiv.org/abs/2407.15731)\u003cbr\u003e:star:[code](https://github.com/mit-ll/IIMM)\n* [MaTVLM Hybrid Mamba-Transformer for Efficient Vision-Language Modeling](http://arxiv.org/abs/2503.13440)\u003cbr\u003e:star:[code](https://github.com/hustvl/MaTVLM)\n* [Safeguarding Vision-Language Models Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks](http://arxiv.org/abs/2504.01308)\u003cbr\u003e:star:[code](https://github.com/JarvisUSTC/DiffPure-RobustVLM)\n* [Dynamic Multimodal Prototype Learning in Vision-Language Models](http://arxiv.org/abs/2507.03657)\n* [GEOBench-VLM Benchmarking Vision-Language Models for Geospatial Tasks](https://openaccess.thecvf.com/content/ICCV2025/papers/Danish_GEOBench-VLM_Benchmarking_Vision-Language_Models_for_Geospatial_Tasks_ICCV_2025_paper.pdf)\n* [Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models](http://arxiv.org/abs/2405.14715)\n* [V2PE Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding](http://arxiv.org/abs/2412.09616)\n* [DexVLG Dexterous Vision-Language-Grasp Model at Scale](http://arxiv.org/abs/2507.02747)\n* [Vision-Language Neural Graph Featurization for Extracting Retinal Lesions](https://openaccess.thecvf.com/content/ICCV2025/papers/Hassan_Vision-Language_Neural_Graph_Featurization_for_Extracting_Retinal_Lesions_ICCV_2025_paper.pdf)\n* [MotionCtrl A Real-time Controllable Vision-Language-Motion Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_MotionCtrl_A_Real-time_Controllable_Vision-Language-Motion_Model_ICCV_2025_paper.pdf)\n* [Breaking the Encoder Barrier for Seamless Video-Language Understanding](http://arxiv.org/abs/2503.18422)\n* [OphCLIP Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining](http://arxiv.org/abs/2411.15421)\n* [How Can Objects Help Video-Language Understanding](http://arxiv.org/abs/2504.07454)\u003cbr\u003e:star:[code](https://github.com/brown-palm/ObjectMLLM)\n* [Factorized Learning for Temporally Grounded Video-Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_Factorized_Learning_for_Temporally_Grounded_Video-Language_Models_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/nusnlp/d2vlm)\n* [Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models](http://arxiv.org/abs/2508.01225)\n* [AdvDreamer Unveils Are Vision-Language Models Truly Ready for Real-World 3D Variations](http://arxiv.org/abs/2412.03002)\n* [HQ-CLIP Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wei_HQ-CLIP_Leveraging_Large_Vision-Language_Models_to_Create_High-Quality_Image-Text_Datasets_ICCV_2025_paper.pdf)\n* [Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation](http://arxiv.org/abs/2504.17207)\n* [The Scalability of Simplicity Empirical Analysis of Vision-Language Learning with a Single Transformer](http://arxiv.org/abs/2504.10462)\u003cbr\u003e:star:[code](https://github.com/bytedance/SAIL)\n* [EVEv2 Improved Baselines for Encoder-Free Vision-Language Models](http://arxiv.org/abs/2502.06788)\u003cbr\u003e:star:[code](https://github.com/baaivision/EVE)\n* [TruthPrInt Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention](https://openaccess.thecvf.com/content/ICCV2025/papers/Duan_TruthPrInt_Mitigating_Large_Vision-Language_Models_Object_Hallucination_Via_Latent_Truthful-Guided_ICCV_2025_paper.pdf)\n* [Structured Policy Optimization Enhance Large Vision-Language Model via Self-referenced Dialogue](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Structured_Policy_Optimization_Enhance_Large_Vision-Language_Model_via_Self-referenced_Dialogue_ICCV_2025_paper.pdf)\n* [Causality-guided Prompt Learning for Vision-language Models via Visual Granulation](http://arxiv.org/abs/2509.03803)\u003cbr\u003e:star:[code](https://github.com/GaoMY-521/CaPL_Code)\n* [CalliReader Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model](http://arxiv.org/abs/2503.06472)\n* [Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma](http://arxiv.org/abs/2503.12496)\n* [Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images](http://arxiv.org/abs/2508.15256)\n* [Uncertainty-Driven Expert Control Enhancing the Reliability of Medical Vision-Language Models](http://arxiv.org/abs/2507.09209)\n* [Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kang_Dynamic_Multi-Layer_Null_Space_Projection_for_Vision-Language_Continual_Learning_ICCV_2025_paper.pdf)\n* [Learning Beyond Still Frames Scaling Vision-Language Models with Video](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Learning_Beyond_Still_Frames_Scaling_Vision-Language_Models_with_Video_ICCV_2025_paper.pdf)\n* [GLEAM Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_GLEAM_Enhanced_Transferable_Adversarial_Attacks_for_Vision-Language_Pre-training_Models_via_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/LuckAlex/GLEAM)\n* [INTER Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling](http://arxiv.org/abs/2507.05056)\u003cbr\u003e:star:[code](https://github.com/xxxxx313/INTER)\n* [SmolDocling An ultra-compact vision-language model for end-to-end multi-modal document conversion](http://arxiv.org/abs/2503.11576)\n* VLN\n  * [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://arxiv.org/pdf/2507.13019v1)\u003cbr\u003e:star:[code](https://crystalsixone.github.io/vln_pe.github.io/)\n  * [monoVLN Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_monoVLN_Bridging_the_Observation_Gap_between_Monocular_and_Panoramic_Vision_ICCV_2025_paper.pdf)\n  * [NavQ Learning a Q-Model for Foresighted Vision-and-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_NavQ_Learning_a_Q-Model_for_Foresighted_Vision-and-Language_Navigation_ICCV_2025_paper.pdf)\n  * [COSMO Combination of Selective Memorization for Low-cost Vision-and-Language Navigation](http://arxiv.org/abs/2503.24065)\u003cbr\u003e:star:[code](https://github.com/siqiZ805/VLN-COSMO.git)\n  * [NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments](https://arxiv.org/pdf/2506.23468v1)\u003cbr\u003e:star:[code](https://github.com/Feliciaxyao/NavMorph)\n  * [3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation](https://openaccess.thecvf.com/content/ICCV2025/papers/Gao_3D_Gaussian_Map_with_Open-Set_Semantic_Grouping_for_Vision-Language_Navigation_ICCV_2025_paper.pdf)\n* LLM\n  * [LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching](https://arxiv.org/pdf/2506.23502v1)\n  * [Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching](http://arxiv.org/abs/2503.14953)\n  * [Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs](https://arxiv.org/pdf/2507.07990v1)\u003cbr\u003e:house:[project](https://www.jshyun.me/projects/sttm)\n  * [Why LVLMs Are More Prone to Hallucinations in Longer Responses The Role of Context](https://openaccess.thecvf.com/content/ICCV2025/papers/Zheng_Why_LVLMs_Are_More_Prone_to_Hallucinations_in_Longer_Responses_ICCV_2025_paper.pdf)\n  * [Zeroth-Order Fine-Tuning of LLMs in Random Subspaces](http://arxiv.org/abs/2410.08989)\u003cbr\u003e:star:[code](https://github.com/zimingyy/SubZero)\n  * [Advancing Visual Large Language Model for Multi-granular Versatile Perception](https://arxiv.org/pdf/2507.16213v1)\u003cbr\u003e:star:[code](https://github.com/xiangwentao666/MVP-LM)\n  * [DisTime Distribution-based Time Representation for Video Large Language Models](http://arxiv.org/abs/2505.24329)\u003cbr\u003e:star:[code](https://github.com/josephzpng/DisTime)\n  * [Aligning Effective Tokens with Video Anomaly in Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Aligning_Effective_Tokens_with_Video_Anomaly_in_Large_Language_Models_ICCV_2025_paper.pdf)\n  * [MeshLLM Empowering Large Language Models to Progressively Understand and Generate 3D Mesh](http://arxiv.org/abs/2508.01242)\n  * [FOLDER Accelerating Multi-Modal Large Language Models with Enhanced Performance](http://arxiv.org/abs/2501.02430)\u003cbr\u003e:star:[code](https://github.com/anakin-skywalker-Joseph/Folder)\n  * [B-VLLM A Vision Large Language Model with Balanced Spatio-Temporal Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Lu_B-VLLM_A_Vision_Large_Language_Model_with_Balanced_Spatio-Temporal_Tokens_ICCV_2025_paper.pdf)\n  * [Robin3D Improving 3D Large Language Model via Robust Instruction Tuning](http://arxiv.org/abs/2410.00255)\n  * [GenieBlue Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices](http://arxiv.org/abs/2503.06019)\n  * [CATP-LLM Empowering Large Language Models for Cost-Aware Tool Planning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_CATP-LLM_Empowering_Large_Language_Models_for_Cost-Aware_Tool_Planning_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/duowuyms/OpenCATP-LLM)\n  * [Multimodal LLM Guided Exploration and Active Mapping using Fisher Information](http://arxiv.org/abs/2410.17422)\n  * [Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Sun_Multimodal_Large_Language_Model-Guided_ISP_Hyperparameter_Optimization_with_Dynamic_Preference_ICCV_2025_paper.pdf)\n  * [Aligning Vision to Language Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning](http://arxiv.org/abs/2503.12972)\u003cbr\u003e:star:[code](https://github.com/Wings-Of-Disaster/VaLiK)\n* MLLM\n  * [Token Activation Map to Visually Explain Multimodal LLMs](http://arxiv.org/abs/2506.23270)\u003cbr\u003e:star:[code](https://github.com/xmed-lab/TAM)\n  * [DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs](https://arxiv.org/pdf/2507.10302v1)\u003cbr\u003e:star:[code](https://github.com/ZJHTerry18/DisCo)\n  * [UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding](https://arxiv.org/pdf/2506.23219v1)\u003cbr\u003e:star:[code](https://github.com/tsinghua-fib-lab/UrbanLLaVA)\n  * [Kestrel 3D Multimodal LLM for Part-Aware Grounded Description](http://arxiv.org/abs/2405.18937)\n  * [Are They the Same Exploring Visual Correspondence Shortcomings of Multimodal LLMs](http://arxiv.org/abs/2501.04670)\n  * [Analyzing Finetuning Representation Shift for Multimodal LLMs Steering](http://arxiv.org/abs/2501.03012)\n  * [Visual Chronicles Using Multimodal LLMs to Analyze Massive Collections of Images](http://arxiv.org/abs/2504.08727)\n  * [Controlling Multimodal LLMs via Reward-guided Decoding](https://openaccess.thecvf.com/content/ICCV2025/papers/Manas_Controlling_Multimodal_LLMs_via_Reward-guided_Decoding_ICCV_2025_paper.pdf)\n  * [TWIST  SCOUT Grounding Multimodal LLM-Experts by Forget-Free Tuning](http://arxiv.org/abs/2410.10491)\n  * [FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging](https://arxiv.org/pdf/2508.04625v1)\n  * [Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation](https://arxiv.org/pdf/2507.02859v1)\n  * [BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models](https://arxiv.org/pdf/2508.06895v1)\n  * [Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning](https://arxiv.org/pdf/2507.07424v1)\u003cbr\u003e:star:[code](https://mm-vl.github.io/corvid)\n  * [Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs](http://arxiv.org/abs/2503.20309)\n  * [CompCap Improving Multimodal Large Language Models with Composite Captions](http://arxiv.org/abs/2412.05243)\n  * [AVAM a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_AVAM_a_Universal_Training-free_Adaptive_Visual_Anchoring_Embedded_into_Multimodal_ICCV_2025_paper.pdf)\n  * [How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning Placing Them in An Extensible Escape Game](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_How_Do_Multimodal_Large_Language_Models_Handle_Complex_Multimodal_Reasoning_ICCV_2025_paper.pdf)\n  * [LLaVA-KD A Framework of Distilling Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Cai_LLaVA-KD_A_Framework_of_Distilling_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)\n  * [LIRA Reasoning Reconstruction via Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_LIRA_Reasoning_Reconstruction_via_Multimodal_Large_Language_Models_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/zhen6618/LIRA)\n  * [MissRAG Addressing the Missing Modality Challenge in Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Pipoli_MissRAG_Addressing_the_Missing_Modality_Challenge_in_Multimodal_Large_Language_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/aimagelab/MissRAG)\n  * [Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models](http://arxiv.org/abs/2411.12790)\u003cbr\u003e:star:[code](https://github.com/zeng-zhen/FGVEdit)\n  * [Benchmarking Multimodal Large Language Models Against Image Corruptions](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Benchmarking_Multimodal_Large_Language_Models_Against_Image_Corruptions_ICCV_2025_paper.pdf)\n  * [SHIFT Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_SHIFT_Smoothing_Hallucinations_by_Information_Flow_Tuning_for_Multimodal_Large_ICCV_2025_paper.pdf)\n  * [Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency](http://arxiv.org/abs/2501.04931)\n  * [VisNumBench Evaluating Number Sense of Multimodal Large Language Models](http://arxiv.org/abs/2503.14939)\n  * [ShortV Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers](http://arxiv.org/abs/2504.00502)\u003cbr\u003e:star:[code](https://github.com/icip-cas/ShortV)\n  * [Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models](http://arxiv.org/abs/2412.05934)\u003cbr\u003e:star:[code](https://github.com/MaTengSYSU/HIMRD-jailbreak)\n  * [Learning to Inference Adaptively for Multimodal Large Language Models](http://arxiv.org/abs/2503.10905)\n  * [FALCON Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers](http://arxiv.org/abs/2501.16297)\n  * [R1-VL Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_R1-VL_Learning_to_Reason_with_Multimodal_Large_Language_Models_via_ICCV_2025_paper.pdf)\n  * [Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Slyman_Calibrating_MLLM-as-a-judge_via_Multimodal_Bayesian_Prompt_Ensembles_ICCV_2025_paper.pdf)\n  * [Boosting MLLM Reasoning with Text-Debiased Hint-GRPO](http://arxiv.org/abs/2503.23905)\u003cbr\u003e:star:[code](https://github.com/hqhQAQ/Hint-GRPO)\n  * [Information Density Principle for MLLM Benchmarks](http://arxiv.org/abs/2503.10079)\n  * [Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Auto-Controlled_Image_Perception_in_MLLMs_via_Visual_Perception_Tokens_ICCV_2025_paper.pdf)\n  * [VSP Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_VSP_Diagnosing_the_Dual_Challenges_of_Perception_and_Reasoning_in_ICCV_2025_paper.pdf)\n  * [MM-Spatial Exploring 3D Spatial Understanding in Multimodal LLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Daxberger_MM-Spatial_Exploring_3D_Spatial_Understanding_in_Multimodal_LLMs_ICCV_2025_paper.pdf)\n  * [Spatial Preference Rewarding for MLLMs Spatial Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Qiu_Spatial_Preference_Rewarding_for_MLLMs_Spatial_Understanding_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/hanqiu-hq/SPR)\n  * [SparseMM Head Sparsity Emerges from Visual Concept Responses in MLLMs](http://arxiv.org/abs/2506.05344)\u003cbr\u003e:star:[code](https://github.com/CR400AF-A/SparseMM)\n  * [OrderChain Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM](http://arxiv.org/abs/2504.04801)\u003cbr\u003e:house:[project](https://order-chain.github.io/)\n  * [STI-Bench Are MLLMs Ready for Precise Spatial-Temporal World Understanding](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_STI-Bench_Are_MLLMs_Ready_for_Precise_Spatial-Temporal_World_Understanding_ICCV_2025_paper.pdf)\n  * [ChartPoint Guiding MLLMs with Grounding Reflection for Chart Reasoning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_ChartPoint_Guiding_MLLMs_with_Grounding_Reflection_for_Chart_Reasoning_ICCV_2025_paper.pdf)\n  * [Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning](http://arxiv.org/abs/2507.17539)\u003cbr\u003e:star:[code](https://github.com/MeteorElf/FundusExpert)\n  * [p-MoD Building Mixture-of-Depths MLLMs via Progressive Ratio Decay](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_p-MoD_Building_Mixture-of-Depths_MLLMs_via_Progressive_Ratio_Decay_ICCV_2025_paper.pdf)\n  * [LLaVA-SP Enhancing Visual Representation with Visual Spatial Tokens for MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Lou_LLaVA-SP_Enhancing_Visual_Representation_with_Visual_Spatial_Tokens_for_MLLMs_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/CnFaker/LLaVA-SP)\n  * [Enhancing Numerical Prediction of MLLMs with Soft Labeling](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Enhancing_Numerical_Prediction_of_MLLMs_with_Soft_Labeling_ICCV_2025_paper.pdf)\n  * [Creation-MMBench Assessing Context-Aware Creative Intelligence in MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Fang_Creation-MMBench_Assessing_Context-Aware_Creative_Intelligence_in_MLLMs_ICCV_2025_paper.pdf)\n* Visual Grounding\n  * [PropVG End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination](http://arxiv.org/abs/2509.04833)\n  * [Move to Understand a 3D Scene Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation](http://arxiv.org/abs/2507.04047)\n  * [MC-Bench A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs](https://openaccess.thecvf.com/content/ICCV2025/papers/Xu_MC-Bench_A_Benchmark_for_Multi-Context_Visual_Grounding_in_the_Era_ICCV_2025_paper.pdf)\u003cbr\u003e:house:[project](https://xuyunqiu.github.io/MC-Bench)\n  * [AerialVG A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations](http://arxiv.org/abs/2504.07836)\u003cbr\u003e:star:[code](https://github.com/Ideal-ljl/AerialVG)\n  * [NAVER A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning](http://arxiv.org/abs/2502.00372)\u003cbr\u003e:star:[code](https://github.com/ControlNet/NAVER)\n  * [VGMamba Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_VGMamba_Attribute-to-Location_Clue_Reasoning_for_Quantity-Agnostic_3D_Visual_Grounding_ICCV_2025_paper.pdf)\n  * [Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding](https://openaccess.thecvf.com/content/ICCV2025/papers/Ouyang_Region-aware_Anchoring_Mechanism_for_Efficient_Referring_Visual_Grounding_ICCV_2025_paper.pdf)\n* REC\n  * [Referring Expression Comprehension for Small Objects](http://arxiv.org/abs/2510.03701)\n  * [Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension](https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Leveraging_Debiased_Cross-modal_Attention_Maps_and_Code-based_Reasoning_for_Zero-shot_ICCV_2025_paper.pdf)\n\n\u003ca name=\"42\"/\u003e\n\n## 42.Vision Transformer\n* [Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features](http://arxiv.org/pdf/2506.21046v1)\u003cbr\u003e:star:[code](https://github.com/spencerwooo/dSVA)\n* [Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy](https://arxiv.org/pdf/2507.13260v1)\n* [EA-ViT: Efficient Adaptation for Elastic Vision Transformer](https://arxiv.org/pdf/2507.19360v1)\u003cbr\u003e:star:[code](https://github.com/zcxcf/EA-ViT)\n* [MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective](https://arxiv.org/pdf/2507.19131v1)\n* [OminiControl Minimal and Universal Control for Diffusion Transformer](http://arxiv.org/abs/2411.15098)\n* [Pinco Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting](http://arxiv.org/abs/2412.03812)\n* [SAFER Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers](http://arxiv.org/abs/2501.01529)\n* [OmniCache A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models](https://openaccess.thecvf.com/content/ICCV2025/papers/Chu_OmniCache_A_Trajectory-Oriented_Global_Perspective_on_Training-Free_Cache_Reuse_for_ICCV_2025_paper.pdf)\n* [Sparse Fine-Tuning of Transformers for Generative Tasks](http://arxiv.org/abs/2507.10855)\n* [MaTe Images Are All You Need for Material Transfer via Diffusion Transformer](https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_MaTe_Images_Are_All_You_Need_for_Material_Transfer_via_ICCV_2025_paper.pdf)\n* [Hybrid Layout Control for Diffusion Transformer Fewer Annotations Superior Aesthetics](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hybrid_Layout_Control_for_Diffusion_Transformer_Fewer_Annotations_Superior_Aesthetics_ICCV_2025_paper.pdf)\n* [UniCombine Unified Multi-Conditional Combination with Diffusion Transformer](http://arxiv.org/abs/2503.09277)\n* [EasyControl Adding Efficient and Flexible Control for Diffusion Transformer](http://arxiv.org/abs/2503.07027)\n* [Accelerating Diffusion Transformer via Gradient-Optimized Cache](http://arxiv.org/abs/2503.05156)\u003cbr\u003e:star:[code](https://github.com/qiujx0520/GOC_ICCV2025.git)\n* [LeGrad An Explainability Method for Vision Transformers via Feature Formation Sensitivity](http://arxiv.org/abs/2404.03214)\n* [An Efficient Hybrid Vision Transformer for TinyML Applications](https://openaccess.thecvf.com/content/ICCV2025/papers/Zeng_An_Efficient_Hybrid_Vision_Transformer_for_TinyML_Applications_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/yuffeenn/TinyNeXt)\n* [MixA A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge](https://openaccess.thecvf.com/content/ICCV2025/papers/Ahmed_MixA_A_Mixed_Attention_approach_with_Stable_Lightweight_Linear_Attention_ICCV_2025_paper.pdf)\n\n\n\u003ca name=\"41\"/\u003e\n\n## 41.Neural Architecture Search(神经架构搜索)\n* [Neural Architecture Search Driven by Locally Guided Diffusion for Personalized Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Liao_Neural_Architecture_Search_Driven_by_Locally_Guided_Diffusion_for_Personalized_ICCV_2025_paper.pdf)\n* [Loss Functions for Predictor-based Neural Architecture Search](http://arxiv.org/abs/2506.05869)\n* [TRNAS A Training-Free Robust Neural Architecture Search](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_TRNAS_A_Training-Free_Robust_Neural_Architecture_Search_ICCV_2025_paper.pdf)\n\n\u003ca name=\"40\"/\u003e\n\n## 40.Deep learning(深度学习)\n* 胶囊网络\n  * [EquiCaps Predictor-Free Pose-Aware Pre-Trained Capsule Networks](http://arxiv.org/abs/2506.09895)\u003cbr\u003e:star:[code](http://github.com/AberdeenML/EquiCaps) :star:[code2](https://github.com/AberdeenML/EquiCaps)\n* RNN\n  * [ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers](http://arxiv.org/pdf/2506.21537v1)\n\n\u003ca name=\"39\"/\u003e\n\n## 39.Machine learning(机器学习)\n* 机器遗忘\n  * [MUNBa Machine Unlearning via Nash Bargaining](http://arxiv.org/abs/2411.15537)\n  * [Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels](http://arxiv.org/abs/2503.13917)\n  * [Learning to Unlearn while Retaining Combating Gradient Conflicts in Machine Unlearning](http://arxiv.org/abs/2503.06339)\n  * [Reminiscence Attack on Residuals Exploiting Approximate Machine Unlearning for Privacy](http://arxiv.org/abs/2507.20573)\n* 主动学习\n  * [To Label or Not to Label: PALM -- A Predictive Model for Evaluating Sample Efficiency in Active Learning Models](https://arxiv.org/pdf/2507.15381v1)\u003cbr\u003e:star:[code](https://github.com/juliamachnio/PALM)\n  * [Consensus-Driven Active Model Selection](https://arxiv.org/pdf/2507.23771v1)\u003cbr\u003e:star:[code](https://github.com/justinkay/coda)\n* 对比学习\n  * [Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision](http://arxiv.org/pdf/2506.20850v1)  \n  * [Selective Contrastive Learning for Weakly Supervised Affordance Grounding](https://arxiv.org/pdf/2508.07877v1)\n  * [Fix-CLIP Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Fix-CLIP_Dual-Branch_Hierarchical_Contrastive_Learning_via_Synthetic_Captions_for_Better_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/bcwang-sjtu/Fix-CLIP)\n  * [Robust Dataset Condensation using Supervised Contrastive Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Robust_Dataset_Condensation_using_Supervised_Contrastive_Learning_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/DISL-Lab/RDC-ICCV2025)\n  * [Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning](http://arxiv.org/abs/2507.12998)\u003cbr\u003e:star:[code](https://github.com/MediaBrain-SJTU/DISSect)\n  * [Backdooring Self-Supervised Contrastive Learning by Noisy Alignment](http://arxiv.org/abs/2508.14015)\u003cbr\u003e:star:[code](https://github.com/jsrdcht/Noisy-Alignment)\n  * [Salvaging the Overlooked Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection](http://arxiv.org/abs/2412.04769)\n  * [AMD Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction](http://arxiv.org/abs/2507.01801)\n* 强化学习\n  * [RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment](http://arxiv.org/pdf/2506.21037v1)\n  * [Reinforcement Learning-Guided Data Selection via Redundancy Assessment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Reinforcement_Learning-Guided_Data_Selection_via_Redundancy_Assessment_ICCV_2025_paper.pdf)\n  * [RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction](https://arxiv.org/pdf/2507.04839v1)\u003cbr\u003e:star:[code](https://github.com/fraunhoferhhi/RIPE)\n  * [DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding](https://arxiv.org/pdf/2508.08589v1)\u003cbr\u003e:star:[code](https://github.com/wenwenyu/DocThinker)\n  * [DeepMesh Auto-Regressive Artist-mesh Creation with Reinforcement Learning](http://arxiv.org/abs/2503.15265)\n  * [ULTHO Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning](http://arxiv.org/abs/2503.06101)\n  * [Disentangled World Models Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning](http://arxiv.org/abs/2503.08751)\n  * [One Encoder to Rule them All Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators](https://openaccess.thecvf.com/content/ICCV2025/papers/Dutta_One_Encoder_to_Rule_them_All_Representation_Learning_for_Model-free_ICCV_2025_paper.pdf)\n  * [Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Lee_Diffusion_Guided_Adaptive_Augmentation_for_Generalization_in_Visual_Reinforcement_Learning_ICCV_2025_paper.pdf)\n  * [GenFlowRL Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning](http://arxiv.org/abs/2508.11049)\u003cbr\u003e:house:[project](https://colinyu1.github.io/genflowrl) :house:[project](https://colinyu1.github.io/genflowrl/)\n* 持续学习\n  * [CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization](http://arxiv.org/pdf/2506.21117v1)\u003cbr\u003e:star:[code](https://cl-splats.github.io)\n  * [PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning](https://arxiv.org/pdf/2507.12305v1)\u003cbr\u003e:star:[code](https://github.com/anwarmaxsum/PROL)\n  * [Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning](https://arxiv.org/pdf/2507.09118v1)\u003cbr\u003e:star:[code](https://github.com/linlany/MindtheGap)\n  * [RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning](https://arxiv.org/pdf/2507.22553v1)\n  * [Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models](https://arxiv.org/pdf/2508.00260v1)\n  * [Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning](https://arxiv.org/pdf/2508.05316v1)\u003cbr\u003e:star:[code](https://github.com/NJUyued/USP4SSCL)\n  * [Any-SSR How Recursive Least Squares Works in Continual Learning of Large Language Model](https://openaccess.thecvf.com/content/ICCV2025/papers/Tong_Any-SSR_How_Recursive_Least_Squares_Works_in_Continual_Learning_of_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/ZHUANGHP/Any-SSR)\n  * [Joint Diffusion Models in Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Skiers_Joint_Diffusion_Models_in_Continual_Learning_ICCV_2025_paper.pdf)\n  * [PLAN Proactive Low-Rank Allocation for Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_PLAN_Proactive_Low-Rank_Allocation_for_Continual_Learning_ICCV_2025_paper.pdf)\n  * [Divide-and-Conquer for Enhancing Unlabeled Learning Stability and Plasticity in Semi-supervised Continual Learning](http://arxiv.org/abs/2508.05316)\u003cbr\u003e:star:[code](https://github.com/NJUyued/USP4SSCL)\n  * [CODE-CL Conceptor-Based Gradient Projection for Deep Continual Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Apolinario_CODE-CL_Conceptor-Based_Gradient_Projection_for_Deep_Continual_Learning_ICCV_2025_paper.pdf)\n  * [FedAGC Federated Continual Learning with Asymmetric Gradient Correction](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_FedAGC_Federated_Continual_Learning_with_Asymmetric_Gradient_Correction_ICCV_2025_paper.pdf)\n* 对抗学习\n  * [TITAN Query-Token based Domain Adaptive Adversarial Learning](http://arxiv.org/abs/2506.21484)\u003cbr\u003e:star:[code](https://github.com/Tajamul21/TITAN)\n  * [ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models](https://arxiv.org/pdf/2507.21985v1)\n  * [Pretend Benign A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception](https://openaccess.thecvf.com/content/ICCV2025/papers/Lin_Pretend_Benign_A_Stealthy_Adversarial_Attack_by_Exploiting_Vulnerabilities_in_ICCV_2025_paper.pdf)\n  * [KOEnsAttack Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_KOEnsAttack_Towards_Efficient_Data-Free_Black-Box_Adversarial_Attacks_via_Knowledge-Orthogonalized_Substitute_ICCV_2025_paper.pdf)\n  * [SMP-Attack Boosting the Transferability of Feature Importance-based Adversarial Attack with Semantics-aware Multi-granularity Patchout](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_SMP-Attack_Boosting_the_Transferability_of_Feature_Importance-based_Adversarial_Attack_with_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/AdvML-Group/SMP-Attack)\n  * [DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion](https://arxiv.org/pdf/2507.22813v1)\u003cbr\u003e:star:[code](https://github.com/AdaptiveMotorControlLab/DISTIL)\n  * [Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights](https://arxiv.org/pdf/2508.00649v1)\u003cbr\u003e:star:[code](https://github.com/Gandolfczjh/APDE)\n  * [Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance](http://arxiv.org/abs/2508.15650)\u003cbr\u003e:star:[code](https://github.com/AIASLab/CFG-ICCV2025)\n  * [Boosting Adversarial Transferability via Residual Perturbation Attack](https://arxiv.org/pdf/2508.05689v1)\u003cbr\u003e:star:[code](https://github.com/ZezeTao/ResPA)\n  * [Confound from All Sides Distill with Resilience Multi-Objective Adversarial Paths to Zero-Shot Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Dong_Confound_from_All_Sides_Distill_with_Resilience_Multi-Objective_Adversarial_Paths_ICCV_2025_paper.pdf)\n  * [Adversarial Training for Probabilistic Robustness](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adversarial_Training_for_Probabilistic_Robustness_ICCV_2025_paper.pdf)\n  * [Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination](https://openaccess.thecvf.com/content/ICCV2025/papers/Pan_Mitigating_Catastrophic_Overfitting_in_Fast_Adversarial_Training_via_Label_Information_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/fzjcdt/LIET)\n  * [Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment](http://arxiv.org/abs/2408.06079)\u003cbr\u003e:star:[code](https://github.com/KejiaZhang-Robust/DHAT)\n  * [Adversarial Exploitation of Data Diversity Improves Visual Localization](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Adversarial_Exploitation_of_Data_Diversity_Improves_Visual_Localization_ICCV_2025_paper.pdf)\n  * [FedPall Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift](http://arxiv.org/abs/2507.04781)\n  * [Adversarial Robust Memory-Based Continual Learner](http://arxiv.org/abs/2311.17608)\n  * [ViT-EnsembleAttack Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers](https://openaccess.thecvf.com/content/ICCV2025/papers/Cao_ViT-EnsembleAttack_Augmenting_Ensemble_Models_for_Stronger_Adversarial_Transferability_in_Vision_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/Trustworthy-AI-Group/TransferAttack)\n  * [CIARD Cyclic Iterative Adversarial Robustness Distillation](http://arxiv.org/abs/2509.12633)\u003cbr\u003e:star:[code](https://github.com/CIARD2025/CIARD)\n  * [Failure Cases Are Better Learned But Boundary Says Sorry Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training](http://arxiv.org/abs/2508.02186)\u003cbr\u003e:star:[code](https://github.com/FlaAI/RPAT)\n  * [Backdoor Mitigation by Distance-Driven Detoxification](http://arxiv.org/abs/2411.09585)\n  * [Mind the Cost of Scaffold Benign Clients May Even Become Accomplices of Backdoor Attack](http://arxiv.org/abs/2411.16167)\n  * [Prototype Guided Backdoor Defense via Activation Space Manipulation](https://openaccess.thecvf.com/content/ICCV2025/papers/Amula_Prototype_Guided_Backdoor_Defense_via_Activation_Space_Manipulation_ICCV_2025_paper.pdf)\n  * [Leveraging Spatial Invariance to Boost Adversarial Transferability](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhou_Leveraging_Spatial_Invariance_to_Boost_Adversarial_Transferability_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/TheMoss7/SID)\n  * [SPD Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Yuan_SPD_Shallow_Backdoor_Protecting_Deep_Backdoor_Against_Backdoor_Detection_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/YuanShunJie1/SPD)\n  * [Backdoor Defense via Enhanced Splitting and Trap Isolation](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_Backdoor_Defense_via_Enhanced_Splitting_and_Trap_Isolation_ICCV_2025_paper.pdf)\n  * [Backdoor Attacks on Neural Networks via One-Bit Flip](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Backdoor_Attacks_on_Neural_Networks_via_One-Bit_Flip_ICCV_2025_paper.pdf)\n  * [Seal Your Backdoor with Variational Defense](https://openaccess.thecvf.com/content/ICCV2025/papers/Sabolic_Seal_Your_Backdoor_with_Variational_Defense_ICCV_2025_paper.pdf)\n  * [Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling](https://openaccess.thecvf.com/content/ICCV2025/papers/Niu_Enhancing_Adversarial_Transferability_by_Balancing_Exploration_and_Exploitation_with_Gradient-Guided_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/anuin-cat/GGS)\n  * [Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Enhancing_Transferability_of_Targeted_Adversarial_Examples_via_Inverse_Target_Gradient_ICCV_2025_paper.pdf)\n  * [Boosting Adversarial Transferability via Negative Hessian Trace Regularization](https://openaccess.thecvf.com/content/ICCV2025/papers/Long_Boosting_Adversarial_Transferability_via_Negative_Hessian_Trace_Regularization_ICCV_2025_paper.pdf)\n  * [Unified Adversarial Augmentation for Improving Palmprint Recognition](https://openaccess.thecvf.com/content/ICCV2025/papers/Jin_Unified_Adversarial_Augmentation_for_Improving_Palmprint_Recognition_ICCV_2025_paper.pdf)\n  * [DIA The Adversarial Exposure of Deterministic Inversion in Diffusion Models](http://arxiv.org/abs/2510.00778)\n  * [Generative Adversarial Diffusion](https://openaccess.thecvf.com/content/ICCV2025/papers/Jun_Generative_Adversarial_Diffusion_ICCV_2025_paper.pdf)\n  * [ODDR Outlier Detection  Dimension Reduction Based Defense Against Adversarial Patches](http://arxiv.org/abs/2311.12084)\n  * [Scaling and Taming Adversarial Training with Synthetic Data](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Scaling_and_Taming_Adversarial_Training_with_Synthetic_Data_ICCV_2025_paper.pdf)\n* 多模态学习\n  * [G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation](http://arxiv.org/pdf/2506.21514v1)\u003cbr\u003e:star:[code](https://github.com/rAIson-Lab/G2D)\n  * [Improving Multimodal Learning via Imbalanced Learning](https://arxiv.org/pdf/2507.10203v1)\u003cbr\u003e:star:[code](https://github.com/shicaiwei123/ICCV2025-ARL)\n  * [SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality](https://arxiv.org/pdf/2507.19264v1)\n  * [Unbiased Missing-modality Multimodal Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Dai_Unbiased_Missing-modality_Multimodal_Learning_ICCV_2025_paper.pdf)\u003cbr\u003e:house:[project](https://crystal-punk.github.io/)\n  * [Boosting Multimodal Learning via Disentangled Gradient Learning](https://arxiv.org/pdf/2507.10213v1)\u003cbr\u003e:star:[code](https://github.com/shicaiwei123/ICCV2025-GDL)\n  * [OpenVision A Fully-Open Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning](http://arxiv.org/abs/2505.04601)\n* 多任务学习\n  * [Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning](https://arxiv.org/pdf/2507.07485v1)\n  * [Beyond Losses Reweighting Empowering Multi-Task Learning via the Generalization Perspective](http://arxiv.org/abs/2211.13723)\n  * [Resolving Token-Space Gradient Conflicts Token Space Manipulation for Transformer-Based Multi-Task Learning](http://arxiv.org/abs/2507.07485)\n  * [Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning](https://arxiv.org/pdf/2507.21049v1)\u003cbr\u003e:star:[code](https://jacky1128.github.io/RepMTL/)\n  * [TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction](https://arxiv.org/pdf/2508.04682v1)\n  * [ModalTune Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology](http://arxiv.org/abs/2503.17564)\n  * [Active Membership Inference Test (aMINT) Enhancing Model Auditability with Multi-Task Learning](http://arxiv.org/abs/2509.07879)\u003cbr\u003e:star:[code](https://github.com/DanieldeAlcala/Membership-Inference-Test.git)\n* 类增量学习\n  * [Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning](https://arxiv.org/pdf/2507.09183v1)\u003cbr\u003e:star:[code](https://github.com/Jywsuperman/LGSP)\n  * [Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning](https://arxiv.org/pdf/2508.08165v1)\u003cbr\u003e:star:[code](https://github.com/LAMDA-CL/ICCV2025-TUNA)\n  * [Achieving More with Less Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning](http://arxiv.org/abs/2503.07979)\n  * [Lark Low-Rank Updates After Knowledge Localization for Few-shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Shi_Lark_Low-Rank_Updates_After_Knowledge_Localization_for_Few-shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)\n  * [A Tiny Change A Giant Leap Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Lai_A_Tiny_Change_A_Giant_Leap_Long-Tailed_Class-Incremental_Learning_via_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/laixinyi023/Geometric-Prototype-Alignment)\n  * [Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Ke_Task-Aware_Prompt_Gradient_Projection_for_Parameter-Efficient_Tuning_Federated_Class-Incremental_Learning_ICCV_2025_paper.pdf)\n  * [External Knowledge Injection for CLIP-Based Class-Incremental Learning](http://arxiv.org/abs/2503.08510)\u003cbr\u003e:star:[code](https://github.com/LAMDA-CL/ICCV25-ENGINE)\n  * [ESSENTIAL Episodic and Semantic Memory Integration for Video Class-Incremental Learning](http://arxiv.org/abs/2508.10896)\n  * [Flexi-FSCIL Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xie_Flexi-FSCIL_Adaptive_Knowledge_Retention_for_Breaking_the_Stability-Plasticity_Dilemma_in_ICCV_2025_paper.pdf)\n  * [Seeing 3D Through 2D Lenses 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification](http://arxiv.org/abs/2509.14958)\n  * [Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Xue_Feature_Decomposition-Recomposition_in_Large_Vision-Language_Model_for_Few-Shot_Class-Incremental_Learning_ICCV_2025_paper.pdf)\n* 增量学习\n  * [Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning](https://arxiv.org/pdf/2507.21588v1)\u003cbr\u003e:star:[code](https://github.com/ENJOY-Yin-jiong/PHP)\n* 联邦学习\n  * [Federated Representation Angle Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Yi_Federated_Representation_Angle_Learning_ICCV_2025_paper.pdf)\n  * [Client2Vec Improving Federated Learning by Distribution Shifts Aware Client Indexing](http://arxiv.org/abs/2405.16233)\u003cbr\u003e:star:[code](https://github.com/LINs-lab/client2vec)\n  * [Geminio Language-Guided Gradient Inversion Attacks in Federated Learning](http://arxiv.org/abs/2411.14937)\n  * [Sibai A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Gotz_Sibai_A_Few-Shot_Meta-Classifier_for_Poisoning_Detection_in_Federated_Learning_ICCV_2025_paper.pdf)\n  * [You Are Your Own Best Teacher Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data](http://arxiv.org/abs/2503.06916)\n  * [Personalized Federated Learning under Local Supervision](https://openaccess.thecvf.com/content/ICCV2025/papers/Liu_Personalized_Federated_Learning_under_Local_Supervision_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/jqLi1626/FedSimSup)\n  * [FedWSQ Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization](http://arxiv.org/abs/2506.23516)\n  * [FedXDS Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning](https://openaccess.thecvf.com/content/ICCV2025/papers/Hoefler_FedXDS_Leveraging_Model_Attribution_Methods_to_counteract_Data_Heterogeneity_in_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/MaxH1996/FedXDS)\n  * [FLSeg Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation](https://openaccess.thecvf.com/content/ICCV2025/papers/Su_FLSeg_Enhancing_Privacy_and_Robustness_in_Federated_Learning_under_Heterogeneous_ICCV_2025_paper.pdf)\n  * [Find a Scapegoat Poisoning Membership Inference Attack and Defense to Federated Learning](http://arxiv.org/abs/2507.00423)\n  * [Forgetting Through Transforming Enabling Federated Unlearning via Class-Aware Representation Transformation](http://arxiv.org/abs/2410.06848)\u003cbr\u003e:star:[code](https://github.com/zhentian777/FUCRT)\n  * [Latte Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning](http://arxiv.org/abs/2507.21494)\u003cbr\u003e:star:[code](https://github.com/baowenxuan/Latte)\n  * 联邦遗忘学习\n    * [Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment](https://openaccess.thecvf.com/content/ICCV2025/papers/Yang_Stealthy_Backdoor_Attack_in_Federated_Learning_via_Adaptive_Layer-wise_Gradient_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/yqqhyqq/LGA)\n* 元学习\n  * [FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields](https://arxiv.org/pdf/2508.06301v1)\n  * [Meta-Unlearning on Diffusion Models Preventing Relearning Unlearned Concepts](http://arxiv.org/abs/2410.12777)\n* Out-of-Distribution Detection(分布外检测)\n  * [Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention](https://arxiv.org/pdf/2507.01417v1)\n  * [NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection](https://arxiv.org/pdf/2507.09795v1)\u003cbr\u003e:star:[code](https://github.com/ah-ansari/NegRefine)\n  * [FEVER-OOD Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Isaac-Medina_FEVER-OOD_Free_Energy_Vulnerability_Elimination_for_Robust_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)\n  * [Beyond Pixel Uncertainty Bounding the OoD Objects in Road Scenes](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhu_Beyond_Pixel_Uncertainty_Bounding_the_OoD_Objects_in_Road_Scenes_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/huachao0124/DetSeg-official)\n  * [ODP-Bench Benchmarking Out-of-Distribution Performance Prediction](https://openaccess.thecvf.com/content/ICCV2025/papers/Yu_ODP-Bench_Benchmarking_Out-of-Distribution_Performance_Prediction_ICCV_2025_paper.pdf)\n  * [A Unified Interpretation of Training-Time Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Cheng_A_Unified_Interpretation_of_Training-Time_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)\n  * [Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection](http://arxiv.org/abs/2507.10225)\u003cbr\u003e:star:[code](https://github.com/Jarvisgivemeasuit/SynOOD)\n  * [Activation Subspaces for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zongur_Activation_Subspaces_for_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)\n  * [Diagnosing Pretrained Models for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Xiong_Diagnosing_Pretrained_Models_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)\n  * [Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection](http://arxiv.org/abs/2510.10584)\n  * [DisCoPatch Taming Adversarially-driven Batch Statistics for Improved Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Caetano_DisCoPatch_Taming_Adversarially-driven_Batch_Statistics_for_Improved_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)\n  * [Secure On-Device Video OOD Detection Without Backpropagation](http://arxiv.org/abs/2503.06166)\u003cbr\u003e:star:[code](https://github.com/Dystopians/SecDOOD)\n  * [FA Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection](http://arxiv.org/abs/2507.04511)\u003cbr\u003e:star:[code](https://github.com/0xFAFA/FA)\n  * [Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Adaptive_Prompt_Learning_via_Gaussian_Outlier_Synthesis_for_Out-of-distribution_Detection_ICCV_2025_paper.pdf)\n  * [Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Miao_Auxiliary_Prompt_Tuning_of_Vision-Language_Models_for_Few-Shot_Out-of-Distribution_Detection_ICCV_2025_paper.pdf)\n* 异常检测\n  * [Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts](https://arxiv.org/pdf/2507.16946v1)\u003cbr\u003e:house:[project](https://doi.org/10.5281/zenodo.16283852)\n  * [DecAD Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_DecAD_Decoupling_Anomalies_in_Latent_Space_for_Multi-Class_Unsupervised_Anomaly_ICCV_2025_paper.pdf)\n  * [Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning](http://arxiv.org/abs/2508.02293)\n  * [Wave-MambaAD Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Wave-MambaAD_Wavelet-driven_State_Space_Model_for_Multi-class_Unsupervised_Anomaly_Detection_ICCV_2025_paper.pdf)\n  * [Debiasing Trace Guidance Top-down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Debiasing_Trace_Guidance_Top-down_Trace_Distillation_and_Bottom-up_Velocity_Alignment_ICCV_2025_paper.pdf)\n  * [MultiADS Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning](http://arxiv.org/abs/2504.06740)\n  * [Triad Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process](https://openaccess.thecvf.com/content/ICCV2025/papers/Li_Triad_Empowering_LMM-based_Anomaly_Detection_with_Expert-guided_Region-of-Interest_Tokenizer_and_ICCV_2025_paper.pdf)\n  * [SALAD -- Semantics-Aware Logical Anomaly Detection](https://openaccess.thecvf.com/content/ICCV2025/papers/Fucka_SALAD_--_Semantics-Aware_Logical_Anomaly_Detection_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/MaticFuc/SALAD)\n  * [SiM3D Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark](http://arxiv.org/abs/2506.21549)\n  * [Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection](http://arxiv.org/abs/2410.10289)\n* 表征学习\n  * [Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM) A Task-Adaptive Representation Learning Framework](https://openaccess.thecvf.com/content/ICCV2025/papers/Sharma_Multi-Modal_Multi-Task_Unified_Embedding_Model_M3T-UEM_A_Task-Adaptive_Representation_Learning_ICCV_2025_paper.pdf)\n  * [LayerLock Non-collapsing Representation Learning with Progressive Freezing](http://arxiv.org/abs/2509.10156)\n  * [CARL Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor](http://arxiv.org/abs/2506.04001)\n  * [Pretrained Reversible Generation as Unsupervised Visual Representation Learning](http://arxiv.org/abs/2412.01787)\u003cbr\u003e:house:[project](https://opendilab.github.io/PRG)\n  * [Region-based Cluster Discrimination for Visual Representation Learning](https://arxiv.org/pdf/2507.20025v1)\u003cbr\u003e:star:[code](https://github.com/deepglint/MVT)\n  * [Gradient Extrapolation for Debiased Representation Learning](http://arxiv.org/abs/2503.13236)\u003cbr\u003e:house:[project](https://gerne-debias.github.io/)\n  * [Scaling Language-Free Visual Representation Learning](http://arxiv.org/abs/2504.01017)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/webssl)\n  * [Q-Norm Robust Representation Learning via Quality-Adaptive Normalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Q-Norm_Robust_Representation_Learning_via_Quality-Adaptive_Normalization_ICCV_2025_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/IIP-Lab-XDU/Q-Norm)\n  * [Scaling Omni-modal Pretraining with Multimodal Context Advancing Universal Representation Learning Across Modalities](https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Scaling_Omni-modal_Pretraining_with_Multimodal_Context_Advancing_Universal_Representation_Learning_ICCV_2025_paper.pdf)\n* 提示学习\n  * [Advancing Textual Prompt Learning with Anchored Attributes](http://arxiv.org/abs/2412.09442)\u003cbr\u003e:star:[code](https://github.com/zhengli97/ATPrompt)\n\n\u003ca name=\"38\"/\u003e\n\n## 38.Few/Zero-Shot Learning/DG/Adaptation(小/零样本/域泛化/适应)\n* 零样本\n  * [Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model](https://arxiv.org/pdf/2506.23822v1)\u003cbr\u003e:star:[code](https://github.com/shiming-chen/LaZSL)\n  * [OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference](https://arxiv.org/pdf/2507.02929v1)\n  * [Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity](https://openaccess.thecvf.com/content/ICCV2025/papers/Wang_Language-Driven_Multi-Label_Zero-Shot_Learning_with_Semantic_Granularity_ICCV_2025_paper.pdf)\n  * [A Conditional Probability Framework for Compositional Zero-shot Learning](http://arxiv.org/abs/2507.17377)\n  * [SVIP Semantically Contextualized Visual Patches for Zero-Shot Learning](http://arxiv.org/abs/2503.10252)\u003cbr\u003e:star:[code](https://github.com/uqzhichen/SVIP)\n  * [Learning Visual Proxy for Compositional Zero-Shot Learning](http://arxiv.org/abs/2501.13859)\n  * [Verbalized Representation Learning for Interpretable Few-Shot Generalization](http://arxiv.org/abs/2411.18651)\n  * [Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization](https://openaccess.thecvf.com/content/ICCV2025/papers/Wu_Hierarchical_Variational_Test-Time_Prompt_Generation_for_Zero-Shot_Generalization_ICCV_2025_paper.pdf)\n* 小样本","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52cv%2Ficcv-2025-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F52cv%2Ficcv-2025-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52cv%2Ficcv-2025-papers/lists"}