https://github.com/52CV/CVPR-2024-Papers

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/52CV/CVPR-2024-Papers
Owner: 52CV
Created: 2023-11-29T06:53:31.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-12T03:27:45.000Z (about 1 year ago)
Last Synced: 2024-04-12T13:24:29.590Z (about 1 year ago)
Size: 113 KB
Stars: 149
Watchers: 5
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

World-Simulator - Link
README

        # CVPR-2024-Papers

![homepage_image](https://github.com/52CV/CVPR-2024-Papers/assets/62801906/41a45750-bca8-4cb8-89dc-a04b0bbe7b2c)

## 官网链接：https://cvpr.thecvf.com/

### 研讨会 :bell:：6 月 17-18 日


### 主会 :bell:：6 月 19-21 日

## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~

## 2024 年论文分类汇总戳这里

↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)

↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)

↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)

## 2023 年论文分类汇总戳这里

↘️[CVPR-2023-Papers](https://github.com/52CV/CVPR-2023-Papers)

↘️[WACV-2023-Papers](https://github.com/52CV/WACV-2023-Papers)

↘️[ICCV-2023-Papers](https://github.com/52CV/ICCV-2023-Papers)

## [2022 年论文分类汇总戳这里](#000)

## [2021 年论文分类汇总戳这里](#00)

## [2020 年论文分类汇总戳这里](#0)

## 💥💥💥收录论文已全部更新，并全部分类完成！！！

### 🏆Best Papers

* [Generative Image Dynamics](https://arxiv.org/abs/2309.07906)
:house:[project](https://generative-dynamics.github.io/)

* [Rich Human Feedback for Text-to-Image Generation](http://arxiv.org/abs/2312.10240)

### 🏅Best Paper Runners-Up

* [EventPS: Real-Time Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_EventPS_Real-Time_Photometric_Stereo_Using_an_Event_Camera_CVPR_2024_paper.pdf)

* [pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction](http://arxiv.org/abs/2312.12337)

### 🥇Best Student Papers

* [Mip-Splatting: Alias-free 3D Gaussian Splatting](https://arxiv.org/abs/2311.16493)
:star:[code](https://github.com/autonomousvision/mip-splatting)
:house:[project](https://niujinshuchong.github.io/mip-splatting/)

* [BioCLIP: A Vision Foundation Model for the Tree of Life](https://arxiv.org/abs/2311.18803)
:star:[code](https://github.com/Imageomics/bioclip)

### 🥈Best Student Paper Runner-Ups

* [SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency](https://openaccess.thecvf.com/content/CVPR2024/papers/Roetzer_SpiderMatch_3D_Shape_Matching_with_Global_Optimality_and_Geometric_Consistency_CVPR_2024_paper.pdf)

* [Image Processing GNN: Breaking Rigidity in Super-Resolution](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Image_Processing_GNN_Breaking_Rigidity_in_Super-Resolution_CVPR_2024_paper.pdf)

* [Objects as Volumes: A Stochastic Geometry View of Opaque Solids](http://arxiv.org/abs/2312.15406)

* [Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods](https://arxiv.org/abs/2212.06872)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|

|------|------|------|------|

|[1.其它(other)](#1)|[2.Image Segmentation(图像分割)](#2)|[3.Image Classification(图像分类)](#3)|[4.Image/Video Super-Resolution(图像超分辨率)](#4)|

|[5.Image/Video Compression(图像/视频压缩)](#5)|[6.Image/Video Captioning(图像/视频字幕)](#6)|[7.Image Progress(图像处理)](#7)|[8.Image Synthesis(图像生成)](#8)|

|[9.Face(人脸)](#9)|[10.Medical Image Progress(医学影响处理)](#10)|[11.3D](#11)|[12.Video](#12)|

|[13.HPE(人体姿态估计)](#13)|[14.HAR(人体动作识别检测)](#14)|[15.Object Detection(目标检测)](#15)|[16.Point Cloud(点云)](#16)|

|[17.Automated Driving(自动驾驶)](#17)|[18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)](#18)|[19.Object Pose Estimation(物体姿态估计)](#19)|[20.Optical Flow Estimation(光流估计)](#20)|

|[21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#21)|[22.Deepfake Detection](#22)|[23.Sound(语音处理)](#23)|[24.ML(机器学习)](#24)|

|[25.Object Tracking(目标跟踪)](#25)|[26.Information Security(信息安全)](#25)|[27.Vision-Language(视觉语言)](#27)|[28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#28)|

|[29.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#29)|[30.Person Re-Id(人员重识别)](#30)|[31.Edge Detection(边缘检测)](#31)|[32.NLP(自然语言处理)](#32)|

|[33.NeRF](#33)|[34.Human–Computer Interaction(人机交互)](#34)|[35.Scene Understanding(场景理解)](#35)|[36.4D Reconstruction(4D 重建)](#36)|

|[37.OCR](#37)|[38.VQA(视觉问答)](#38)|[39.Motion Generation(动作生成)](#39)|[40.Scene Graph Generation(场景图生成)](#40)|

|[41.Graph Generative Network(GNN/GCN)](#41)|[42.Image Retrieval(图像检索)](#42)|[43.Image Matching(图像匹配)](#43)|[44.Image Fusion(图像融合)](#44)|

|[45.NAS(神经架构搜索)](#45)|[46.Industrial Anomaly Detection(工业缺陷检测)](#46)|[47.Dense Predictions(密集预测)](#47)|[48.Semi/self-supervised learning(半/自监督)](#48)|

|[49.Dataset(数据集)](#49)|[50.OOD Detection](#50)|[51.Style Transfer(风格迁移)](#51)|[52.Biomedical](#52)|

|[53.Light-Field(光场)](#53)|[54.ViT](#54)|[55.REC(指代表达理解)](#55)|[56.Visual emotion recognition(视觉情绪识别)](#56)|

|[57.Visual Relationship Detection(视觉关系检测)](#57)|[58.Fisheye Images(鱼眼图像)](#58)|[59.Clustering(聚类)](#59)|[60.Sketch(草图)](#60)|

|[61.Gaze](#61)|[62.全家桶](#62)|



## 62.全家桶

* [UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition](https://arxiv.org/abs/2311.15599)
:star:[code](https://github.com/AILab-CVC/UniRepLKNet)用于音频、视频、点云、时间序列和图像识别的通用感知大内核卷积网络

* [GPT4Point: A Unified Framework for Point-Language Understanding and Generation](https://arxiv.org/abs/2312.02980)点语言理解和生成的统一框架

* [AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond](https://arxiv.org/abs/2311.16468)用于运动理解、规划、生成等的一体化框架



## 61.Gaze

* [Sharingan: A Transformer Architecture for Multi-Person Gaze Following](https://arxiv.org/abs/2310.00816)目光跟随

* [From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Bao_From_Feature_to_Gaze_A_Generalizable_Replacement_of_Linear_Layer_CVPR_2024_paper.pdf)



## 60.Sketch(草图)

* [What Sketch Explainability Really Means for Downstream Tasks](http://arxiv.org/abs/2403.09480v1)

* [SketchINR: A First Look into Sketches as Implicit Neural Representations](https://arxiv.org/abs/2403.09344)

* [Open Vocabulary Semantic Scene Sketch Understanding](https://arxiv.org/abs/2312.12463)草图理解

* [CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention](https://arxiv.org/abs/2402.17678)



## 59.Clustering(聚类)

* [MoDE: CLIP Data Experts via Clustering](http://arxiv.org/abs/2404.16030)聚类

* [Fine-Grained Bipartite Concept Factorization for Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Peng_Fine-Grained_Bipartite_Concept_Factorization_for_Clustering_CVPR_2024_paper.pdf)

* 多视图聚类

  * [Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios](https://arxiv.org/abs/2303.17245)

  * [Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Learn_from_View_Correlation_An_Anchor_Enhancement_Strategy_for_Multi-view_CVPR_2024_paper.pdf)

  * [Differentiable Information Bottleneck for Deterministic Multi-view Clustering](https://arxiv.org/abs/2403.15681)



## 58.Fisheye Images(鱼眼图像)

* [Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption](https://openaccess.thecvf.com/content/CVPR2024/papers/Wakai_Deep_Single_Image_Camera_Calibration_by_Heatmap_Regression_to_Recover_CVPR_2024_paper.pdf)鱼眼图像



## 57.Visual Relationship Detection(视觉关系检测)

* [Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection](http://arxiv.org/abs/2403.17709v1)
:star:[code](https://github.com/mlvlab/SpeaQ)



## 56.Visual emotion recognition(视觉情绪识别)

* [EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning](https://export.arxiv.org/abs/2404.16670)
:star:[code](https://github.com/aimmemotion/EmoVIT)视觉情感理解

* 多模态意图识别

  * [Contextual Augmented Global Contrast for Multimodal Intent Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Contextual_Augmented_Global_Contrast_for_Multimodal_Intent_Recognition_CVPR_2024_paper.pdf)



## 55.Referring Expression Comprehension(指代表达理解)

* [ScanFormer: Referring Expression Comprehension by Iteratively Scanning](https://openaccess.thecvf.com/content/CVPR2024/papers/Su_ScanFormer_Referring_Expression_Comprehension_by_Iteratively_Scanning_CVPR_2024_paper.pdf)

* [Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions](https://arxiv.org/abs/2311.17048)
:star:[code](https://github.com/Show-han/Zeroshot_REC)零样本指代表达理解

* [Revisiting Counterfactual Problems in Referring Expression Comprehension](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Revisiting_Counterfactual_Problems_in_Referring_Expression_Comprehension_CVPR_2024_paper.pdf)



## 54.Vision Transformers

* [Dexterous Grasp Transformer](http://arxiv.org/abs/2404.18135)

* [Mean-Shift Feature Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Kobayashi_Mean-Shift_Feature_Transformer_CVPR_2024_paper.pdf)

* [MLP Can Be A Good Transformer Learner](https://arxiv.org/abs/2404.05657)
:star:[code](https://github.com/sihaoevery/lambda_vit)

* [Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers](http://arxiv.org/abs/2303.09383)

* [Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers](http://arxiv.org/abs/2404.07292)

* [Dual-scale Transformer for Large-scale Single-Pixel Imaging](http://arxiv.org/abs/2404.05001)

* [DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets](http://arxiv.org/abs/2404.02900)

* [Solving Masked Jigsaw Puzzles with Diffusion Transformers](http://arxiv.org/abs/2404.07292)

* [Towards Understanding and Improving Adversarial Robustness of Vision Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Jain_Towards_Understanding_and_Improving_Adversarial_Robustness_of_Vision_Transformers_CVPR_2024_paper.pdf)

* [RMT: Retentive Networks Meet Vision Transformers](https://arxiv.org/abs/2309.11523)
:star:[code](https://github.com/qhfan/RMT)

* [You Only Need Less Attention at Each Stage in Vision Transformers](https://arxiv.org/abs/2406.00427)

* [MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers](https://arxiv.org/abs/2311.15475)
:house:[project](https://nihalsid.github.io/mesh-gpt/)

* [Instance-Aware Group Quantization for Vision Transformers](https://arxiv.org/abs/2404.00928)
:house:[project](https://cvlab.yonsei.ac.kr/projects/IGQ-ViT/)

* [Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers](http://arxiv.org/abs/2403.10030v1)
:star:[code](https://github.com/mlvlab/MCTF)

* [RepViT: Revisiting Mobile CNN From ViT Perspective](https://arxiv.org/abs/2307.09283)
:star:[code](https://github.com/THU-MIG/RepViT)

* [Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer](http://arxiv.org/abs/2403.14552v1)

* [Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers](https://arxiv.org/abs/2403.10574)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods](https://arxiv.org/abs/2212.06872)

* [On the Faithfulness of Vision Transformer Explanations](http://arxiv.org/abs/2404.01415v1)

* [Learning Correlation Structures for Vision Transformers](http://arxiv.org/abs/2404.03924v1)

* [Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach](https://arxiv.org/abs/2403.19067)
:star:[code](https://github.com/zstarN70/RLRR.git)

* [Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression](http://arxiv.org/abs/2403.15835v1)

* [Point Transformer V3: Simpler Faster Stronger](https://arxiv.org/abs/2312.10035)
:star:[code](https://github.com/Pointcept/PointTransformerV3)

* [A General and Efficient Training for Transformer via Token Expansion](http://arxiv.org/abs/2404.00672v1)
:star:[code](https://github.com/Osilly/TokenExpansion)

* [HEAL-SWIN: A Vision Transformer On The Sphere](https://arxiv.org/abs/2307.07313)
:star:[code](https://github.com/JanEGerken/HEAL-SWIN)

* [SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design](https://arxiv.org/abs/2401.16456)Vision

* [TransNeXt: Robust Foveal Visual Perception for Vision Transformers](https://arxiv.org/abs/2311.17132)
:star:[code](https://github.com/DaiShiResearch/TransNeXt)

* [Making Vision Transformers Truly Shift-Equivariant](https://arxiv.org/abs/2305.16316)

* [Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities](https://arxiv.org/abs/2401.14405)
:star:[code](https://github.com/AILab-CVC/M2PT)

* [Random Entangled Tokens for Adversarially Robust Vision Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Gong_Random_Entangled_Tokens_for_Adversarially_Robust_Vision_Transformer_CVPR_2024_paper.pdf)



## 53.Light-Field(光场)

* [Time-Efficient Light-Field Acquisition Using Coded Aperture and Events](https://arxiv.org/abs/2403.07244)
:house:[project](https://www.fujii.nuee.nagoya-u.ac.jp/Research/EventLF/)

* [Continuous Pose for Monocular Cameras in Neural Implicit Representation](https://arxiv.org/abs/2311.17119)
:star:[code](https://github.com/qimaqi/Continuous-Pose-in-NeRF)

* [PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Tu_PanoPose_Self-supervised_Relative_Pose_Estimation_for_Panoramic_Images_CVPR_2024_paper.pdf)
:house:[project](http://www.3dv.ac.cn/en/publication/cvpr-b/)

* [Unbiased Estimator for Distorted Conics in Camera Calibration](http://arxiv.org/abs/2403.04583)

* 相机姿态

  * [Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences](https://arxiv.org/abs/2404.06337)

  * [Map-Relative Pose Regression for Visual Re-Localization](http://arxiv.org/abs/2404.09884v1)
:star:[code](https://nianticlabs.github.io/marepo)

  * [The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement](http://arxiv.org/abs/2404.10438v1)
:star:[code](https://github.com/ga1i13o/mcloc_poseref)

* 快照压缩成像

  * [DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model](https://arxiv.org/abs/2311.11417)



## 52.Biomedical

* [ManiFPT: Defining and Analyzing Fingerprints of Generative Models](https://arxiv.org/abs/2402.10401)

* [Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Tiong_Flexible_Biometrics_Recognition_Bridging_the_Multimodality_Gap_through_Attention_Alignment_CVPR_2024_paper.pdf)生物识别

* 人员识别

  * [Activity-Biometrics: Person Identification from Daily Activities](http://arxiv.org/abs/2403.17360v1)
:star:[code](https://github.com/sacrcv/Activity-Biometrics/)



## 51.Style Transfer(风格迁移)

* [Z*: Zero-shot Style Transfer via Attention Reweighting](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Z_Zero-shot_Style_Transfer_via_Attention_Reweighting_CVPR_2024_paper.pdf)

* [MoST: Motion Style Transformer Between Diverse Action Contents](http://arxiv.org/abs/2403.06225)
:star:[code](https://github.com/Boeun-Kim/MoST)

* [ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation](https://arxiv.org/abs/2312.02109)
:star:[code](https://github.com/cardinalblue/ArtAdapter)
:house:[project](https://cardinalblue.github.io/artadapter.github.io/)

* [Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_Arbitrary_Motion_Style_Transfer_with_Multi-condition_Motion_Latent_Diffusion_Model_CVPR_2024_paper.pdf)

* [Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer](https://arxiv.org/abs/2312.09008v2)
:house:[project](https://jiwoogit.github.io/StyleID_site)

* [Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network](https://arxiv.org/abs/2405.19775)
:thumbsup:[平衡效率与质量，南航提出新风格迁移算法Puff-Net](https://mp.weixin.qq.com/s/B-RkdeQNvIXmAYJMUkHkYQ)

* 零样本文本驱动运动迁移

  * [Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer](http://arxiv.org/abs/2311.17009)
:house:[project](https://diffusion-motion-transfer.github.io/)



## 50.OOD Detection

* [Test-Time Linear Out-of-Distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Test-Time_Linear_Out-of-Distribution_Detection_CVPR_2024_paper.pdf)

* [Segment Every Out-of-Distribution Object](https://arxiv.org/abs/2311.16516)

* [Label-Efficient Group Robustness via Out-of-Distribution Concept Curation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Label-Efficient_Group_Robustness_via_Out-of-Distribution_Concept_Curation_CVPR_2024_paper.pdf)

* [Enhancing the Power of OOD Detection via Sample-Aware Model Selection](https://openaccess.thecvf.com/content/CVPR2024/papers/Xue_Enhancing_the_Power_of_OOD_Detection_via_Sample-Aware_Model_Selection_CVPR_2024_paper.pdf)OOD

* [Discriminability-Driven Channel Selection for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Discriminability-Driven_Channel_Selection_for_Out-of-Distribution_Detection_CVPR_2024_paper.pdf)

* [CORES: Convolutional Response-based Score for Out-of-distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_CORES_Convolutional_Response-based_Score_for_Out-of-distribution_Detection_CVPR_2024_paper.pdf)

* [Learning Transferable Negative Prompts for Out-of-Distribution Detection](https://arxiv.org/abs/2404.03248)
:star:[code](https://github.com/mala-lab/negprompt)

* [A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?](http://arxiv.org/abs/2404.01775v1)
:star:[code](https://github.com/glhr/ood-labelnoise)

* [Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments](https://arxiv.org/abs/2403.01773)

* [A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?](http://arxiv.org/abs/2404.01775)

* 异常检测

  * [Hyperbolic Anomaly Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Hyperbolic_Anomaly_Detection_CVPR_2024_paper.pdf)

  * [Universal Novelty Detection through Adaptive Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Mirzaei_Universal_Novelty_Detection_Through_Adaptive_Contrastive_Learning_CVPR_2024_paper.pdf)

  * [Looking 3D: Anomaly Detection with 2D-3D Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Bhunia_Looking_3D_Anomaly_Detection_with_2D-3D_Alignment_CVPR_2024_paper.pdf)



## 49.Dataset(数据集)

* 数据集

  * [Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Multiagent_Multitraversal_Multimodal_Self-Driving_Open_MARS_Dataset_CVPR_2024_paper.pdf)

  * [4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_4D-DRESS_A_4D_Dataset_of_Real-World_Human_Clothing_With_Semantic_CVPR_2024_paper.pdf)

  * [DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_DiLiGenRT_A_Photometric_Stereo_Dataset_with_Quantified_Roughness_and_Translucency_CVPR_2024_paper.pdf)

  * [MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation](http://arxiv.org/abs/2404.02790)

  * [LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs](http://arxiv.org/abs/2312.04372)

  * [360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries](http://arxiv.org/abs/2311.17389)

  * [Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline](http://arxiv.org/abs/2312.02528)

  * [MSU-4S - The Michigan State University Four Seasons Dataset](https://openaccess.thecvf.com/content/CVPR2024/papers/Kent_MSU-4S_-_The_Michigan_State_University_Four_Seasons_Dataset_CVPR_2024_paper.pdf)

  * [DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Lu_DiVa-360_The_Dynamic_Visual_Dataset_for_Immersive_Neural_Fields_CVPR_2024_paper.pdf)

  * [Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline](http://arxiv.org/abs/2309.14611)

  * [LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_LiDAR-Net_A_Real-scanned_3D_Point_Cloud_Dataset_for_Indoor_Scenes_CVPR_2024_paper.pdf)

  * [Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Advancing_Saliency_Ranking_with_Human_Fixations_Dataset_Models_and_Benchmarks_CVPR_2024_paper.pdf)

  * [MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying](https://openaccess.thecvf.com/content/CVPR2024/papers/Burgert_MAGICK_A_Large-scale_Captioned_Dataset_from_Matting_Generated_Images_using_CVPR_2024_paper.pdf)

  * [HardMo: A Large-Scale Hardcase Dataset for Motion Capture](https://openaccess.thecvf.com/content/CVPR2024/papers/Liao_HardMo_A_Large-Scale_Hardcase_Dataset_for_Motion_Capture_CVPR_2024_paper.pdf)

  * [The STVchrono Dataset: Towards Continuous Change Recognition in Time](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_The_STVchrono_Dataset_Towards_Continuous_Change_Recognition_in_Time_CVPR_2024_paper.pdf)

  * [Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Nguyen_Insect-Foundation_A_Foundation_Model_and_Large-scale_1M_Dataset_for_Visual_CVPR_2024_paper.pdf)

  * [LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising](https://arxiv.org/abs/2405.19718)

  * [On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm](https://arxiv.org/abs/2312.03526)

  * [Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods](https://openaccess.thecvf.com/content/CVPR2024/papers/Qu_Towards_Modern_Image_Manipulation_Localization_A_Large-Scale_Dataset_and_Novel_CVPR_2024_paper.pdf)

  * [Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation](https://arxiv.org/abs/2306.11290)

  * [FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FineSports_A_Multi-person_Hierarchical_Sports_Video_Dataset_for_Fine-grained_Action_CVPR_2024_paper.pdf)细粒度动作理解

  * [MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos](https://arxiv.org/abs/2306.04216)
:house:[project](https://mmsum-dataset.github.io/)

  * [Traffic Scene Parsing through the TSP6K Dataset](http://arxiv.org/abs/2303.02835)

  * [Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset](https://arxiv.org/abs/2311.17396)

  * [RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos](https://arxiv.org/abs/2401.12592)
:house:[project](https://wildrgbd.github.io/)
:sunflower:[dataset](https://github.com/wildrgbd/wildrgbd)RGB-D object数据集

  * [eTraM: Event-based Traffic Monitoring Dataset](https://arxiv.org/abs/2403.19976)
:star:[code](https://github.com/eventbasedvision/eTraM)
:house:[project](https://eventbasedvision.github.io/eTraM/)流量监控数据集

  * [Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network](http://arxiv.org/abs/2405.00244)
:sunflower:[dataset](https://github.com/yungsyu99/Real-HDRV)

  * [JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups](http://arxiv.org/abs/2404.04458v1)
:house:[project](https://jrdb.erc.monash.edu/dataset/social)

  * [TULIP: Multi-camera 3D Precision Assessment of Parkinson's Disease](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_TULIP_Multi-camera_3D_Precision_Assessment_of_Parkinsons_Disease_CVPR_2024_paper.pdf)

  * [JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments](http://arxiv.org/abs/2404.01686v1)

  * [OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion](http://arxiv.org/abs/2403.19417v1)
:house:[project](https://oakink.net/v2)

  * [SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos](http://arxiv.org/abs/2404.04565v1)

  * [RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method](http://arxiv.org/abs/2403.19501v1)
:house:[project](http://www.lidarhumanmotion.net/reli11d/)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [MatSynth: A Modern PBR Materials Dataset](https://arxiv.org/abs/2401.06056)
:house:[project](https://gvecchio.com/matsynth/)

  * [RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception](http://arxiv.org/abs/2403.10145v1)
:star:[code](https://github.com/AIR-THU/DAIR-RCooper)

  * [Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection](http://arxiv.org/abs/2403.12580v1)

  * [EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World](http://arxiv.org/abs/2403.16182v1)
:star:[code](https://github.com/OpenGVLab/EgoExoLearn)

  * [MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception](https://arxiv.org/abs/2403.11496)
:sunflower:[dataset](https://mcdviral.github.io/)

  * [HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios](https://arxiv.org/abs/2212.10428)

  * [HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative](https://arxiv.org/abs/2403.02640)
:sunflower:[dataset](https://holovic.net/)

  * [DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision](https://arxiv.org/abs/2312.16256)
:sunflower:[dataset](https://github.com/DL3DV-10K/Dataset)

  * [EFHQ: Multi-purpose ExtremePose-Face-HQ dataset](https://arxiv.org/abs/2312.17205)
:star:[code](https://www.vinai.io/)
:house:[project](https://bomcon123456.github.io/efhq/)数据集

  * [LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images](https://arxiv.org/abs/2403.13171)

  * [MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors](http://arxiv.org/abs/2403.17610v1)
:star:[code](https://haolyuan.github.io/MMVP-Dataset/)

  * [FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions](https://arxiv.org/abs/2309.05073)
:house:[project](https://wangjiongw.github.io/freeman/)

  * [TUMTraf V2X Cooperative Perception Dataset](https://arxiv.org/pdf/2403.01316.pdf)
:house:[project](https://tum-traffic-dataset.github.io/tumtraf-v2x/)

  * [MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures](https://arxiv.org/abs/2312.02963)
:sunflower:[dataset](https://x-zhangyang.github.io/MVHumanNet/)

* 基准

  * [When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_When_Visual_Grounding_Meets_Gigapixel-level_Large-scale_Scenes_Benchmark_and_Approach_CVPR_2024_paper.pdf)

  * [THRONE: A Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models](http://arxiv.org/abs/2405.05256)

  * [M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Pu_M3-UDA_A_New_Benchmark_for_Unsupervised_Domain_Adaptive_Fetal_Cardiac_CVPR_2024_paper.pdf)
:star:[code](https://github.com/LiwenWang919/M3-UDA)

  * [DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos](https://arxiv.org/abs/2312.09523)现实视频中远程点跟踪的基准

  * [SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge](https://arxiv.org/abs/2405.09713)

  * [MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_MAPLM_A_Real-World_Large-Scale_Vision-Language_Benchmark_for_Map_and_Traffic_CVPR_2024_paper.pdf)

  * [RoDLA: Benchmarking the Robustness of Document Layout Analysis Models](http://arxiv.org/abs/2403.14442v1)
:star:[code](https://yufanchen96.github.io/projects/RoDLA)

  * [GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation](https://openaccess.thecvf.com/content/CVPR2024/papers/Khanna_GOAT-Bench_A_Benchmark_for_Multi-Modal_Lifelong_Navigation_CVPR_2024_paper.pdf)

  * [MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI](http://arxiv.org/abs/2311.16502)

  * [Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Advancing_Saliency_Ranking_with_Human_Fixations_Dataset_Models_and_Benchmarks_CVPR_2024_paper.pdf)

  * [ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Rosasco_ConCon-Chi_Concept-Context_Chimera_Benchmark_for_Personalized_Vision-Language_Tasks_CVPR_2024_paper.pdf)

  * [Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark](http://arxiv.org/abs/2403.18821v1)
:star:[code](https://facebookresearch.github.io/real-acoustic-fields/)

  * [UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement](http://arxiv.org/abs/2404.14542)

  * [PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_PKU-DyMVHumans_A_Multi-View_Video_Benchmark_for_High-Fidelity_Dynamic_Human_Modeling_CVPR_2024_paper.pdf)
:house:[project](https://pku-dymvhumans.github.io/)

  * [MVBench: A Comprehensive Multi-modal Video Understanding Benchmark](https://arxiv.org/abs/2311.17005)
:star:[code](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2)

  * [Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly](http://arxiv.org/abs/2405.00181)

  * [VBench : Comprehensive Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2311.17982)
:star:[code](https://arxiv.org/abs/2311.17982)
:house:[project](https://vchitect.github.io/VBench-project/)

  * [MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark](http://arxiv.org/abs/2403.20225v1)

  * [CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs](https://arxiv.org/abs/2311.16703)
:house:[project](https://enigma-li.github.io/CADTalk/)

  * [How to Train Neural Field Representations: A Comprehensive Study and Benchmark](https://arxiv.org/abs/2312.10531)

  * [OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM](https://arxiv.org/abs/2402.09181)



## 48.Semi/self-supervised learning(半/自监督)

* 弱监督学习

  * 部分标签学习

    * [CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning](https://arxiv.org/abs/2303.10365)部分标签学习-弱监督学习问题

* 半监督

  * [Targeted Representation Alignment for Open-World Semi-Supervised Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Targeted_Representation_Alignment_for_Open-World_Semi-Supervised_Learning_CVPR_2024_paper.pdf)

  * [SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_SeNM-VAE_Semi-Supervised_Noise_Modeling_with_Hierarchical_Variational_Autoencoder_CVPR_2024_paper.pdf)

  * [CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning](http://arxiv.org/abs/2403.10391v1)

  * [BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning](http://arxiv.org/abs/2404.01179v1)

  * 正样本标签学习

    * [Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation](https://openaccess.thecvf.com/content/CVPR2024/papers/Long_Positive-Unlabeled_Learning_by_Latent_Group-Aware_Meta_Disambiguation_CVPR_2024_paper.pdf)Positive-Unlabeled Learning(正样本标签学习)半监督学习的一个重要分支

* 自监督学习

  * [Self-supervised Representation Learning from Arbitrary Scenarios](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Self-Supervised_Representation_Learning_from_Arbitrary_Scenarios_CVPR_2024_paper.pdf)

  * [Self-supervised Debiasing Using Low Rank Regularization](http://arxiv.org/abs/2210.05248)

  * [Self-Supervised Dual Contouring](http://arxiv.org/abs/2405.18131)

  * [Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces](https://arxiv.org/abs/2404.17620)

  * [Self-Supervised Representation Learning from Arbitrary Scenarios](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Self-Supervised_Representation_Learning_from_Arbitrary_Scenarios_CVPR_2024_paper.pdf)

  * [SD2Event: Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_SD2EventSelf-supervised_Learning_of_Dynamic_Detectors_and_Contextual_Descriptors_for_Event_CVPR_2024_paper.pdf)

  * [An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_An_Asymmetric_Augmented_Self-Supervised_Learning_Method_for_Unsupervised_Fine-Grained_Image_CVPR_2024_paper.pdf)

  * [Self-supervised debiasing using low rank regularization](https://arxiv.org/abs/2210.05248)

  * [CNC-Net: Self-Supervised Learning for CNC Machining Operations](https://arxiv.org/abs/2312.09925)

* 无监督学习

  * [Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.pdf)



## 47.Dense Predictions(密集预测)

* [Efficient Multitask Dense Predictor via Binarization](https://arxiv.org/abs/2405.14136)密集预测

* [Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_Going_Beyond_Multi-Task_Dense_Prediction_with_Synergy_Embedding_Models_CVPR_2024_paper.pdf)

* [Exploiting Diffusion Prior for Generalizable Dense Prediction](https://arxiv.org/abs/2311.18832)
:house:[project](https://shinying.github.io/dmp)

* [ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions](http://arxiv.org/abs/2403.07392v1)
:star:[code](https://github.com/Traffic-X/ViT-CoMer)
:thumbsup:[百度提出视觉新骨干ViT-CoMer，刷新密集预测任务SOTA](https://mp.weixin.qq.com/s/Q2xI_rU5_7Mv6jiYeu6NkA)

* [Multi-Task Dense Prediction via Mixture of Low-Rank Experts](http://arxiv.org/abs/2403.17749v1)
:star:[code](https://github.com/YuqiYang213/MLoRE)



## 46.Industrial Anomaly Detection(工业缺陷检测)

* [Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection](https://arxiv.org/abs/2310.12790)
:star:[code](https://github.com/mala-lab/AHL)

* 异常检测

  * [Supervised Anomaly Detection for Complex Industrial Images](http://arxiv.org/abs/2405.04953)

  * [Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Prompt-Enhanced_Multiple_Instance_Learning_for_Weakly_Supervised_Video_Anomaly_Detection_CVPR_2024_paper.pdf)弱监督异常检测

  * [Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping](https://arxiv.org/abs/2312.04521)

  * [Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation](https://arxiv.org/abs/2403.06247)

  * [Long-Tailed Anomaly Detection with Learnable Class Names](http://arxiv.org/abs/2403.20236v1)
:house:[project](https://zenodo.org/records/10854201)

  * [RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection](http://arxiv.org/abs/2403.05897v1)
:star:[code](https://github.com/cnulab/RealNet)

  * [Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts](http://arxiv.org/abs/2403.06495v1)
:star:[code](https://github.com/mala-lab/InCTRL)

  * [PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection](http://arxiv.org/abs/2404.05231v1)
:star:[code](https://github.com/FuNz-0/PromptAD)

* 薄膜去除

  * [Learning to Remove Wrinkled Transparent Film with Polarized Prior](http://arxiv.org/abs/2403.04368v1)
:star:[code](https://github.com/jqtangust/FilmRemoval)

* 基准/数据集

  * [Real-IAD: A Real-World Multi-view Dataset for Benchmarking Versatile Industrial Anomaly Detection](https://arxiv.org/abs/2403.12580)
:star:[code](https://github.com/TencentYoutuResearch/AnomalyDetection_Real-IAD)

  * [Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network](https://arxiv.org/abs/2311.14897)
:star:[code](https://github.com/Chopper-233/Anomaly-ShapeNet)



## 45.Neural Architecture Search(神经架构搜索)

* [Towards Accurate and Robust Architectures via Neural Architecture Search](https://arxiv.org/abs/2405.05502)

* [Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach](http://arxiv.org/abs/2403.11380v1)

* [Building Optimal Neural Architectures using Interpretable Knowledge](http://arxiv.org/abs/2403.13293v1)
:star:[code](https://github.com/Ascend-Research/AutoBuild)

* [AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search](http://arxiv.org/abs/2403.19232v1)

* [SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model](https://arxiv.org/abs/2406.00195)

* [Insights from the Use of Previously Unseen Neural Architecture Search Datasets](https://arxiv.org/abs/2404.02189)

* [FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer](https://arxiv.org/abs/2403.12821)
:star:[code](http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer)



## 44.Image Fusion(图像融合)

* [Equivariant Multi-Modality Image Fusion](https://arxiv.org/abs/2305.11443)图像融合

* [Task-Customized Mixture of Adapters for General Image Fusion](http://arxiv.org/abs/2403.12494v1)
:star:[code](https://github.com/YangSun22/TC-MoA)

* [Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion](http://arxiv.org/abs/2403.16387v1)
:star:[code](https://github.com/XunpengYi/Text-IF)

* [Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Tan_Revisiting_Spatial-Frequency_Information_Integration_from_a_Hierarchical_Perspective_for_Panchromatic_CVPR_2024_paper.pdf)

* [Neural Spline Fields for Burst Image Fusion and Layer Separation](https://arxiv.org/abs/2312.14235)
:house:[project](https://light.princeton.edu/publication/nsf)

* 红外和可见光图像融合

  * [Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_Probing_Synergistic_High-Order_Interaction_in_Infrared_and_Visible_Image_Fusion_CVPR_2024_paper.pdf)



## 43.Image Matching(图像匹配)

* [XFeat: Accelerated Features for Lightweight Image Matching](https://arxiv.org/abs/2404.19174)
:house:[project](http://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24)图像匹配

* 图像-文本

  * [Composing Object Relations and Attributes for Image-Text Matching](https://openaccess.thecvf.com/content/CVPR2024/papers/Pham_Composing_Object_Relations_and_Attributes_for_Image-Text_Matching_CVPR_2024_paper.pdf)



## 42.Image Retrieval(图像检索)

* [Language-only Training of Zero-shot Composed Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Gu_Language-only_Training_of_Zero-shot_Composed_Image_Retrieval_CVPR_2024_paper.pdf)
:star:[code](https://github.com/navervision/lincir)

* [Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Evaluating_Transferability_in_Retrieval_Tasks_An_Approach_Using_MMD_and_CVPR_2024_paper.pdf)

* [Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval](http://arxiv.org/abs/2403.16005v1)

* [On Train-Test Class Overlap and Detection for Image Retrieval](http://arxiv.org/abs/2404.01524v1)
:star:[code](https://github.com/dealicious-inc/RGLDv2-clean)

* [D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_D3still_Decoupled_Differential_Distillation_for_Asymmetric_Image_Retrieval_CVPR_2024_paper.pdf)

* [Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection](http://arxiv.org/abs/2404.09263)

* 跨域检索

  * [ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval](https://arxiv.org/abs/2312.12478)
:star:[code](https://github.com/fangkaipeng/ProS)

* 视频检索

  * [Composed Video Retrieval via Enriched Context and Discriminative Embeddings](http://arxiv.org/abs/2403.16997v1)
:star:[code](https://github.com/OmkarThawakar/composed-video-retrieval)

* 跨模态检索

  * [Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval](http://arxiv.org/abs/2403.05105v1)
:star:[code](https://github.com/hhc1997/L2RM)

  * [Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Fine-grained_Prototypical_Voting_with_Heterogeneous_Mixup_for_Semi-supervised_2D-3D_Cross-modal_CVPR_2024_paper.pdf)

* 文本-视频检索

  * [Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval](http://arxiv.org/abs/2403.17998v1)
:star:[code](https://github.com/Jiamian-Wang/T-MASS-text-video-retrieval)

  * [Holistic Features are almost Sufficient for Text-to-Video Retrieval](https://www.researchgate.net/publication/379270657_Holistic_Features_are_almost_Sufficient_for_Text-to-Video_Retrieval)

* 图像-文本检索

  * [How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_How_to_Make_Cross_Encoder_a_Good_Teacher_for_Efficient_CVPR_2024_paper.pdf)

* 视频文本检索

  * [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Jin_MV-Adapter_Multimodal_Video_Transfer_Learning_for_Video_Text_Retrieval_CVPR_2024_paper.pdf)

* 组合图像检索

  * [Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval](http://arxiv.org/abs/2404.15516)

* 细粒度图像检索

  * [You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval](https://arxiv.org/abs/2403.07222)
:house:[project](https://subhadeepkoley.github.io/Sketch2Word)

  * [Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Characteristics_Matching_Based_Hash_Codes_Generation_for_Efficient_Fine-grained_Image_CVPR_2024_paper.pdf)

* 基于草图的检索

  * [How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?](http://arxiv.org/abs/2403.07203v1)
:star:[code](https://subhadeepkoley.github.io/AbstractAway)

  * [Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers](http://arxiv.org/abs/2403.07214v1)
:house:[project](https://subhadeepkoley.github.io/DiffusionZSSBIR)  



## 41.Graph Generative Network(GNN/GCN)

* GNN

  * [Domain Separation Graph Neural Networks for Saliency Object Ranking](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Domain_Separation_Graph_Neural_Networks_for_Saliency_Object_Ranking_CVPR_2024_paper.pdf)

  * [GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs](https://arxiv.org/abs/2405.06849)

  * [FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FC-GNN_Recovering_Reliable_and_Accurate_Correspondences_from_Interferences_CVPR_2024_paper.pdf)

  * [DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching](https://arxiv.org/abs/2306.12547)

  * [GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds](https://arxiv.org/abs/2312.00068)图生成网络

* GCN

  * [Learning for Transductive Threshold Calibration in Open-World Recognition](https://arxiv.org/abs/2305.12039)



## 40.Scene Graph Generation(场景图生成)

* [Leveraging Predicate and Triplet Learning for Scene Graph Generation](https://arxiv.org/abs/2406.02038)

* [OED: Towards One-stage End-to-End Dynamic Scene Graph Generation](https://arxiv.org/abs/2405.16925)

* [CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_CLIP-Driven_Open-Vocabulary_3D_Scene_Graph_Generation_via_Cross-Modality_Contrastive_Learning_CVPR_2024_paper.pdf)

* [Multi-Level Neural Scene Graphs for Dynamic Urban Environments](http://arxiv.org/abs/2404.00168v1)
:star:[code](https://tobiasfshr.github.io/pub/ml-nsg/)

* [HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation](http://arxiv.org/abs/2403.12033v1)
:star:[code](https://zhangce01.github.io/HiKER-SGG)
:star:[code](https://github.com/zhangce01/HiKER-SGG)

* [DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation](http://arxiv.org/abs/2403.14886v1)
:star:[code](https://github.com/zeeshanhayder/DSGG)
:house:[project](https://zeeshanhayder.github.io/DSGG/)

* [From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models](http://arxiv.org/abs/2404.00906v1)

* [EGTR: Extracting Graph from Transformer for Scene Graph Generation](http://arxiv.org/abs/2404.02072v1)
:star:[code](https://github.com/naver-ai/egtr)

* [LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation](http://arxiv.org/abs/2310.10404)



## 39.Motion Generation(动作生成)

* [Programmable Motion Generation for Open-Set Motion Control Tasks](https://arxiv.org/abs/2405.19283)

* [Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance](http://arxiv.org/abs/2403.18036v1)

* [AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents](https://arxiv.org/abs/2403.12835)

* [Towards Variable and Coordinated Holistic Co-Speech Motion Generation](http://arxiv.org/abs/2404.00368v1)
:star:[code](https://feifeifeiliu.github.io/probtalk/)

* [Generating Human Motion in 3D Scenes from Text Descriptions](http://arxiv.org/abs/2405.07784)根据文本描述生成 3D 场景中的人体运动

* [NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis](https://arxiv.org/abs/2307.07511)
:house:[project](https://nileshkulkarni.github.io/nifty)人体运动合成

* [OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers](https://arxiv.org/abs/2312.08985)
:house:[project](https://tr3e.github.io/omg-page)

* [WANDR: Intention-guided Human Motion Generation](http://arxiv.org/abs/2404.15383)
:tv:[video](https://www.youtube.com/watch?v=9szizM-XUCg)

* [MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion](http://arxiv.org/abs/2310.14729)
:house:[project](https://guytevet.github.io/mas-page/)

* [Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action](http://arxiv.org/abs/2312.17172)

* [Multimodal Sense-Informed Forecasting of 3D Human Motions](https://arxiv.org/abs/2405.02911)

* 运动检索

  * [Tri-Modal Motion Retrieval by Learning a Joint Embedding Space](http://arxiv.org/abs/2403.00691)

* 动物运动

  * [OmniMotionGPT: Animal Motion Generation with Limited Data](https://arxiv.org/abs/2311.18303)
:star:[code](https://zshyang.github.io/omgpt-website/)
:house:[project](https://zshyang.github.io/omgpt-website/)

* 人体运动预测

  * [MoML: Online Meta Adaptation for 3D Human Motion Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_MoML_Online_Meta_Adaptation_for_3D_Human_Motion_Prediction_CVPR_2024_paper.pdf)

  * [MoST: Multi-Modality Scene Tokenization for Motion Prediction](http://arxiv.org/abs/2404.19531)

  * [Rethinking Human Motion Prediction with Symplectic Integral](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Rethinking_Human_Motion_Prediction_with_Symplectic_Integral_CVPR_2024_paper.pdf)

  * [Human Motion Prediction Under Unexpected Perturbation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yue_Human_Motion_Prediction_Under_Unexpected_Perturbation_CVPR_2024_paper.pdf)

  * [Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy](https://openaccess.thecvf.com/content/CVPR2024/papers/Kang_Continual_Learning_for_Motion_Prediction_Model_via_Meta-Representation_Learning_and_CVPR_2024_paper.pdf)

* 人体运动估计

  * [MultiPhys: Multi-Person Physics-aware 3D Motion Estimation](https://arxiv.org/abs/2404.11987)
:house:[project](http://www.iri.upc.edu/people/nugrinovic/multiphys/)

  * [A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals](https://arxiv.org/abs/2404.04890)人体运动估计

* 人体运动重建

  * [RoHM: Robust Human Motion Reconstruction via Diffusion](http://arxiv.org/abs/2401.08570)



## 38.Vision Question Answering(视觉问答)

* [GRAM: Global Reasoning for Multi-Page VQA](https://arxiv.org/abs/2401.03411)

* [SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities](http://arxiv.org/abs/2401.12168)
:house:[project](https://spatial-vlm.github.io/)

* [Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering](http://arxiv.org/abs/2404.10193v1)

* [How to Configure Good In-Context Sequence for Visual Question Answering](https://arxiv.org/abs/2312.01571)
:star:[code](https://github.com/GaryJiajia/OFv2_ICL_VQA)

* [Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models](https://arxiv.org/abs/2312.06685)

* [Question Aware Vision Transformer for Multimodal Reasoning](http://arxiv.org/abs/2402.05472)

* [OpenEQA: Embodied Question Answering in the Era of Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Majumdar_OpenEQA_Embodied_Question_Answering_in_the_Era_of_Foundation_Models_CVPR_2024_paper.pdf)

* Video-QA

  * [Grounded Question-Answering in Long Egocentric Videos](https://arxiv.org/abs/2312.06505)
:star:[code](https://github.com/Becomebright/GroundVQA)

  * [Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels](http://arxiv.org/abs/2403.14430v1) 

  * [Language-aware Visual Semantic Distillation for Video Question Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Zou_Language-aware_Visual_Semantic_Distillation_for_Video_Question_Answering_CVPR_2024_paper.pdf)

  * [MoReVQA: Exploring Modular Reasoning Models for Video Question Answering](https://arxiv.org/abs/2404.06511)

  * [Can I Trust Your Answer? Visually Grounded Video Question Answering](https://arxiv.org/abs/2309.01327)
:star:[code](https://github.com/doc-doc/NExT-GQA)

  * [Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Liao_Align_and_Aggregate_Compositional_Reasoning_with_Video_Alignment_and_Answer_CVPR_2024_paper.pdf)

* 图表问答

  * [CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_CoG-DQA_Chain-of-Guiding_Learning_with_Large_Language_Models_for_Diagram_Question_CVPR_2024_paper.pdf)

  * [Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA](http://arxiv.org/abs/2403.16385v1)

* 视觉文本问答

  * [VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning](https://arxiv.org/abs/2303.02635)



## 37.OCR

* 场景文本识别

  * [OTE: Exploring Accurate Scene Text Recognition Using One Token](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_OTE_Exploring_Accurate_Scene_Text_Recognition_Using_One_Token_CVPR_2024_paper.pdf)

  * [An Empirical Study of Scaling Law for Scene Text Recognition](https://arxiv.org/abs/2401.00028)
:star:[code](https://github.com/large-ocr-model/large-ocr-model.github.io)场景文本识别

  * [Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer](https://arxiv.org/abs/2311.13120)
:star:[code](https://github.com/bytedance/E2STR)

  * [Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_Kernel_Adaptive_Convolution_for_Scene_Text_Detection_via_Distance_Map_CVPR_2024_paper.pdf)

  * [Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing](https://arxiv.org/abs/2405.04377)场景文本识别、删除和编辑

  * [ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting](https://arxiv.org/abs/2403.00303)
:star:[code](https://github.com/PriNing/ODM)

* 场景文本图像合成

  * [Layout-Agnostic Scene Text Image Synthesis with Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhangli_Layout-Agnostic_Scene_Text_Image_Synthesis_with_Diffusion_Models_CVPR_2024_paper.pdf)

* 场景文本理解

  * [LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_LayoutFormer_Hierarchical_Text_Detection_Towards_Scene_Text_Understanding_CVPR_2024_paper.pdf)

* 化学结构识别

  * [Atom-Level Optical Chemical Structure Recognition with Limited Supervision](https://arxiv.org/abs/2404.01743)
:star:[code](https://github.com/molden/atomlenz)

* 文档色度检测

  * [CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_CMA_A_Chromaticity_Map_Adapter_for_Robust_Detection_of_Screen-Recapture_CVPR_2024_paper.pdf)
:star:[code](https://github.com/chenlewis/Chromaticity-Map-Adapter-for-DPAD)

* 文本检测

  * [OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition](http://arxiv.org/abs/2403.19128v1)
:star:[code](https://github.com/AlibabaResearch/AdvancedLiterateMachinery)

  * [Bridging the Gap Between End-to-End and Two-Step Text Spotting](http://arxiv.org/abs/2404.04624v1)
:star:[code](https://github.com/mxin262/Bridging-Text-Spotting)

  * [Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis](https://arxiv.org/abs/2405.07481)

* 文档理解

  * [LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding](http://arxiv.org/abs/2404.05225v1)
:star:[code](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LayoutLLM)

  * [HRVDA: High-Resolution Visual Document Assistant](http://arxiv.org/abs/2404.06918v1)

* 字体生成

  * [Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Fu_Generate_Like_Experts_Multi-Stage_Font_Generation_by_Incorporating_Font_Transfer_CVPR_2024_paper.pdf)



## 36.4D Reconstruction(4D 重建)

* [Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle](https://arxiv.org/abs/2312.03431)
:house:[project](https://nju-3dv.github.io/projects/Gaussian-Flow)

* [Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking](https://arxiv.org/abs/2401.06614)
:house:[project](https://vveicao.github.io/projects/Motion2VecSets/)

* [4D Gaussian Splatting for Real-Time Dynamic Scene Rendering](https://arxiv.org/abs/2310.08528)
:star:[code](https://github.com/hustvl/4DGaussians)
:house:[project](https://guanjunwu.github.io/4dgs/)

* 文本和图像引导 4D 场景生成

  * [A Unified Approach for Text- and Image-guided 4D Scene Generation](https://arxiv.org/abs/2311.16854)
:house:[project](https://research.nvidia.com/labs/nxp/dream-in-4d/)

* 4D视图合成

  * [4K4D: Real-Time 4D View Synthesis at 4K Resolution](https://arxiv.org/abs/2310.11448)
:star:[code](https://github.com/zju3dv/4K4D)
:house:[project](https://zju3dv.github.io/4k4d/)

* 语言到 4D 建模

  * [L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_L4D-Track_Language-to-4D_Modeling_Towards_6-DoF_Tracking_and_Shape_Reconstruction_in_CVPR_2024_paper.pdf)
:star:[code](https://github.com/S-JingTao/L4D_Track)



## 35.Scene Understanding(场景理解)

* [Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Omni-Q_Omni-Directional_Scene_Understanding_for_Unsupervised_Visual_Grounding_CVPR_2024_paper.pdf)

* [PanoContext-Former: Panoramic Total Scene Understanding with a Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_PanoContext-Former_Panoramic_Total_Scene_Understanding_with_a_Transformer_CVPR_2024_paper.pdf)

* [DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving](https://arxiv.org/abs/2405.04390)

* [OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies](https://arxiv.org/abs/2405.05259)
:star:[code](https://github.com/ldkong1205/OpenESS)

* [A Category Agnostic Model for Visual Rearrangment](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_A_Category_Agnostic_Model_for_Visual_Rearrangment_CVPR_2024_paper.pdf)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

* [360+x: A Panoptic Multi-modal Scene Understanding Dataset](http://arxiv.org/abs/2404.00989v1)
:star:[code](https://x360dataset.github.io)

* 开放词汇场景理解

  * [Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding](https://arxiv.org/abs/2311.18482)

* 3D场景理解

  * [HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting](https://arxiv.org/abs/2403.12722)
:house:[project](https://xdimlab.github.io/hugs_website)

  * [SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field](https://arxiv.org/abs/2403.14366)

  * [GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding](http://arxiv.org/abs/2403.03608v1)

  * [GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_GP-NeRF_Generalized_Perception_NeRF_for_Context-Aware_3D_Scene_Understanding_CVPR_2024_paper.pdf)

  * [RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding](https://arxiv.org/abs/2304.00962)
:house:[project](https://jihanyang.github.io/projects/RegionPLC)

  * [GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding](http://arxiv.org/abs/2403.09639v1)
:star:[code](https://github.com/dvlab-research/GroupContrast)

  * [SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Delitzas_SceneFun3D_Fine-Grained_Functionality_and_Affordance_Understanding_in_3D_Scenes_CVPR_2024_paper.pdf)



## 34.Human–Computer Interaction(人机交互)

* [Exploring Pose-Aware Human-Object Interaction via Hybrid Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)

* [Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)

* [Scaling Up Dynamic Human-Scene Interaction Modeling](https://arxiv.org/abs/2403.08629)
:star:[code](https://huggingface.co/spaces/jnnan/trumans/tree/main)
:house:[project](https://jnnan.github.io/trumans/)

* [ReGenNet: Towards Human Action-Reaction Synthesis](http://arxiv.org/abs/2403.11882v1)
:star:[code](https://liangxuy.github.io/ReGenNet/)

* [DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback](https://arxiv.org/pdf/2311.10081.pdf)
:star:[code](https://huggingface.co/datasets/YangyiYY/LVLM_NLF)交互

* [HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_HOI-M3_Capture_Multiple_Humans_and_Objects_Interaction_within_Contextual_Environment_CVPR_2024_paper.pdf)

* [GenZI: Zero-Shot 3D Human-Scene Interaction Generation](http://arxiv.org/abs/2311.17737)

* [Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection](http://arxiv.org/abs/2404.06194)

* 人体运动跟踪

  * [HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations](http://arxiv.org/abs/2403.03561v1)
:star:[code](https://pico-ai-team.github.io/hmd-poser)
:house:[project](https://pico-ai-team.github.io/hmd-poser)

* 新运动合成

  * [PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics](https://arxiv.org/abs/2311.12198)
:star:[code](https://github.com/XPandora/PhysGaussian)
:house:[project](https://xpandora.github.io/PhysGaussian/)

* 手部交互

  * [InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion](http://arxiv.org/abs/2403.17422v1)
:star:[code](https://jyunlee.github.io/projects/interhandgen/)

  * [HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data](http://arxiv.org/abs/2403.12011)

  * [Physics-Aware Hand-Object Interaction Denoising](http://arxiv.org/abs/2405.11481)

  * [HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video](https://arxiv.org/abs/2311.18448)
:star:[code](https://github.com/zc-alexfan/hold)
:house:[project](https://zc-alexfan.github.io/hold)手物交互

  * [GEARS: Local Geometry-aware Hand-object Interaction Synthesis](https://arxiv.org/abs/2404.01758)

  * [TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding](https://arxiv.org/abs/2401.08399)
:house:[project](https://taco2024.github.io/)

  * [Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction](http://arxiv.org/abs/2404.00562v1)
:star:[code](https://github.com/JunukCha/Text2HOI)

  * [G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis](http://arxiv.org/abs/2404.12383v1)
:star:[code](https://judyye.github.io/ghop-www)

  * [MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision](http://arxiv.org/abs/2310.11696)

  * [HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild](https://openaccess.thecvf.com/content/CVPR2024/papers/Narasimhaswamy_HOIST-Former_Hand-held_Objects_Identification_Segmentation_and_Tracking_in_the_Wild_CVPR_2024_paper.pdf)

* 人物交互

  * [Discovering Syntactic Interaction Clues for Human-Object Interaction Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)

  * [Open-World Human-Object Interaction Detection via Multi-modal Prompts](https://arxiv.org/abs/2406.07221)

  * [LEMON: Learning 3D Human-Object Interaction Relation from 2D Images](https://arxiv.org/pdf/2312.08963.pdf)
:star:[code](https://github.com/yyvhang/lemon_3d)
:house:[project](https://yyvhang.github.io/LEMON/)

  * [Disentangled Pre-training for Human-Object Interaction Detection](http://arxiv.org/abs/2404.01725v1)
:star:[code](https://github.com/xingaoli/DP-HOI)

  * [GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation](https://arxiv.org/abs/2401.00929)
:house:[project](https://genh2r.github.io/)

  * [Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition](http://arxiv.org/abs/2405.09931)
:house:[project](https://yuchen2199.github.io/Interactive-Gaze/)

  * [Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation](https://arxiv.org/abs/2312.07063)
:house:[project](https://virtualhumans.mpi-inf.mpg.de/procigen-hdm)

  * 3D 人物交互

    * [I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions](https://arxiv.org/abs/2312.08869)
:house:[project](https://afterjourney00.github.io/IM-HOI.github.io/)

    * [CG-HOI: Contact-Guided 3D Human-Object Interaction Generation](https://arxiv.org/abs/2311.16097)
:house:[project](https://cg-hoi.christian-diller.de/)

* 人-人交互

  * [Inter-X: Towards Versatile Human-Human Interaction Analysis](https://arxiv.org/abs/2312.16051)
:star:[code](https://github.com/liangxuy/Inter-X)
:house:[project](https://liangxuy.github.io/inter-x/)
:thumbsup:[三维数字人重建、编辑与驱动](https://valser.org/webinar/slide/slides/20240403/Valse20240403%E6%99%8F%E8%BD%B6%E8%B6%85.pdf)



## 33.NeRF

* [GARField: Group Anything with Radiance Fields](http://arxiv.org/abs/2401.09419)

* [IReNe: Instant Recoloring of Neural Radiance Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Mazzucchelli_IReNe_Instant_Recoloring_of_Neural_Radiance_Fields_CVPR_2024_paper.pdf)

* [PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF](https://openaccess.thecvf.com/content/CVPR2024/papers/Feng_PIE-NeRF_Physics-based_Interactive_Elastodynamics_with_NeRF_CVPR_2024_paper.pdf)

* [LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes](http://arxiv.org/abs/2405.00900)

* [SIGNeRF: Scene Integrated Generation for Neural Radiance Fields](http://arxiv.org/abs/2401.01647)

* [NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_NC-SDF_Enhancing_Indoor_Scene_Reconstruction_Using_Neural_SDFs_with_View-Dependent_CVPR_2024_paper.pdf)

* [SpecNeRF: Gaussian Directional Encoding for Specular Reflections](http://arxiv.org/abs/2312.13102)

* [PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_PaReNeRF_Toward_Fast_Large-scale_Dynamic_NeRF_with_Patch-based_Reference_CVPR_2024_paper.pdf)NeRF

* [Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Global_and_Hierarchical_Geometry_Consistency_Priors_for_Few-shot_NeRFs_in_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs](http://arxiv.org/abs/2402.08622)

* [Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling](https://arxiv.org/abs/2405.14847)
:star:[code](https://github.com/lwwu2/nde)

* [Accelerating Neural Field Training via Soft Mining](http://arxiv.org/abs/2312.00075)

* [Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling](https://arxiv.org/abs/2406.03723)
:house:[project](https://merl.com/research/highlights/gear-nerf)

* [How Far Can We Compress Instant-NGP-Based NeRF?](https://arxiv.org/abs/2406.04101)
:star:[code](https://github.com/yihangchen-ee/cnc/)
:house:[project](https://yihangchen-ee.github.io/project_cnc/)

* [BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction](http://arxiv.org/abs/2404.13024)
:star:[code](https://theialab.github.io/banf/)
:house:[project](https://theialab.github.io/banf/)

* [Tactile-Augmented Radiance Fields](https://arxiv.org/abs/2405.04534)
:star:[code](https://github.com/Dou-Yiming/TaRF/)
:house:[project](https://dou-yiming.github.io/TaRF)

* [NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild](https://arxiv.org/abs/2405.18715)
:house:[project](https://nerf-on-the-go.github.io/)

* [L0-Sampler: An L0 Model Guided Volume Sampling for NeRF](https://arxiv.org/abs/2311.07044)
:house:[project](https://ustc3dv.github.io/L0-Sampler/)NeRF

* [HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses](https://arxiv.org/abs/2312.02232)

* [Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields](https://arxiv.org/abs/2311.11845)
:star:[code](https://github.com/tatakai1/EVENeRF)

* [NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation](https://arxiv.org/abs/2404.02185)

* [MuRF: Multi-Baseline Radiance Fields](https://arxiv.org/abs/2312.04565)
:house:[project](https://haofeixu.github.io/murf/)
:house:[project](https://ivrl.github.io/InNeRF360)

* [InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields](https://arxiv.org/abs/2305.15094)

* [NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors](https://arxiv.org/abs/2403.03122)
:star:[code](https://github.com/hynann/NRDF)
:house:[project](https://virtualhumans.mpi-inf.mpg.de/nrdf/)

* [Neural Fields as Distributions: Signal Processing Beyond Euclidean Space](https://openaccess.thecvf.com/content/CVPR2024/papers/Rebain_Neural_Fields_as_Distributions_Signal_Processing_Beyond_Euclidean_Space_CVPR_2024_paper.pdf)
:house:[project](https://ubc-vision.github.io/nfd/)

* [CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs](http://arxiv.org/abs/2403.16885v1)
:star:[code](https://zhongyingji.github.io/CVT-xRF)

* [DaReNeRF: Direction-aware Representation for Dynamic Scenes](http://arxiv.org/abs/2403.02265v1)

* [Geometry Transfer for Stylizing Radiance Fields](https://arxiv.org/abs/2402.00863)
:house:[project](https://hyblue.github.io/geo-srf/)

* [S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes](http://arxiv.org/abs/2403.06205v1)
:star:[code](https://xingyi-li.github.io/s-dyrf/)

* [SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream](http://arxiv.org/abs/2403.11222v1)
:star:[code](https://github.com/BIT-Vision/SpikeNeRF)

* [Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes](http://arxiv.org/abs/2403.16141v1)
:star:[code](https://otonari726.github.io/entitynerf/)

* [Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates](https://arxiv.org/abs/2309.11281)
:star:[code](https://github.com/kcshum/pose-conditioned-NeRF-object-fusion)

* [LAENeRF: Local Appearance Editing for Neural Radiance Fields](https://arxiv.org/abs/2312.09913)
:star:[code](https://github.com/r4dl/LAENeRF)
:house:[project](https://r4dl.github.io/LAENeRF/)

* [Single View Refractive Index Tomography with Neural Fields](http://arxiv.org/abs/2309.04437)

* [ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models](https://arxiv.org/abs/2406.06133)

* [TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video](https://arxiv.org/abs/2312.06713)

* [NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation](http://arxiv.org/abs/2403.17537v1)
:star:[code](https://cnhaox.github.io/NeRF-HuGS/)

* [Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency](http://arxiv.org/abs/2403.17638v1)
:star:[code](https://github.com/HKCLynn/ReVoRF)

* [Grounding and Enhancing Grid-based Models for Neural Fields](http://arxiv.org/abs/2403.20002v1)
:house:[project](https://sites.google.com/view/cvpr24-2034-submission/home)

* [Mitigating Motion Blur in Neural Radiance Fields with Events and Frames](http://arxiv.org/abs/2403.19780v1)

* [OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos](http://arxiv.org/abs/2404.00676v1)

* [Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects](http://arxiv.org/abs/2404.01440v1)
:star:[code](https://github.com/NVlabs/DigitalTwinArt)

* [Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Goli_Bayes_Rays_Uncertainty_Quantification_for_Neural_Radiance_Fields_CVPR_2024_paper.pdf)

* [Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields](http://arxiv.org/abs/2404.02155v1)
:house:[project](https://pals.ttic.edu/p/alpha-invariance)

* [Dynamic LiDAR Re-simulation using Compositional Neural Fields](https://arxiv.org/abs/2312.05247)
:house:[project](https://shengyuh.github.io/dynfl)

* [SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields](https://arxiv.org/abs/2311.15803)
:house:[project](https://qherau.github.io/SOAC/)

* [ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization](https://arxiv.org/abs/2401.08937)

* [NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_NeRFDeformer_NeRF_Transformation_from_a_Single_View_via_3D_Scene_CVPR_2024_paper.pdf)

* 新视图合成

  * [ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image](http://arxiv.org/abs/2310.17994)

  * [Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Hong_Unifying_Correspondence_Pose_and_NeRF_for_Generalized_Pose-Free_Novel_View_CVPR_2024_paper.pdf)

  * [NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/You_NeLF-Pro_Neural_Light_Field_Probes_for_Multi-Scale_Novel_View_Synthesis_CVPR_2024_paper.pdf)

  * [3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis](http://arxiv.org/abs/2404.06270)

  * [G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images](http://arxiv.org/abs/2404.07474v1)

  * [MultiDiff: Consistent Novel View Synthesis from a Single Image](https://openaccess.thecvf.com/content/CVPR2024/papers/Muller_MultiDiff_Consistent_Novel_View_Synthesis_from_a_Single_Image_CVPR_2024_paper.pdf)

  * [Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis](https://arxiv.org/abs/2401.02436)

  * [DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis](https://arxiv.org/abs/2312.13016)

  * [3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis](https://arxiv.org/abs/2404.06270)

  * [Generalizable Novel-View Synthesis using a Stereo Camera](http://arxiv.org/abs/2404.13541)
:house:[project](https://jinwonjoon.github.io/stereonerf/)

  * [DART: Implicit Doppler Tomography for Radar Novel View Synthesis](http://arxiv.org/abs/2403.03896v1)
:house:[project](https://wiselabcmu.github.io/dart/)

  * [XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold](http://arxiv.org/abs/2403.19517v1)

  * [Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis](https://arxiv.org/abs/2312.16812)
:star:[code](https://github.com/oppo-us-research/SpacetimeGaussians)
:house:[project](https://oppo-us-research.github.io/SpacetimeGaussians-website/)

  * [NViST: In the Wild New View Synthesis from a Single Image with Transformers](https://arxiv.org/abs/2312.08568)
:star:[code](https://github.com/wbjang/nvist_official)
:house:[project](https://wbjang.github.io/nvist_webpage/)

  * [ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models](https://arxiv.org/abs/2312.01305)
:house:[project](https://jgkwak95.github.io/ViVid-1-to-3/)

  * [SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes](https://arxiv.org/abs/2312.14937)
:star:[code](https://github.com/yihua7/SC-GS)
:house:[project](https://yihua7.github.io/SC-GS-web/)

  * [Neural Visibility Field for Uncertainty-Driven Active Mapping](https://arxiv.org/abs/2406.06948)
:house:[project](https://sites.google.com/view/nvf-cvpr24/)

  * [EscherNet: A Generative Model for Scalable View Synthesis](https://arxiv.org/abs/2402.03908)
:star:[code](https://github.com/kxhit/EscherNet)
:house:[project](https://kxhit.github.io/EscherNet)

  * [GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis](https://arxiv.org/pdf/2312.02155.pdf)
:star:[code](https://github.com/ShunyuanZheng/GPS-Gaussian)
:house:[project](https://shunyuanzheng.github.io/GPS-Gaussian)新视图

  * [DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization](http://arxiv.org/abs/2403.06912v1)
:star:[code](https://github.com/Fictionarry/DNGaussian)
:house:[project](https://fictionarry.github.io/DNGaussian/)

  * [LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis](https://arxiv.org/abs/2404.02742)
:star:[code](https://github.com/ispc-lab/LiDAR4D)
:house:[project](https://dyfcalid.github.io/LiDAR4D)

  * [Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?](http://arxiv.org/abs/2403.06092v1)

  * [Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models](https://github.com/Q-Future/Q-Instruct/tree/main/fig/Q_Instruct_v0_1_preview.pdf)
:star:[code](https://huggingface.co/datasets/teowu/Q-Instruct)
:house:[project](https://q-future.github.io/Q-Instruct/)

  * [CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs](https://arxiv.org/abs/2312.07246)
:star:[code](https://github.com/KU-CVLAB/CoPoNeRF)
:house:[project](https://ku-cvlab.github.io/CoPoNeRF/)

  * [EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion](https://arxiv.org/abs/2312.06725)
:star:[code](https://github.com/huanngzh/EpiDiff)
:house:[project](https://huanngzh.github.io/EpiDiff/)

  * [Free3D: Consistent Novel View Synthesis without 3D Representation](https://arxiv.org/abs/2312.04551)
:star:[code](https://github.com/lyndonzheng/Free3D)
:house:[project](https://chuanxiaz.com/free3d/)

  * [Novel View Synthesis with View-Dependent Effects from a Single Image](https://arxiv.org/abs/2312.08071)
:house:[project](https://kaist-viclab.github.io/monovde-site)

* 渲染

  * [NeRF Director: Revisiting View Selection in Neural Volume Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_NeRF_Director_Revisiting_View_Selection_in_Neural_Volume_Rendering_CVPR_2024_paper.pdf)

  * [Multiplane Prior Guided Few-Shot Aerial Scene Rendering](https://arxiv.org/abs/2406.04961)渲染

  * [Differentiable Point-based Inverse Rendering](https://arxiv.org/abs/2312.02480)逆渲染

  * [Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance](https://arxiv.org/abs/2312.04529)渲染

  * [Perceptual Assessment and Optimization of HDR Image Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_Perceptual_Assessment_and_Optimization_of_HDR_Image_Rendering_CVPR_2024_paper.pdf)

  * [Global Latent Neural Rendering](https://arxiv.org/abs/2312.08338)

  * [Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields](https://arxiv.org/abs/2404.17528)
:star:[code](https://github.com/TQTQliu/GeFu)
:house:[project](https://gefucvpr24.github.io/)

  * [GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering](https://arxiv.org/abs/2402.10128)
:house:[project](https://abdullahamdi.com/ges)

  * [Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination](https://openaccess.thecvf.com/content/CVPR2024/papers/Zeng_Real-time_Acquisition_and_Reconstruction_of_Dynamic_Volumes_with_Neural_Structured_CVPR_2024_paper.pdf)
:house:[project](https://svbrdf.github.io/publications/realtimedynamic/project.html)
:tv:[video](https://www.youtube.com/watch?v=XoTYTGSueh4)
:thumbsup:[借助神经结构光，浙大实现动态三维现象的实时采集重建](https://mp.weixin.qq.com/s/cUnFIaL4xLaHBOWpNcI7Yg)

  * [Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields](http://arxiv.org/abs/2403.16224v1)
:house:[project](https://whyy.site/paper/nep)

  * [Dr.Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering](https://arxiv.org/abs/2308.08843)
:house:[project](https://shengcn.github.io/DrBokeh/)

  * [HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting](https://arxiv.org/abs/2312.03461)
:thumbsup:[HiFi4G: 通过紧凑高斯进行高保真人体性能渲染](https://cloud.tencent.com/developer/article/2383180)

  * [ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering](https://arxiv.org/abs/2312.05941)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/ash/)

  * [SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild](https://arxiv.org/abs/2401.10171)
:house:[project](https://shinobi.aengelhardt.com/)神经渲染

  * [LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Choi_LTM_Lightweight_Textured_Mesh_Extraction_and_Refinement_of_Large_Unbounded_CVPR_2024_paper.pdf)

  * [HashPoint: Accelerated Point Searching and Sampling for Neural Rendering](https://export.arxiv.org/abs/2404.14044)
:house:[project](https://jiahao-ma.github.io/hashpoint/)

  * [HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces](https://arxiv.org/abs/2312.03160)
:house:[project](https://haithemturki.com/hybrid-nerf/)

  * [DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling](https://arxiv.org/abs/2402.08876)
:star:[code](https://github.com/LIA-DiTella/DiffUDF)
:house:[project](https://lia-ditella.github.io/DUDF/)

  * [Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras](https://arxiv.org/abs/2312.07423)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/holochar/)

  * [ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis](https://arxiv.org/abs/2311.17123)
:house:[project](https://gaoxiangjun.github.io/contex_human/)

* 多视图逆渲染

  * [VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources](https://openaccess.thecvf.com/content/CVPR2024/papers/Fei_VMINer_Versatile_Multi-view_Inverse_Rendering_with_Near-_and_Far-field_Light_CVPR_2024_paper.pdf)

* 目标重建

  * [Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction](https://arxiv.org/abs/2312.01196)
:house:[project](https://geometric-rl.mpi-inf.mpg.de/npg)

  * [SAOR: Single-View Articulated Object Reconstruction](https://arxiv.org/abs/2303.13514)
:house:[project](https://mehmetaygun.github.io/saor)



## 32.NLP(自然语言处理)

* [Describing Differences in Image Sets with Natural Language](http://arxiv.org/abs/2312.02974)

* 实体识别

  * [A Generative Approach for Wikipedia-Scale Visual Entity Recognition](http://arxiv.org/abs/2403.02041v1)

* 提示学习

  * [BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP](https://arxiv.org/abs/2311.16194)

  * [Active Prompt Learning in Vision Language Models](https://arxiv.org/abs/2311.11178)
:star:[code](https://github.com/kaist-dmlab/pcb)

  * [Domain Prompt Learning with Quaternion Networks](https://arxiv.org/abs/2312.08878)

  * [On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?](http://arxiv.org/abs/2405.02266)

  * [ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection](http://arxiv.org/abs/2311.15243)

* 基础模型

  * [Asymmetric Masked Distillation for Pre-Training Small Foundation Models](https://arxiv.org/abs/2311.03149)
:star:[code](https://github.com/MCG-NJU/AMD)

  * [Bootstrapping SparseFormers from Vision Foundation Models](https://arxiv.org/abs/2312.01987)
:star:[code](https://github.com/showlab/sparseformer)

 



## 31.Edge Detection(边缘检测)

* [MuGE: Multiple Granularity Edge Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_MuGE_Multiple_Granularity_Edge_Detection_CVPR_2024_paper.pdf)

* [RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses](http://arxiv.org/abs/2403.01795v1)
:star:[code](https://ranked-cvpr24.github.io)



## 30.Person Re-Identification(人员重识别)

* [Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Fusing_Personal_and_Environmental_Cues_for_Identification_and_Segmentation_of_CVPR_2024_paper.pdf)

* [Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception](https://arxiv.org/abs/2311.13793)

* 行人检测

  * [DAP: A Dynamic Adversarial Patch for Evading Person Detectors](https://arxiv.org/abs/2305.11618)

  * [Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection](http://arxiv.org/abs/2403.01300v1)
:star:[code](https://github.com/ssbin0914/Causal-Mode-Multiplexer)

  * [WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion](http://arxiv.org/abs/2403.19022)

  * 基于文本的行人检索

    * [UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity](https://arxiv.org/abs/2312.03441)
:star:[code](https://github.com/Zplusdragon/UFineBench)

* 人群计数

  * [Single Domain Generalization for Crowd Counting](http://arxiv.org/abs/2403.09124v1)
:star:[code](https://github.com/Shimmer93/MPCount)

  * [CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models](https://arxiv.org/abs/2303.12790)
:house:[project](https://dylran.github.io/crowddiff.github.io)

  * [Regressor-Segmenter Mutual Prompt Learning for Crowd Counting](https://arxiv.org/abs/2312.01711)

* 行人属性检测

  * [Learning Group Activity Features Through Person Attribute Prediction](https://arxiv.org/abs/2403.02753)
:star:[code](https://github.com/chihina/GAFL-CVPR2024)
:house:[project](https://www.toyota-ti.ac.jp/Lab/Denshi/iim/ukita/selection/CVPR2024-GAFL.html)

* 重识别

  * [SEAS: ShapE-Aligned Supervision for Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_SEAS_ShapE-Aligned_Supervision_for_Person_Re-Identification_CVPR_2024_paper.pdf)

  * [Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Cui_Learning_Continual_Compatible_Representation_for_Re-indexing_Free_Lifelong_Person_Re-identification_CVPR_2024_paper.pdf)
:star:[code](https://github.com/PKU-ICST-MIPL/C2R_CVPR2024)

  * [View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network](http://arxiv.org/abs/2403.14513v1)
:star:[code](https://github.com/LinlyAC/VDT-AGPReID)

  * [CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification](https://arxiv.org/abs/2311.10605)

  * [Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_Attribute-Guided_Pedestrian_Retrieval_Bridging_Person_Re-ID_with_Internal_Attribute_Variability_CVPR_2024_paper.pdf)

  * [All in One Framework for Multimodal Re-identification in the Wild](https://arxiv.org/abs/2405.04741)

  * [A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_A_Pedestrian_is_Worth_One_Prompt_Towards_Language_Guidance_Person_CVPR_2024_paper.pdf)

  * [Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification](https://zhoujiahuan1991.github.io/pub/CVPR2024_DKP.pdf)

  * [Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions](https://arxiv.org/abs/2306.07520)
:star:[code](https://github.com/hwz-zju/Instruct-ReID)

  * 基于雷达的Re-Id

    * [LiDAR-based Person Re-identification](https://arxiv.org/abs/2312.03033)

  * 可见光-红外人员重识别

    * [Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification](http://arxiv.org/abs/2403.11708v1)
:star:[code](https://github.com/1KK077/IDKL)

    * [Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Shallow-Deep_Collaborative_Learning_for_Unsupervised_Visible-Infrared_Person_Re-Identification_CVPR_2024_paper.pdf)

  * 文本-图像重识别

    * [Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID](https://arxiv.org/abs/2405.04940)

    * [Noisy-Correspondence Learning for Text-to-Image Person Re-identification](https://arxiv.org/abs/2308.09911)
:star:[code](https://github.com/QinYang79/RDE)

* 步态识别

  * [Learning Visual Prompt for Gait Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_Learning_Visual_Prompt_for_Gait_Recognition_CVPR_2024_paper.pdf)

  * [BigGait: Learning Gait Representation You Want by Large Vision Models](https://arxiv.org/abs/2402.19122)
:star:[code](https://github.com/ShiqiYu/OpenGait)



## 29.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

* MC

  * [Dense Vision Transformer Compression with Few Samples](http://arxiv.org/abs/2403.18708v1)

* KD

  * [Small Scale Data-Free Knowledge Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Small_Scale_Data-Free_Knowledge_Distillation_CVPR_2024_paper.pdf)

  * [KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_KD-DETR_Knowledge_Distillation_for_Detection_Transformer_with_Consistent_Distillation_Points_CVPR_2024_paper.pdf)

  * [Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation](http://arxiv.org/abs/2404.07933)

  * [Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities](http://arxiv.org/abs/2404.16456)

  * [C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Huo_C2KD_Bridging_the_Modality_Gap_for_Cross-Modal_Knowledge_Distillation_CVPR_2024_paper.pdf)

  * [CrossKD: Cross-Head Knowledge Distillation for Object Detection](http://arxiv.org/abs/2306.11369)

  * [CLIP-KD: An Empirical Study of CLIP Model Distillation](https://arxiv.org/abs/2307.12732)
:star:[code](https://github.com/winycg/CLIP-KD)

  * [Aligning Logits Generatively for Principled Black-Box Knowledge Distillation](https://arxiv.org/abs/2205.10490)

  * [FreeKD: Knowledge Distillation via Semantic Frequency Prompt](https://arxiv.org/abs/2311.12079)

  * [Logit Standardization in Knowledge Distillation](http://arxiv.org/abs/2403.01427v1)

  * [$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections](http://arxiv.org/abs/2403.06213v1)
:star:[code](https://github.com/roymiles/vkd)

  * [Scale Decoupled Distillation](http://arxiv.org/abs/2403.13512v1)
:star:[code](https://github.com/shicaiwei123/SDD-CVPR2024)

  * [NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation](https://arxiv.org/abs/2310.00258v2)
:star:[code](https://github.com/tmtuan1307/nayer)

  * [De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts](http://arxiv.org/abs/2403.19539v1)

  * [PromptKD: Unsupervised Prompt Distillation for Vision-Language Models](https://arxiv.org/abs/2403.02781)
:star:[code](https://github.com/zhengli97/PromptKD)
:house:[project](https://zhengli97.github.io/PromptKD/)
:thumbsup:[中文解读](https://zhengli97.github.io/PromptKD/chinese_interpertation.html)

* 剪枝

  * [Device-Wise Federated Network Pruning](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Device-Wise_Federated_Network_Pruning_CVPR_2024_paper.pdf)

  * [FedMef: Towards Memory-efficient Federated Dynamic Pruning](http://arxiv.org/abs/2403.14737)

  * [OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning](http://arxiv.org/abs/2403.13351)

  * [BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_BilevelPruning_Unified_Dynamic_and_Static_Channel_Pruning_for_Convolutional_Neural_CVPR_2024_paper.pdf)

  * [Resource-Efficient Transformer Pruning for Finetuning of Large Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Ilhan_Resource-Efficient_Transformer_Pruning_for_Finetuning_of_Large_Models_CVPR_2024_paper.pdf)

  * [Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch](https://arxiv.org/abs/2403.14729)

  * [Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning](https://arxiv.org/abs/2406.01820)
:house:[project](https://iurada.github.io/PX)

  * [Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Zero-TPrune_Zero-Shot_Token_Pruning_through_Leveraging_of_the_Attention_Graph_CVPR_2024_paper.pdf)

  * [MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric](http://arxiv.org/abs/2403.07839v1)

  * [Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment](http://arxiv.org/abs/2403.19490v1)

  * [MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning](http://arxiv.org/abs/2404.05621v1)
:star:[code](https://github.com/FarinaMatteo/multiflow)

* 量化

  * [PTQ4SAM: Post-Training Quantization for Segment Anything](https://arxiv.org/abs/2405.03144)

  * [Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector](https://openaccess.thecvf.com/content/CVPR2024/papers/Ding_Reg-PTQ_Regression-specialized_Post-training_Quantization_for_Fully_Quantized_Object_Detector_CVPR_2024_paper.pdf)

  * [Data-Free Quantization via Pseudo-label Filtering](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Data-Free_Quantization_via_Pseudo-label_Filtering_CVPR_2024_paper.pdf)

  * [JointSQ: Joint Sparsification-Quantization for Distributed Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_JointSQ_Joint_Sparsification-Quantization_for_Distributed_Learning_CVPR_2024_paper.pdf)

  * [Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning](http://arxiv.org/abs/2401.01543)

  * [Epistemic Uncertainty Quantification For Pre-Trained Neural Networks](http://arxiv.org/abs/2404.10124)

  * [Enhancing Post-training Quantization Calibration through Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Enhancing_Post-training_Quantization_Calibration_through_Contrastive_Learning_CVPR_2024_paper.pdf)

  * [Towards Accurate Post-training Quantization for Diffusion Models](http://arxiv.org/abs/2305.18723)量化

  * [Is Conventional SNN Really Efficient? A Perspective from Network Quantization](https://arxiv.org/abs/2311.10802)

  * [Are Conventional SNNs Really Efficient? A Perspective from Network Quantization](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_Are_Conventional_SNNs_Really_Efficient_A_Perspective_from_Network_Quantization_CVPR_2024_paper.pdf)



## 28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

* [Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization](http://arxiv.org/abs/2403.14198v1)
:star:[code](https://github.com/liguopeng0923/UCVGL)

* [Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery](http://arxiv.org/abs/2403.05419v1)
:star:[code](https://github.com/techmn/satmae_pp)

* [Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery](http://arxiv.org/abs/2403.11812v1)
:house:[project](https://zyqz97.github.io/Aerial_Lifting/)

* [S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_S2MAE_A_Spatial-Spectral_Pretraining_Foundation_Model_for_Spectral_Remote_Sensing_CVPR_2024_paper.pdf)

* [Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans](https://arxiv.org/abs/2304.09704)
:house:[project](https://imagine.enpc.fr/~loiseaur/learnable-earth-parser)

* [WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Kumar_WildlifeMapper_Aerial_Image_Analysis_for_Multi-Species_Detection_and_Identification_CVPR_2024_paper.pdf)
:star:[code](https://github.com/UCSB-VRL/WildlifeMapper)

* [Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels](https://arxiv.org/abs/2403.02746)
:star:[code](https://github.com/LiZhuoHong/Paraformer)

* 遥感

  * [GeoChat: Grounded Large Vision-Language Model for Remote Sensing](http://arxiv.org/abs/2311.15826)

  * [SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery](https://arxiv.org/abs/2312.10115)

  * [3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions](http://arxiv.org/abs/2404.04823v1)
:star:[code](https://github.com/opendatalab/MLS-BRN.git)

  * [Poly Kernel Inception Network for Remote Sensing Detection](https://arxiv.org/abs/2403.06258)

  * [Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening](http://arxiv.org/abs/2404.07543v1)
:star:[code](https://github.com/duanyll/CANConv)

* 航空图像分割

  * [SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation](http://arxiv.org/abs/2403.16605v1)

  * [Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation](https://arxiv.org/abs/2312.12470)
:star:[code](https://github.com/Lsan2401/RMSIN)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* 基于参考图像的超分辨率

  * [Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model](http://arxiv.org/abs/2403.17460v1)
:star:[code](https://github.com/dongrunmin/RefDiff)

* 基于UAV的目标检测

  * [Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Weakly_Misalignment-free_Adaptive_Feature_Alignment_for_UAVs-based_Multimodal_Object_Detection_CVPR_2024_paper.pdf)

* 交叉视角定位

  * [View From Above: Orthogonal-View aware Cross-view Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_View_From_Above_Orthogonal-View_aware_Cross-view_Localization_CVPR_2024_paper.pdf)



## 27.Vision-Language(视觉语言)

* [A Vision Check-up for Language Models](http://arxiv.org/abs/2401.01862)

* [The Neglected Tails in Vision-Language Models](http://arxiv.org/abs/2401.12425)

* [Beyond Average: Individualized Visual Scanpath Prediction](http://arxiv.org/abs/2404.12235v1)

* [ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models](http://arxiv.org/abs/2311.16494)

* [Language Models as Black-Box Optimizers for Vision-Language Models](http://arxiv.org/abs/2309.05950)

* [Distilling Vision-Language Models on Millions of Videos](http://arxiv.org/abs/2401.06129)

* [SonicVisionLM: Playing Sound with Vision Language Models](http://arxiv.org/abs/2401.04394)

* [Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model](http://arxiv.org/abs/2312.12423)

* [Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models](http://arxiv.org/abs/2312.03052)

* [JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_JoAPR_Cleaning_the_Lens_of_Prompt_Learning_for_Vision-Language_Models_CVPR_2024_paper.pdf)

* [MMA: Multi-Modal Adapter for Vision-Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_MMA_Multi-Modal_Adapter_for_Vision-Language_Models_CVPR_2024_paper.pdf)

* [Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Fu_Linguistic-Aware_Patch_Slimming_Framework_for_Fine-grained_Cross-Modal_Alignment_CVPR_2024_paper.pdf)

* [Building Vision-Language Models on Solid Foundations with Masked Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Sameni_Building_Vision-Language_Models_on_Solid_Foundations_with_Masked_Distillation_CVPR_2024_paper.pdf)

* [TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model](https://arxiv.org/abs/2311.18231)
:star:[code](https://github.com/htyao89/Textual-based_Class-aware_prompt_tuning/)

* [On Scaling Up a Multilingual Vision and Language Model](http://arxiv.org/abs/2305.18565)

* [CogAgent: A Visual Language Model for GUI Agents](https://arxiv.org/abs/2312.08914)
:star:[code](https://github.com/THUDM/CogVLM)

* [Towards Better Vision-Inspired Vision-Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_Towards_Better_Vision-Inspired_Vision-Language_Models_CVPR_2024_paper.pdf)

* [SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_SaCo_Loss_Sample-wise_Affinity_Consistency_for_Vision-Language_Pre-training_CVPR_2024_paper.pdf)

* [MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer](https://arxiv.org/abs/2403.02991)

* [Sequential Modeling Enables Scalable Learning for Large Vision Models](https://arxiv.org/abs/2312.00785)
:house:[project](https://yutongbai.com/lvm.html)大型视觉模型

* [Seeing the Unseen: Visual Common Sense for Semantic Placement](https://arxiv.org/abs/2401.07770)

* [Efficient Vision-Language Pre-training by Cluster Masking](https://arxiv.org/abs/2405.08815)
:star:[code](https://github.com/Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking)
:house:[project](https://zxp46.github.io/cluster-masking/)

* [VILA: On Pre-training for Visual Language Models](https://arxiv.org/abs/2312.07533)

* [EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models](https://arxiv.org/pdf/2311.15596.pdf)
:star:[code](https://github.com/AdaCheng/EgoThink)
:house:[project](https://adacheng.github.io/EgoThink/)

* [SPIN: Simultaneous Perception Interaction and Navigation](http://arxiv.org/abs/2405.07991)

* [MAFA: Managing False Negatives for Vision-Language Pre-training](https://openaccess.thecvf.com/content/CVPR2024/papers/Byun_MAFA_Managing_False_Negatives_for_Vision-Language_Pre-training_CVPR_2024_paper.pdf)

* [Visual In-Context Prompting](https://arxiv.org/abs/2311.13601)
:star:[code](https://github.com/UX-Decoder/DINOv)

* [Semantics-aware Motion Retargeting with Vision-Language Models](https://arxiv.org/abs/2312.01964)

* [DePT: Decoupled Prompt Tuning](https://arxiv.org/abs/2309.07439)
:star:[code](https://github.com/Koorye/DePT)

* [Osprey: Pixel Understanding with Visual Instruction Tuning](https://arxiv.org/abs/2312.10032)
:star:[code](https://github.com/CircleRadon/Osprey)

* [FairCLIP: Harnessing Fairness in Vision-Language Learning](http://arxiv.org/abs/2403.19949v1)
:house:[project](https://ophai.hms.harvard.edu/datasets/fairvlmed10k)

* [Efficient Test-Time Adaptation of Vision-Language Models](http://arxiv.org/abs/2403.18293v1)
:star:[code](https://kdiaaa.github.io/tda/)

* [BioCLIP: A Vision Foundation Model for the Tree of Life](https://arxiv.org/abs/2311.18803)
:star:[code](https://github.com/Imageomics/bioclip)

* [InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks](https://arxiv.org/abs/2312.14238)
:star:[code](https://github.com/OpenGVLab/InternVL)

* [Anchor-based Robust Finetuning of Vision-Language Models](https://arxiv.org/abs/2404.06244)

* [Multi-Modal Hallucination Control by Visual Information Grounding](http://arxiv.org/abs/2403.14003v1)

* [Do Vision and Language Encoders Represent the World Similarly?](https://arxiv.org/abs/2401.05224)

* [Dual-View Visual Contextualization for Web Navigation](https://arxiv.org/abs/2402.04476)

* [Any-Shift Prompting for Generalization over Distributions](https://arxiv.org/abs/2402.10099)

* [Non-autoregressive Sequence-to-Sequence Vision-Language Models](http://arxiv.org/abs/2403.02249v1)

* [One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models](http://arxiv.org/abs/2403.01849v1)
:star:[code](https://github.com/TreeLLi/APT)

* [SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models](http://arxiv.org/abs/2403.13263v1)
:star:[code](https://github.com/ivattyue/SC-Tune)

* [RegionGPT: Towards Region Understanding Vision Language Model](http://arxiv.org/abs/2403.02330v1)

* [Enhancing Vision-Language Pre-training with Rich Supervisions](http://arxiv.org/abs/2403.03346v1)

* [Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://arxiv.org/abs/2312.00878)
:star:[code](https://github.com/WalBouss/GEM)

* [Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples](https://arxiv.org/abs/2312.00825)

* [Beyond Text: Frozen Large Language Models in Visual Signal Comprehension](http://arxiv.org/abs/2403.07874v1)
:star:[code](https://github.com/zh460045050/V2L-Tokenizer)

* [Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding](http://arxiv.org/abs/2306.08832)
:star:[code](https://github.com/lezhang7/Enhance-FineGrained)视觉语言构图理解

* [FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models](http://arxiv.org/abs/2405.10286)

* [Enhancing Vision-Language Pretraining with Rich Supervisions]

* [Improved Baselines with Visual Instruction Tuning](https://arxiv.org/abs/2310.03744)
:house:[project](https://llava-vl.github.io/)

* [Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Ishmam_Semantic_Shield_Defending_Vision-Language_Models_Against_Backdooring_and_Poisoning_via_CVPR_2024_paper.pdf)

* [Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models](http://arxiv.org/abs/2403.17589v1)
:star:[code](https://github.com/YBZh/DMN)

* [A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models](https://arxiv.org/abs/2312.12730)
:star:[code](https://github.com/jusiro/CLAP)

* [Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding](https://arxiv.org/abs/2312.00081)
:star:[code](https://github.com/wjpoom/SPEC)

* [SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining](http://arxiv.org/abs/2404.01156v1)视觉-语言

* [Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning](http://arxiv.org/abs/2404.00909v1)

* [Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping](http://arxiv.org/abs/2404.00974v1)
:star:[code](https://github.com/kwonjunn01/Hi-Mapper)

* [Iterated Learning Improves Compositionality in Large Vision-Language Models](http://arxiv.org/abs/2404.02145v1)

* [ViTamin: Designing Scalable Vision Models in the Vision-Language Era](http://arxiv.org/abs/2404.02132v1)
:star:[code](https://github.com/Beckschen/ViTamin)

* [Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners](http://arxiv.org/abs/2404.02117v1)
:star:[code](https://github.com/KHU-AGI/PriViLege)

* [Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models](https://arxiv.org/abs/2404.02233)
:house:[project](https://yorkucvil.github.io/VCC)

* [Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning](https://arxiv.org/abs/2404.03658)
:house:[project](https://ruili3.github.io/kyn)

* [HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models](https://arxiv.org/abs/2310.14566)
:star:[code](https://github.com/tianyi-lab/HallusionBench)

* [Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models](https://arxiv.org/abs/2402.19014)

* [Learning Vision from Models Rivals Learning Vision from Data](https://arxiv.org/abs/2312.17742)
:star:[code](https://github.com/google-research/syn-rep-learn)

* [Probing the 3D Awareness of Visual Foundation Models](http://arxiv.org/abs/2404.08636v1)
:star:[code](https://github.com/mbanani/probe3d)

* [LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning](http://arxiv.org/abs/2311.18651)
:house:[project](https://ll3da.github.io/)

* 视觉理解

  * [Towards More Unified In-context Visual Understanding](https://arxiv.org/abs/2312.02520)

* LLM

  * [PixelLM: Pixel Reasoning with Large Multimodal Model](https://arxiv.org/abs/2312.02228)
:house:[project](https://pixellm.github.io/)

  * [OneLLM: One Framework to Align All Modalities with Language](http://arxiv.org/abs/2312.03700)

  * [Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld](http://arxiv.org/abs/2311.16714)

  * [Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhong_Lets_Think_Outside_the_Box_Exploring_Leap-of-Thought_in_Large_Language_CVPR_2024_paper.pdf)

  * [Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs](http://arxiv.org/abs/2404.07449)

  * [Hallucination Augmented Contrastive Learning for Multimodal Large Language Model](http://arxiv.org/abs/2312.06968)

  * [See Say and Segment: Teaching LMMs to Overcome False Premises](http://arxiv.org/abs/2312.08366)

  * [ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_ViP-LLaVA_Making_Large_Multimodal_Models_Understand_Arbitrary_Visual_Prompts_CVPR_2024_paper.pdf)

  * [Driving Everywhere with Large Language Model Policy Adaptation](https://arxiv.org/abs/2402.05932)
:house:[project](https://boyiliee.github.io/llada)

  * [Exploring the Transferability of Visual Prompting for Multimodal Large Language Models](http://arxiv.org/abs/2404.11207v1)

  * [GROUNDHOG: Grounding Large Language Models to Holistic Segmentation](https://arxiv.org/abs/2402.16846)
:house:[project](https://groundhog-mllm.github.io/)

  * [Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement](http://arxiv.org/abs/2404.04627v1)
:house:[project](https://zaidkhan.me/ViReP)

  * [V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs](https://arxiv.org/abs/2312.14135)
:star:[code](https://github.com/penghao-wu/vstar)

  * [Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding](https://arxiv.org/abs/2311.16922)

  * [Pixel Aligned Language Models](https://arxiv.org/abs/2312.09237)
:house:[project](https://jerryxu.net/PixelLLM/)

  * [SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection](http://arxiv.org/abs/2403.03170v1)

  * [OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation](https://arxiv.org/abs/2311.17911)
:star:[code](https://github.com/shikiw/OPERA)多模态大语言模型

  * [Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_Low-Rank_Approximation_for_Sparse_Attention_in_Multi-Modal_LLMs_CVPR_2024_paper.pdf)LLMs

  * [LISA: Reasoning Segmentation via Large Language Model](https://arxiv.org/abs/2308.00692)
:star:[code](https://github.com/dvlab-research/LISA)

  * [Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_Querying_as_Prompt_Parameter-Efficient_Learning_for_Multimodal_Language_Model_CVPR_2024_paper.pdf)

  * [Compositional Chain-of-Thought Prompting for Large Multimodal Models](https://arxiv.org/abs/2311.17076)
:star:[code](https://github.com/chancharikmitra/CCoT)

  * [Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs](https://arxiv.org/abs/2401.06209)
:house:[project](https://tsb0601.github.io/mmvp_blog/)

  * [Honeybee: Locality-enhanced Projector for Multimodal LLM](https://arxiv.org/abs/2312.06742)
:star:[code](https://github.com/kakaobrain/honeybee)LLM

  * [HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data](https://arxiv.org/abs/2311.13614)
:star:[code](https://github.com/Yuqifan1117/HalluciDoctor)

  * [SEED-Bench: Benchmarking Multimodal Large Language Models](https://arxiv.org/abs/2404.16790)
:star:[code](https://github.com/AILab-CVC/SEED-Bench)

  * [PerceptionGPT: Effectively Fusing Visual Perception into LLM](https://arxiv.org/abs/2311.06612)

  * [UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All](http://arxiv.org/abs/2403.12532v1)

  * [ModaVerse: Efficiently Transforming Modalities with LLMs](https://arxiv.org/abs/2401.06395)

  * [VCoder: Versatile Vision Encoders for Multimodal Large Language Models](https://arxiv.org/abs/2312.14233)
:star:[code](https://github.com/SHI-Labs/VCoder)
:house:[project](https://praeclarumjj3.github.io/vcoder/)

  * [mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration](https://arxiv.org/abs/2311.04257)

  * [MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World](https://arxiv.org/abs/2401.08577)
:house:[project](https://vis-www.cs.umass.edu/multiply)大语言模型

  * [RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback](https://arxiv.org/abs/2312.00849)
:star:[code](https://github.com/RLHF-V/RLHF-V)

  * [DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model](https://arxiv.org/abs/2404.01342)
:star:[code](https://github.com/OpenGVLab/DiffAgent)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [Prompt Highlighter: Interactive Control for Multi-Modal LLMs](https://arxiv.org/abs/2312.04302)
:star:[code](https://github.com/dvlab-research/Prompt-Highlighter)
:house:[project](https://julianjuaner.github.io/projects/PromptHighlighter/)

  * [Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft](https://arxiv.org/abs/2312.09238)
:house:[project](https://yangxue0827.github.io/auto_mc-reward.html)

  * [General Object Foundation Model for Images and Videos at Scale](https://arxiv.org/abs/2312.09158)
:star:[code](https://github.com/FoundationVision/GLEE)
:house:[project](https://glee-vision.github.io/)
:thumbsup:[GLEE 华科与字节跳动联手打造全能目标感知基础模型](https://mp.weixin.qq.com/s/3RTxWRH7CM6_AbeLT6v0PA)

  * [Link-Context Learning for Multimodal LLMs](https://arxiv.org/abs/2308.07891)
:star:[code](https://github.com/isekai-portal/Link-Context-Learning)LLMs

  * [Cloud-Device Collaborative Learning for Multimodal Large Language Models](https://arxiv.org/abs/2312.16279)

  * [LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model](https://arxiv.org/abs/2406.04659)
:star:[code](https://github.com/kennethwdk/LocLLM)
:thumbsup:[成果速览 | CVPR2024细粒度视觉感知多模态大模型Pink、LocLLM](https://idm.pku.edu.cn/info/1012/1839.htm)

  * [Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs](https://arxiv.org/abs/2310.00582)
:star:[code](https://github.com/SY-Xuan/Pink)
:thumbsup:[成果速览 | CVPR2024细粒度视觉感知多模态大模型Pink、LocLLM](https://idm.pku.edu.cn/info/1012/1839.htm)

  * [LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge](https://arxiv.org/abs/2311.11860)
:star:[code](https://github.com/rshaojimmy/JiuTian)
:house:[project](https://rshaojimmy.github.io/Projects/JiuTian-LION)MLLMs

  * [GSVA: Generalized Segmentation via Multimodal Large Language Models](https://arxiv.org/abs/2312.10103)

* VLN

  * [Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation](http://arxiv.org/abs/2404.01943v1)
:star:[code](https://github.com/MrZihan/HNR-VLN)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

  * [Volumetric Environment Representation for Vision-Language Navigation](http://arxiv.org/abs/2403.14158v1)

  * [OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation](http://arxiv.org/abs/2403.17334v1)

  * [Vision-and-Language Navigation via Causal Learning](https://arxiv.org/abs/2404.10241)
:star:[code](https://github.com/CrystalSixone/VLN-GOAT)视觉和语言导航

* 视频语言

  * [VidLA: Video-Language Alignment at Scale](http://arxiv.org/abs/2403.14870v1)

  * [SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Lee_SRTube_Video-Language_Pre-Training_with_Action-Centric_Video_Tube_Features_and_Semantic_CVPR_2024_paper.pdf)

  * [VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_VISTA-LLAMA_Reducing_Hallucination_in_Video_Language_Models_via_Equal_Distance_CVPR_2024_paper.pdf)

  * [VideoLLM-online: Online Video Large Language Model for Streaming Video](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_VideoLLM-online_Online_Video_Large_Language_Model_for_Streaming_Video_CVPR_2024_paper.pdf)
:house:[project](https://showlab.github.io/videollm-online/)

* Visual Grounding

  * [Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners](http://arxiv.org/abs/2404.19696)

  * [MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding](http://arxiv.org/abs/2403.03077)

  * [Viewpoint-Aware Visual Grounding in 3D Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_Viewpoint-Aware_Visual_Grounding_in_3D_Scenes_CVPR_2024_paper.pdf)

  * [Improved Visual Grounding through Self-Consistent Explanations](http://arxiv.org/abs/2312.04554)

  * [Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding](https://arxiv.org/abs/2311.15383)
:house:[project](https://curryyuan.github.io/ZSVG3D/)

  * [Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Towards_CLIP-driven_Language-free_3D_Visual_Grounding_via_2D-3D_Relational_Enhancement_CVPR_2024_paper.pdf)Visual Grounding

  * [Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_G3-LQ_Marrying_Hyperbolic_Alignment_with_Explicit_Semantic-Geometric_Modeling_for_3D_CVPR_2024_paper.pdf)

  * [Multi-Attribute Interactions Matter for 3D Visual Grounding](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_Multi-Attribute_Interactions_Matter_for_3D_Visual_Grounding_CVPR_2024_paper.pdf)

  * [Investigating Compositional Challenges in Vision-Language Models for Visual Grounding](https://openaccess.thecvf.com/content/CVPR2024/papers/Zeng_Investigating_Compositional_Challenges_in_Vision-Language_Models_for_Visual_Grounding_CVPR_2024_paper.pdf)

  * [Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language](https://openaccess.thecvf.com/content/CVPR2024/papers/Hamilton_Separating_the_Chirp_from_the_Chat_Self-supervised_Visual_Grounding_of_CVPR_2024_paper.pdf)

* 多模态模型

  * [GLaMM: Pixel Grounding Large Multimodal Model](https://arxiv.org/abs/2311.03356)

  * [Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models](https://arxiv.org/abs/2311.06607)
:star:[code](https://github.com/Yuliang-Liu/Monkey)

  * [What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models](https://arxiv.org/abs/2310.06627)
:house:[project](https://bzhao.me/C-VQA/)

  * [Multi-modal Learning for Geospatial Vegetation Forecasting](http://arxiv.org/abs/2303.16198)

  * [Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception](https://arxiv.org/abs/2403.02969)

  * [MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception](https://arxiv.org/abs/2312.07472)

  * [TRINS: Towards Multimodal Language Models that Can Read](https://arxiv.org/abs/2406.06730)

  * [Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations](http://arxiv.org/abs/2403.07241v1)

* 视觉基础模型

  * [Three Pillars improving Vision Foundation Model Distillation for Lidar](https://arxiv.org/abs/2310.17504)
:star:[code](https://github.com/valeoai/ScaLR)

  * [Bridging Remote Sensors with Multisensor Geospatial Foundation Models](http://arxiv.org/abs/2404.01260)

  * [Low-Resource Vision Challenges for Foundation Models](http://arxiv.org/abs/2401.04716)

* 多视图理解

  * [Learning to Select Views for Efficient Multi-View Understanding](https://arxiv.org/abs/2303.06145)
:star:[code](https://github.com/hou-yz/MVSelect)

* 视觉定位

  * [Learning to Produce Semi-dense Correspondences for Visual Localization](https://arxiv.org/abs/2402.08359)
:star:[code](https://github.com/TruongKhang/DeViLoc)

  * [PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs](https://arxiv.org/abs/2402.08657)定位

  * [Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds](https://openaccess.thecvf.com/content/CVPR2024/papers/Moon_Efficient_Privacy-Preserving_Visual_Localization_Using_3D_Ray_Clouds_CVPR_2024_paper.pdf)



## 26.Information Security(信息安全)

* [CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion](http://arxiv.org/abs/2403.11162v1)
:star:[code](https://github.com/Nicholas0228/Revelio)

* [WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights](http://arxiv.org/abs/2405.02066)

* 图像隐写术

  * [Purified and Unified Steganographic Network](http://arxiv.org/abs/2402.17210v1)
:star:[code](https://github.com/albblgb/PUSNet)

* 知识产权保护

  * [Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models](http://arxiv.org/abs/2404.09401)

  * [MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection](http://arxiv.org/abs/2403.04149v1)
:star:[code](https://github.com/ispc-lab/MAP)

  * [CPR: Retrieval Augmented Generation for Copyright Protection](http://arxiv.org/abs/2403.18920v1)

  * [VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models](http://arxiv.org/abs/2312.00057)
:star:[code](https://github.com/South7X/VA3)

  * [Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models](http://arxiv.org/abs/2404.04956v1)

* IP 保护 

  * [Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining](https://arxiv.org/abs/2404.02889v1)



## 25.Object Tracking(目标跟踪)

* [3D Feature Tracking via Event Camera](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_3D_Feature_Tracking_via_Event_Camera_CVPR_2024_paper.pdf)

* [Projecting Trackable Thermal Patterns for Dynamic Computer Vision](https://openaccess.thecvf.com/content/CVPR2024/papers/Sheinin_Projecting_Trackable_Thermal_Patterns_for_Dynamic_Computer_Vision_CVPR_2024_paper.pdf)

* [ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe](https://arxiv.org/abs/2312.17133)

* [DPHMs: Diffusion Parametric Head Models for Depth-based Tracking](https://arxiv.org/abs/2312.01068)
:house:[project](https://tangjiapeng.github.io/projects/DPHMs/)

* [NetTrack: Tracking Highly Dynamic Objects with a Net](http://arxiv.org/abs/2403.11186v1)
:star:[code](https://george-zhuang.github.io/nettrack/)

* [RTracker: Recoverable Tracking via PN Tree Structured Memory](http://arxiv.org/abs/2403.19242v1)

* [Context-Aware Integration of Language and Visual References for Natural Language Tracking](http://arxiv.org/abs/2403.19975v1)

* [CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Shah_CodedEvents_Optimal_Point-Spread-Function_Engineering_for_3D-Tracking_with_Event_Cameras_CVPR_2024_paper.pdf)

* [SpatialTracker: Tracking Any 2D Pixels in 3D Space](http://arxiv.org/abs/2404.04319v1)
:star:[code](https://henry123-boy.github.io/SpaTracker/)

* [Learning Tracking Representations from Single Point Annotations](http://arxiv.org/abs/2404.09504v1)

* 视觉目标跟踪

  * [DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_DiffusionTrack_Point_Set_Diffusion_Model_for_Visual_Object_Tracking_CVPR_2024_paper.pdf)

  * [OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning](http://arxiv.org/abs/2403.09634v1)

  * [HIPTrack: Visual Tracking with Historical Prompts](https://arxiv.org/abs/2311.02072)
:star:[code](https://github.com/WenRuiCai/HIPTrack)

  * [Single-Model and Any-Modality for Video Object Tracking](https://arxiv.org/abs/2311.15851)
:star:[code](https://github.com/Zongwei97/UnTrack)

  * [SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking](http://arxiv.org/abs/2403.16002v1)
:star:[code](https://github.com/hoqolo/SDSTrack)

* 多目标跟踪

  * [Multi-Object Tracking in the Dark](https://arxiv.org/abs/2405.06600)

  * [Towards Generalizable Multi-Object Tracking](https://arxiv.org/abs/2406.00429)

  * [ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association](https://arxiv.org/abs/2405.08909)

  * [DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking](https://arxiv.org/abs/2403.02767)

  * [Delving into the Trajectory Long-tail Distribution for Muti-object Tracking](http://arxiv.org/abs/2403.04700v1)
:star:[code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT)

  * [Self-Supervised Multi-Object Tracking with Path Consistency](http://arxiv.org/abs/2404.05136v1)

  * [DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction](https://arxiv.org/abs/2403.02075)
:star:[code](https://github.com/Kroery/DiffMOT)
:house:[project](https://diffmot.github.io/)

  * [iKUN: Speak to Trackers without Retraining](https://arxiv.org/abs/2312.16245)
:star:[code](https://github.com/dyhBUPT/iKUN)

* 点跟踪

  * [LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry](https://arxiv.org/abs/2401.01887)



## 24.Machine Learning(机器学习)

* [Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision](https://openaccess.thecvf.com/content/CVPR2024/papers/Juan_Molecular_Data_Programming_Towards_Molecule_Pseudo-labeling_with_Systematic_Weak_Supervision_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://sai.jlu.edu.cn/info/1026/4601.htm)

* [Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization](https://arxiv.org/abs/2406.04155)
:house:[project](https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/lpo/)

* [Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiong_Circuit_Design_and_Efficient_Simulation_of_Quantum_Inner_Product_and_CVPR_2024_paper.pdf)

* 对抗

  * [Infrared Adversarial Car Stickers](https://arxiv.org/abs/2405.09924)

  * [Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_Robust_Distillation_via_Untargeted_and_Targeted_Intermediate_Adversarial_Samples_CVPR_2024_paper.pdf)

  * [Revisiting Adversarial Training Under Long-Tailed Distributions](http://arxiv.org/abs/2403.10073)

  * [PAD: Patch-Agnostic Defense against Adversarial Patch Attacks](https://export.arxiv.org/abs/2404.16452)
:star:[code](https://github.com/Lihua-Jing/PAD)

  * [Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training](http://arxiv.org/abs/2404.04647v1)

  * [MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_MimicDiffusion_Purifying_Adversarial_Perturbation_via_Mimicking_Clean_Diffusion_Model_CVPR_2024_paper.pdf)对抗性扰动

  * [Towards Transferable Targeted 3D Adversarial Attack in the Physical World](https://arxiv.org/abs/2312.09558)

  * [Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack](https://openaccess.thecvf.com/content/CVPR2024/papers/Ahmed_Deep-TROJ_An_Inference_Stage_Trojan_Insertion_Algorithm_through_Efficient_Weight_CVPR_2024_paper.pdf)攻击

  * [Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Fares_Attack_To_Defend_Exploiting_Adversarial_Attacks_for_Detecting_Poisoned_Models_CVPR_2024_paper.pdf)

  * [Re-thinking Data Availability Attacks Against Deep Neural Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Fang_Re-thinking_Data_Availability_Attacks_Against_Deep_Neural_Networks_CVPR_2024_paper.pdf)

  * [SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Navaneet_SlowFormer_Adversarial_Attack_on_Compute_and_Energy_Consumption_of_Efficient_CVPR_2024_paper.pdf)

  * [Re-thinking Data Availablity Attacks Against Deep Neural Networks](https://arxiv.org/abs/2305.10691)攻击

  * [NAPGuard: Towards Detecting Naturalistic Adversarial Patches](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_NAPGuard_Towards_Detecting_Naturalistic_Adversarial_Patches_CVPR_2024_paper.pdf)

  * [Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training](https://arxiv.org/abs/2312.07067)

  * [Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Not_All_Prompts_Are_Secure_A_Switchable_Backdoor_Attack_Against_CVPR_2024_paper.pdf)后门攻击

  * [Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World](http://arxiv.org/abs/2404.19417)

  * [Backdoor Defense via Test-Time Detecting and Repairing](https://openaccess.thecvf.com/content/CVPR2024/papers/Guan_Backdoor_Defense_via_Test-Time_Detecting_and_Repairing_CVPR_2024_paper.pdf)

  * [Nearest Is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks](https://arxiv.org/abs/2405.12725)

  * [Semantic-Aware Multi-Label Adversarial Attacks](https://openaccess.thecvf.com/content/CVPR2024/papers/Mahmood_Semantic-Aware_Multi-Label_Adversarial_Attacks_CVPR_2024_paper.pdf)对抗攻击

  * [Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning](https://arxiv.org/abs/2209.11964v2)

  * [Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Improving_Transferable_Targeted_Adversarial_Attacks_with_Model_Self-Enhancement_CVPR_2024_paper.pdf)对抗攻击

  * [On the Robustness of Large Multimodal Models Against Image Adversarial Attacks](https://arxiv.org/abs/2312.03777)

  * [Incremental Residual Concept Bottleneck Models](https://arxiv.org/abs/2404.08978)

  * [Revisiting Adversarial Training at Scale](https://arxiv.org/abs/2401.04727)
:star:[code](https://github.com/UCSC-VLAA/AdvXL)

  * [Language-Driven Anchors for Zero-Shot Adversarial Robustness](https://arxiv.org/abs/2301.13096)零样本对抗

  * [Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training](https://openaccess.thecvf.com/content/CVPR2024/papers/Ming_Transferable_Structural_Sparse_Adversarial_Attack_Via_Exact_Group_Sparsity_Training_CVPR_2024_paper.pdf)

  * [Learning to Transform Dynamically for Better Adversarial Transferability](https://arxiv.org/abs/2405.14077)

  * [Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay](https://arxiv.org/abs/2404.01828)

  * [Boosting Adversarial Transferability by Block Shuffle and Rotation](https://arxiv.org/abs/2308.10299)
:star:[code](https://github.com/Trustworthy-AI-Group/BSR)对抗性可转移性

  * [MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models](https://arxiv.org/abs/2403.19080v1)

  * [Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness](https://arxiv.org/abs/2401.04350)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

  * [Adversaral Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights](https://arxiv.org/abs/2311.15994)

  * [PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor](http://arxiv.org/abs/2403.06668v1)

  * [Revisiting Adversarial Training under Long-Tailed Distributions](http://arxiv.org/abs/2403.10073v1)
:star:[code](https://github.com/NISPLab/AT-BSL)

  * [Towards Fairness-Aware Adversarial Learning](http://arxiv.org/abs/2402.17729v1)

  * [Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Dispel_Darkness_for_Better_Fusion_A_Controllable_Visual_Enhancer_based_CVPR_2024_paper.pdf)

  * [Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement](http://arxiv.org/abs/2403.09101v1)

  * [Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM](http://arxiv.org/abs/2403.11448v1)

  * [Boosting Adversarial Training via Fisher-Rao Norm-based Regularization](http://arxiv.org/abs/2403.17520v1)
:star:[code](https://github.com/TrustAI/LOAT)

  * [A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning](http://arxiv.org/abs/2405.04115)攻击

  * 后门攻击

    * [LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning](https://arxiv.org/abs/2403.17188)
:star:[code](https://github.com/Megum1/LOTUS)

    * [Test-Time Backdoor Defense via Detecting and Repairing](https://arxiv.org/pdf/2308.06107.pdf)

    * [Data Poisoning based Backdoor Attacks to Contrastive Learning](https://arxiv.org/pdf/2211.08229.pdf)
:star:[code](https://github.com/jzhang538/CorruptEncoder)

* 持续学习

  * [RCL: Reliable Continual Learning for Unified Failure Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_RCL_Reliable_Continual_Learning_for_Unified_Failure_Detection_CVPR_2024_paper.pdf)

  * [Consistent Prompting for Rehearsal-Free Continual Learning](http://arxiv.org/abs/2403.08568)

  * [Improving Plasticity in Online Continual Learning via Collaborative Learning](http://arxiv.org/abs/2312.00600)

  * [Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters](http://arxiv.org/abs/2403.11549v1)
:star:[code](https://github.com/JiazuoYu/MoE-Adapters4CL)

  * [Enhancing Visual Continual Learning with Language-Guided Supervision](http://arxiv.org/abs/2403.16124v1)

  * [Convolutional Prompting meets Language Models for Continual Learning](http://arxiv.org/abs/2403.20317v1)

  * [Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning](https://arxiv.org/abs/2405.19074)

  * [Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation](http://arxiv.org/abs/2404.00417v1)

  * [InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning](http://arxiv.org/abs/2404.00228v1)

  * [Learning Equi-angular Representations for Online Continual Learning](http://arxiv.org/abs/2404.01628v1)

  * [BrainWash: A Poisoning Attack to Forget in Continual Learning](https://arxiv.org/abs/2311.11995)

  * [Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning](http://arxiv.org/abs/2405.16754)持续学习

  * [Traceable Federated Continual Learning](https://openreview.net/forum?id=OkZ5UrVpo6)

  * [Interactive Continual Learning: Fast and Slow Thinking](https://arxiv.org/abs/2403.02628)

* 增量学习

  * [Towards Efficient Replay in Federated Incremental Learning](http://arxiv.org/abs/2403.05890)

* 类增量学习

  * [Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Qiu_Dual-Consistency_Model_Inversion_for_Non-Exemplar_Class_Incremental_Learning_CVPR_2024_paper.pdf)

  * [Class Incremental Learning with Multi-Teacher Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wen_Class_Incremental_Learning_with_Multi-Teacher_Distillation_CVPR_2024_paper.pdf)

  * [Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Dual-Enhanced_Coreset_Selection_with_Class-wise_Collaboration_for_Online_Blurry_Class_CVPR_2024_paper.pdf)

  * [Generative Multi-modal Models are Good Class Incremental Learners](http://arxiv.org/abs/2403.18383)

  * [FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_FCS_Feature_Calibration_and_Separation_for_Non-Exemplar_Class_Incremental_Learning_CVPR_2024_paper.pdf)

  * [OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning](http://arxiv.org/abs/2403.18550)

  * [Long-Tail Class Incremental Learning via Independent Sub-prototype Construction](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Long-Tail_Class_Incremental_Learning_via_Independent_Sub-prototype_Construction_CVPR_2024_paper.pdf)

  * [Gradient Reweighting: Towards Imbalanced Class-Incremental Learning](http://arxiv.org/abs/2402.18528v1)

  * [DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/He_DYSON_Dynamic_Feature_Space_Self-Organization_for_Online_Task-Free_Class_Incremental_CVPR_2024_paper.pdf)

  * [NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Gurbuz_NICE_Neurogenesis_Inspired_Contextual_Encoding_for_Replay-free_Class_Incremental_Learning_CVPR_2024_paper.pdf)
:star:[code](https://github.com/BurakGurbuz97/NICE)

  * [Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning](http://arxiv.org/abs/2403.12030v1)
:star:[code](https://github.com/sun-hailong/CVPR24-Ease)

  * [Text-Enhanced Data-free Approach for Federated Class-Incremental Learning](http://arxiv.org/abs/2403.14101v1)
:star:[code](https://github.com/tmtuan1307/lander)

  * [Generative Multi-modal Models are Good Class-Incremental Learners](http://arxiv.org/abs/2403.18383v1)
:star:[code](https://github.com/DoubleClass/GMM)

  * [Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning](https://arxiv.org/abs/2212.08251)
:star:[code](https://github.com/scok30/tass)

* 多任务

  * [Masked AutoDecoder is Effective Multi-Task Vision Generalist](http://arxiv.org/abs/2403.07692v1)

  * [OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Srivastava_OmniVec2_-_A_Novel_Transformer_based_Network_for_Large_Scale_CVPR_2024_paper.pdf)

  * [Task-conditioned adaptation of visual features in multi-task policy learning](https://arxiv.org/abs/2402.07739)

  * [DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data](http://arxiv.org/abs/2403.15389v1)
:star:[code](https://prismformore.github.io/diffusionmtl/)

  * [FedHCA2: Towards Hetero-Client Federated Multi-Task Learning](https://arxiv.org/abs/2311.13250)
:star:[code](https://github.com/innovator-zero/FedHCA2)

  * [MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning](http://arxiv.org/abs/2403.20320v1)

  * [Joint-Task Regularization for Partially Labeled Multi-Task Learning](http://arxiv.org/abs/2404.01976v1)

  * [Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning](http://arxiv.org/abs/2402.07739)

  * 多标签学习

    * [View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Ou_View-Category_Interactive_Sharing_Transformer_for_Incomplete_Multi-View_Multi-Label_Learning_CVPR_2024_paper.pdf)

* 多视角学习

  * [Rethinking Multi-view Representation Learning via Distilled Disentangling](http://arxiv.org/abs/2403.10897v1)
:star:[code](https://github.com/Guanzhou-Ke/MRDD)

* 元学习

  * [FREE: Faster and Better Data-Free Meta-Learning](http://arxiv.org/abs/2405.00984)

  * [Improving Generalization via Meta-Learning on Hard Samples](http://arxiv.org/abs/2403.12236v1)

* 联邦学习

  * [An Aggregation-Free Federated Learning for Tackling Data Heterogeneity](http://arxiv.org/abs/2404.18962)

  * [Decentralized Directed Collaboration for Personalized Federated Learning](http://arxiv.org/abs/2405.17876)

  * [Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data](https://arxiv.org/abs/2403.16398)

  * [Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Byzantine-robust_Decentralized_Federated_Learning_via_Dual-domain_Clustering_and_Trust_Bootstrapping_CVPR_2024_paper.pdf)

  * [FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_FLHetBench_Benchmarking_Device_and_State_Heterogeneity_in_Federated_Learning_CVPR_2024_paper.pdf)

  * [Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space](https://openaccess.thecvf.com/content/CVPR2024/papers/Kumar_Revamping_Federated_Learning_Security_from_a_Defenders_Perspective_A_Unified_CVPR_2024_paper.pdf)

  * [Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning](https://arxiv.org/abs/2310.18285)

  * [Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices](https://arxiv.org/abs/2311.18129)

  * [FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning](http://arxiv.org/abs/2404.02478)

  * [Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Fair_Federated_Learning_under_Domain_Skew_with_Local_Consistency_and_CVPR_2024_paper.pdf)
:star:[code](https://github.com/yuhangchen0/FedHEAL)

  * [Global and Local Prompts Cooperation via Optimal Transport for Federated Learning](https://arxiv.org/abs/2403.00041)

  * [PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees](https://arxiv.org/abs/2302.06637)
:star:[code](https://github.com/NVlabs/PerAda)

  * [Relaxed Contrastive Learning for Federated Learning](https://arxiv.org/abs/2401.04928)

  * [DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning](https://arxiv.org/abs/2403.08506)

  * [FedAS: Bridging Inconsistency in Personalized Federated Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_FedAS_Bridging_Inconsistency_in_Personalized_Federated_Learning_CVPR_2024_paper.pdf)
:star:[code](https://github.com/xiyuanyang45/FedAS)

  * [Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning](http://arxiv.org/abs/2403.18144v1)

  * [Data Valuation and Detections in Federated Learning](https://arxiv.org/abs/2311.05304)
:star:[code](https://github.com/muz1lee/MOTdata/tree/main)

  * [An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning](https://arxiv.org/abs/2403.15760)
:star:[code](https://github.com/TsingZ0/FedKTL)

  * [Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Qi_Adaptive_Hyper-graph_Aggregation_for_Modality-Agnostic_Federated_Learning_CVPR_2024_paper.pdf)

  * [FedUV: Uniformity and Variance for Heterogeneous Federated Learning](https://arxiv.org/abs/2402.18372)

  * [FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning](https://arxiv.org/abs/2308.12532v6)

  * [Communication-Efficient Federated Learning with Accelerated Client Gradient](https://arxiv.org/abs/2201.03172)

* 强化学习

  * [Improving Unsupervised Hierarchical Representation with Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/An_Improving_Unsupervised_Hierarchical_Representation_with_Reinforcement_Learning_CVPR_2024_paper.pdf)

  * [AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning](https://arxiv.org/abs/2406.00480)强化学习

  * [Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Miao_Training_Diffusion_Models_Towards_Diverse_Image_Generation_with_Reinforcement_Learning_CVPR_2024_paper.pdf)

  * [POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Guan_POCE_Primal_Policy_Optimization_with_Conservative_Estimation_for_Multi-constraint_Offline_CVPR_2024_paper.pdf)

  * [DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_DMR_Decomposed_Multi-Modality_Representations_for_Frames_and_Events_Fusion_in_CVPR_2024_paper.pdf)

  * [Learning to Control Camera Exposure via Reinforcement Learning](http://arxiv.org/abs/2404.01636v1)
:house:[project](https://sites.google.com/view/drl-ae)

  * [Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Moure_Regularized_Parameter_Uncertainty_for_Improving_Generalization_in_Reinforcement_Learning_CVPR_2024_paper.pdf)

  * [Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World](https://arxiv.org/abs/2312.02976)
:house:[project](https://spoc-robot.github.io/)

* 多模态机器学习

  * [DIEM: Decomposition-Integration Enhancing Multimodal Insights](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_DIEM_Decomposition-Integration_Enhancing_Multimodal_Insights_CVPR_2024_paper.pdf)

* 迁移学习

  * [Model Inversion Robustness: Can Transfer Learning Help?](https://arxiv.org/abs/2405.05588)

  * [Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Enhanced_Motion-Text_Alignment_for_Image-to-Video_Transfer_Learning_CVPR_2024_paper.pdf)

  * [Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Structured_Model_Probing_Empowering_Efficient_Transfer_Learning_by_Structured_Regularization_CVPR_2024_paper.pdf)

  * [UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory](https://arxiv.org/abs/2308.14316)
:star:[code](https://github.com/Paranioar/UniPT)

  * [Initialization Matters for Adversarial Transfer Learning](https://arxiv.org/abs/2312.05716)

* 对比学习

  * [Improving Graph Contrastive Learning via Adaptive Positive Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhuo_Improving_Graph_Contrastive_Learning_via_Adaptive_Positive_Sampling_CVPR_2024_paper.pdf)

  * [MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Abdelfattah_MaskCLR_Attention-Guided_Contrastive_Learning_for_Robust_Action_Representation_Learning_CVPR_2024_paper.pdf)

  * [BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning](http://arxiv.org/abs/2311.12075)

  * [Universal Novelty Detection Through Adaptive Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Mirzaei_Universal_Novelty_Detection_Through_Adaptive_Contrastive_Learning_CVPR_2024_paper.pdf)

  * [NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models](https://arxiv.org/abs/2312.05390)
:house:[project](https://noiseclr.github.io/)

* 模仿学习

  * [LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation](https://arxiv.org/abs/2403.17601)

* 上下文学习

  * [Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning](https://arxiv.org/abs/2312.03703)
:star:[code](https://github.com/fanglaosi/Skeleton-in-Context)

* 弱监督学习

  * [Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Virtual_Immunohistochemistry_Staining_for_Histological_Images_Assisted_by_Weakly-supervised_Learning_CVPR_2024_paper.pdf)

* 启示学习

  * [One-Shot Open Affordance Learning with Foundation Models](http://arxiv.org/abs/2311.17776)



## 23.Sound

* [Hearing Anything Anywhere](https://arxiv.org/abs/2406.07532)
:house:[project](https://masonlwang.com/hearinganythinganywhere/)

* [Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling](https://arxiv.org/abs/2312.01017)

* [AV-RIR: Audio-Visual Room Impulse Response Estimation](https://arxiv.org/abs/2312.00834)
:tv:[video](https://www.youtube.com/watch?v=tTsKhviukAE)

* [DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction](http://arxiv.org/abs/2403.01226v1)

* [Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners](http://arxiv.org/abs/2402.17723v1)
:star:[code](https://yzxing87.github.io/Seeing-and-Hearing/)

* [Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Rachavarapu_Weakly-Supervised_Audio-Visual_Video_Parsing_with_Prototype-based_Pseudo-Labeling_CVPR_2024_paper.pdf)

* 视听对话

  * [The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective](https://arxiv.org/abs/2312.12870)
:house:[project](https://vjwq.github.io/AV-CONV/)

* 视听导航

  * [RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_RILA_Reflective_and_Imaginative_Language_Agent_for_Zero-Shot_Semantic_Audio-Visual_CVPR_2024_paper.pdf)

* 视听分割

  * [Audio-Visual Segmentation via Unlabeled Frame Exploitation](http://arxiv.org/abs/2403.11074v1)

  * [Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Benchmarking_Audio_Visual_Segmentation_for_Long-Untrimmed_Videos_CVPR_2024_paper.pdf)

  * [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](http://arxiv.org/abs/2312.06462)

  * [Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition](https://arxiv.org/abs/2310.00132)
:star:[code](https://github.com/lxa9867/QSD)

  * [Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation](https://arxiv.org/abs/2304.02970)
:star:[code](https://github.com/cyh-0/CAVP)

* 语音识别

  * [A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition](http://arxiv.org/abs/2403.04245v1)
:star:[code](https://github.com/dalision/ModalBiasAVSR)

* 语音定位

  * [Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge](http://arxiv.org/abs/2403.17420v1)
:star:[code](https://github.com/VisualAIKHU/NoPrior_MultiSSL)

* 音-视语音表示学习

  * [ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_ES3_Evolving_Self-Supervised_Learning_of_Robust_Audio-Visual_Speech_Representations_CVPR_2024_paper.pdf)
:house:[project](https://www.sailorzhang.com/publications/2024-02-es3)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

* 文本驱动的语音定位

  * [T-VSL: Text-Guided Visual Sound Source Localization in Mixtures](http://arxiv.org/abs/2404.01751v1)

* 从图像和语言提示合成音乐

  * [MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Chowdhury_MeLFusion_Synthesizing_Music_from_Image_and_Language_Cues_using_Diffusion_CVPR_2024_paper.pdf)

* 耳音频生成和定位

  * [Cyclic Learning for Binaural Audio Generation and Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Cyclic_Learning_for_Binaural_Audio_Generation_and_Localization_CVPR_2024_paper.pdf)

* 视频和音频同步

  * [DiVAS: Video and Audio Synchronization with Dynamic Frame Rates](https://openaccess.thecvf.com/content/CVPR2024/papers/Fernandez-Labrador_DiVAS_Video_and_Audio_Synchronization_with_Dynamic_Frame_Rates_CVPR_2024_paper.pdf)

* 视听表征学习

  * [Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning](https://arxiv.org/abs/2304.05600)

* 说话人检测

  * [LoCoNet: Long-Short Context Network for Active Speaker Detection](https://arxiv.org/pdf/2301.08237.pdf)
:star:[code](https://github.com/SJTUwxz/LoCoNet_ASD)

* 音频描述

  * [MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning](https://arxiv.org/abs/2311.17435)
:house:[project](https://mm-narrator.github.io/)

* 视听语音翻译

  * [AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation](https://arxiv.org/abs/2312.02512)
:house:[project](https://choijeongsoo.github.io/av2av)视听语音到视听语音翻译



## 22.Deepfake Detection

* [AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection](https://arxiv.org/abs/2406.02951)

* [Preserving Fairness Generalization in Deepfake Detection](http://arxiv.org/abs/2402.17229v1)
:star:[code](https://github.com/Purdue-M2/Fairness-Generalization)

* [Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection](https://arxiv.org/abs/2312.10461)
:star:[code](https://github.com/chuangchuangtan/NPR-DeepfakeDetection)

* [Exploiting Style Latent Flows for Generalizing Deepfake Video Detection](http://arxiv.org/abs/2403.06592)

* [LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Nguyen_LAA-Net_Localized_Artifact_Attention_Network_for_Quality-Agnostic_and_Generalizable_Deepfake_CVPR_2024_paper.pdf)

* [LAA-Net: Localized Artifact Attention Network for High-Quality Deepfakes Detection](https://arxiv.org/abs/2401.13856)

* [Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection](https://arxiv.org/abs/2311.11278)

* [Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking](https://openaccess.thecvf.com/content/CVPR2024/papers/Hong_Contrastive_Learning_for_DeepFake_Classification_and_Localization_via_Multi-Label_Ranking_CVPR_2024_paper.pdf)

* 图像篡改检测

  * [DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_DiffForensics_Leveraging_Diffusion_Prior_to_Image_Forgery_Detection_and_Localization_CVPR_2024_paper.pdf)伪造图像检测

  * [EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection](https://arxiv.org/abs/2312.08883)
:star:[code](https://github.com/xuanyuzhang21/EditGuard)用于篡改定位和版权保护的多功能图像水印

  * [UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_UnionFormer_Unified-Learning_Transformer_with_Multi-View_Representation_for_Image_Manipulation_Detection_CVPR_2024_paper.pdf)图像操作检测和定位

  * [CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image](https://openaccess.thecvf.com/content/CVPR2024/papers/Yoon_CORE-MPI_Consistency_Object_Removal_with_Embedding_MultiPlane_Image_CVPR_2024_paper.pdf)

* 合成图像检测

  * [WinSyn: : A High Resolution Testbed for Synthetic Data](http://arxiv.org/abs/2310.08471)

  * [Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection](https://arxiv.org/abs/2312.16649)



## 21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)

* [Transductive Zero-Shot and Few-Shot CLIP](https://arxiv.org/abs/2405.18437)
:star:[code](https://github.com/SegoleneMartin/transductive-CLIP)

* DG

  * [Disentangled Prompt Representation for Domain Generalization](https://openaccess.thecvf.com/content/CVPR2024/papers/Cheng_Disentangled_Prompt_Representation_for_Domain_Generalization_CVPR_2024_paper.pdf)

  * [A2XP: Towards Private Domain Generalization](https://arxiv.org/abs/2311.10339)
:star:[code](https://github.com/AIRLABkhu/A2XP)
:house:[project](https://airlabkhu.github.io/A2XP/)

  * [PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization](http://arxiv.org/abs/2404.09011v1)

  * [Towards Generalizing to Unseen Domains with Few Labels](http://arxiv.org/abs/2403.11674v1)

  * [Rethinking the Evaluation Protocol of Domain Generalization](https://arxiv.org/abs/2305.15253)

  * [Rethinking Multi-domain Generalization with A General Learning Objective](http://arxiv.org/abs/2402.18853v1)

  * [Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization](http://arxiv.org/abs/2404.00710v1)
:star:[code](https://github.com/mainaksingha01/ODG-CLIP)

  * [Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization](https://arxiv.org/abs/2402.18447)

  * [Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization](https://arxiv.org/abs/2403.15605)

* DA

  * [Parameter Efficient Self-Supervised Geospatial Domain Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Scheibenreif_Parameter_Efficient_Self-Supervised_Geospatial_Domain_Adaptation_CVPR_2024_paper.pdf)

  * [Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation](http://arxiv.org/abs/2403.18360)

  * [Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_Discriminative_Pattern_Calibration_Mechanism_for_Source-Free_Domain_Adaptation_CVPR_2024_paper.pdf)

  * [Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective](https://openaccess.thecvf.com/content/CVPR2024/papers/Mitsuzumi_Understanding_and_Improving_Source-free_Domain_Adaptation_from_a_Theoretical_Perspective_CVPR_2024_paper.pdf)

  * [A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_A_Versatile_Framework_for_Continual_Test-Time_Domain_Adaptation_Balancing_Discriminability_CVPR_2024_paper.pdf)

  * [Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation](https://arxiv.org/abs/2403.02899)

  * [Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wan_Unveiling_the_Unknown_Unleashing_the_Power_of_Unknown_to_Known_CVPR_2024_paper.pdf)

  * [Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training](https://arxiv.org/abs/2312.02914)

  * [Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer](https://arxiv.org/abs/2311.12905)

  * [LEAD: Learning Decomposition for Source-free Universal Domain Adaptation](http://arxiv.org/abs/2403.03421v1)
:star:[code](https://github.com/ispc-lab/LEAD)

  * [Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation](http://arxiv.org/abs/2403.06946v1)
:star:[code](https://github.com/TL-UESTC/UniMoS)

  * [Source-Free Domain Adaptation with Frozen Multimodal Foundation Model](https://arxiv.org/pdf/2311.16510.pdf)
:star:[code](https://github.com/tntek/source-free-domain-adaptation)

  * [Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias](http://arxiv.org/abs/2403.11234v1)

  * [Unified Language-driven Zero-shot Domain Adaptation](http://arxiv.org/abs/2404.07155v1)
:house:[project](https://senqiaoyang.com/project/ULDA)

* FSL

  * [Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_Adversarially_Robust_Few-shot_Learning_via_Parameter_Co-distillation_of_Similarity_and_CVPR_2024_paper.pdf)

  * [Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning](http://arxiv.org/abs/2311.13612)

  * [Simple Semantic-Aided Few-Shot Learning](https://arxiv.org/abs/2311.18649)
:star:[code](https://github.com/zhangdoudou123/SemFew)

  * [DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Shao_DeIL_Direct-and-Inverse_CLIP_for_Open-World_Few-Shot_Learning_CVPR_2024_paper.pdf)

  * [AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_AMU-Tuning_Effective_Logit_Bias_for_CLIP-based_Few-shot_Learning_CVPR_2024_paper.pdf)
:thumbsup:[摘要](http://aiskyeye.com/%E5%9B%A2%E9%98%9F%E4%B8%A4%E7%AF%87%E8%AE%BA%E6%96%87%E8%A2%ABcvpr-2024%E5%BD%95%E7%94%A8/)

  * [Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning](http://arxiv.org/abs/2403.04492v1)
:star:[code](https://github.com/rashindrie/DIPA)

  * [Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning](https://arxiv.org/abs/2403.00567)

  * [Few-shot Learner Parameterization by Diffusion Time-steps](https://arxiv.org/abs/2403.02649)
:star:[code](https://github.com/yue-zhongqi/tif)

* ZSL

  * [Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning](http://arxiv.org/abs/2404.07713v1)

  * [Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning](http://arxiv.org/abs/2404.14808)
:thumbsup:[提升生成式零样本学习能力，视觉增强动态语义原型方法](https://mp.weixin.qq.com/s/HEe185Yp4XWMAIlCmmudpQ)

  * [Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning](https://arxiv.org/abs/2402.17251)

  * [Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning](https://arxiv.org/abs/2303.15230)
:star:[code](https://github.com/bighuang624/Troika)

  * [Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Improving_Generalized_Zero-Shot_Learning_by_Exploring_the_Diverse_Semantics_from_CVPR_2024_paper.pdf)



## 20.Optical Flow Estimation(光流估计)

* [Efficient Meshflow and Optical Flow Estimation from Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Efficient_Meshflow_and_Optical_Flow_Estimation_from_Event_Cameras_CVPR_2024_paper.pdf)

* [UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model](https://arxiv.org/abs/2405.02608)
:star:[code](https://github.com/facebookresearch/UnSAMFlow)

* [FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_FlowTrack_Revisiting_Optical_Flow_for_Long-Range_Dense_Tracking_CVPR_2024_paper.pdf)

* [FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_FlowDiffuser_Advancing_Optical_Flow_Estimation_with_Diffusion_Models_CVPR_2024_paper.pdf)

* [ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF](https://arxiv.org/abs/2311.04246)

* [Dense Optical Tracking: Connecting the Dots](https://arxiv.org/abs/2312.00786)
:star:[code](https://github.com/16lemoing/dot)
:house:[project](https://16lemoing.github.io/dot)光流

* [MemFlow: Optical Flow Estimation and Prediction with Memory](http://arxiv.org/abs/2404.04808v1)
:star:[code](https://dqiaole.github.io/MemFlow/)

* [OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation](http://arxiv.org/abs/2403.18092v1)

* 场景流

  * [Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow](https://arxiv.org/pdf/2403.07432.pdf)

  * [ICP-Flow: LiDAR Scene Flow Estimation with ICP](https://openaccess.thecvf.com/content/CVPR2024/papers/Lin_ICP-Flow_LiDAR_Scene_Flow_Estimation_with_ICP_CVPR_2024_paper.pdf)

* 3D 场景流估计

  * [3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling](http://arxiv.org/abs/2402.18146v1)

  * [DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_DifFlow3D_Toward_Robust_Uncertainty-Aware_Scene_Flow_Estimation_with_Iterative_Diffusion-Based_CVPR_2024_paper.pdf)



## 19.Object Pose Estimation(物体姿态估计)

* [3D-LFM: Lifting Foundation Model](https://arxiv.org/abs/2312.11894)
:house:[project](https://3dlfm.github.io/)

* [Efficient Solution of Point-Line Absolute Pose](https://export.arxiv.org/abs/2404.16552)
:star:[code](https://github.com/petrhruby97/efficient_absolute)

* [Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval](http://arxiv.org/abs/2403.00272v1)

* [DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses](http://arxiv.org/abs/2403.13683v1)
:star:[code](https://github.com/sailor-z/DVMNet/)

* [Dynamic Support Information Mining for Category-Agnostic Pose Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Ren_Dynamic_Support_Information_Mining_for_Category-Agnostic_Pose_Estimation_CVPR_2024_paper.pdf)

* [From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation](https://openaccess.thecvf.com/content/CVPR2024/papers/Tirado-Garin_From_Correspondences_to_Pose_Non-minimal_Certifiably_Optimal_Relative_Pose_without_CVPR_2024_paper.pdf)

* 物体姿态估计

  * [Object Pose Estimation via the Aggregation of Diffusion Features](http://arxiv.org/abs/2403.18791v1)
:star:[code](https://github.com/Tianfu18/diff-feats-pose)

  * [NOPE: Novel Object Pose Estimation from a Single Image](https://arxiv.org/abs/2303.13612)
:star:[code](https://github.com/nv-nguyen/nope)

  * [GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence](https://arxiv.org/abs/2311.14155)
:star:[code](https://github.com/nv-nguyen/gigaPose)

* 6DoF

  * [HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Lin_HiPose_Hierarchical_Binary_Surface_Encoding_and_Correspondence_Pruning_for_RGB-D_CVPR_2024_paper.pdf)

  * [Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kalra_Towards_Co-Evaluation_of_Cameras_HDR_and_Algorithms_for_Industrial-Grade_6DoF_CVPR_2024_paper.pdf)

  * [Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)](https://arxiv.org/abs/2305.15873)

  * [SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation](https://arxiv.org/abs/2311.15707)
:star:[code](https://github.com/JiehongLin/SAM-6D)

  * [FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation](https://arxiv.org/abs/2403.03221)
:star:[code](https://github.com/crockwell/far)
:house:[project](https://crockwell.github.io/far/)

  * [6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation](https://arxiv.org/abs/2401.00029)

  * [MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images](https://arxiv.org/abs/2403.01517)

  * [Open-Vocabulary Object 6D Pose Estimation](http://arxiv.org/abs/2312.00690)

  * [FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects](https://arxiv.org/abs/2312.08344)
:house:[project](https://nvlabs.github.io/FoundationPose/)

  * [GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects](https://arxiv.org/abs/2403.11510)

  * [Open-vocabulary object 6D pose estimation](https://arxiv.org/abs/2312.00690)
:house:[project](https://jcorsetti.github.io/oryon/)

  * [SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation](https://arxiv.org/abs/2311.11125)
:star:[code](https://github.com/NOrangeeroli/SecondPose)

  * [A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization](http://arxiv.org/abs/2403.19412v1)

  * [Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation](http://arxiv.org/abs/2403.19527v1)

  * [MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation](http://arxiv.org/abs/2403.08019v1)

  * [Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge](http://arxiv.org/abs/2404.01727v1)

* 重识别

  * [Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification](http://arxiv.org/abs/2403.10254v1)
:star:[code](https://github.com/924973292/EDITOR)

* 计数

  * [Referring Expression Counting](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Referring_Expression_Counting_CVPR_2024_paper.pdf)

  * [Learning to Count without Annotations](https://openreview.net/forum?id=DAs9X4mCpu)

  * [Point, Segment and Count: A Generalized Framework for Object Counting](https://arxiv.org/abs/2311.12386)
:star:[code](https://github.com/Hzzone/PseCo)

  * [DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting](https://export.arxiv.org/abs/2404.16622)
:star:[code](https://github.com/jerpelhan/DAVE)

  * [Weakly Supervised Video Individual Counting](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Weakly_Supervised_Video_Individual_Counting_CVPR_2024_paper.pdf)



## 18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)

* [Instance Tracking in 3D Scenes from Egocentric Videos](https://arxiv.org/abs/2312.04117)

* VPR

  * [TransLoc4D: Transformer-based 4D Radar Place Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Peng_TransLoc4D_Transformer-based_4D_Radar_Place_Recognition_CVPR_2024_paper.pdf)

  * [CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition](http://arxiv.org/abs/2402.19231v1)
:star:[code](https://github.com/Lu-Feng/CricaVPR)

  * [On the Estimation of Image-matching Uncertainty in Visual Place Recognition](http://arxiv.org/abs/2404.00546v1)

  * [Optimal Transport Aggregation for Visual Place Recognition](https://arxiv.org/abs/2311.15937)
:star:[code](https://github.com/serizba/salad)

* 导航

  * [Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Imagine_Before_Go_Self-Supervised_Generative_Map_for_Object_Goal_Navigation_CVPR_2024_paper.pdf)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

  * [Detours for Navigating Instructional Videos](https://arxiv.org/abs/2401.01823)旅游视频导航

  * [MemoNav: Working Memory Model for Visual Navigation](http://arxiv.org/abs/2402.19161v1)

  * [DiaLoc: An Iterative Approach to Embodied Dialog Localization](http://arxiv.org/abs/2403.06846v1)

  * [F$^3$Loc: Fusion and Filtering for Floorplan Localization](http://arxiv.org/abs/2403.03370v1)

  * [An Interactive Navigation Method with Effect-oriented Affordance](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_An_Interactive_Navigation_Method_with_Effect-oriented_Affordance_CVPR_2024_paper.pdf)交互式导航

* SLAM

  * [SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System](https://arxiv.org/pdf/2312.01616.pdf)
:star:[code](https://github.com/bytedance/SchurVINS)

  * [SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments](http://arxiv.org/abs/2307.07607)

  * [SNI-SLAM: Semantic Neural Implicit SLAM](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_SNI-SLAM_Semantic_Neural_Implicit_SLAM_CVPR_2024_paper.pdf)

  * [Gaussian Splatting SLAM](https://arxiv.org/abs/2312.06741)
:star:[code](https://github.com/muskie82/MonoGS)
:house:[project](https://rmurai.co.uk/projects/GaussianSplattingSLAM/)

  * [SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM](https://arxiv.org/abs/2312.02126)
:house:[project](https://spla-tam.github.io/)

  * [NARUTO: Neural Active Reconstruction from Uncertain Target Observations](http://arxiv.org/abs/2402.18771v1)

  * [Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM](http://arxiv.org/abs/2403.19473v1)

  * [Implicit Event-RGBD Neural SLAM](https://arxiv.org/abs/2311.11013)
:star:[code](https://github.com/DelinQu/EN-SLAM)
:house:[project](https://delinqu.github.io/EN-SLAM/)

  * [Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization](http://arxiv.org/abs/2404.15263)

  * [IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM](https://openaccess.thecvf.com/content/CVPR2024/papers/Yin_IBD-SLAM_Learning_Image-Based_Depth_Fusion_for_Generalizable_SLAM_CVPR_2024_paper.pdf)

  * [Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular Stereo and RGB-D Cameras](https://arxiv.org/abs/2311.16728)

  * [Loopy-SLAM: Dense Neural SLAM with Loop Closures](https://openaccess.thecvf.com/content/CVPR2024/papers/Liso_Loopy-SLAM_Dense_Neural_SLAM_with_Loop_Closures_CVPR_2024_paper.pdf)
:house:[project](http://notchla.github.io/Loopy-SLAM)

  * [GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting](https://arxiv.org/abs/2311.11700)
:house:[project](https://gs-slam.github.io/)

* 机器人

  * [Retrieval-Augmented Embodied Agents](https://arxiv.org/abs/2404.11699)

  * [ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation](http://arxiv.org/abs/2312.16217)

  * [SUGAR: Pre-training 3D Visual Representations for Robotics](https://arxiv.org/abs/2404.01491)
:house:[project](https://cshizhe.github.io/projects/robot_sugar.html)

  * [Learning to navigate efficiently and precisely in real environments](https://arxiv.org/abs/2401.14349)

  * [Language-driven Grasp Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Vuong_Language-driven_Grasp_Detection_CVPR_2024_paper.pdf)

  * [CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation](https://arxiv.org/pdf/2402.14795.pdf)
:house:[project](https://cyber-demo.github.io/)

  * [Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation](http://arxiv.org/abs/2403.03890v1)
:star:[code](https://yusufma03.github.io/projects/hdp/)

  * [Diffusion-EDFs:Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation](https://openaccess.thecvf.com/content/CVPR2024/papers/Ryu_Diffusion-EDFs_Bi-equivariant_Denoising_Generative_Modeling_on_SE3_for_Visual_Robotic_CVPR_2024_paper.pdf)
:star:[code](https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Ftomato1mule%2Fdiffusion_edf&sa=D&sntz=1&usg=AOvVaw0f9FZz4FGq0n3CYypMB3Eb)
:house:[project](https://sites.google.com/view/diffusion-edfs)

  * [Rapid Motor Adaptation for Robotic Manipulator Arms](https://arxiv.org/abs/2312.04670)机器人机械臂

  * [Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts](https://openaccess.thecvf.com/content/CVPR2024/papers/Ni_Generate_Subgoal_Images_before_Act_Unlocking_the_Chain-of-Thought_Reasoning_in_CVPR_2024_paper.pdf)

* Avatar(虚拟建模)

  * [SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting](https://arxiv.org/abs/2403.05087)
:star:[code](https://github.com/initialneil/SplattingAvatar)
:tv:[video](https://www.youtube.com/watch?v=IzC-fLvdntA)

  * [MonoNPHM: Dynamic Head Reconstruction from Monocular Videos](http://arxiv.org/abs/2312.06740)

  * [Relightable and Animatable Neural Avatar from Sparse-View Video](https://arxiv.org/abs/2308.07903)
:house:[project](https://zju3dv.github.io/relightable_avatar)

  * [Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes](http://arxiv.org/abs/2404.01543v1)
:star:[code](https://augmentedperception.github.io/monoavatar-plus)

  * [Artist-Friendly Relightable and Animatable Neural Heads](http://arxiv.org/abs/2312.03420)

  * [GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image](http://arxiv.org/abs/2404.02152v1)
:star:[code](https://zju3dv.github.io/geneavatar/)

  * [DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars](https://openaccess.thecvf.com/content/CVPR2024/papers/Kirschstein_DiffusionAvatars_Deferred_Diffusion_for_High-fidelity_3D_Head_Avatars_CVPR_2024_paper.pdf)

  * [EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars](http://arxiv.org/abs/2404.19110)

  * [Stratified Avatar Generation from Sparse Observations](http://arxiv.org/abs/2405.20786)(https://zerg-overmind.github.io/)
:tv:[video](https://www.youtube.com/watch?v=RkXaxyv1TOU)

  * [Real-Time Simulated Avatar from Head-Mounted Sensors](http://arxiv.org/abs/2403.06862v1)
:house:[project](https://www.zhengyiluo.com/SimXR/)

  * [Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling](https://arxiv.org/pdf/2311.16096.pdf)
:star:[code](https://github.com/lizhe00/AnimatableGaussians)
:house:[project](https://animatable-gaussians.github.io/)

  * [NECA: Neural Customizable Human Avatar](http://arxiv.org/abs/2403.10335v1)
:star:[code](https://github.com/iSEE-Laboratory/NECA)

  * [Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework](http://arxiv.org/abs/2403.16510v1)
:star:[code](https://github.com/ICTMCG/Make-Your-Anchor)

  * [GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians](https://arxiv.org/abs/2312.02134)
:star:[code](https://github.com/huliangxiao/GaussianAvatar)
:house:[project](https://huliangxiao.github.io/GaussianAvatar)

  * [Gaussian Head Avatar:Ultra High-fidelity Head Avatar via Dynamic Gaussians](https://arxiv.org/abs/2312.03029)
:star:[code](https://github.com/YuelangX/Gaussian-Head-Avatar)
:house:[project](https://yuelangx.github.io/gaussianheadavatar/)

  * [GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning](https://arxiv.org/abs/2312.11461)
:house:[project](https://nvlabs.github.io/GAvatar)

  * [UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures](https://arxiv.org/abs/2401.11078)
:star:[code](https://usrc-sea.github.io/UltrAvatar/)
:house:[project](https://usrc-sea.github.io/UltrAvatar/)

  * [GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians](https://arxiv.org/abs/2312.02069)
:house:[project](https://shenhanqian.github.io/gaussian-avatars)

  * [3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting](https://arxiv.org/abs/2312.09228)
:house:[project](https://neuralbodies.github.io/3DGS-Avatar)3D动画

  * [AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing](https://arxiv.org/abs/2312.02209)3D 人体头像生成

  * [Human Gaussian Splatting: Real-time Rendering of Animatable Avatars](https://arxiv.org/abs/2311.17113)

  * [GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh](http://arxiv.org/abs/2404.07991v1)
:star:[code](https://wenj.github.io/GoMAvatar/) 

  * [DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation](https://arxiv.org/abs/2311.12194)
:house:[project](https://people.csail.mit.edu/liyifei/publication/diffavatar/)

  * [PEGASUS: Personalized Generative 3D Avatars with Composable Attributes](https://arxiv.org/abs/2402.10636)
:house:[project](https://snuvclab.github.io/pegasus/)

  * [FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding](https://arxiv.org/abs/2312.02214)
:house:[project](https://ustc3dv.github.io/FlashAvatar/)

  * [MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model](https://arxiv.org/abs/2311.16498)
:house:[project](https://showlab.github.io/magicanimate)人体图像动画

  * [Relightable Gaussian Codec Avatars](https://arxiv.org/abs/2312.03704)
:house:[project](https://shunsukesaito.github.io/rgca/)

  * [IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing](https://arxiv.org/abs/2312.05210)
:house:[project](https://neuralbodies.github.io/IntrinsicAvatar)

* 头发建模

  * [MonoHair: High-Fidelity Hair Modeling from a Monocular Video](http://arxiv.org/abs/2403.18356v1)
:star:[code](https://keyuwu-cs.github.io/MonoHair/)

* 虚拟试穿

  * [Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On](http://arxiv.org/abs/2404.01089v1)

  * [CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model](https://arxiv.org/abs/2311.18405)

  * [M&M VTO: Multi-Garment Virtual Try-On and Editing](https://arxiv.org/abs/2406.04542)
:house:[project](https://mmvto.github.io/)

  * [StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On](https://arxiv.org/abs/2312.01725)
:star:[code](https://github.com/rlawjdghek/StableVITON)

  * [PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns](https://arxiv.org/abs/2312.04534)
:house:[project](https://ningshuliang.github.io/2023/Arxiv/index.html)

* 抓取

  * [MANUS: Markerless Grasp Capture using Articulated 3D Gaussians](https://arxiv.org/abs/2312.02137)

* 卡通人物

  * [Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text](https://arxiv.org/abs/2403.16897)
:house:[project](https://make-it-vivid.github.io/)



## 17.Automated Driving(自动驾驶)

* [Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs](https://openaccess.thecvf.com/content/CVPR2024/papers/Blayney_Bezier_Everywhere_All_at_Once_Learning_Drivable_Lanes_as_Bezier_CVPR_2024_paper.pdf)
:star:[code](https://github.com/driskai/BGFormer)

* [SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction](http://arxiv.org/abs/2404.09502v1)

* 自动驾驶

  * [Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory](https://openaccess.thecvf.com/content/CVPR2024/papers/Kalble_Accurate_Training_Data_for_Occupancy_Map_Prediction_in_Automated_Driving_CVPR_2024_paper.pdf)

  * [VLP: Vision Language Planning for Autonomous Driving](http://arxiv.org/abs/2401.05577)

  * [Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles](http://arxiv.org/abs/2402.07635)

  * [DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes](http://arxiv.org/abs/2312.07920)

  * [Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Diffusion-ES_Gradient-free_Planning_with_Diffusion_for_Autonomous_and_Instruction-guided_Driving_CVPR_2024_paper.pdf)

  * [DualAD: Disentangling the Dynamic and Static World for End-to-End Driving](https://arxiv.org/abs/2406.06264)
:house:[project](https://simondoll.github.io/publications/dualad)

  * [UniPAD: A Universal Pre-training Paradigm for Autonomous Driving](https://arxiv.org/abs/2310.08370)
:star:[code](https://github.com/Nightmare-n/UniPAD)

  * [Generalized Predictive Model for Autonomous Driving](http://arxiv.org/abs/2403.09630v1)
:star:[code](https://github.com/OpenDriveLab/DriveAGI)

  * [Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications](http://arxiv.org/abs/2311.17663)

  * [ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles](http://arxiv.org/abs/2405.14062)

  * [Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Ding_Holistic_Autonomous_Driving_Understanding_by_Birds-Eye-View_Injected_Multi-Modal_Large_Models_CVPR_2024_paper.pdf)

  * [LMDrive: Closed-Loop End-to-End Driving with Large Language Models](https://arxiv.org/abs/2312.07488)
:star:[code](https://github.com/opendilab/LMDrive)
:house:[project](https://hao-shao.com/projects/lmdrive.html)

  * [Feedback-Guided Autonomous Driving](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Feedback-Guided_Autonomous_Driving_CVPR_2024_paper.pdf)

  * [PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving](https://openaccess.thecvf.com/content/CVPR2024/papers/Weng_PARA-Drive_Parallelized_Architecture_for_Real-time_Autonomous_Driving_CVPR_2024_paper.pdf)

  * [Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving](https://arxiv.org/abs/2312.03031)
:star:[code](https://github.com/NVlabs/BEV-Planner)

  * [On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving](http://arxiv.org/abs/2403.01238v1)

  * [Visual Point Cloud Forecasting enables Scalable Autonomous Driving](https://arxiv.org/abs/2312.17655)
:star:[code](https://github.com/OpenDriveLab/ViDAR)

  * [Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving](http://arxiv.org/abs/2403.07535v1)
:star:[code](https://github.com/Junda24/AFNet/)

  * [CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow](http://arxiv.org/abs/2403.08919v1)

  * [Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving](http://arxiv.org/abs/2403.17301v1)

  * [AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving](http://arxiv.org/abs/2403.17373v1)

  * [NeuRAD: Neural Rendering for Autonomous Driving](https://arxiv.org/abs/2311.15260)
:star:[code](https://github.com/georghess/neurad)
:house:[project](https://research.zenseact.com/publications/neurad/)

  * [Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving](https://arxiv.org/abs/2311.17918)
:star:[code](https://github.com/BraveGroup/Drive-WM)
:house:[project](https://drive-wm.github.io/)

  * [Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents](https://arxiv.org/abs/2402.05746)
:star:[code](https://github.com/yifanlu0227/ChatSim)
:house:[project](https://yifanlu0227.github.io/ChatSim/)

  * [3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation](http://arxiv.org/abs/2405.03388)
:star:[code](https://github.com/PRBonn/4dNDF)

  * [PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_PACER_On-Demand_Pedestrian_Animation_Controller_in_Driving_Scenarios_CVPR_2024_paper.pdf)
:star:[code](https://github.com/IDC-Flash/PacerPlus)
:tv:[video](https://www.youtube.com/watch?v=Pq10Q_ZBOrw)

  * [Bootstrapping Autonomous Driving Radars with Self-Supervised Learning](https://arxiv.org/abs/2312.04519)
:star:[code](https://github.com/yiduohao/Radical)

  * [SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving](https://arxiv.org/abs/2403.17094)自动驾驶去雾

* 轨迹预测

  * [Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Pose-Transformed_Equivariant_Network_for_3D_Point_Trajectory_Prediction_CVPR_2024_paper.pdf)

  * [Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving](http://arxiv.org/abs/2306.15755)

  * [CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving](https://openaccess.thecvf.com/content/CVPR2024/papers/Pourkeshavarz_CaDeT_a_Causal_Disentanglement_Approach_for_Robust_Trajectory_Prediction_in_CVPR_2024_paper.pdf)

  * [Higher-order Relational Reasoning for Pedestrian Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_Higher-order_Relational_Reasoning_for_Pedestrian_Trajectory_Prediction_CVPR_2024_paper.pdf)

  * [Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Wen_Density-Adaptive_Model_Based_on_Motif_Matrix_for_Multi-Agent_Trajectory_Prediction_CVPR_2024_paper.pdf)

  * [GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Lin_GigaTraj_Predicting_Long-term_Trajectories_of_Hundreds_of_Pedestrians_in_Gigapixel_CVPR_2024_paper.pdf)

  * [ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_ERMVP_Communication-Efficient_and_Collaboration-Robust_Multi-Vehicle_Perception_in_Challenging_Environments_CVPR_2024_paper.pdf)

  * [HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention](https://arxiv.org/abs/2404.06351)
:star:[code](https://github.com/XiaolongTang23/HPNet)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

  * [Adapting to Length Shift: FlexiLength Network for Trajectory Prediction](http://arxiv.org/abs/2404.00742v1)

  * [OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising](https://arxiv.org/abs/2404.02227)
:star:[code](https://github.com/Hai-chao-Zhang/OOSTraj)

  * [SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction](https://arxiv.org/abs/2310.05370)行人轨迹预测

  * [T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory](http://arxiv.org/abs/2403.10052v1)
:star:[code](https://github.com/daeheepark/T4P)

  * [Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations](http://arxiv.org/abs/2403.13261v1)

  * [SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction](http://arxiv.org/abs/2403.11492v1)
:star:[code](https://github.com/opendilab/SmartRefine/)

  * [Producing and Leveraging Online Map Uncertainty in Trajectory Prediction](http://arxiv.org/abs/2403.16439v1)

  * [SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model](http://arxiv.org/abs/2403.18452v1)
:star:[code](https://github.com/inhwanbae/SingularTrajectory)

  * [Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction](http://arxiv.org/abs/2403.18447v1)
:star:[code](https://github.com/inhwanbae/LMTrajectory)

  * [Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture](http://arxiv.org/abs/2404.03789v1)
:star:[code](https://github.com/PurdueDigitalTwin/seneva)

* 车道线检测

  * [LaneCPP: Continuous 3D Lane Detection using Physical Priors](https://openaccess.thecvf.com/content/CVPR2024/papers/Pittner_LaneCPP_Continuous_3D_Lane_Detection_using_Physical_Priors_CVPR_2024_paper.pdf)

  * [Lane2Seq: Towards Unified Lane Detection via Sequence Generation](http://arxiv.org/abs/2402.17172v1)
:house:[project](https://zkyseu.github.io/lane2seq.github.io/)

* 车载凝视估计

  * [What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation](http://arxiv.org/abs/2403.15664v1)
:house:[project](https://yihua.zone/work/ivgaze)

* 3D Occupancy Prediction

  * [COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction](https://arxiv.org/abs/2312.01919)
:star:[code](https://github.com/NotACracker/COTR)

  * [SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction](https://arxiv.org/abs/2311.12754)
:star:[code](https://github.com/huang-yh/SelfOcc)

  * [StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_StreamingFlow_Streaming_Occupancy_Forecasting_with_Asynchronous_Multi-modal_Data_Streams_via_CVPR_2024_paper.pdf)

车辆重识别

  * [Day-Night Cross-domain Vehicle Re-identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Day-Night_Cross-domain_Vehicle_Re-identification_CVPR_2024_paper.pdf)



## 16.Point Cloud(点云)

* [Single-View Scene Point Cloud Human Grasp Generation](http://arxiv.org/abs/2404.15815)

* [LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_LTA-PCS_Learnable_Task-Agnostic_Point_Cloud_Sampling_CVPR_2024_paper.pdf)

* [StraightPCF: Straight Point Cloud Filtering](https://arxiv.org/abs/2405.08322)

* [CurveCloudNet: Processing Point Clouds with 1D Structure](https://arxiv.org/abs/2303.12050)

* [Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange](http://arxiv.org/abs/2404.07504)

* [Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform](http://arxiv.org/abs/2404.11156v1)

* [Multiway Point Cloud Mosaicking with Diffusion and Global Optimization](http://arxiv.org/abs/2404.00429)
:tv:[video](https://www.youtube.com/watch?v=dnzhKfPIoWg)

* [Point Cloud Pre-training with Diffusion Models](https://arxiv.org/abs/2311.14960)

* [PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds](http://arxiv.org/abs/2311.12062)

* [Unsupervised Occupancy Learning from Sparse Point Cloud](https://arxiv.org/abs/2404.02759)

* [TULIP: Transformer for Upsampling of LiDAR Point Clouds](http://arxiv.org/abs/2312.06733)

* [Draw Step by Step Like Human: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_Draw_Step_by_Step_Reconstructing_CAD_Construction_Sequences_from_Point_CVPR_2024_paper.pdf)点云重建 CAD

* [Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis](http://arxiv.org/abs/2403.01439v1)
:star:[code](https://github.com/LMD0311/DAPT)

* [Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis](http://arxiv.org/abs/2403.11113v1)
:star:[code](https://github.com/wdttt/LocoTrans)

* [Unsupervised Template-assisted Point Cloud Shape Correspondence Network](http://arxiv.org/abs/2403.16412v1)

* [GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds](http://arxiv.org/abs/2403.19220v1)

* [Object Dynamics Modeling with Hierarchical Point Cloud-based Representations](https://arxiv.org/abs/2404.06044)

* [KPConvX: Modernizing Kernel Point Convolution with Kernel Attention](https://arxiv.org/abs/2405.13194)

* 点云配准

  * [Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension](http://arxiv.org/abs/2403.03532v1)

  * [Inlier Confidence Calibration for Point Cloud Registration](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Inlier_Confidence_Calibration_for_Point_Cloud_Registration_CVPR_2024_paper.pdf)

  * [ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Mu_ColorPCR_Color_Point_Cloud_Registration_with_Multi-Stage_Geometric-Color_Fusion_CVPR_2024_paper.pdf)

  * [Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Dynamic_Cues-Assisted_Transformer_for_Robust_Point_Cloud_Registration_CVPR_2024_paper.pdf)

  * [Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes](http://arxiv.org/abs/2404.04557)

* 3D 点云

  * [Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds](http://arxiv.org/abs/2403.05247v1)
:star:[code](https://github.com/TRLou/HiT-ADV)

  * [Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds](http://arxiv.org/abs/2403.18469v1)
:star:[code](https://github.com/yuan-zm/DGT-ST)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds](http://arxiv.org/abs/2312.04962)

  * [Text2Loc: 3D Point Cloud Localization from Natural Language](https://arxiv.org/abs/2311.15977)
:house:[project](https://yan-xia.github.io/projects/text2loc/)

  * [3DInAction: Understanding Human Actions in 3D Point Clouds](https://arxiv.org/abs/2303.06346)
:star:[code](https://github.com/sitzikbs/3dincaction)

  * [Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching](http://arxiv.org/abs/2402.17372v1)
:star:[code](https://github.com/matteo-bastico/CoupledLaplacian)

* 点云识别

  * [X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_X-3D_Explicit_3D_Structure_Modeling_for_Point_Cloud_Recognition_CVPR_2024_paper.pdf)

* 点云上采样

  * [RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation](https://openaccess.thecvf.com/content/CVPR2024/papers/Rong_RepKPU_Point_Cloud_Upsampling_with_Kernel_Point_Representation_and_Deformation_CVPR_2024_paper.pdf)

  * [A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling](http://arxiv.org/abs/2312.02719)

  * [SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_SPU-PMD_Self-Supervised_Point_Cloud_Upsampling_via_Progressive_Mesh_Deformation_CVPR_2024_paper.pdf)

* 点云分割

  * [Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Construct_to_Associate_Cooperative_Context_Learning_for_Domain_Adaptive_Point_CVPR_2024_paper.pdf)

  * [OneFormer3D: One Transformer for Unified Point Cloud Segmentation](https://arxiv.org/abs/2311.14405)

  * 点云实例分割

    * [FreePoint: Unsupervised Point Cloud Instance Segmentation](https://arxiv.org/abs/2305.06973)

    * [Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation](https://arxiv.org/abs/2312.11269)3D 点云实例分割

  * 弱监督点云语义分割

    * [Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle](https://openaccess.thecvf.com/content/CVPR2024/papers/Kweon_Weakly_Supervised_Point_Cloud_Semantic_Segmentation_via_Artificial_Oracle_CVPR_2024_paper.pdf)
:house:[project](http://vi.kaist.ac.kr/2024/02/28/weakly-supervised-point-cloud-semantic-segmentation-via-artificial-oracle/)  

* 点云分析

  * [TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis](https://arxiv.org/abs/2211.14456)
:star:[code](https://github.com/pavlo-melnyk/tetrasphere)

* 点云理解

  * [Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding](http://arxiv.org/abs/2312.02244)

* 点云生成

  * [TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process](http://cvlab.cse.msu.edu/pdfs/Ren_Kim_Liu_Liu_TIGER.pdf)
:star:[code](https://github.com/Zhiyuan-R/Tiger-Diffusion)

* 点云去噪

  * [Denoising Point Cloud in Latent Space via Graph Convolution and Invertible Neural Network](https://openaccess.thecvf.com/content/CVPR2024/papers/Mao_Denoising_Point_Clouds_in_Latent_Space_via_Graph_Convolution_and_CVPR_2024_paper.pdf)

* 点云分类

  * [CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_CausalPC_Improving_the_Robustness_of_Point_Cloud_Classification_by_Causal_CVPR_2024_paper.pdf)

* 点云质量评估

  * [Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment](https://arxiv.org/abs/2403.10066)无参考点云质量评估



## 15.Object Detection(目标检测)

* [Semantic Line Combination Detector](http://arxiv.org/abs/2404.18399)

* [Language-conditioned Detection Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_Language-conditioned_Detection_Transformer_CVPR_2024_paper.pdf)

* [Unsupervised Salient Instance Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Unsupervised_Salient_Instance_Detection_CVPR_2024_paper.pdf)

* [Neural Exposure Fusion for High-Dynamic Range Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Onzon_Neural_Exposure_Fusion_for_High-Dynamic_Range_Object_Detection_CVPR_2024_paper.pdf)

* [LEOD: Label-Efficient Object Detection for Event Cameras](http://arxiv.org/abs/2311.17286)

* [SFOD: Spiking Fusion Object Detector](http://arxiv.org/abs/2403.15192v1)
:star:[code](https://github.com/yimeng-fan/SFOD)

* [Exploring Orthogonality in Open World Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Exploring_Orthogonality_in_Open_World_Object_Detection_CVPR_2024_paper.pdf)
:star:[code](https://github.com/feifeiobama/OrthogonalDet)

* [What How and When Should Object Detectors Update in Continually Changing Test Domains?](http://arxiv.org/abs/2312.08875)

* [Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Depth-Aware_Concealed_Crop_Detection_in_Dense_Agricultural_Scenes_CVPR_2024_paper.pdf)

* [SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection](http://arxiv.org/abs/2402.17323v1)

* [Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement](http://arxiv.org/abs/2403.16131v1)
:star:[code](https://github.com/xiuqhou/Salience-DETR)

* [Theoretically Achieving Continuous Representation of Oriented Bounding Boxes](http://arxiv.org/abs/2402.18975v1)
:star:[code](https://github.com/Jittor/JDet)

* [RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features](http://arxiv.org/abs/2403.05061v1)

* [Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation](https://arxiv.org/abs/2312.01220)
:star:[code](https://github.com/ZPDu/DAI-Net)

* [CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation](https://arxiv.org/abs/2403.19104)
:house:[project](https://song-jingyu.github.io/CRKD/)

* [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069)
:house:[project](https://zhao-yian.github.io/RTDETR)

* [Hyperbolic Learning with Synthetic Captions for Open-World Detection](http://arxiv.org/abs/2404.05016v1)

* [Overload: Latency Attacks on Object Detection for Edge Devices](https://arxiv.org/abs/2304.05370)

* [YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection](http://arxiv.org/abs/2212.02081)

* [Active Domain Adaptation with False Negative Prediction for Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Nakamura_Active_Domain_Adaptation_with_False_Negative_Prediction_for_Object_Detection_CVPR_2024_paper.pdf)

* [RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation](https://arxiv.org/abs/2404.18150)
:star:[code](https://github.com/yuvalHG/RadSimReal)

* [Active Object Detection with Knowledge Aggregation and Distillation from Large Models](http://arxiv.org/abs/2405.12509)

* [GLOW: Global Layout Aware Attacks on Object Detection](https://arxiv.org/abs/2302.14166)

* [Plug and Play Active Learning for Object Detection](https://arxiv.org/abs/2211.11612)
:star:[code](https://github.com/ChenhongyiYang/PPAL)

* [InstaGen: Enhancing Object Detection by Training on Synthetic Dataset](https://arxiv.org/abs/2402.05937)
:house:[project](https://fcjian.github.io/InstaGen)

* [Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition](http://arxiv.org/abs/2401.01482)

* [Generating Enhanced Negatives for Training Language-Based Object Detectors](http://arxiv.org/abs/2401.00094)

* SAR目标检测

  * [Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Unleashing_Channel_Potential_Space-Frequency_Selection_Convolution_for_SAR_Object_Detection_CVPR_2024_paper.pdf)

* 3D目标检测

  * [Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection](http://arxiv.org/abs/2404.19384)

  * [PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection](http://arxiv.org/abs/2312.08371)

  * [Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Prompt3D_Random_Prompt_Assisted_Weakly-Supervised_3D_Object_Detection_CVPR_2024_paper.pdf)

  * [CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_CaKDP_Category-aware_Knowledge_Distillation_and_Pruning_Framework_for_Lightweight_3D_CVPR_2024_paper.pdf)

  * [Weakly Supervised Monocular 3D Detection with a Single-View Image](http://arxiv.org/abs/2402.19144)

  * [Weak-to-Strong 3D Object Detection with X-Ray Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Gambashidze_Weak-to-Strong_3D_Object_Detection_with_X-Ray_Distillation_CVPR_2024_paper.pdf)

  * [GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_GAFusion_Adaptive_Fusing_LiDAR_and_Camera_with_Multiple_Guidance_for_CVPR_2024_paper.pdf)

  * [BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection](https://arxiv.org/pdf/2312.01696.pdf)

  * [Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions](https://openaccess.thecvf.com/content/CVPR2024/papers/Chae_Towards_Robust_3D_Object_Detection_with_LiDAR_and_4D_Radar_CVPR_2024_paper.pdf)

  * [HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes](http://arxiv.org/abs/2403.02769)

  * [Commonsense Prototype for Outdoor Unsupervised 3D Object Detection](http://arxiv.org/abs/2404.16493)
:star:[code](https://github.com/hailanyi/CPD)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [Multi-View Attentive Contextualization for Multi-View 3D Object Detection](https://arxiv.org/abs/2405.12200)

  * [BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_BEVSpread_Spread_Voxel_Pooling_for_Birds-Eye-View_Representation_in_Vision-based_Roadside_CVPR_2024_paper.pdf)

  * [An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains](https://arxiv.org/abs/2402.17562)

  * [SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects](https://arxiv.org/abs/2403.20318)
:star:[code](https://github.com/abhi1kumar/SeaBird)

  * [HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_HINTED_Hard_Instance_Enhanced_Detector_with_Mixed-Density_Feature_Fusion_for_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features](https://arxiv.org/abs/2311.04391)
:house:[project](https://research.nvidia.com/labs/toronto-ai/3difftection/)

  * [UniMODE: Unified Monocular 3D Object Detection](http://arxiv.org/abs/2402.18573v1)

  * [Learning Occupancy for Monocular 3D Object Detection](https://arxiv.org/abs/2305.15694)
:star:[code](https://github.com/SPengLiang/OccupancyM3D)

  * [CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_CN-RMA_Combined_Network_with_Ray_Marching_Aggregation_for_3D_Indoor_CVPR_2024_paper.pdf)
:star:[code](https://github.com/SerCharles/CN-RMA)

  * [VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection](http://arxiv.org/abs/2404.00149v1)
:star:[code](https://github.com/skmhrk1209/VSRD)

  * [Improving Distant 3D Object Detection Using 2D Box Supervision](http://arxiv.org/abs/2403.09230v1)

  * [SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection](http://arxiv.org/abs/2403.05817v1)
:star:[code](https://github.com/zhanggang001/HEDNet)

  * [Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors](http://arxiv.org/abs/2403.06093v1)
:star:[code](https://github.com/nullmax-vision/QAF2D)

  * [IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection](http://arxiv.org/abs/2403.15241v1)
:star:[code](https://github.com/yinjunbo/IS-Fusion)

  * [RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection](http://arxiv.org/abs/2403.16440v1)
:star:[code](https://github.com/VDIGPKU/RCBEVDet)

  * [Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection](http://arxiv.org/abs/2403.17387v1)

  * [SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects](http://arxiv.org/abs/2403.20318v1)
:star:[code](https://github.com/abhi1kumar/SeaBird)

  * [MonoCD: Monocular 3D Object Detection with Complementary Depths](https://arxiv.org/abs/2404.03181)
:star:[code](https://github.com/elvintanhust/MonoCD)

* 小目标检测

  * [Infrared Small Target Detection with Scale and Location Sensitivity](http://arxiv.org/abs/2403.19366v1)
:star:[code](https://github.com/ying-fu/MSHNet)

* 显著目标检测

  * [VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning](https://arxiv.org/abs/2311.15011)
:star:[code](https://github.com/Sssssuperior/VSCode)

* 定向目标检测

  * [Rethinking Boundary Discontinuity Problem for Oriented Object Detection](https://arxiv.org/pdf/2305.10061.pdf)
:star:[code](https://github.com/hangxu-cv/cvpr24acm)

  * [PointOBB: Learning Oriented Object Detection via Single Point Supervision](https://arxiv.org/abs/2311.14757)
:star:[code](https://github.com/Luo-Z13/pointobb)

  * 弱半监督定向目标检测

    * [Relational Matching for Weakly Semi-Supervised Oriented Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Relational_Matching_for_Weakly_Semi-Supervised_Oriented_Object_Detection_CVPR_2024_paper.pdf)

* 小样本目标检测

  * [SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_SNIDA_Unlocking_Few-Shot_Object_Detection_with_Non-linear_Semantic_Decoupling_Augmentation_CVPR_2024_paper.pdf)

  * [Few-Shot Object Detection with Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Han_Few-Shot_Object_Detection_with_Foundation_Models_CVPR_2024_paper.pdf)

* 域泛化目标检测

  * [Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection](https://arxiv.org/abs/2405.15225)

  * [Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment](http://arxiv.org/abs/2405.14497)

* 域适应目标检测

  * [D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection](http://arxiv.org/abs/2403.09359v1)
:star:[code](https://github.com/EdwardDo69/D3T)

  * [CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection](http://arxiv.org/abs/2403.19278v1)

* 开放式目标检测

  * [Generative Region-Language Pretraining for Open-Ended Object Detection](http://arxiv.org/abs/2403.10191v1)

* 半监督目标检测

  * [A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_A-Teacher_Asymmetric_Network_for_3D_Semi-Supervised_Object_Detection_CVPR_2024_paper.pdf)

  * [Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection](http://arxiv.org/abs/2404.01819v1)

* 端到端目标检测

  * [Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision](https://arxiv.org/abs/2311.14758)
:star:[code](https://github.com/yuyi1005/point2rbox-mmrotate)

* 开放词汇目标检测

  * [Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Exploring_Region-Word_Alignment_in_Built-in_Detector_for_Open-Vocabulary_Object_Detection_CVPR_2024_paper.pdf)

  * [The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding](https://arxiv.org/abs/2311.17518)
:house:[project](https://lorebianchi98.github.io/FG-OVD/)

  * [Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection](https://arxiv.org/abs/2406.00510)

  * [OVMR: Open-Vocabulary Recognition with Multi-Modal References](https://arxiv.org/abs/2406.04675)开放词汇识别

  * [SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection](https://arxiv.org/abs/2405.10053)

  * [YOLO-World: Real-Time Open-Vocabulary Object Detection](https://arxiv.org/abs/2401.17270)
:star:[code](https://github.com/AILab-CVC/YOLO-World)

  * [Retrieval-Augmented Open-Vocabulary Object Detection](http://arxiv.org/abs/2404.05687v1)
:star:[code](https://github.com/mlvlab/RALF)

  * [Taming Self-Training for Open-Vocabulary Object Detection](https://arxiv.org/abs/2308.06412)
:star:[code](https://github.com/xiaofeng94/SAS-Det)

  * [Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Scene-adaptive_and_Region-aware_Multi-modal_Prompt_for_Open_Vocabulary_Object_Detection_CVPR_2024_paper.pdf)

  * [DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection](http://arxiv.org/abs/2404.09216v1)

* 视频伪装目标检测

  * [Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Hui_Endow_SAM_with_Keen_Eyes_Temporal-spatial_Prompt_Learning_for_Video_CVPR_2024_paper.pdf)

* 基于事件的目标检测

  * [EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition](http://arxiv.org/abs/2403.14082v1)

  * [Scene Adaptive Sparse Transformer for Event-based Object Detection](https://arxiv.org/abs/2404.01882)
:star:[code](https://github.com/Peterande/SAST)

* 联合显著性目标检测

  * [CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection](https://arxiv.org/abs/2403.18554)

* 开集识别

  * [From Coarse to Fine-Grained Open-Set Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Lang_From_Coarse_to_Fine-Grained_Open-Set_Recognition_CVPR_2024_paper.pdf)

* 物体识别

  * [Object Recognition as Next Token Prediction](https://arxiv.org/abs/2312.02142)
:star:[code](https://github.com/kaiyuyue/nxtp)

* 目标发现

  * [DIOD: Self-Distillation Meets Object Discovery](https://openaccess.thecvf.com/content/CVPR2024/papers/Kara_DIOD_Self-Distillation_Meets_Object_Discovery_CVPR_2024_paper.pdf)

  * [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Adaptive_Slot_Attention_Object_Discovery_with_Dynamic_Slot_Number_CVPR_2024_paper.pdf)

* 目标定位

  * [CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective](http://arxiv.org/abs/2403.06676v1)
:star:[code](https://github.com/snskysk/CAM-Back-Again)



## 14.Human Action Recognition(人体动作识别)

* [STMixer: A One-Stage Sparse Action Detector](http://arxiv.org/abs/2404.09842v1)

* [Adapting Short-Term Transformers for Action Detection in Untrimmed Videos](https://arxiv.org/abs/2312.01897)

* [Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence](https://arxiv.org/abs/2401.00921)
:star:[code](https://github.com/Ruizhuo-Xu/Skeleton2vec)

* [Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition](https://arxiv.org/abs/2403.12710)
:star:[code](https://github.com/f-ilic/SelectivePrivacyPreservation)

* [Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition](https://arxiv.org/pdf/2403.12710.pdf)
:star:[code](https://github.com/f-ilic/SelectivePrivacyPreservation)
:house:[project](https://f-ilic.github.io/SelectivePrivacyPreservation)

* [X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization](http://arxiv.org/abs/2403.19811v1)
:star:[code](https://github.com/annusha/xmic)

* [LLMs are Good Action Recognizers](http://arxiv.org/abs/2404.00532v1)

* [Action Detection via an Image Diffusion Process](http://arxiv.org/abs/2404.01051v1)

* [Language Model Guided Interpretable Video Action Reasoning](http://arxiv.org/abs/2404.01591v1)
:star:[code](https://github.com/NingWang2049/LaIAR)

* [SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos](http://arxiv.org/abs/2404.05206v1)
:house:[project](https://vision.cs.utexas.edu/projects/soundingactions)

* [TIM: A Time Interval Machine for Audio-Visual Action Recognition](http://arxiv.org/abs/2404.05559v1)
:star:[code](https://github.com/JacobChalk/TIM)

* [VicTR: Video-conditioned Text Representations for Activity Recognition](https://arxiv.org/abs/2304.02560)

* [Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes](https://arxiv.org/abs/2311.17948)
:house:[project](https://hcis-lab.github.io/Action-slot/)

* [Narrative Action Evaluation with Prompt-Guided Multimodal Interaction](https://arxiv.org/abs/2404.14471)
:star:[code](https://github.com/shiyi-zh0408/NAE_CVPR2024)

* [Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition](http://arxiv.org/abs/2311.15619)

* [Modality-Collaborative Test-Time Adaptation for Action Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiong_Modality-Collaborative_Test-Time_Adaptation_for_Action_Recognition_CVPR_2024_paper.pdf)

* [CPR-Coach: Recognizing Composite Error Actions based on Single-class Training](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_CPR-Coach_Recognizing_Composite_Error_Actions_based_on_Single-class_Training_CVPR_2024_paper.pdf)

* 基于骨架的动作识别

  * [BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_BlockGCN_Redefine_Topology_Awareness_for_Skeleton-Based_Action_Recognition_CVPR_2024_paper.pdf)

* 基于事件的动作识别

  * [ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More](http://arxiv.org/abs/2403.12534v1)

* 零样本动作识别

  * [Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_Part-aware_Unified_Representation_of_Language_and_Skeleton_for_Zero-shot_Action_CVPR_2024_paper.pdf)

* 细粒度动作识别

  * [PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_PeVL_Pose-Enhanced_Vision-Language_Model_for_Fine-Grained_Human_Action_Recognition_CVPR_2024_paper.pdf)

* 动作定位

  * [Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization](http://arxiv.org/abs/2312.17686)

  * 时序动作定位

    * [Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_Realigning_Confidence_with_Temporal_Saliency_Information_for_Point-Level_Weakly-Supervised_Temporal_CVPR_2024_paper.pdf)
:star:[code](https://github.com/zyxia1009/CVPR2024-TSPNet)

    * [End-to-End Spatio-Temporal Action Localisation with Video Transformers](http://arxiv.org/abs/2304.12160)

   * [Test-Time Zero-Shot Temporal Action Localization](http://arxiv.org/abs/2404.05426)

* 时序动作检测

  * [Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions](http://arxiv.org/abs/2403.20254v1)
:star:[code](https://github.com/Alvin-Zeng/temporal-robustness-benchmark)

  * [TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_TE-TAD_Towards_Full_End-to-End_Temporal_Action_Detection_via_Time-Aligned_Coordinate_CVPR_2024_paper.pdf)

  * [End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames](https://arxiv.org/abs/2311.17241)
:star:[code](https://github.com/sming256/AdaTAD)

  * [Low-power, Continuous Remote Behavioral Localization with Event Cameras](https://arxiv.org/abs/2312.03799)
:house:[project](https://tub-rip.github.io/eventpenguins/)

  * [Dual DETRs for Multi-Label Temporal Action Detection](http://arxiv.org/abs/2404.00653v1)

* 动作质量评估

  * [FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment](https://arxiv.org/abs/2405.06887)

* 群体活动识别

  * [Bi-Causal: Group Activity Recognition via Bidirectional Causality](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Bi-Causal_Group_Activity_Recognition_via_Bidirectional_Causality_CVPR_2024_paper.pdf)

  * [Learning from Synthetic Human Group Activities](http://arxiv.org/abs/2306.16772)

* 人体动作理解

  * [From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding](http://arxiv.org/abs/2304.00553)

* 动作预期

  * [Can’t make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models](https://arxiv.org/abs/2405.20305)

  * [Uncertainty-aware Action Decoupling Transformer for Action Anticipation](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_Uncertainty-aware_Action_Decoupling_Transformer_for_Action_Anticipation_CVPR_2024_paper.pdf)

* 行为定位

  * [Low-power Continuous Remote Behavioral Localization with Event Cameras](http://arxiv.org/abs/2312.03799)



## 13.Human Pose Estimation(人体姿态估计)

* [CLOAF: CoLlisiOn-Aware Human Flow](http://arxiv.org/abs/2403.09050v1)

* [Meta-Point Learning and Refining for Category-Agnostic Pose Estimation](http://arxiv.org/abs/2403.13647v1)

* [SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering](http://arxiv.org/abs/2404.01225v1)
:star:[code](https://taohuumd.github.io/projects/SurMo/)

* [GALA: Generating Animatable Layered Assets from a Single Scan](https://arxiv.org/abs/2401.12979)
:star:[code](https://github.com/snuvclab/GALA)
:house:[project](https://snuvclab.github.io/gala/)

* [ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation](https://arxiv.org/abs/2311.11106)自监督关节形状规范化、分割、检索和变形

* 手部

  * [Authentic Hand Avatar from a Phone Scan via Universal Hand Model](https://arxiv.org/abs/2405.07933)

  * [URHand: Universal Relightable Hands](http://arxiv.org/abs/2401.05334)
:house:[project](https://frozenburning.github.io/projects/urhand/)

  * [OHTA: One-shot Hand Avatar via Data-driven Implicit Priors](http://arxiv.org/abs/2402.18969v1)
:star:[code](https://github.com/zxz267/OHTA)
:house:[project](https://zxz267.github.io/OHTA/)

  * [BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics](https://arxiv.org/abs/2312.07937)
:star:[code](https://github.com/Godheritage/BOTH2Hands)
:house:[project](https://godheritage.github.io/)

  * [Reconstructing Hands in 3D with Transformers](http://arxiv.org/abs/2312.05251)

  * 3D手部姿态估计

    * [HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields](http://arxiv.org/abs/2402.17062v1)
:star:[code](https://github.com/amathislab/HOISDF)

    * [Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation](http://arxiv.org/abs/2403.04381v1)
:star:[code](https://github.com/MickeyLLG/S2DHand)

    * [HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud](https://arxiv.org/abs/2404.03159)
:star:[code](https://github.com/cwc1260/HandDiff)

  * 手部网格重建

    * [Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction](https://arxiv.org/abs/2403.07346)
:star:[code](https://github.com/AlanJiang98/EvRGBHand)
:house:[project](https://alanjiang98.github.io/evrgbhand.github.io/)

    * [HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions](https://arxiv.org/abs/2403.18575)
:star:[code](https://github.com/hxwork/HandBooster_Pytorch)

  * 手部网格恢复

    * [A Simple Baseline for Efficient Hand Mesh Reconstruction](https://arxiv.org/pdf/2403.01813.pdf)
:star:[code](https://github.com/patienceFromZhou/simpleHand)
:house:[project](https://simplehand.github.io/)

    * [HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models](https://arxiv.org/abs/2406.01334)
:house:[project](https://dw1010.github.io/project/HHMR/HHMR.html)

  * 手部姿态跟踪

    * [MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints](http://arxiv.org/abs/2404.10227v1)

  * 手部纹理重建

    * [BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image](http://arxiv.org/abs/2403.08262v1)
:star:[code](https://github.com/yunminjin2/BiTT)
:house:[project](https://yunminjin2.github.io/projects/bitt/)

  * 手势合成

    * [Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation](https://arxiv.org/abs/2311.17532)
:house:[project](https://xingqunqi-lab.github.io/Emo-Transition-Gesture/)

    * [ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis](http://arxiv.org/abs/2403.17936v1)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/ConvoFusion/)

    * [DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation](https://arxiv.org/abs/2401.04747)
:house:[project](https://jeremycjm.github.io/proj/DiffSHEG/)

    * [EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling](https://arxiv.org/abs/2401.00374)
:house:[project](https://pantomatrix.github.io/EMAGE/)

* 人体

  * [LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment](http://arxiv.org/abs/2402.17171v1)

  * [AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation](http://arxiv.org/abs/2403.17934)

  * [LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging](http://arxiv.org/abs/2404.01941)

  * [Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors](http://arxiv.org/abs/2312.02196)

  * [Fast Adaptation for Human Pose Estimation via Meta-Optimization](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_Fast_Adaptation_for_Human_Pose_Estimation_via_Meta-Optimization_CVPR_2024_paper.pdf)

  * [RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_RAM-Avatar_Real-time_Photo-Realistic_Avatar_from_Monocular_Videos_with_Full-body_Control_CVPR_2024_paper.pdf)
:star:[code](https://github.com/Xiang-Deng00/RAM-Avatar/)

  * [SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation](https://arxiv.org/abs/2404.03518)
:star:[code](https://github.com/MartyrPenink/SDPose)

  * 多人姿势估计

    * [DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach](https://openaccess.thecvf.com/content/CVPR2024/papers/Tan_DiffusionRegPose_Enhancing_Multi-Person_Pose_Estimation_using_a_Diffusion-Based_End-to-End_Regression_CVPR_2024_paper.pdf)

    * [RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation](http://arxiv.org/abs/2312.07526)

  * 3D 人体

    * [Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi](https://openaccess.thecvf.com/content/CVPR2024/papers/Yan_Person-in-WiFi_3D_End-to-End_Multi-Person_3D_Pose_Estimation_with_Wi-Fi_CVPR_2024_paper.pdf)

    * [Cross-view and Cross-pose Completion for 3D Human Understanding](http://arxiv.org/abs/2311.09104)

    * [TexVocab: Texture Vocabulary-conditioned Human Avatars](http://arxiv.org/abs/2404.00524)
:house:[project](https://texvocab.github.io/)

    * [MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Ranasinghe_MonoDiff_Monocular_3D_Object_Detection_and_Pose_Estimation_with_Diffusion_CVPR_2024_paper.pdf)

    * [ChatPose: Chatting about 3D Human Pose](https://arxiv.org/abs/2311.18836)
:house:[project](https://yfeng95.github.io/ChatPose/)

    * [SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation](http://arxiv.org/abs/2404.02041v1)
:star:[code](https://github.com/CAMMA-public/SelfPose3D)

    * [Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting](http://arxiv.org/abs/2402.18330v1)

    * [FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations](https://arxiv.org/abs/2211.14309)
:house:[project](https://future-human-3d.christian-diller.de/)

    * [FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models](https://arxiv.org/abs/2405.05216)

    * [PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_PoseIRM_Enhance_3D_Human_Pose_Estimation_on_Unseen_Camera_Settings_CVPR_2024_paper.pdf)

    * [Score-Guided Diffusion for 3D Human Recovery](http://arxiv.org/abs/2403.09623v1)
:star:[code](https://statho.github.io/ScoreHMR)

    * [A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation](http://arxiv.org/abs/2403.11310v1)

    * [KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation](http://arxiv.org/abs/2404.00658v1)
:star:[code](https://github.com/JihuaPeng/KTPFormer)

    * [Multiple View Geometry Transformers for 3D Human Pose Estimation](https://arxiv.org/abs/2311.10983)
:star:[code](https://github.com/XunshanMan/MVGFormer)

    * [Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling](http://arxiv.org/abs/2404.05675v1)

    * [Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning](http://arxiv.org/abs/2404.05218v1)
:star:[code](https://github.com/Jaewoo97/T2P)

    * [EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams](http://arxiv.org/abs/2404.08640v1)
:house:[project](https://4dqv.mpi-inf.mpg.de/EventEgo3D/)

    * [3D Human Pose Perception from Egocentric Stereo Videos](http://arxiv.org/abs/2401.00889)
:star:[code](https://github.com/hiroyasuakada/3D-Human-Pose-Perception-from-Egocentric-Stereo-Videos)
:house:[project](https://4dqv.mpi-inf.mpg.de/UnrealEgo2/)

    * [Forecasting of 3D Whole-body Human Poses with Grasping Objects](https://openaccess.thecvf.com/content/CVPR2024/papers/Yan_Forecasting_of_3D_Whole-body_Human_Poses_with_Grasping_Objects_CVPR_2024_paper.pdf)3D 全身人体姿势

    * [BodyMAP -- Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed](https://arxiv.org/abs/2404.03183)
:star:[code](https://github.com/RCHI-Lab/BodyMAP)
:house:[project](https://bodymap3d.github.io/)

    * [MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Le_MeshPose_Unifying_DensePose_and_3D_Body_Mesh_Reconstruction_CVPR_2024_paper.pdf)

    * [Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches](https://arxiv.org/abs/2405.04771)

    * [Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation](https://arxiv.org/abs/2311.12028)
:star:[code](https://github.com/NationalGAILab/HoT)
:thumbsup:[让视频姿态Transformer变得飞速，北大提出高效三维人体姿态估计框架HoT](https://mp.weixin.qq.com/s/9R9FlYahCKYGErNgsniHYg)

    * [Optimizing Diffusion Noise Can Serve As Universal Motion Priors](https://arxiv.org/abs/2312.11994)
:star:[code](https://github.com/korrawe/Diffusion-Noise-Optimization)
:house:[project](https://korrawe.github.io/dno-project/)

    * [En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data](https://arxiv.org/abs/2401.01173)
:star:[code](https://github.com/menyifang/En3D)
:house:[project](https://menyifang.github.io/projects/En3D/index.html)

* 人体网格恢复/重建

  * [DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery](http://arxiv.org/abs/2404.01424v1)

  * [Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Gwon_Instance-aware_Contrastive_Learning_for_Occluded_Human_Mesh_Reconstruction_CVPR_2024_paper.pdf)

  * [PostureHMR: Posture Transformation for 3D Human Mesh Recovery](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_PostureHMR_Posture_Transformation_for_3D_Human_Mesh_Recovery_CVPR_2024_paper.pdf)

  * [Probabilistic Human Mesh Estimation with Hypothesis Scoring](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_ScoreHypo_Probabilistic_Human_Mesh_Estimation_with_Hypothesis_Scoring_CVPR_2024_paper.pdf)

  * [KITRO:Refining Human Mesh by 2D Clues and Kinematic-tree Rotation](https://arxiv.org/abs/2405.19833)
:star:[code](https://github.com/MartaYang/KITRO)

  * [Semantic Human Mesh Reconstruction with Textures](https://arxiv.org/abs/2403.02561)
:house:[project](https://zhanxy.xyz/projects/shert/)

  * [TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation](https://export.arxiv.org/abs/2404.16752)
:house:[project](https://tokenhmr.is.tue.mpg.de/)

  * [SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes](https://arxiv.org/abs/2308.10638)人体网格

  * [R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization](https://openaccess.thecvf.com/content/CVPR2024/papers/Chan_R-Cyclic_Diffuser_Reductive_and_Cyclic_Latent_Diffusion_for_3D_Clothed_CVPR_2024_paper.pdf)

  * [DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion](https://arxiv.org/abs/2308.16682)
:house:[project](https://diffusionposer.github.io/)

  * [Synergistic Global-space Camera and Human Reconstruction from Videos](https://arxiv.org/abs/2405.14855)

  * [SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion](http://arxiv.org/abs/2311.15855)

  * [SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction](http://arxiv.org/abs/2312.06704)

* 动作捕捉

  * [ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning](http://arxiv.org/abs/2307.01200)
:house:[project](https://zhangyux15.github.io/ProxyCapV2/)

  * [Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement](http://arxiv.org/abs/2311.16495)

  * [Capturing Closely Interacted Two-Person Motions with Reaction Priors](https://openaccess.thecvf.com/content/CVPR2024/papers/Fang_Capturing_Closely_Interacted_Two-Person_Motions_with_Reaction_Priors_CVPR_2024_paper.pdf)
:house:[project](https://netease-gameai.github.io/Dual-Human/)

  * [Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket](https://openaccess.thecvf.com/content/CVPR2024/papers/Zuo_Loose_Inertial_Poser_Motion_Capture_with_IMU-attached_Loose-Wear_Jacket_CVPR_2024_paper.pdf)
:star:[code](https://github.com/ZuoCX1996/Loose-Inertial-Poser)

  * [Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera](https://arxiv.org/abs/2401.00847)
:star:[code](https://github.com/jiyewise/MocapEvery/)
:house:[project](https://jiyewise.github.io/projects/MocapEvery/)

* 3D人体生成

  * [HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation](https://arxiv.org/abs/2310.01406)
:star:[code](https://github.com/xhuangcv/humannorm)
:house:[project](https://humannorm.github.io/)

  * [FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings](https://arxiv.org/abs/2402.15509)
:star:[code](https://github.com/BarqueroGerman/FlowMDM)
:house:[project](https://barquerogerman.github.io/FlowMDM/)人体运动合成

  * [HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion](https://arxiv.org/abs/2311.16961)
:house:[project](https://eckertzhang.github.io/HumanRef.github.io/)

  * [Gaussian Shell Maps for Efficient 3D Human Generation](http://arxiv.org/abs/2311.17857)

* 语音驱动的人体动画

  * [Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion](https://arxiv.org/abs/2312.04466)
:star:[code](https://github.com/kiranchhatre/amuse)
:house:[project](https://amuse.is.tue.mpg.de/)

* 文本提示的人体动画  

  * [HOIAnimator: Generating Text-prompt Human-object Animations using Novel Perceptive Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_HOIAnimator_Generating_Text-prompt_Human-object_Animations_using_Novel_Perceptive_Diffusion_Models_CVPR_2024_paper.pdf)

* 手语翻译

  * [LLMs are Good Sign Language Translators](http://arxiv.org/abs/2404.00925v1)

  * [Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text](http://arxiv.org/abs/2312.02702)

* 3D姿势迁移

  * [Towards Robust 3D Pose Transfer with Adversarial Learning](https://arxiv.org/abs/2404.02242)

  * [Cinematic Behavior Transfer via NeRF-based Differentiable Filming](http://arxiv.org/abs/2311.17754)

* 人体重建

  * [MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_MultiPly_Reconstruction_of_Multiple_People_from_Monocular_Video_in_the_CVPR_2024_paper.pdf)

  * [Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption](http://arxiv.org/abs/2404.11291v1)
:star:[code](https://github.com/boycehbz/HumanInteraction)

  * [Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer](http://arxiv.org/abs/2404.04819v1)
:star:[code](https://github.com/dqj5182/CONTHO_RELEASE)

  * [HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models](http://arxiv.org/abs/2404.04876v1)

  * [ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image](https://openaccess.thecvf.com/content/CVPR2024/papers/Pesavento_ANIM_Accurate_Neural_Implicit_Model_for_Human_Reconstruction_from_a_CVPR_2024_paper.pdf)

  * [Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Diffusion-FOF_Single-View_Clothed_Human_Reconstruction_via_Diffusion-Based_Fourier_Occupancy_Field_CVPR_2024_paper.pdf)

  * [VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_VS_Reconstructing_Clothed_3D_Human_from_Single_Image_via_Vertex_CVPR_2024_paper.pdf)
:star:[code](https://github.com/starVisionTeam/VS)

  * [WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion](https://arxiv.org/abs/2312.07531)
:house:[project](http://wham.is.tue.mpg.de/)3D 运动重建

* 类别无关的姿势估计

  * [ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Nguyen_ESCAPE_Encoding_Super-keypoints_for_Category-Agnostic_Pose_Estimation_CVPR_2024_paper.pdf)

* 视频估计人体动力学

  * [PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos](http://arxiv.org/abs/2404.04430)

* 人体姿势回归

  * [Video-Based Human Pose Regression via Decoupled Space-Time Aggregation](https://arxiv.org/abs/2403.19926)
:star:[code](https://github.com/zgspose/DSTA)

* 3D人体模型

  * [GauHuman: Articulated Gaussian Splatting from Monocular Human Videos](https://arxiv.org/abs/2312.02973)
:star:[code](https://github.com/skhu101/GauHuman)
:house:[project](https://skhu101.github.io/GauHuman/)

* 人体生成

  * [FairRAG: Fair Human Generation via Fair Retrieval Augmentation](https://arxiv.org/abs/2403.19964)

  * [HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting](https://arxiv.org/abs/2311.17061)
:house:[project](https://alvinliu0.github.io/projects/HumanGaussian)文本驱动 3D 人体生成

  * [Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints](http://arxiv.org/abs/2312.08591)
:house:[project](http://cic.tju.edu.cn/faculty/likun/projects/Joint2Human)

  * [MoMask: Generative Masked Modeling of 3D Human Motions](https://arxiv.org/abs/2312.00063)
:house:[project](https://ericguo5513.github.io/momask/)3D 人体运动

* 人体运动理解

  * [HumMUSS: Human Motion Understanding using State Space Models](http://arxiv.org/abs/2404.10880v1)

* 人体形状

  * [Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Distilling_CLIP_with_Dual_Guidance_for_Learning_Discriminative_Human_Body_CVPR_2024_paper.pdf)

* 舞蹈生成

  * [DisCo: Disentangled Control for Realistic Human Dance Generation](https://arxiv.org/abs/2307.00040)
:house:[project](https://disco-dance.github.io/)

  * [POPDG: Popular 3D Dance Generation with PopDanceSet](http://arxiv.org/abs/2405.03178)

  * [DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance](http://arxiv.org/abs/2403.13667v1)
:star:[code](https://github.com/Carmenw1203/DanceCamera3D-Official)

  * [Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives](https://arxiv.org/abs/2403.10518)
:house:[project](https://li-ronghui.github.io/lodge)

  * [Bidirectional Autoregessive Diffusion Model for Dance Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Bidirectional_Autoregessive_Diffusion_Model_for_Dance_Generation_CVPR_2024_paper.pdf)



## 12.Video

* [Learning from One Continuous Video Stream](http://arxiv.org/abs/2312.00598)

* [Deep Video Inverse Tone Mapping Based on Temporal Clues](https://openaccess.thecvf.com/content/CVPR2024/papers/Ye_Deep_Video_Inverse_Tone_Mapping_Based_on_Temporal_Clues_CVPR_2024_paper.pdf)

* [VTimeLLM: Empower LLM to Grasp Video Moments](http://arxiv.org/abs/2311.18445)

* [Combining Frame and GOP Embeddings for Neural Video Representation](https://openaccess.thecvf.com/content/CVPR2024/papers/Saethre_Combining_Frame_and_GOP_Embeddings_for_Neural_Video_Representation_CVPR_2024_paper.pdf)

* [Learning to Predict Activity Progress by Self-Supervised Video Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Donahue_Learning_to_Predict_Activity_Progress_by_Self-Supervised_Video_Alignment_CVPR_2024_paper.pdf)

* [CoDeF: Content Deformation Fields for Temporally Consistent Video Processing](http://arxiv.org/abs/2308.07926)

* [vid-TLDR: Training Free Token Merging for Light-weight Video Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Choi_vid-TLDR_Training_Free_Token_Merging_for_Light-weight_Video_Transformer_CVPR_2024_paper.pdf)
:star:[code](https://github.com/mlvlab/vid-TLDR)

* [Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video](http://arxiv.org/abs/2404.09833v1)
:star:[code](https://video2game.github.io/)

* [Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement](https://arxiv.org/abs/2312.00362)

* [Understanding Video Transformers via Universal Concept Discovery](https://arxiv.org/abs/2401.10831)

* [Video Recognition in Portrait Mode](https://arxiv.org/abs/2312.13746)
:house:[project](http://mingfei.info/PMV)

* [VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams](https://arxiv.org/abs/2312.01407)
:house:[project](https://aoliao12138.github.io/VideoRF)

* [Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living](https://arxiv.org/abs/2311.18840)
:star:[code](https://github.com/dominickrei/pi-vit)

* [A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames](http://arxiv.org/abs/2312.07395)

* [Reliable Video Teller via Equal Distance to Visual Tokens]

* [Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens](https://arxiv.org/abs/2312.08870)
:house:[project](https://jinxxian.github.io/Vista-LLaMA/)

* [Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings](https://openaccess.thecvf.com/content/CVPR2024/papers/Chang_Towards_HDR_and_HFR_Video_from_Rolling-Mixed-Bit_Spikings_CVPR_2024_paper.pdf)

* [Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Stotko_Physics-guided_Shape-from-Template_Monocular_Video_Perception_through_Neural_Surrogate_Models_CVPR_2024_paper.pdf)

* 睡眠监测

  * [SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers](http://arxiv.org/abs/2404.03831v1)

* 视频理解

  * [Compositional Video Understanding with Spatiotemporal Structure-based Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Yun_Compositional_Video_Understanding_with_Spatiotemporal_Structure-based_Transformers_CVPR_2024_paper.pdf)

  * [Action Scene Graphs for Long-Form Understanding of Egocentric Videos](http://arxiv.org/abs/2312.03391)

  * [HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding](http://arxiv.org/abs/2312.03050)

  * [A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives](https://arxiv.org/abs/2403.03037)
:house:[project](https://sapeirone.github.io/EgoPack)

  * [Koala: Key Frame-Conditioned Long Video-LLM](http://arxiv.org/abs/2404.04346)

  * [MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding](http://arxiv.org/abs/2404.05726v1)
:star:[code](https://boheumd.github.io/MA-LMM/)

  * [Abductive Ego-View Accident Video Understanding for Safe Driving Perception](http://arxiv.org/abs/2403.00436v1)
:house:[project](http://www.lotvsmmau.net)

  * [OmniVid: A Generative Framework for Universal Video Understanding](http://arxiv.org/abs/2403.17935v1)
:star:[code](https://github.com/wangjk666/OmniVid)

  * [A Unified Framework for Human-centric Point Cloud Video Understanding](http://arxiv.org/abs/2403.20031v1)

  * [Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection](https://arxiv.org/abs/2311.16464)

  * [MovieChat: From Dense Token to Sparse Memory for Long Video Understanding](https://arxiv.org/abs/2307.16449)
:house:[project](https://rese1f.github.io/MovieChat/)

  * [TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding](https://arxiv.org/abs/2312.02051)
:star:[code](https://github.com/RenShuhuai-Andy/TimeChat)

  * [Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding](https://arxiv.org/abs/2311.08046)
:star:[code](https://github.com/PKU-YuanGroup/Chat-UniVi)

* 视频摘要

  * [Previously on ... From Recaps to Story Summarization](https://arxiv.org/abs/2405.11487)
:house:[project](https://katha-ai.github.io/projects/recap-story-summ/)

  * [Scaling Up Video Summarization Pretraining with Large Language Models](https://arxiv.org/abs/2404.03398)

  * [CSTA: CNN-based Spatiotemporal Attention for Video Summarization](https://arxiv.org/abs/2405.11905)
:star:[code](https://github.com/thswodnjs3/CSTA)

* 视频重建

  * [HDRFlow: Real-Time HDR Video Reconstruction with Large Motions](http://arxiv.org/abs/2403.03447v1)
:star:[code](https://openimaginglab.github.io/HDRFlow/)

* 视频表示

  * [DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes](http://arxiv.org/abs/2403.15679v1)
:house:[project](https://haoyan14.github.io/DS-NeRV)

* 视频判读

  * [Visual Objectification in Films: Towards a New AI Task for Video Interpretation](https://arxiv.org/abs/2401.13296)

* 电影描述

  * [MICap: A Unified Model for Identity-Aware Movie Descriptions](http://arxiv.org/abs/2405.11483)
:house:[project](https://katha-ai.github.io/projects/micap/)

* 视频监控

  * [Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges](http://arxiv.org/abs/2309.13925)
:sunflower:[dataset](https://xuange923.github.io/Surveillance-Video-Understanding)

* 视频预测

  * [Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes](https://openaccess.thecvf.com/content/CVPR2024/papers/Shrivastava_Video_Prediction_by_Modeling_Videos_as_Continuous_Multi-Dimensional_Processes_CVPR_2024_paper.pdf)

  * [ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_ExtDM_Distribution_Extrapolation_Diffusion_Model_for_Video_Prediction_CVPR_2024_paper.pdf)
:star:[code](https://github.com/nku-zhichengzhang/ExtDM)
:house:[project](https://zzcheng.top/ExtDM/)

* 视频稳定

  * [Harnessing Meta-Learning for Improving Full-Frame Video Stabilization](https://arxiv.org/abs/2403.03662)

  * [3D Multi-frame Fusion for Video Stabilization](http://arxiv.org/abs/2404.12887)

* 视频识别

  * [OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition](https://arxiv.org/abs/2312.00096)
:star:[code](https://github.com/tomchen-ctj/OST)
:house:[project](https://tomchen-ctj.github.io/OST/)

* 视频对话

  * [BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_BT-Adapter_Video_Conversation_is_Feasible_Without_Video_Instruction_Tuning_CVPR_2024_paper.pdf)
:star:[code](https://github.com/farewellthree/BT-Adapter)

* 视频重照明

  * [Real-time 3D-aware Portrait Video Relighting](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_Real-time_3D-aware_Portrait_Video_Relighting_CVPR_2024_paper.pdf)

* 视频和谐化

  * [Video Harmonization with Triplet Spatio-Temporal Variation Patterns](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_Video_Harmonization_with_Triplet_Spatio-Temporal_Variation_Patterns_CVPR_2024_paper.pdf)
:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)

* 视频帧插值

  * [Video Frame Interpolation via Direct Synthesis with the Event-based Reference](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Video_Frame_Interpolation_via_Direct_Synthesis_with_the_Event-based_Reference_CVPR_2024_paper.pdf)

  * [IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_IQ-VFI_Implicit_Quadratic_Motion_Estimation_for_Video_Frame_Interpolation_CVPR_2024_paper.pdf)

  * [EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_EVS-assisted_Joint_Deblurring_Rolling-Shutter_Correction_and_Video_Frame_Interpolation_through_CVPR_2024_paper.pdf)

  * [TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_TTA-EVF_Test-Time_Adaptation_for_Event-based_Video_Frame_Interpolation_via_Reliable_CVPR_2024_paper.pdf)

  * [Sparse Global Matching for Video Frame Interpolation with Large Motion](http://arxiv.org/abs/2404.06913v1)
:star:[code](https://sgm-vfi.github.io/)

  * [Perception-Oriented Video Frame Interpolation via Asymmetric Blending](http://arxiv.org/abs/2404.06692v1)
:star:[code](https://github.com/mulns/PerVFI)
:thumbsup:[视频插帧视觉效果新突破！上海交大提出PerVFI，视频插帧新范式](https://mp.weixin.qq.com/s/WXNr5sX9Yzcj5xDaAJdjbw)

  * [SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation](https://arxiv.org/abs/2308.16876)
:house:[project](https://neu-vi.github.io/SportsSlomo/)

* 视频主题交换

  * [VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence](https://arxiv.org/abs/2312.02087)
:house:[project](https://videoswap.github.io/)

* 视频异常检测

  * [Open-Vocabulary Video Anomaly Detection](http://arxiv.org/abs/2311.07042)

  * [Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Multi-Scale_Video_Anomaly_Detection_by_Multi-Grained_Spatio-Temporal_Representation_Learning_CVPR_2024_paper.pdf)

  * [Harnessing Large Language Models for Training-free Video Anomaly Detection](http://arxiv.org/abs/2404.01014v1)
:star:[code](https://lucazanella.github.io/lavad/)

  * [Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline](http://arxiv.org/abs/2404.00847v1)
:star:[code](https://github.com/AnasEmad11/CLAP)

  * [Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Prompt-Enhanced_Multiple_Instance_Learning_for_Weakly_Supervised_Video_Anomaly_Detection_CVPR_2024_paper.pdf)

  * [MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection](https://arxiv.org/abs/2403.14497)

  * [PREGO: Online Mistake Detection in PRocedural EGOcentric Videos](http://arxiv.org/abs/2404.01933v1)

  * [Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors](https://arxiv.org/abs/2306.12041)
:star:[code](https://github.com/ristea/aed-mae)

  * [Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection](http://arxiv.org/abs/2404.08531v1)

  * [GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?](https://arxiv.org/abs/2312.05291)
:house:[project](https://glitchbench.github.io/)大型多模态模型能否检测视频游戏故障

* 视频场景检测

  * [Neighbor Relations Matter in Video Scene Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Tan_Neighbor_Relations_Matter_in_Video_Scene_Detection_CVPR_2024_paper.pdf)

* 视频镜像检测

  * [Effective Video Mirror Detection with Inconsistent Motion Cues](https://openaccess.thecvf.com/content/CVPR2024/papers/Warren_Effective_Video_Mirror_Detection_with_Inconsistent_Motion_Cues_CVPR_2024_paper.pdf)

* 自动生成电影预告片

  * [Towards Automated Movie Trailer Generation](https://arxiv.org/abs/2404.03477)

* 视频对话式音乐推荐系统

  * [MuseChat: A Conversational Music Recommendation System for Videos](https://arxiv.org/abs/2310.06282)

* Video Paragraph Grounding

  * [Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding](http://arxiv.org/abs/2403.11463v1)

* video Grounding

  * [SnAG: Scalable and Accurate Video Grounding](https://arxiv.org/abs/2404.02257)
:star:[code](https://github.com/fmu2/snag_release)

  * [Context-Guided Spatio-Temporal Video Grounding](https://arxiv.org/abs/2401.01578)
:star:[code](https://github.com/HengLan/CGSTVG)

  * [Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding](https://arxiv.org/abs/2401.00901)

  * [What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions](https://arxiv.org/abs/2303.16990)



## 11.3D

* [Rapid 3D Model Generation with Intuitive 3D Input](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Rapid_3D_Model_Generation_with_Intuitive_3D_Input_CVPR_2024_paper.pdf)

* [Instantaneous Perception of Moving Objects in 3D](https://arxiv.org/abs/2405.02781)

* [NEAT: Distilling 3D Wireframes from Neural Attraction Fields](http://arxiv.org/abs/2307.10206)
:star:[code](https://github.com/cherubicXN/neat)

* [Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training](http://arxiv.org/abs/2311.01734)

* [LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_LowRankOcc_Tensor_Decomposition_and_Low-Rank_Recovery_for_Vision-based_3D_Semantic_CVPR_2024_paper.pdf)

* [TexOct: Generating Textures of 3D Models with Octree-based Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_TexOct_Generating_Textures_of_3D_Models_with_Octree-based_Diffusion_CVPR_2024_paper.pdf)

* [Unsupervised 3D Structure Inference from Category-Specific Image Collections](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Unsupervised_3D_Structure_Inference_from_Category-Specific_Image_Collections_CVPR_2024_paper.pdf)

* [Garment Recovery with Shape and Deformation Priors](http://arxiv.org/abs/2311.10356)

* [ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding](https://arxiv.org/abs/2305.08275)
:star:[code](https://github.com/salesforce/ULIP)

* [CAGE: Controllable Articulation GEneration](https://arxiv.org/abs/2312.09570)
:star:[code](https://github.com/3dlg-hcvc/cage)
:house:[project](https://3dlg-hcvc.github.io/cage/)3D

* [Sparse views, Near light: A practical paradigm for uncalibrated point-light photometric stereo](https://arxiv.org/abs/2404.00098)

* [Dispersed Structured Light for Hyperspectral 3D Imaging](https://arxiv.org/abs/2311.18287)

* [G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping](https://arxiv.org/abs/2405.06828)
:star:[code](https://github.com/J-F-Cheng/G-FARS-3DPartGrouping)

* [Wonder3D: Single Image to 3D using Cross-Domain Diffusion](https://arxiv.org/abs/2310.15008)
:house:[project](https://www.xxlong.site/Wonder3D/)

* [UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence](https://arxiv.org/abs/2405.06903)
:star:[code](https://github.com/luhr2003/UniGarmentManip)服装操作

* [GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo](http://arxiv.org/abs/2404.07992v1)
:star:[code](https://github.com/Wuuu3511/GoMVS)
:star:[code](https://wuuu3511.github.io/gomvs/)

* [EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Priors](https://arxiv.org/abs/2308.13223)
:house:[project](https://efficientdreamer.github.io/)

* [MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation](https://arxiv.org/abs/2404.03656)
:house:[project](https://mvd-fusion.github.io/)

* [Digital Life Project: Autonomous 3D Characters with Social Intelligence](https://arxiv.org/abs/2312.04547)
:house:[project](https://digital-life-project.com/)

* [Image Sculpting: Precise Object Editing with 3D Geometry Control](https://arxiv.org/abs/2401.01702)
:house:[project](https://image-sculpting.github.io/)

* [TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_TutteNet_Injective_3D_Deformations_by_Composition_of_2D_Mesh_Deformations_CVPR_2024_paper.pdf)

* [Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception](https://arxiv.org/abs/2405.07201)

* [GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting](https://arxiv.org/abs/2311.14521)
:star:[code](https://github.com/buaacyw/GaussianEditor)
:house:[project](https://buaacyw.github.io/gaussian-editor/)

* [SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SHAP-EDITOR_Instruction-Guided_Latent_3D_Editing_in_Seconds_CVPR_2024_paper.pdf)

* [ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images](https://arxiv.org/abs/2404.15707)

* [Differentiable Display Photometric Stereo](https://arxiv.org/abs/2306.13325)

* [ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion](https://arxiv.org/abs/2310.10343)
:star:[code](https://github.com/JiayuYANG/ConsistNet)
:house:[project](https://jiayuyang.github.io/Consist_Net/)

* [Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps](https://arxiv.org/abs/2312.13216)

* [REACTO: Reconstructing Articulated Objects from a Single Video](http://arxiv.org/abs/2404.11151)
:star:[code](https://github.com/ChaoyueSong/REACTO)

* [Low-Latency Neural Stereo Streaming](http://arxiv.org/abs/2403.17879v1)

* [Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes](http://arxiv.org/abs/2403.01414v1)

* [Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation](http://arxiv.org/abs/2403.01619v1)
:house:[project](https://bit.ly/saucd)

* [Wired Perspectives: Multi-View Wire Art Embraces Generative AI](https://arxiv.org/abs/2311.15421)
:star:[code](https://github.com/WinKawaks/DreamWire)
:house:[project](https://dreamwireart.github.io/)

* [Memory-based Adapters for Online 3D Scene Perception](http://arxiv.org/abs/2403.06974v1)
:star:[code](https://xuxw98.github.io/Online3D/)

* [FastMAC: Stochastic Spectral Sampling of Correspondence Graph](http://arxiv.org/abs/2403.08770v1)
:star:[code](https://github.com/Forrest-110/FastMAC)

* [One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion](https://arxiv.org/pdf/2311.07885.pdf)
:star:[code](https://github.com/SUDO-AI-3D/One2345plus)
:house:[project](https://sudo-ai-3d.github.io/One2345plus_page/)

* [PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm](https://arxiv.org/abs/2310.08586)
:house:[project](https://github.com/OpenGVLab/PonderV2)

* [CityDreamer: Compositional Generative Model of Unbounded 3D Cities](https://arxiv.org/abs/2309.00610)
:house:[project](https://www.infinitescript.com/project/city-dreamer)

* [EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI](https://arxiv.org/abs/2312.16170)
:star:[code](https://github.com/OpenRobotLab/EmbodiedScan)

* [Mosaic-SDF for 3D Generative Models](https://arxiv.org/abs/2312.09222)
:house:[project](https://lioryariv.github.io/msdf)

* [Federated Online Adaptation for Deep Stereo](https://arxiv.org/abs/2405.14873)
:house:[project](https://fedstereo.github.io/)

* [ControlRoom3D: Room Generation using Semantic Proxy Rooms](https://openaccess.thecvf.com/content/CVPR2024/papers/Schult_ControlRoom3D_Room_Generation_using_Semantic_Proxy_Rooms_CVPR_2024_paper.pdf)

* 三维视觉

  * [Situational Awareness Matters in 3D Vision Language Reasoning](https://arxiv.org/abs/2406.07544)
:house:[project](https://yunzeman.github.io/situation3d)

  * [DUSt3R: Geometric 3D Vision Made Easy](https://arxiv.org/abs/2312.14132)

  * [Towards 3D Vision with Low-Cost Single-Photon Cameras](https://arxiv.org/abs/2403.17801)

* 三维重建

  * [3D Neural Edge Reconstruction](http://arxiv.org/abs/2405.19295)

  * [3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces](https://arxiv.org/abs/2403.08768)
:house:[project](https://jinlinyi.github.io/3DFIRES/)
:tv:[video](https://www.youtube.com/watch?v=k_WJPOG9uMU)

  * [PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_PanoRecon_Real-Time_Panoptic_3D_Reconstruction_from_Monocular_Video_CVPR_2024_paper.pdf)

  * [NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Han_NeRSP_Neural_3D_Reconstruction_for_Reflective_Objects_with_Sparse_Polarized_CVPR_2024_paper.pdf)

  * [NTO3D: Neural Target Object 3D Reconstruction with Segment Anything](http://arxiv.org/abs/2309.12790)

  * [pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction](http://arxiv.org/abs/2312.12337)

  * [ReconFusion: 3D Reconstruction with Diffusion Priors](http://arxiv.org/abs/2312.02981)

  * [VGGSfM: Visual Geometry Grounded Deep Structure From Motion](https://arxiv.org/abs/2312.04563)
:star:[code](https://github.com/facebookresearch/vggsfm)
:house:[project](https://vggsfm.github.io/)

  * [Slice3D: Multi-Slice Occlusion-Revealing Single View 3D Reconstruction](https://arxiv.org/abs/2312.02221)
:house:[project](https://yizhiwang96.github.io/Slice3D/)

  * [GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction](https://arxiv.org/abs/2402.16174)

  * [Coherence As Texture - Passive Textureless 3D Reconstruction by Self-interference](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Coherence_As_Texture_-_Passive_Textureless_3D_Reconstruction_by_Self-interference_CVPR_2024_paper.pdf)
:star:[code](https://github.com/Image-Science-Lab-cmu/CoherenceAsTexture)

  * [Structure-Aware Sparse-View X-ray 3D Reconstruction](https://arxiv.org/abs/2311.10959)
:star:[code](https://github.com/caiyuanhao1998/SAX-NeRF)
:thumbsup:[如何给 NeRF 开透视眼？](https://mp.weixin.qq.com/s/lh7LnwsHJlx2FvxBdkMryQ)

  * [Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers](https://arxiv.org/abs/2312.09147)
:house:[project](https://zouzx.github.io/TriplaneGaussian/)

  * [Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning](https://arxiv.org/abs/2312.13980)
:star:[code](https://github.com/desaixie/carve3d)
:house:[project](https://desaixie.github.io/carve-3d/)多视图重建

  * [PlatoNeRF: 3D Reconstruction in Plato’s Cave via Single-View Two-Bounce Lidar](https://arxiv.org/abs/2312.14239)
:house:[project](https://platonerf.github.io/)

  * [WonderJourney: Going from Anywhere to Everywhere](https://arxiv.org/abs/2312.03884)
:house:[project](https://kovenyu.com/WonderJourney/)

  * [Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments](https://arxiv.org/abs/2312.09138)
:star:[code](https://github.com/GradientSpaces/LivingScenes)
:house:[project](https://www.zhuliyuan.net/livingscenes)

  * [DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans](http://arxiv.org/abs/2404.00485v1)

  * [IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images](http://arxiv.org/abs/2404.00269v1)
:star:[code](https://yushuang-wu.github.io/IPoD)

  * [Splatter Image: Ultra-Fast Single-View 3D Reconstruction](https://arxiv.org/abs/2312.13150)
:star:[code](https://github.com/szymanowiczs/splatter-image)
:house:[project](https://szymanowiczs.github.io/splatter-image.html)

  * [PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar](https://openaccess.thecvf.com/content/CVPR2024/papers/Klinghoffer_PlatoNeRF_3D_Reconstruction_in_Platos_Cave_via_Single-View_Two-Bounce_Lidar_CVPR_2024_paper.pdf)
:star:[code](https://github.com/facebookresearch/PlatoNeRF)
:house:[project](https://platonerf.github.io/)

  * [MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections](http://arxiv.org/abs/2403.10815v1)
:star:[code](https://github.com/UCSC-VLAA/MicroDiffusion)

  * [ZeroShape: Regression-based Zero-shot Shape Reconstruction](http://arxiv.org/abs/2312.14198)
:star:[code](https://github.com/zxhuang1698/ZeroShape)
:house:[project](https://zixuanh.com/projects/zeroshape.html)

  * [DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction](http://arxiv.org/abs/2403.05005v1)

  * [G3DR: Generative 3D Reconstruction in ImageNet](https://arxiv.org/abs/2403.00939)
:star:[code](https://github.com/preddy5/G3DR)
:house:[project](https://preddy5.github.io/g3dr_website/)

  * [3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface](http://arxiv.org/abs/2403.08768v1)
:star:[code](https://jinlinyi.github.io/3DFIRES/)

  * [Bayesian Diffusion Models for 3D Shape Reconstruction](http://arxiv.org/abs/2403.06973v1)

  * [RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Brument_RNb-NeuS_Reflectance_and_Normal-based_Multi-View_3D_Reconstruction_CVPR_2024_paper.pdf)

  * [ZeroRF: Fast Sparse View 360deg Reconstruction with Zero Pretraining](https://arxiv.org/abs/2312.09249)
:house:[project](https://sarahweiii.github.io/zerorf/)视图 360° 重建

* 表面重建

  * [SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration](https://arxiv.org/abs/2312.04803)

  * [MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Santo_MVCPS-NeuS_Multi-view_Constrained_Photometric_Stereo_for_Neural_Surface_Reconstruction_CVPR_2024_paper.pdf)

  * [MorpheuS: Neural Dynamic 360deg Surface Reconstruction from Monocular RGB-D Video](https://arxiv.org/abs/2312.00778)
:star:[code](https://github.com/HengyiWang/MorpheuS)
:house:[project](https://hengyiwang.github.io/projects/morpheus)

  * [UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets](http://arxiv.org/abs/2403.05086v1)
:star:[code](https://github.com/Youngju-Na/UFORecon)
:star:[code](https://youngju-na.github.io/uforecon.github.io/)

  * [UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets](https://arxiv.org/abs/2403.05086)
:star:[code](https://github.com/Youngju-Na/UFORecon)

* 三维网格重建

  * [SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering](https://arxiv.org/abs/2311.12775)
:house:[project](https://anttwo.github.io/sugar/)

* 三维形状

  * [GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_GPLD3D_Latent_Diffusion_of_3D_Shape_Generative_Models_by_Enforcing_CVPR_2024_paper.pdf)

  * [TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding](https://arxiv.org/abs/2402.18490)
:star:[code](https://alanzhangcs.github.io/tamm-page)

  * [Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes](https://arxiv.org/abs/2312.04043)
:house:[project](https://hmrishavbandy.github.io/doodle23d/)

  * [ShapeWalk: Compositional Shape Editing Through Language-Guided Chains](https://openaccess.thecvf.com/content/CVPR2024/papers/Slim_ShapeWalk_Compositional_Shape_Editing_Through_Language-Guided_Chains_CVPR_2024_paper.pdf)
:star:[code](https://shapewalk.github.io/TODO)
:house:[project](https://shapewalk.github.io/)

  * [Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation](http://arxiv.org/abs/2402.18920v1)

  * [Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships](https://arxiv.org/abs/2402.12259)
:house:[project](https://kochsebastian.com/open3dsg)

  * [FSC: Few-point Shape Completion](http://arxiv.org/abs/2403.07359v1)

  * [3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation](https://arxiv.org/abs/2311.09571)
:star:[code](https://github.com/threedle/3d-paintbrush)
:house:[project](https://threedle.github.io/3d-paintbrush/)3D 形状

  * [Category-Level Multi-Part Multi-Joint 3D Shape Assembly](http://arxiv.org/abs/2303.06163)

  * [Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation](https://arxiv.org/abs/2312.14124)

* Stereo Matching

  * [Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching](http://arxiv.org/abs/2403.00486v1)
:star:[code](https://github.com/Windsrain/Selective-Stereo)

  * [LoS: Local Structure-Guided Stereo Matching](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_LoS_Local_Structure-Guided_Stereo_Matching_CVPR_2024_paper.pdf)

  * [Robust Synthetic-to-Real Transfer for Stereo Matching](http://arxiv.org/abs/2403.07705v1)

  * [Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching](https://arxiv.org/abs/2306.15612)

  * [Neural Markov Random Field for Stereo Matching](http://arxiv.org/abs/2403.11193v1)
:star:[code](https://github.com/aeolusguan/NMRF)

  * [Reusable Architecture Growth for Continual Stereo Matching](http://arxiv.org/abs/2404.00360v1)

  * [MoCha-Stereo: Motif Channel Attention Network for Stereo Matching](http://arxiv.org/abs/2404.06842v1)
:star:[code](https://github.com/ZYangChen/MoCha-Stereo)
:house:[project](https://www.cvlibs.net/datasets/kitti/eval_scene_flow_detail.php?benchmark=stereo&result=8ad7a3fbb8e4bd9964afabac7e5a3babed26c0df)

  * [Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching](https://arxiv.org/abs/2402.19270)

* 表面法线估计

  * [Rethinking Inductive Biases for Surface Normal Estimation](http://arxiv.org/abs/2403.00712v1)
:star:[code](https://github.com/baegwangbin/DSINE)

* 特征匹配

  * [OmniGlue: Generalizable Feature Matching with Foundation Model Guidance](https://arxiv.org/abs/2405.12979)

  * [RoMa: Robust Dense Feature Matching](https://openaccess.thecvf.com/content/CVPR2024/papers/Edstedt_RoMa_Robust_Dense_Feature_Matching_CVPR_2024_paper.pdf)

  * [Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed](http://arxiv.org/abs/2403.04765v1)
:star:[code](https://zju3dv.github.io/efficientloftr)

* 三维检索

  * [KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation](http://arxiv.org/abs/2403.10099v1)
:star:[code](https://github.com/lolrudy/KP-RED)

* 深度补全

  * [Flexible Depth Completion for Sparse and Varying Point Densities](https://openaccess.thecvf.com/content/CVPR2024/papers/Park_Flexible_Depth_Completion_for_Sparse_and_Varying_Point_Densities_CVPR_2024_paper.pdf)

  * [Improving Depth Completion via Depth Feature Upsampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Improving_Depth_Completion_via_Depth_Feature_Upsampling_CVPR_2024_paper.pdf)

  * [Test-Time Adaptation for Depth Completion](https://arxiv.org/abs/2402.03312)

  * [Bilateral Propagation Network for Depth Completion](http://arxiv.org/abs/2403.11270v1)

  * [DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions](http://arxiv.org/abs/2403.12202v1)

  * [Tri-Perspective View Decomposition for Geometry-Aware Depth Completion](http://arxiv.org/abs/2403.15008v1)
:star:[code](https://yanzq95.github.io/projectpage/TOFDC/index.html)

* 深度估计

  * [Cross-spectral Gated-RGB Stereo Depth Estimation](http://arxiv.org/abs/2405.12759)

  * [Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation](http://arxiv.org/abs/2404.14908)

  * [Depth Prompting for Sensor-Agnostic Depth Estimation](https://arxiv.org/abs/2405.11867)

  * [Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion](https://arxiv.org/abs/2312.12471)
:star:[code](https://github.com/zkawfanx/Atlantis)

  * [On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation](http://arxiv.org/abs/2404.08540v1)
:house:[project](https://agneetchatterjee.com/robustness_depth_lang/)

  * [Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation](https://arxiv.org/abs/2212.05315)
:star:[code](https://github.com/liortalker/MindTheEdge)

  * [PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation](https://arxiv.org/abs/2312.02284)

  * [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://arxiv.org/abs/2312.02145)
:house:[project](https://marigoldmonodepth.github.io/)

  * [Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion](http://arxiv.org/abs/2403.16376v1)

  * [ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation](http://arxiv.org/abs/2403.18807v1)
:star:[code](https://github.com/Aradhye2002/EcoDepth)

  * [From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior](https://arxiv.org/abs/2312.10118)

  * [UniDepth: Universal Monocular Metric Depth Estimation](https://arxiv.org/abs/2403.18913)
:star:[code](https://github.com/lpiccinelli-eth/unidepth)

  * [WorDepth: Variational Language Prior for Monocular Depth Estimation](https://arxiv.org/abs/2404.03635)

  * [SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing](http://arxiv.org/abs/2312.04553)

  * [Snapshot Lidar: Fourier Embedding of Amplitude and Phase for Single-Image Depth Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Friday_Snapshot_Lidar_Fourier_Embedding_of_Amplitude_and_Phase_for_Single-Image_CVPR_2024_paper.pdf)

* 全景定位

  * [Fully Geometric Panoramic Localization](http://arxiv.org/abs/2403.19904v1)
:star:[code](https://82magnolia.github.io/fgpl/)

* 3D关键点检测

  * [Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features](https://arxiv.org/abs/2311.18113)
:house:[project](https://wimmerth.github.io/back-to-3d.html)

* 布局重建

  * [Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction](https://arxiv.org/abs/2311.18695)

  * [No More Ambiguity in 360deg Room Layout via Bi-Layout Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Tsai_No_More_Ambiguity_in_360deg_Room_Layout_via_Bi-Layout_Estimation_CVPR_2024_paper.pdf)
:star:[code](https://liagm.github.io/Bi_Layout/)

* CAD 重建

  * [SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_SfmCAD_Unsupervised_CAD_Reconstruction_by_Learning_Sketch-based_Feature_Modeling_Operations_CVPR_2024_paper.pdf)无监督 CAD 重建

* 形状匹配

  * [Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching](https://arxiv.org/abs/2312.03678)

* 3DGS

  * [COLMAP-Free 3D Gaussian Splatting](https://arxiv.org/abs/2312.07504)
:star:[code](https://oasisyang.github.io/colmap-free-3dgs/)
:house:[project](https://oasisyang.github.io/colmap-free-3dgs/)

  * [Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields](http://arxiv.org/abs/2312.03203)

  * [GS-IR: 3D Gaussian Splatting for Inverse Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_GS-IR_3D_Gaussian_Splatting_for_Inverse_Rendering_CVPR_2024_paper.pdf)

  * [FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization](https://arxiv.org/abs/2403.06908)
:house:[project](https://rogeraigc.github.io/FreGS-Page/)

  * [Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering](https://arxiv.org/abs/2312.00109)
:house:[project](https://city-super.github.io/scaffold-gs/)

  * [GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models](https://arxiv.org/abs/2310.08529)
:house:[project](https://taoranyi.com/gaussiandreamer/)

  * [Mip-Splatting: Alias-free 3D Gaussian Splatting](https://arxiv.org/abs/2311.16493)
:star:[code](https://github.com/autonomousvision/mip-splatting)
:house:[project](https://niujinshuchong.github.io/mip-splatting/)

  * [CoGS: Controllable Gaussian Splatting](https://arxiv.org/abs/2312.05664)
:star:[code](https://github.com/Heng14/CoGS/tree/main)
:house:[project](https://cogs2024.github.io/)

  * [LangSplat: 3D Language Gaussian Splatting](https://arxiv.org/abs/2312.16084)
:star:[code](https://github.com/minghanqin/LangSplat)
:house:[project](https://langsplat.github.io/)

  * [Compact 3D Gaussian Representation for Radiance Field](https://arxiv.org/abs/2311.13681)
:house:[project](https://maincold2.github.io/c3dgs/)

  * [3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos](http://arxiv.org/abs/2403.01444v1)
:house:[project](https://sjojok.github.io/3dgstream)

  * [HUGS: Human Gaussian Splatting]

  * [HUGS: Human Gaussian Splats](https://arxiv.org/abs/2311.17910)
:star:[code](https://github.com/apple/ml-hugs)
:house:[project](https://machinelearning.apple.com/research/hugs)

  * [Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering](https://arxiv.org/abs/2311.17089)3DGS

  * [GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces](https://arxiv.org/abs/2311.17977)

* 场景重建

  * [Gated Fields: Learning Scene Reconstruction from Gated Videos](http://arxiv.org/abs/2405.19819)

  * [Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses](http://arxiv.org/abs/2404.14410)

  * [SuperPrimitive: Scene Reconstruction at a Primitive Level](https://arxiv.org/abs/2312.05889)
:house:[project](https://makezur.github.io/SuperPrimitive/)

  * [Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction](http://arxiv.org/abs/2403.19314v1)
:star:[code](https://github.com/CVMI-Lab/Total-Decom.git)

  * [Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion](http://arxiv.org/abs/2404.03070)

  * [OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees](http://arxiv.org/abs/2404.00678v1)

  * [VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction](http://arxiv.org/abs/2402.17427v1)
:star:[code](https://vastgaussian.github.io)

  * [Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction](https://arxiv.org/abs/2309.13101)
:star:[code](https://github.com/ingra14m/Deformable-3D-Gaussians)
:house:[project](https://ingra14m.github.io/Deformable-Gaussians/)
:thumbsup:[CVPR 2024满分论文：浙大提出基于可变形三维高斯的高质量单目动态重建新方法](https://mp.weixin.qq.com/s/VY3XdR2gsXsHcLfO2z1zWA)

  * [Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts](https://arxiv.org/abs/2406.03461)
:house:[project](https://light.princeton.edu/publication/pollidar)

* 3D 场景合成

  * [GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs](https://arxiv.org/abs/2312.00093)
:house:[project](https://graphdreamer.github.io/)

  * [DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation](https://arxiv.org/abs/2306.00519)
:star:[code](https://github.com/AkiraHero/diffindscene)

  * [BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation](https://arxiv.org/abs/2312.02136)
:house:[project](https://zqh0253.github.io/BerfScene/)3D 场景生成

  * [Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion](http://arxiv.org/abs/2401.10786)

  * 文本驱动的 3D 场景生成

    * [3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation](https://arxiv.org/abs/2403.09439)

    * [Towards Text-guided 3D Scene Composition](https://arxiv.org/abs/2312.08885)
:house:[project](https://zqh0253.github.io/SceneWiz3D/)3D 场景合成

* 3D 场景图

  * [SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks](https://arxiv.org/abs/2403.19474)

* 3D 场景编辑

  * [GaussianEditor:Editing 3D Gaussians Delicately with Text Instructions](https://arxiv.org/abs/2311.16037)
:house:[project](https://gaussianeditor.github.io/)

  * [Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training](http://arxiv.org/abs/2312.01663)
:thumbsup:[文本或图像提示精准编辑3D场景，美图&信工所&北航&中大联合提出3D编辑方法CustomNeRF](https://mp.weixin.qq.com/s/iMOJdboRx7Z8X0JRakfXNA)

  * [PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI](http://arxiv.org/abs/2404.09465v1)
:star:[code](http://physcene.github.io)

  * [Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes](https://arxiv.org/abs/2311.15637)
:house:[project](http://buaavrcg.github.io/Neural3DStrokes)3D 场景

  * [PAPR in Motion: Seamless Point-level 3D Scene Interpolation](https://openaccess.thecvf.com/content/CVPR2024/papers/Peng_PAPR_in_Motion_Seamless_Point-level_3D_Scene_Interpolation_CVPR_2024_paper.pdf)3D 场景插值

  * [ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_ConsistDreamer_3D-Consistent_2D_Diffusion_for_High-Fidelity_Scene_Editing_CVPR_2024_paper.pdf)

* 语义匹配

  * [SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching](https://arxiv.org/abs/2310.17569)
:star:[code](https://github.com/ActiveVisionLab/SD4Match)
:house:[project](https://sd4match.active.vision/)

* 室内照明估计

  * [LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation](https://arxiv.org/abs/2404.03925)

* 三维服装生成

  * [Design2Cloth: 3D Cloth Generation from 2D Masks](https://arxiv.org/abs/2404.02686)
:house:[project](https://jiali-zheng.github.io/Design2Cloth/)

* 3D 形状匹配

  * [SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency](https://openaccess.thecvf.com/content/CVPR2024/papers/Roetzer_SpiderMatch_3D_Shape_Matching_with_Global_Optimality_and_Geometric_Consistency_CVPR_2024_paper.pdf)



## 10.Medical Image Progress(医学影响处理)

* [Brain Decodes Deep Nets](http://arxiv.org/abs/2312.01280)

* [Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology](http://arxiv.org/abs/2402.17228v1)
:star:[code](https://github.com/DearCaat/RRT-MIL)

* [MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning](http://arxiv.org/abs/2402.02045)

* [Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling](http://arxiv.org/abs/2403.01053v2)

* [Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning](https://arxiv.org/abs/2311.17597)
:star:[code](https://github.com/yeerwen/MedCoSS)

* [MindBridge: A Cross-Subject Brain Decoding Framework](http://arxiv.org/abs/2404.07850v1)
:star:[code](https://github.com/littlepure2333/MindBridge)
:star:[code](https://littlepure2333.github.io/MindBridge)

* [MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant](http://arxiv.org/abs/2403.04290v1)

* [Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images](http://arxiv.org/abs/2404.01464v1)
:star:[code](https://github.com/jungeun122333/UVI-Net)

* [PairAug: What Can Augmented Image-Text Pairs Do for Radiology?](http://arxiv.org/abs/2404.04960v1)
:star:[code](https://github.com/YtongXie/PairAug)

* [Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Shao_Tumor_Micro-environment_Interactions_Guided_Graph_Learning_for_Survival_Analysis_of_CVPR_2024_paper.pdf)

* [C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction](https://arxiv.org/abs/2406.03902)
:star:[code](https://github.com/xmed-lab/C2RV-CBCT)

* [VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis](http://arxiv.org/abs/2402.17300)

* [Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts](http://arxiv.org/abs/2312.02567)

* CT

  * [QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction](http://arxiv.org/abs/2402.17951v1)
:star:[code](https://towzeur.github.io/QN-Mixer/)

  * [Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models](http://arxiv.org/abs/2404.04936v1)

* 切片分类

  * [Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction](http://arxiv.org/abs/2402.19326v1)

  * [Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis](http://arxiv.org/abs/2403.07719v1)
:star:[code](https://github.com/WonderLandxD/WiKG)

  * [Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification](http://arxiv.org/abs/2403.07939v1)
:house:[project](https://vilab.hit.edu.cn/projects/pamil)

  * [ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_ViLa-MIL_Dual-scale_Vision-Language_Multiple_Instance_Learning_for_Whole_Slide_Image_CVPR_2024_paper.pdf)

* 肿瘤合成

  * [Towards Generalizable Tumor Synthesis](http://arxiv.org/abs/2402.19470v1)

* 病理检测

  * [Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework](http://arxiv.org/abs/2403.07636)
:star:[code](https://github.com/HieuPhan33/MAVL)

* 基因检测

  * [Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features](https://arxiv.org/abs/2403.07592)
:star:[code](https://github.com/NEXGEM/TRIPLEX)基因预测

* 癌症检测

  * [FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders](http://arxiv.org/abs/2403.08848v1)
:star:[code](https://github.com/sbasu276/FocusMAE)

* 医学图像配准

  * [Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration](https://arxiv.org/abs/2406.00123)入围最佳论文

  * [Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration](http://arxiv.org/abs/2402.18933v1)

* 医学图像分类

  * [Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification](http://arxiv.org/abs/2307.08919)

* 医学图像分割

  * [Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation](http://arxiv.org/abs/2306.02416)

  * [One-Prompt to Segment All Medical Images](https://arxiv.org/abs/2305.10300)

  * [Diversified and Personalized Multi-rater Medical Image Segmentation](http://arxiv.org/abs/2403.13417v1)
:star:[code](https://github.com/ycwu1997/D-Persona)

  * [Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding](http://arxiv.org/abs/2403.18271v1)
:star:[code](https://github.com/Cccccczh404/H-SAM)

  * [Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation](http://arxiv.org/abs/2311.10696)

  * [Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation](http://arxiv.org/abs/2405.00378)

  * [MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling](http://arxiv.org/abs/2303.09373)

  * [EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation](https://arxiv.org/abs/2405.06880)

  * [Tyche: Stochastic In-Context Learning for Medical Image Segmentation](https://arxiv.org/abs/2401.13650)

  * [Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention](https://arxiv.org/abs/2405.06284)
:house:[project](https://skawngus1111.github.io/MADGNet_project/)

  * [Clustering Propagation for Universal Medical Image Segmentation](http://arxiv.org/abs/2403.16646v1)

  * [Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling](https://arxiv.org/abs/2309.12378)无监督语义分割

  * [MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_MemSAM_Taming_Segment_Anything_Model_for_Echocardiography_Video_Segmentation_CVPR_2024_paper.pdf)
:star:[code](https://github.com/dengxl0520/MemSAM)超声心动图视频分割

  * [Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Bi-level_Learning_of_Task-Specific_Decoders_for_Joint_Registration_and_One-Shot_CVPR_2024_paper.pdf)

  * [Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation](https://arxiv.org/abs/2404.08951)
:star:[code](https://github.com/MQinghe/MiDSS)

  * [Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Incremental_Nuclei_Segmentation_from_Histopathological_Images_via_Future-class_Awareness_and_CVPR_2024_paper.pdf)
:star:[code](https://github.com/why19991/InSeg)细胞核分割

  * [PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_PH-Net_Semi-Supervised_Breast_Lesion_Segmentation_via_Patch-wise_Hardness_CVPR_2024_paper.pdf)
:star:[code](https://github.com/jjjsyyy/PH-Net)半监督乳腺病变分割

  * [PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation](https://arxiv.org/abs/2402.19286)全景肾脏病理分割

  * [Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation](https://arxiv.org/abs/2311.18363)
:star:[code](https://github.com/Chen-Ziyang/VPTTA)

* X-ray

  * [Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering](https://arxiv.org/abs/2312.06358)
:star:[code](https://github.com/eigenvivek/DiffPose)

* MRI

  * [Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI](http://arxiv.org/abs/2403.10064v1)

  * [Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution](http://arxiv.org/abs/2404.04785v1)

  * [Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI](https://arxiv.org/abs/2312.03102)
:star:[code](http://github.com/seannz/svr)

* 异常检测

  * [Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images](http://arxiv.org/abs/2403.12570v1)
:star:[code](https://github.com/MediaBrain-SJTU/MVFA-AD)

* 脑活动

  * [Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity](http://arxiv.org/abs/2403.20022v1)

* 生存预测

  * [Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction](http://arxiv.org/abs/2304.06819)
:star:[code](https://github.com/mahmoodlab/SurvPath)

* 计算病理学

  * [Transcriptomics-guided Slide Representation Learning in Computational Pathology](https://arxiv.org/abs/2405.11618)

  * [XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Yin_XFibrosis_Explicit_Vessel-Fiber_Modeling_for_Fibrosis_Staging_from_Liver_Pathology_CVPR_2024_paper.pdf)

* 组织病理学

  * [SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology](https://arxiv.org/abs/2312.15010)

  * [CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Javed_CPLIP_Zero-Shot_Learning_for_Histopathology_with_Comprehensive_Vision-Language_Alignment_CVPR_2024_paper.pdf)

  * [Prompting Vision Foundation Models for Pathology Image Analysis](https://openaccess.thecvf.com/content/CVPR2024/papers/Yin_Prompting_Vision_Foundation_Models_for_Pathology_Image_Analysis_CVPR_2024_paper.pdf)

  * [Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos](https://arxiv.org/abs/2312.04746)
:house:[project](http://quilt-llava.github.io/)

* 医学超分辨率

  * [CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data](https://arxiv.org/abs/2404.04878)

* 3D医学影像

  * [ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images](https://export.arxiv.org/abs/2404.13103)

* 放射学报告生成

  * [Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Bu_Instance-level_Expert_Knowledge_and_Aggregate_Discriminative_Attention_for_Radiology_Report_CVPR_2024_paper.pdf)

* 放射学报告检索

  * [AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Yan_AHIVE_Anatomy-aware_Hierarchical_Vision_Encoding_for_Interactive_Radiology_Report_Retrieval_CVPR_2024_paper.pdf)

* 医学基础模型

  * [Low-Rank Knowledge Decomposition for Medical Foundation Models](http://arxiv.org/abs/2404.17184)

* 肿瘤分割

  * [ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting](https://arxiv.org/abs/2312.04964)

* 基因表达预测

  * [Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features](http://arxiv.org/abs/2403.07592)



## 9.Face(人脸)

* [Unsupervised Gaze Representation Learning from Multi-view Face Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Bao_Unsupervised_Gaze_Representation_Learning_from_Multi-view_Face_Images_CVPR_2024_paper.pdf)

* [ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing](https://openaccess.thecvf.com/content/CVPR2024/papers/Thakral_ToonerGAN_Reinforcing_GANs_for_Obfuscating_Automated_Facial_Indexing_CVPR_2024_paper.pdf)

* [PairDETR : Joint Detection and Association of Human Bodies and Faces](https://openaccess.thecvf.com/content/CVPR2024/papers/Ali_PairDETR__Joint_Detection_and_Association_of_Human_Bodies_and_CVPR_2024_paper.pdf)

* [Neural Implicit Morphing of Face Images](http://arxiv.org/abs/2308.13888)
:house:[project](https://schardong.github.io/ifmorph/)

* [SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models](http://arxiv.org/abs/2312.07865)

* [Anatomically Constrained Implicit Face Models](https://arxiv.org/abs/2312.07538)

* [Face2Diffusion for Fast and Editable Face Personalization](http://arxiv.org/abs/2403.05094v1)
:star:[code](https://mapooon.github.io/Face2DiffusionPage/)
:star:[code](https://github.com/mapooon/Face2Diffusion)

* [Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection](http://arxiv.org/abs/2405.09882)
:star:[code](https://github.com/HansSunY/DiffAM)

* [Self-Supervised Facial Representation Learning with Facial Region Awareness](https://arxiv.org/abs/2403.02138)

* [Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction](https://openaccess.thecvf.com/content/CVPR2024/papers/Kuang_Facial_Identity_Anonymization_via_Intrinsic_and_Extrinsic_Attention_Distraction_CVPR_2024_paper.pdf)

* [VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment](http://arxiv.org/abs/2312.04651)

* 人脸编辑

  * [StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_StrokeFaceNeRF_Stroke-based_Facial_Appearance_Editing_in_Neural_Radiance_Field_CVPR_2024_paper.pdf)

  * [In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing](https://arxiv.org/abs/2302.04871)
:house:[project](https://in-n-out-3d.github.io/)

* 人脸表情

  * [Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Learning_Adaptive_Spatial_Coherent_Correlations_for_Speech-Preserving_Facial_Expression_Manipulation_CVPR_2024_paper.pdf)

  * [3D Facial Expressions through Analysis-by-Neural-Synthesis](https://arxiv.org/abs/2404.04104)
:house:[project](https://georgeretsi.github.io/smirk/)

* 人脸识别

  * [OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition](http://arxiv.org/abs/2402.18786v1)(抑郁症识别)

  * [Privacy-Preserving Face Recognition Using Trainable Feature Subtraction](http://arxiv.org/abs/2403.12457v1)
:star:[code](https://github.com/Tencent/TFace)

  * [KeyPoint Relative Position Encoding for Face Recognition](http://arxiv.org/abs/2403.14852v1)

  * [LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition](http://arxiv.org/abs/2403.08161v1)

  * [Validating Privacy-Preserving Face Recognition under a Minimum Assumption](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Validating_Privacy-Preserving_Face_Recognition_under_a_Minimum_Assumption_CVPR_2024_paper.pdf)

* 人脸合成

  * [Deformable One-shot Face Stylization via DINO Semantic Guidance](http://arxiv.org/abs/2403.00459v1)
:star:[code](https://github.com/zichongc/DoesFS)

  * [Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation](https://arxiv.org/abs/2401.01207)

  * [DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation](http://arxiv.org/abs/2403.19235)

  * [LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example](https://arxiv.org/abs/2403.15227)
:star:[code](https://github.com/kwanyun/LeGO_code)
:house:[project](https://kwanyun.github.io/lego/)

  * [Text-Guided 3D Face Synthesis - From Generation to Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Text-Guided_3D_Face_Synthesis_-_From_Generation_to_Editing_CVPR_2024_paper.pdf)

  * [Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_Text-conditional_Attribute_Alignment_across_Latent_Spaces_for_3D_Controllable_Face_CVPR_2024_paper.pdf)

  * [UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_UV-IDM_Identity-Conditioned_Latent_Diffusion_Model_for_Face_UV-Texture_Generation_CVPR_2024_paper.pdf)

  * [Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation](https://arxiv.org/abs/2405.04356)

* 人脸重建

  * [High-Quality Facial Geometry and Appearance Capture at Home](https://arxiv.org/abs/2312.03442)
:star:[code](https://github.com/yxuhan/CoRA)
:house:[project](https://yxuhan.github.io/CoRA/index.html)

  * [Monocular Identity-Conditioned Facial Reflectance Reconstruction](http://arxiv.org/abs/2404.00301v1)
:star:[code](https://xingyuren.github.io/id2reflectance/)
:thumbsup:[三维数字人重建、编辑与驱动](https://valser.org/webinar/slide/slides/20240403/Valse20240403%E6%99%8F%E8%BD%B6%E8%B6%85.pdf)

  * [3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation](https://arxiv.org/abs/2312.00311)
:star:[code](https://github.com/wang-zidu/3DDFA-V3)

  * [3D-Aware Face Editing via Warping-Guided Latent Direction Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Cheng_3D-Aware_Face_Editing_via_Warping-Guided_Latent_Direction_Learning_CVPR_2024_paper.pdf)
:house:[project](https://cyh-sj.github.io/FaceEdit3D/)
:thumbsup:[三维数字人重建、编辑与驱动](https://valser.org/webinar/slide/slides/20240403/Valse20240403%E6%99%8F%E8%BD%B6%E8%B6%85.pdf)

* 人脸修饰

  * [VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Xue_VRetouchEr_Learning_Cross-frame_Feature_Interdependence_with_Imperfection_Flow_for_Face_CVPR_2024_paper.pdf)

* 人脸重现

  * [Pose Adapted Shape Learning for Large-Pose Face Reenactment](https://openaccess.thecvf.com/content/CVPR2024/papers/Hsu_Pose_Adapted_Shape_Learning_for_Large-Pose_Face_Reenactment_CVPR_2024_paper.pdf)

  * [FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features](http://arxiv.org/abs/2404.09736)

* 人脸恢复

  * [Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_Learning_Degradation-unaware_Representation_with_Prior-based_Latent_Transformations_for_Blind_Face_CVPR_2024_paper.pdf)

  * [PFStorer: Personalized Face Restoration and Super-Resolution](https://arxiv.org/abs/2403.08436)

  * [WaveFace: Authentic Face Restoration with Efficient Frequency Recovery](http://arxiv.org/abs/2403.12760)

* 人脸去识别

  * [Privacy-preserving Optics for Enhancing Protection in Face De-identification](https://arxiv.org/abs/2404.00777)
:house:[project](https://carloshinojosa.me/project/privacy-face-deid/)

* 人脸化妆

  * [Makeup Prior Models for 3D Facial Makeup Estimation and Applications](http://arxiv.org/abs/2403.17761v1)
:star:[code](https://yangxingchao.github.io/makeup-priors-page)

* 人脸关键点

  * [Generalizable Face Landmarking Guided by Conditional Face Warping](http://arxiv.org/abs/2404.12322v1)
:star:[code](https://github.com/plustwo0/generalized-face-landmarker)
:house:[project](https://plustwo0.github.io/project-face-landmarker/)

   * [FaceLift: Semi-supervised 3D Facial Landmark Localization](https://arxiv.org/abs/2405.19646)

* 人脸属性分类

  * [Distributionally Generative Augmentation for Fair Facial Attribute Classification](http://arxiv.org/abs/2403.06606v1)
:star:[code](https://github.com/heqianpei/DiGA)

* 人脸活体检测

  * [One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_One-Class_Face_Anti-spoofing_via_Spoof_Cue_Map-Guided_Feature_Learning_CVPR_2024_paper.pdf)

  * [Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_Rethinking_Generalizable_Face_Anti-spoofing_via_Hierarchical_Prototype-guided_Distribution_Refinement_in_CVPR_2024_paper.pdf)

  * [CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_CFPL-FAS_Class_Free_Prompt_Learning_for_Generalizable_Face_Anti-spoofing_CVPR_2024_paper.pdf)

  * [Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing](http://arxiv.org/abs/2402.19298v1)
:star:[code](https://github.com/OMGGGGG/mmdg)

  * [Gradient Alignment for Cross-Domain Face Anti-Spoofing](http://arxiv.org/abs/2402.18817v1)
:star:[code](https://github.com/leminhbinh0209/CVPR24-FAS)

  * [Test-Time Domain Generalization for Face Anti-Spoofing](http://arxiv.org/abs/2403.19334v1)

  * [Gradient Alignment for Cross-domain Face Anti-Spoofing](http://arxiv.org/abs/2402.18817)
:star:[code](https://github.com/Leminhbinh0209/CVPR24-FAS)

* 人脸动作单元

  * [Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition](https://arxiv.org/abs/2404.06443)

  * [AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement](http://arxiv.org/abs/2404.05063)

* 人脸图像质量

  * [CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration](https://openaccess.thecvf.com/content/CVPR2024/papers/Ou_CLIB-FIQA_Face_Image_Quality_Assessment_with_Confidence_Calibration_CVPR_2024_paper.pdf)

  * [DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_DSL-FIQA_Assessing_Facial_Image_Quality_via_Dual-Set_Degradation_Learning_and_CVPR_2024_paper.pdf)

* 肖像编辑

  * [Control4D: Efficient 4D Portrait Editing with Text](https://arxiv.org/abs/2305.20082)
:house:[project](https://control4darxiv.github.io/)

* 头发重建

  * [Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-Training via Differentiable Rendering of Line Segments](https://openaccess.thecvf.com/content/CVPR2024/papers/Takimoto_Dr.Hair_Reconstructing_Scalp-Connected_Hair_Strands_without_Pre-Training_via_Differentiable_Rendering_CVPR_2024_paper.pdf)

* 三维人脸

  * [3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow](http://arxiv.org/abs/2404.09819v1)

  * [FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance](https://arxiv.org/abs/2406.02074)

  * [Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications](http://arxiv.org/abs/2311.18168)

* 4D 头像合成

  * [Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data](https://arxiv.org/abs/2311.18729)
:star:[code](https://github.com/YuDeng/Portrait-4D)
:house:[project](https://yudeng.github.io/Portrait4D/)

* 头像重建

  * [HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images](https://arxiv.org/abs/2311.15672)
:house:[project](https://seanchenxy.github.io/HaveFunWeb/)

  * [MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading](https://arxiv.org/abs/2312.13091)
:house:[project](https://ubisoft-laforge.github.io/character/mosar/)

  * [Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation](https://arxiv.org/abs/2401.04728)
:house:[project](https://xiyichen.github.io/morphablediffusion/)

* 说话头合成

  * [Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis](http://arxiv.org/abs/2402.17364v1)

  * [Faces that Speak: Jointly Synthesising Talking Face and Speech from Text](https://arxiv.org/abs/2405.10272)

  * [CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation](http://arxiv.org/abs/2403.00274v1)

  * [SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis](https://arxiv.org/abs/2311.17590)
:star:[code](https://github.com/ziqiaopeng/SyncTalk)
:house:[project](https://ziqiaopeng.github.io/synctalk/)

  * [FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio](https://arxiv.org/abs/2403.01901)
:star:[code](https://github.com/modelscope/facechain)

  * [FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models](https://arxiv.org/abs/2312.08459)
:star:[code](https://github.com/shivangi-aneja/FaceTalk)
:house:[project](https://shivangi-aneja.github.io/projects/facetalk/)

  * [FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization](https://arxiv.org/abs/2403.06375)

* 防御人脸编辑滥用

  * [IDGuard: Robust General Identity-centric POI Proactive Defense Against Face Editing Abuse](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_IDGuard_Robust_General_Identity-centric_POI_Proactive_Defense_Against_Face_Editing_CVPR_2024_paper.pdf)

* 3D 头像

  * [3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Men_3DToonify_Creating_Your_High-Fidelity_3D_Stylized_Avatar_Easily_from_2D_CVPR_2024_paper.pdf)

  * [DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models](https://arxiv.org/abs/2304.00916)
:house:[project](https://yukangcao.github.io/DreamAvatar/)3

* 化妆迁移

  * [Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth](https://arxiv.org/abs/2405.17240)

* 人脸重识别

  * [Privacy-Preserving Optics for Enhancing Protection in Face De-Identification](http://arxiv.org/abs/2404.00777)

* 年龄估计

  * [A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark](https://openaccess.thecvf.com/content/CVPR2024/papers/Paplham_A_Call_to_Reflect_on_Evaluation_Practices_for_Age_Estimation_CVPR_2024_paper.pdf)

* 情绪识别

  * [Robust Emotion Recognition in Context Debiasing](http://arxiv.org/abs/2403.05963)



## 8.GAN/Image Synthesis(图像生成)

* [L-MAGIC: Language Model Assisted Generation of Images with Coherence](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_L-MAGIC_Language_Model_Assisted_Generation_of_Images_with_Coherence_CVPR_2024_paper.pdf)

* [CapsFusion: Rethinking Image-Text Data at Scale](http://arxiv.org/abs/2310.20550)

* [C3Net: Compound Conditioned ControlNet for Multimodal Content Generation](http://arxiv.org/abs/2311.17951)

* [Scaling Laws of Synthetic Images for Model Training ... for Now](https://arxiv.org/abs/2312.04567)

* [An edit friendly ddpm noise space: inversion and manipulations](https://arxiv.org/abs/2304.06140)
:star:[code](https://github.com/inbarhub/DDPM_inversion)
:house:[project](https://inbarhub.github.io/DDPM_inversion)

* [CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_CoDi-2_In-Context_Interleaved_and_Interactive_Any-to-Any_Generation_CVPR_2024_paper.pdf)
:star:[code](https://github.com/microsoft/i-Code/tree/main/CoDi-2)
:house:[project](https://codi-2.github.io/)

* [CapHuman: Capture Your Moments in Parallel Universes](https://arxiv.org/abs/2402.00627)
:star:[code](https://github.com/VamosC/CapHuman)
:house:[project](https://caphuman.github.io/)

* [Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles](https://arxiv.org/abs/2312.11666)
:house:[project](https://haar.is.tue.mpg.de/)

* [IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation](https://arxiv.org/abs/2403.10701)

* [TexTile: A Differentiable Metric for Texture Tileability](http://arxiv.org/abs/2403.12961v1)
:house:[project](https://mslab.es/projects/TexTile/)

* [SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer](http://arxiv.org/abs/2403.17004v1)

* [PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding](https://arxiv.org/abs/2312.04461)
:star:[code](https://github.com/TencentARC/PhotoMaker)
:house:[project](https://photo-maker.github.io/)

* [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://arxiv.org/pdf/2311.17049.pdf)
:star:[code](https://github.com/apple/ml-mobileclip)

* [Text-Image Alignment for Diffusion-Based Perception](http://arxiv.org/abs/2310.00031)

* [AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error](https://arxiv.org/pdf/2401.17879.pdf)

* [FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation](http://arxiv.org/abs/2403.06775v1)
:star:[code](https://github.com/modelscope/facechain)

* [It's All About Your Sketch: Democratising Sketch Control in Diffusion Models](http://arxiv.org/abs/2403.07234v1)
:star:[code](https://github.com/subhadeepkoley/DemoSketch2RGB)

* [Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling](http://arxiv.org/abs/2403.10071v1)

* [ProMark: Proactive Diffusion Watermarking for Causal Attribution](http://arxiv.org/abs/2403.09914v1)

* [DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception](http://arxiv.org/abs/2403.13304v1)

* GAN

  * [StyLitGAN: Image-Based Relighting via Latent Control](https://openaccess.thecvf.com/content/CVPR2024/papers/Bhattad_StyLitGAN_Image-Based_Relighting_via_Latent_Control_CVPR_2024_paper.pdf)
:star:[code](https://github.com/anandbhattad/stylitgan)
:house:[project](https://anandbhattad.github.io/stylitgan/)

  * [StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN](http://arxiv.org/abs/2403.14186)

  * [What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs](https://arxiv.org/abs/2401.02411)
:house:[project](https://research.nvidia.com/labs/nxp/wysiwyg/)

  * [Diversity-aware Channel Pruning for StyleGAN Compression](http://arxiv.org/abs/2403.13548v1)
:star:[code](https://jiwoogit.github.io/DCP-GAN_site)

  * [Adversarial Score Distillation: When score distillation meets GAN](https://arxiv.org/abs/2312.00739)
:star:[code](https://github.com/2y7c3/asd)
:house:[project](https://2y7c3.github.io/ASD/asd.html)

* 扩散

  * [Fixed Point Diffusion Models](https://arxiv.org/abs/2401.08741)
:house:[project](https://lukemelas.github.io/fixed-point-diffusion-models)

  * [Diffusion Models Without Attention](http://arxiv.org/abs/2311.18257)

  * [Image Neural Field Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Image_Neural_Field_Diffusion_Models_CVPR_2024_paper.pdf)

  * [Functional Diffusion](https://arxiv.org/abs/2311.15435)
:house:[project](https://1zb.github.io/functional-diffusion/)

  * [Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models](http://arxiv.org/abs/2312.04410)

  * [Learned Representation-Guided Diffusion Models for Large-Image Generation](http://arxiv.org/abs/2312.07330)

  * [ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Kong_ACT-Diffusion_Efficient_Adversarial_Consistency_Training_for_One-step_Diffusion_Models_CVPR_2024_paper.pdf)

  * [Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model](http://arxiv.org/abs/2311.13231)

  * [LightIt: Illumination Modeling and Control for Diffusion Models](http://arxiv.org/abs/2403.10615)

  * [Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_Towards_More_Accurate_Diffusion_Model_Acceleration_with_A_Timestep_Tuner_CVPR_2024_paper.pdf)

  * [MMA-Diffusion: MultiModal Attack on Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_MMA-Diffusion_MultiModal_Attack_on_Diffusion_Models_CVPR_2024_paper.pdf)

  * [CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Gokaslan_CommonCanvas_Open_Diffusion_Models_Trained_on_Creative-Commons_Images_CVPR_2024_paper.pdf)

  * [Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?](https://arxiv.org/abs/2312.00084)

  * [Self-correcting LLM-controlled Diffusion Models](https://arxiv.org/abs/2311.16090)

  * [Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models](https://arxiv.org/abs/2405.05252)

  * [SODA: Bottleneck Diffusion Models for Representation Learning](http://arxiv.org/abs/2311.17901)

  * [PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models](http://arxiv.org/abs/2402.08714)

  * [Don't drop your samples! Coherence-aware training benefits Conditional diffusion](https://arxiv.org/abs/2405.20324)
:house:[project](https://nicolas-dufour.github.io/cad.html)

  * [Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Improving_Training_Efficiency_of_Diffusion_Models_via_Multi-Stage_Framework_and_CVPR_2024_paper.pdf)

  * [DiffLoc: Diffusion Model for Outdoor LiDAR Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_DiffLoc_Diffusion_Model_for_Outdoor_LiDAR_Localization_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [EasyDrag: Efficient Point-based Manipulation on Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Hou_EasyDrag_Efficient_Point-based_Manipulation_on_Diffusion_Models_CVPR_2024_paper.pdf)

  * [Distilling ODE Solvers of Diffusion Models into Smaller Steps](https://arxiv.org/abs/2309.16421)

  * [Cache Me if You Can: Accelerating Diffusion Models through Block Caching](https://arxiv.org/abs/2312.03209)
:house:[project](https://fwmb.github.io/blockcaching/)

  * [Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Beyond_Textual_Constraints_Learning_Novel_Diffusion_Conditions_with_Fewer_Examples_CVPR_2024_paper.pdf)

  * [AAMDM: Accelerated Auto-regressive Motion Diffusion Model](https://arxiv.org/abs/2401.06146)

  * [DeepCache: Accelerating Diffusion Models for Free](https://arxiv.org/abs/2312.00858)
:house:[project](https://horseee.github.io/Diffusion_DeepCache/)

  * [Diffusion Model Alignment Using Direct Preference Optimization](https://arxiv.org/abs/2311.12908)

  * [Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models](http://arxiv.org/abs/2404.15081)

  * [Analyzing and Improving the Training Dynamics of Diffusion Models](http://arxiv.org/abs/2312.02696)

  * [Residual Learning in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Residual_Learning_in_Diffusion_Models_CVPR_2024_paper.pdf)

  * [FreeU: Free Lunch in Diffusion U-Net](https://arxiv.org/abs/2309.11497)
:star:[code](https://github.com/ChenyangSi/FreeU)
:house:[project](https://chenyangsi.top/FreeU/)

  * [VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models](https://arxiv.org/abs/2401.09047)
:star:[code](https://github.com/AILab-CVC/VideoCrafter)
:house:[project](https://ailab-cvc.github.io/videocrafter)

  * [Diff-BGM: A Diffusion Model for Video Background Music Generation](https://arxiv.org/abs/2405.11913)视频背景音乐生成的扩散模型

  * [Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models](https://arxiv.org/abs/2312.10835v2)

  * [Shadow Generation for Composite Image Using Diffusion Model](https://arxiv.org/abs/2403.15234)
:star:[code](https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2)

  * [Alchemist: Parametric Control of Material Properties with Diffusion Models](https://arxiv.org/abs/2312.02970)

  * [Orthogonal Adaptation for Modular Customization of Diffusion Models](https://arxiv.org/abs/2312.02432)
:house:[project](https://ryanpo.com/ortha/)扩散模型

  * [Observation-Guided Diffusion Probabilistic Models](https://arxiv.org/abs/2310.04041)
:star:[code](https://github.com/Junoh-Kang/OGDM_edm)

  * [TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models](https://arxiv.org/abs/2311.16503)
:star:[code](https://github.com/ModelTC/TFMQ-DM)

  * [Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models](http://arxiv.org/abs/2311.17919)

  * [SPAD: Spatially Aware Multi-View Diffusers](https://openaccess.thecvf.com/content/CVPR2024/papers/Kant_SPAD_Spatially_Aware_Multi-View_Diffusers_CVPR_2024_paper.pdf)
:house:[project](https://yashkant.github.io/spad)

  * [Structure-Guided Adversarial Training of Diffusion Models](http://arxiv.org/abs/2402.17563v1)

  * [One-step Diffusion with Distribution Matching Distillation](https://arxiv.org/abs/2311.18828)
:house:[project](https://tianweiy.github.io/dmd/)

  * [Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance](http://arxiv.org/abs/2404.05384v1)
:star:[code](https://github.com/SmilesDZgk/S-CFG)

  * [Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models](https://arxiv.org/abs/2306.00973)
:star:[code](https://github.com/haoningwu3639/StoryGen)
:house:[project](https://haoningwu3639.github.io/StoryGen_Webpage/)

  * [X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model](https://arxiv.org/abs/2312.02238)
:house:[project](https://showlab.github.io/X-Adapter/)

  * [Readout Guidance: Learning Control from Diffusion Features](https://arxiv.org/abs/2312.02150)
:house:[project](https://readout-guidance.github.io/)

  * [PointInfinity: Resolution-Invariant Point Diffusion Models](https://arxiv.org/abs/2404.03566)
:house:[project](https://zixuanh.com/projects/pointinfinity)

  * [Unsupervised Keypoints from Pretrained Diffusion Models](https://arxiv.org/abs/2312.00065)
:star:[code](https://ubc-vision.github.io/StableKeypoints/)

  * [Amodal Completion via Progressive Mixed Context Diffusion](https://arxiv.org/pdf/2312.15540.pdf)
:star:[code](https://github.com/k8xu/amodal)
:house:[project](https://k8xu.github.io/amodal/)

  * [SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution](https://arxiv.org/abs/2312.11598)
:house:[project](https://skilldiffuser.github.io/)

  * [DREAM: Diffusion Rectification and Estimation-Adaptive Models](https://arxiv.org/abs/2312.00210)

  * [Towards Memorization-Free Diffusion Models](http://arxiv.org/abs/2404.00922v1)

  * [Efficient Dataset Distillation via Minimax Diffusion](https://arxiv.org/abs/2311.15529)
:star:[code](https://github.com/vimar-gu/MinimaxDiffusion)

  * [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408)
:star:[code](https://github.com/giuvecchio/matfuse-sd)
:house:[project](https://gvecchio.com/matfuse/)

  * [Accelerating Diffusion Sampling with Optimized Time Steps](http://arxiv.org/abs/2402.17376v1)

  * [Boosting Diffusion Models with Moving Average Sampling in Frequency Domain](http://arxiv.org/abs/2403.17870v1)

  * [One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications](https://arxiv.org/abs/2312.16145)
:star:[code](https://github.com/Con6924/SPM)
:house:[project](https://lyumengyao.github.io/projects/spm)

  * [Balancing Act: Distribution-Guided Debiasing in Diffusion Models](http://arxiv.org/abs/2402.18206v1)
:star:[code](https://ab-34.github.io/balancing_act/)

  * [Shadow Generation for Composite Image Using Diffusion model](http://arxiv.org/abs/2403.15234v1)
:star:[code](https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2)

  * [MACE: Mass Concept Erasure in Diffusion Models](http://arxiv.org/abs/2403.06135v1)
:star:[code](https://github.com/Shilin-LU/MACE)

  * [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](http://arxiv.org/abs/2402.19481v1)
:house:[project](https://hanlab.mit.edu/blog/distrifusion)
:house:[project](https://hanlab.mit.edu/projects/distrifusion)
:star:[code](https://github.com/mit-han-lab/distrifuser)

  * [Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models](http://arxiv.org/abs/2403.08381v1)
:star:[code](https://github.com/PangzeCheung/SingDiffusion)
:house:[project](https://pangzecheung.github.io/SingDiffusion/)

  * [DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations](http://arxiv.org/abs/2403.06951v1)
:star:[code](https://github.com/Tianhao-Qi/DEADiff_code)
:house:[project](https://tianhao-qi.github.io/DEADiff/)

  * [SVGDreamer: Text Guided SVG Generation with Diffusion Model](https://arxiv.org/abs/2312.16476)
:star:[code](https://github.com/ximinng/SVGDreamer)
:house:[project](https://ximinng.github.io/SVGDreamer-project/)
:thumbsup:[SVGDreamer: 北航&港大发布全新文本引导的矢量图形可微渲染方法](https://mp.weixin.qq.com/s/QEBiP-xLVvQVoV_9H2Id7g)

  * [Relation Rectification in Diffusion Model](https://arxiv.org/abs/2403.20249)
:star:[code](https://github.com/WUyinwei-hah/RRNet)
:house:[project](https://wuyinwei-hah.github.io/rrnet.github.io/)

* 图像合成/生成

  * 图像合成

    * [One-Shot Structure-Aware Stylized Image Synthesis](http://arxiv.org/abs/2402.17275v1)

    * [AnyScene: Customized Image Synthesis with Composited Foreground](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_AnyScene_Customized_Image_Synthesis_with_Composited_Foreground_CVPR_2024_paper.pdf)

    * [Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance](http://arxiv.org/abs/2405.01356)

    * [ViewFusion: Towards Multi-View Consistency via Interpolated Denoising](http://arxiv.org/abs/2402.18842v1)
:star:[code](https://wi-sc.github.io/ViewFusion.github.io/)

    * [PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis](http://arxiv.org/abs/2403.01852)

    * [Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks](http://arxiv.org/abs/2403.00644v1)

    * [Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis](https://arxiv.org/abs/2212.03185)

    * [Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Unmixing_Before_Fusion_A_Generalized_Paradigm_for_Multi-Source-based_Hyperspectral_Image_CVPR_2024_paper.pdf)

    * [Unlocking Pretrained Image Backbones for Semantic Image Synthesis]

    * [Unlocking Pre-trained Image Backbones for Semantic Image Synthesis](https://arxiv.org/abs/2312.13314)

    * [Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis](https://arxiv.org/abs/2406.05478)

    * 场景-文本图像合成

       * [TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Cui_TextNeRF_A_Novel_Scene-Text_Image_Synthesis_Method_based_on_Neural_CVPR_2024_paper.pdf)

  * 图像生成

    * [ElasticDiffusion: Training-free Arbitrary Size Image Generation]

    * [ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation](https://arxiv.org/abs/2311.18822)
:house:[project](https://elasticdiffusion.github.io/)

    * [SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation](http://arxiv.org/abs/2401.08053)

    * [AnyDoor: Zero-shot Object-level Image Customization](https://arxiv.org/abs/2307.09481)
:house:[project](https://damo-vilab.github.io/AnyDoor-Page/)

    * [Taming Stable Diffusion for Text to 360 Panorama Image Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Taming_Stable_Diffusion_for_Text_to_360_Panorama_Image_Generation_CVPR_2024_paper.pdf)
:star:[code](https://chengzhag.github.io/publication/panfusion)
:star:[code](https://github.com/chengzhag/PanFusion)

    * [Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations](https://arxiv.org/abs/2311.17938)

    * [Generative Image Dynamics](https://arxiv.org/abs/2309.07906)
:house:[project](https://generative-dynamics.github.io/)

    * [Clockwork Diffusion: Efficient Generation With Model-Step Distillation](https://arxiv.org/abs/2312.08128)

    * [UniGS: Unified Representation for Image Generation and Segmentation](https://arxiv.org/abs/2312.01985)
:star:[code](https://github.com/qqlu/Entity)图像生成

    * [Exact Fusion via Feature Distribution Matching for Few-shot Image Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Exact_Fusion_via_Feature_Distribution_Matching_for_Few-shot_Image_Generation_CVPR_2024_paper.pdf)

    * [FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition](https://arxiv.org/abs/2405.13870)
:star:[code](https://github.com/aim-uofa/FreeCustom/tree/main)
:house:[project](https://aim-uofa.github.io/FreeCustom/)

    * [Adversarial Text to Continuous Image Generation](https://openreview.net/forum?id=9X3UZJSGIg9)

    * [Style Aligned Image Generation via Shared Attention](https://arxiv.org/abs/2312.02133)
:house:[project](http://style-aligned-gen.github.io/)

    * [CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation](https://arxiv.org/abs/2310.01407)

    * [Instruct-Imagen: Image Generation with Multi-modal Instruction](https://arxiv.org/abs/2401.01952)

    * [InstanceDiffusion: Instance-level Control for Image Generation](https://arxiv.org/abs/2402.03290)
:star:[code](https://github.com/frank-xwang/InstanceDiffusion)
:house:[project](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/)

    * [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/pdf/2311.16973.pdf)
:star:[code](https://github.com/PRIS-CV/DemoFusion)
:house:[project](https://ruoyidu.github.io/demofusion/demofusion.html)

    * [ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models](http://arxiv.org/abs/2403.01807v1)
:star:[code](https://lukashoel.github.io/ViewDiff/)
:house:[project](https://www.youtube.com/watch?v=SdjoCqHzMMk)
:star:[code](https://github.com/facebookresearch/ViewDiff)

    * [When StyleGAN Meets Stable Diffusion:a W+ Adapter for Personalized Image Generation](https://arxiv.org/pdf/2311.17461.pdf)
:star:[code](https://github.com/csxmli2016/w-plus-adapter)
:house:[project](https://csxmli2016.github.io/projects/w-plus-adapter/)

    * [Correcting Diffusion Generation through Resampling](https://arxiv.org/pdf/2312.06038.pdf)
:star:[code](https://github.com/UCSB-NLP-Chang/diffusion_resampling.git)

    * [Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder](http://arxiv.org/abs/2403.10255v1)

    * [Condition-Aware Neural Network for Controlled Image Generation](http://arxiv.org/abs/2404.01143v1)

    * [A Unified and Interpretable Emotion Representation and Expression Generation](http://arxiv.org/abs/2404.01243v1)
:star:[code](https://emotion-diffusion.github.io)

    * [Rethinking FID: Towards a Better Evaluation Metric for Image Generation](http://arxiv.org/abs/2401.09603)

    * 主题驱动的图像生成

      * [SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation](https://arxiv.org/abs/2312.16272)
:star:[code](https://github.com/Xiaojiu-z/SSR_Encoder)
:house:[project](https://ssr-encoder.github.io/)

  * 文本-图像

    * [Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Ohanyan_Zero-Painter_Training-Free_Layout_Control_for_Text-to-Image_Synthesis_CVPR_2024_paper.pdf)

    * [Learning Multi-Dimensional Human Preference for Text-to-Image Generation](http://arxiv.org/abs/2405.14705)

    * [Customization Assistant for Text-to-Image Generation](http://arxiv.org/abs/2312.03045)

    * [TokenCompose: Text-to-Image Diffusion with Token-level Supervision](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_TokenCompose_Text-to-Image_Diffusion_with_Token-level_Supervision_CVPR_2024_paper.pdf)

    * [FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition](http://arxiv.org/abs/2312.07536)

    * [Personalized Residuals for Concept-Driven Text-to-Image Generation](https://arxiv.org/abs/2405.12978)
:house:[project](https://cusuh.github.io/personalized-residuals)

    * [Rich Human Feedback for Text-to-Image Generation](http://arxiv.org/abs/2312.10240)

    * [MarkovGen: Structured Prediction for Efficient Text-to-Image Generation](http://arxiv.org/abs/2308.10997)

    * [Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models](http://arxiv.org/abs/2308.15692)

    * [Customization Assistant for Text-to-image Generation](https://arxiv.org/abs/2312.03045)

    * [SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation](http://arxiv.org/abs/2312.05239)

    * [JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zeng_JeDi_Joint-Image_Diffusion_Models_for_Finetuning-Free_Personalized_Text-to-Image_Generation_CVPR_2024_paper.pdf)

    * [MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis](https://arxiv.org/abs/2402.05408)
:house:[project](https://migcproject.github.io/)

    * [Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation](https://arxiv.org/abs/2311.15773)
:house:[project](https://simm-t2i.github.io/SimM)

    * [Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models](http://arxiv.org/abs/2311.16117)

    * [DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization](https://arxiv.org/abs/2402.09812)
:house:[project](https://ku-cvlab.github.io/DreamMatcher/)

    * [UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs](https://arxiv.org/abs/2311.09257)

    * [Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models](http://arxiv.org/abs/2312.12416)

    * [Countering Personalized Text-to-Image Generation with Influence Watermarks](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Countering_Personalized_Text-to-Image_Generation_with_Influence_Watermarks_CVPR_2024_paper.pdf)

    * [Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following](https://arxiv.org/abs/2311.17002)
:house:[project](https://ranni-t2i.github.io/Ranni)

    * [Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting](https://arxiv.org/abs/2310.08129)
:star:[code](https://github.com/zzjchen/Tailored-Visions)

    * [InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning](https://arxiv.org/abs/2304.03411)

    * [FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Cazenavette_FakeInversion_Learning_to_Detect_Images_from_Unseen_Text-to-Image_Models_by_CVPR_2024_paper.pdf)

    * [Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation](https://arxiv.org/abs/2311.15841)
:house:[project](https://adi-t2i.github.io/ADI)

    * [LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model](https://arxiv.org/abs/2305.11577)
:star:[code](https://github.com/ewrfcas/LeftRefill)
:house:[project](https://ewrfcas.github.io/LeftRefill)

    * [HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models](https://arxiv.org/abs/2307.06949)
:house:[project](https://hyperdreambooth.github.io/)

    * [PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models](https://arxiv.org/abs/2312.13964)
:house:[project](https://pi-animator.github.io/)

    * [On the Scalability of Diffusion-based Text-to-Image Generation](https://arxiv.org/abs/2404.02883)

    * [Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation](https://arxiv.org/abs/2311.17216)
:house:[project](https://interpretdiffusion.github.io/)

    * [EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models](https://arxiv.org/abs/2401.04608)
:star:[code](https://github.com/JingyuanYY/EmoGen)

    * [Grounded Text-to-Image Synthesis with Attention Refocusing](https://arxiv.org/abs/2306.05427)
:house:[project](https://attention-refocusing.github.io/)

    * [OpenBias: Open-set Bias Detection in Text-to-Image Generative Models](http://arxiv.org/abs/2404.07990v1)
:star:[code](https://github.com/Picsart-AI-Research/OpenBias)

    * [Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models](https://arxiv.org/abs/2305.16223)
:star:[code](https://github.com/SHI-Labs/Prompt-Free-Diffusion)

    * [CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models](https://arxiv.org/abs/2312.06059)

    * [InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization](http://arxiv.org/abs/2404.04650v1)
:star:[code](https://github.com/xiefan-guo/initno)

    * [Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models](http://arxiv.org/abs/2404.03913v1)

    * [Cross Initialization for Face Personalization of Text-to-Image Models](https://arxiv.org/abs/2312.15905)文本到图像Cross Initialization for Personalized Text-to-Image Generation

    * [CosmicMan: A Text-to-Image Foundation Model for Humans](http://arxiv.org/abs/2404.01294v1)
:star:[code](https://cosmicman-cvpr2024.github.io)

    * [Dynamic Prompt Optimizing for Text-to-Image Generation](http://arxiv.org/abs/2404.04095v1)
:star:[code](https://github.com/Mowenyii/PAE)

    * [WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models](https://arxiv.org/abs/2306.04744)

    * [Attention Calibration for Disentangled Text-to-Image Personalization](http://arxiv.org/abs/2403.18551v1)

    * [RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization](http://arxiv.org/abs/2403.00483v1)
:star:[code](https://corleone-huang.github.io/realcustom/)

    * [InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models](https://arxiv.org/abs/2312.05849)
:star:[code](https://github.com/jiuntian/interactdiffusion)
:house:[project](https://jiuntian.github.io/interactdiffusion/)

    * [Learning Continuous 3D Words for Text-to-Image Generation](https://ttchengab.github.io/continuous_3d_words/c3d_words.pdf)
:star:[code](https://github.com/ttchengab/continuous_3d_words_code/)
:house:[project](https://ttchengab.github.io/continuous_3d_words/)

    * [NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging](http://arxiv.org/abs/2403.03485v1)
:star:[code](https://github.com/univ-esuty/noisecollage)

    * [HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances](https://arxiv.org/abs/2403.01693)
:house:[project](https://supreethn.github.io/research/handiffuser/index.html)

    * [Discriminative Probing and Tuning for Text-to-Image Generation](http://arxiv.org/abs/2403.04321v1)
:star:[code](https://github.com/LgQu/DPT-T2I)
:house:[project](https://dpt-t2i.github.io/)

    * [Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization](http://arxiv.org/abs/2403.15330v1)

    * [ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations](https://arxiv.org/abs/2312.04655)
:star:[code](https://github.com/eclipse-t2i/eclipse-inference)
:house:[project](https://eclipse-t2i.vercel.app/)

    * [FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models](http://arxiv.org/abs/2403.16379v1)
:star:[code](https://github.com/thu-nics/FlashEval)

    * [MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning](http://arxiv.org/abs/2311.13127)

  * 主题-图像

    * [High-fidelity Person-centric Subject-to-Image Synthesis](https://arxiv.org/abs/2311.10329)
:star:[code](https://github.com/CodeGoat24/Face-diffuser)

    * [High Fidelity Person-centric Subject-to-Image Synthesis] 

* 视频合成/生成

  * 视频生成

    * [InstructVideo: Instructing Video Diffusion Models with Human Feedback](https://arxiv.org/abs/2312.12490)
:star:[code](https://github.com/damo-vilab/i2vgen-xl/blob/main/doc/InstructVideo.md)
:house:[project](https://instructvideo.github.io/)

    * [Make Pixels Dance: High-Dynamic Video Generation](http://arxiv.org/abs/2311.10982)

    * [GenTron: Diffusion Transformers for Image and Video Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_GenTron_Diffusion_Transformers_for_Image_and_Video_Generation_CVPR_2024_paper.pdf)

    * [Panacea: Panoramic and Controllable Video Generation for Autonomous Driving](http://arxiv.org/abs/2311.16813)

    * [Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models](https://arxiv.org/abs/2312.01409)
:house:[project](https://primecai.github.io/generative_rendering/)

    * [DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_DiffPerformer_Iterative_Learning_of_Consistent_Latent_Guidance_for_Diffusion-based_Human_CVPR_2024_paper.pdf)

    * [VideoBooth: Diffusion-based Video Generation with Image Prompts](https://arxiv.org/abs/2312.00777)
:house:[project](https://vchitect.github.io/VideoBooth-project/)

    * [Hierarchical Patch Diffusion Models for High-Resolution Video Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Skorokhodov_Hierarchical_Patch_Diffusion_Models_for_High-Resolution_Video_Generation_CVPR_2024_paper.pdf)
:star:[code](https://github.com/snap-research/hpdm)

    * [On the Content Bias in Fréchet Video Distance](http://arxiv.org/abs/2404.12391v1)
:star:[code](https://content-debiased-fvd.github.io/)

    * [360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model](https://arxiv.org/abs/2401.06578)

    * [SimDA: Simple Diffusion Adapter for Efficient Video Generation](https://arxiv.org/abs/2308.09710)
:star:[code](https://github.com/ChenHsing/SimDA)
:house:[project](https://chenhsing.github.io/SimDA/)

    * [GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos](https://arxiv.org/abs/2312.07322)视频生成

    * [FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](http://arxiv.org/abs/2403.12962v1)
:house:[project](https://www.mmlab-ntu.com/project/fresco/)
:star:[code](https://github.com/williamyang1991/FRESCO)

    * [Vlogger: Make Your Dream A Vlog](https://arxiv.org/abs/2401.09414)
:star:[code](https://github.com/zhuangshaobin/Vlogger)
:house:[project](https://zhuangshaobin.github.io/Vlogger.github.io/)

    * [LAMP: Learn A Motion Pattern for Few-Shot Video Generation](https://arxiv.org/abs/2310.10769)
:house:[project](https://rq-wu.github.io/projects/LAMP)

    * [EvalCrafter: Benchmarking and Evaluating Large Video Generation Models](https://arxiv.org/abs/2310.11440)
:house:[project](https://evalcrafter.github.io/)

    * [Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model](http://arxiv.org/abs/2404.01862v1)
:star:[code](https://github.com/thuhcsi/S2G-MDDiffusion)

    * [BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models](https://arxiv.org/abs/2312.02813)
:star:[code](https://github.com/MCG-NJU/BIVDiff)
:house:[project](https://bivdiff.github.io/)视频合成

    * [DreamVideo: Composing Your Dream Videos with Customized Subject and Motion](https://arxiv.org/abs/2312.04433)
:house:[project](https://dreamvideo-t2v.github.io/)

    * [PEEKABOO: Interactive Video Generation via Masked-Diffusion](https://arxiv.org/abs/2312.07509)
:house:[project](https://jinga-lala.github.io/projects/Peekaboo/)

  * 文本-视频

    * [Grid Diffusion Models for Text-to-Video Generation](http://arxiv.org/abs/2404.00234v1)

    * [Breathing Life Into Sketches Using Text-to-Video Priors](https://arxiv.org/abs/2311.13608)
:star:[code](https://github.com/yael-vinker/live_sketch)
:house:[project](https://livesketch.github.io/)

    * [Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs](https://arxiv.org/abs/2308.13812)

    * [TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models](https://export.arxiv.org/abs/2404.16306)
:house:[project](https://merl.com/demos/TI2V-Zero)

    * [Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation](http://arxiv.org/abs/2311.17117)

    * [Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation](https://arxiv.org/abs/2312.04483)
:house:[project](https://higen-t2v.github.io/)

    * [A Recipe for Scaling up Text-to-Video Generation with Text-free Videos](https://arxiv.org/abs/2312.15770)
:house:[project](https://tf-t2v.github.io/)

    * [TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models](http://arxiv.org/abs/2403.17005v1)
:star:[code](https://trip-i2v.github.io/TRIP/)

    * [Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis](https://arxiv.org/abs/2402.14797)
:house:[project](https://snap-research.github.io/snapvideo/)

    * [VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models](https://arxiv.org/abs/2312.00845)
:house:[project](https://video-motion-customization.github.io/)

    * [MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation](http://arxiv.org/abs/2311.18829)

  * 图像-视频

    * [Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion](https://arxiv.org/abs/2403.15194)

  * 视频-视频

    * [Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis](https://arxiv.org/abs/2312.13834)
:house:[project](https://fairy-video2video.github.io/)

    * [FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis](http://arxiv.org/abs/2312.17681)

* 纹理生成/合成

  * 文本-纹理合成

    * [Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering](https://arxiv.org/abs/2312.11360)
:house:[project](https://kim-youwang.github.io/paint-it)

  * 纹理合成

    * [SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors](https://arxiv.org/abs/2311.17261)
:house:[project](https://daveredrum.github.io/SceneTex/)

    * [Single Mesh Diffusion Models with Field Latents for Texture Generation](https://arxiv.org/abs/2312.09250)

    * [TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion](http://arxiv.org/abs/2401.09416)

* 文本-3D

  * [DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data](https://arxiv.org/abs/2406.04322)
:star:[code](https://github.com/qihao067/direct3d)
:house:[project](https://direct-3d.github.io/)

  * [PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion](http://arxiv.org/abs/2312.09069)

  * [Text-to-3D using Gaussian Splatting](https://arxiv.org/abs/2309.16585)
:star:[code](https://github.com/gsgen3d/gsgen)
:house:[project](https://gsgen3d.github.io/)

  * [DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling](http://arxiv.org/abs/2311.17082)

  * [RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D](http://arxiv.org/abs/2311.16918)

  * [Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior](http://arxiv.org/abs/2401.09050)

  * [Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior](http://arxiv.org/abs/2403.09140v1)
:star:[code](https://stellarcheng.github.io/Sculpt3D/)

  * [LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching](https://arxiv.org/abs/2311.11284)
:star:[code](https://github.com/EnVision-Research/LucidDreamer)

  * [Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion](https://arxiv.org/abs/2311.15980)
:house:[project](https://nju-3dv.github.io/projects/direct25)

  * [Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior](https://arxiv.org/abs/2312.06655)
:house:[project](https://liuff19.github.io/Sherpa3D/)文本到 3D 

  * [Taming Mode Collapse in Score Distillation for Text-to-3D Generation](https://arxiv.org/abs/2401.00909)
:house:[project](https://vita-group.github.io/3D-Mode-Collapse/)

  * [Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors](https://arxiv.org/abs/2312.04963)
:house:[project](https://bidiff.github.io/)

  * [DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior](https://arxiv.org/abs/2312.06439)
:star:[code](https://github.com/tyhuang0428/DreamControl)

  * [VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation](http://arxiv.org/abs/2403.17001v1)
:star:[code](https://vp3d-cvpr24.github.io)

  * [GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation](https://arxiv.org/abs/2401.04092)
:star:[code](https://github.com/3DTopia/GPTEval3D)
:house:[project](https://gpteval3d.github.io/)

  * [Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences](http://arxiv.org/abs/2404.10603v1)

  * [DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors](https://arxiv.org/abs/2312.16837)
:house:[project](https://younglbw.github.io/DiffusionGAN3D-homepage/)

  * [HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation](https://arxiv.org/abs/2403.00372)

  * [Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D](https://arxiv.org/abs/2312.02190)
:house:[project](https://diffusionhandles.github.io/)

  * [HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D](https://arxiv.org/abs/2312.15980)
:house:[project](https://byeongjun-park.github.io/HarmonyView/)

* 图像-3D

  * [Diffusion Time-step Curriculum for One Image to 3D Generation](http://arxiv.org/abs/2404.04562v1)
:star:[code](https://github.com/yxymessi/DTC123)

  * [MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_MPOD123_One_Image_to_3D_Content_Generation_Using_Mask-enhanced_Progressive_CVPR_2024_paper.pdf)

  * [Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing](https://arxiv.org/abs/2402.17464)

* 文本-4D

  * [4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling](https://arxiv.org/abs/2311.17984)
:house:[project](https://sherwinbahmani.github.io/4dfy)

  * [Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models](https://arxiv.org/abs/2312.13763)
:house:[project](https://research.nvidia.com/labs/toronto-ai/AlignYourGaussians/)

* 3D生成

  * [DreamComposer: Controllable 3D Object Generation via Multi-View Conditions](https://arxiv.org/abs/2312.03611)
:house:[project](https://yhyang-myron.github.io/DreamComposer/)

  * [XCube (X3): Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies](https://arxiv.org/abs/2312.03806)
:house:[project](https://research.nvidia.com/labs/toronto-ai/xcube/)

  * [CAD: Photorealistic 3D Generation via Adversarial Distillation](https://arxiv.org/abs/2312.06663)
:star:[code](https://github.com/raywzy/CAD)
:house:[project](http://raywzy.com/CAD/)

  * [Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models](https://arxiv.org/abs/2312.13913)
:star:[code](https://github.com/OpenTexture/Paint3D)3D 内容

  * [Interactive3D: Create What You Want by Interactive 3D Generation](http://arxiv.org/abs/2404.16510)

* 语义场景生成

  * [SemCity: Semantic Scene Generation with Triplane Diffusion](http://arxiv.org/abs/2403.07773v1)
:star:[code](https://github.com/zoomin-lee/SemCity)

  * [Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion](https://openaccess.thecvf.com/content/CVPR2024/papers/Xue_Bi-SSC_Geometric-Semantic_Bidirectional_Fusion_for_Camera-based_3D_Semantic_Scene_Completion_CVPR_2024_paper.pdf)

* 场景补全

  * [Unleashing Network Potentials for Semantic Scene Completion](http://arxiv.org/abs/2403.07560v1)

  * [Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation](http://arxiv.org/abs/2404.11958v1)
:star:[code](https://github.com/songw-zju/HASSC)

  * [Symphonize 3D Semantic Scene Completion with Contextual Instance Queries](https://arxiv.org/abs/2306.15670)
:star:[code](https://github.com/hustvl/Symphonies)3D 语义

  * [PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness](http://arxiv.org/abs/2312.02158)

  * [Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion](https://arxiv.org/abs/2403.13470)

* 图像-图像翻译

  * [StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation](http://arxiv.org/abs/2403.20142)

* 图像检测

  * [LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection](http://arxiv.org/abs/2403.17465v1)

* 图像编辑

  * [Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation](https://arxiv.org/abs/2312.10113)
:star:[code](https://github.com/guoqincode/Focus-on-Your-Instruction)

  * [Emu Edit: Precise Image Editing via Recognition and Generation Tasks](https://arxiv.org/abs/2311.10089)

  * [An Edit Friendly DDPM Noise Space: Inversion and Manipulations](http://arxiv.org/abs/2304.06140)

  * [Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Lo_Distraction_is_All_You_Need_Memory-Efficient_Image_Immunization_against_Diffusion-Based_CVPR_2024_paper.pdf)

  * [DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing](https://arxiv.org/abs/2402.02583)
:star:[code](https://github.com/MC-E/DragonDiffusion)

  * [DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing](https://arxiv.org/pdf/2312.07409)
:star:[code](https://github.com/Kevin-thu/DiffMorpher)
:house:[project](https://kevin-thu.github.io/DiffMorpher_page/)

  * [UniHuman: A Unified Model For Editing Human Images in the Wild](http://arxiv.org/abs/2312.14985)
:star:[code](https://github.com/NannanLi999/UniHuman)

  * [Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing](https://arxiv.org/abs/2311.18608)
:house:[project](https://hyelinnam.github.io/CDS/)

  * [Inversion-Free Image Editing with Language-Guided Diffusion Models](https://arxiv.org/abs/2312.04965)
:star:[code](https://github.com/sled-group/InfEdit)
:house:[project](https://sled-group.github.io/InfEdit/)

  * [TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing](http://arxiv.org/abs/2404.11120v1)
:star:[code](https://github.com/SherryXTChen/TiNO-Edit)

  * [Edit One for All: Interactive Batch Image Editing](https://arxiv.org/abs/2401.10219)
:star:[code](https://github.com/thaoshibe/edit-one-for-all)
:house:[project](https://thaoshibe.github.io/edit-one-for-all/)

  * [SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing](https://arxiv.org/abs/2312.11392)
:house:[project](https://scedit.github.io/)

  * [On Exact Inversion of DPM-Solvers](https://arxiv.org/abs/2311.18387)
:star:[code](https://github.com/smhongok/inv-dpm)
:house:[project](https://smhongok.github.io/inv-dpm.html)

  * [Doubly Abductive Counterfactual Inference for Text-based Image Editing](https://arxiv.org/abs/2403.02981)
:star:[code](https://github.com/xuesong39/DAC)基于文本的图像编辑

  * [Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing](https://arxiv.org/abs/2403.03431)

  * [ZONE: Zero-Shot Instruction-Guided Local Editing](https://arxiv.org/abs/2312.16794)

  * [HIVE: Harnessing Human Feedback for Instructional Visual Editing](https://arxiv.org/abs/2303.09618)

  * [FreeDrag: Feature Dragging for Reliable Point-based Image Editing](https://arxiv.org/abs/2307.04684)

  * [The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Bobkov_The_Devil_is_in_the_Details_StyleFeatureEditor_for_Detail-Rich_StyleGAN_CVPR_2024_paper.pdf)

  * [DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing](https://arxiv.org/pdf/2306.14435.pdf)
:star:[code](https://github.com/Yujun-Shi/DragDiffusion)

  * [Text-Driven Image Editing via Learnable Regions](https://arxiv.org/pdf/2311.16432.pdf)
:star:[code](https://github.com/yuanze-lin/Learnable_Regions)
:house:[project](https://yuanze-lin.me/LearnableRegions_page/)

  * [LEDITS++: Limitless Image Editing using Text-to-Image Models](https://arxiv.org/pdf/2311.16711.pdf)
:star:[code](https://huggingface.co/spaces/editing-images/ledtisplusplus/tree/main)
:house:[project](https://leditsplusplus-project.static.hf.space/index.html)

  * [SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models](http://arxiv.org/abs/2312.06739)
:star:[code](https://github.com/TencentARC/SmartEdit)
:house:[project](https://yuzhou914.github.io/SmartEdit/)

  * [Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Person_in_Place_Generating_Associative_Skeleton-Guidance_Maps_for_Human-Object_Interaction_CVPR_2024_paper.pdf)
:star:[code](https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE?tab=readme-ov-file)
:house:[project](https://yangchanghee.github.io/Person-in-Place_page/)

  * [PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor](https://arxiv.org/abs/2303.17546)
:star:[code](https://github.com/Picsart-AI-Research/PAIR-Diffusion)
:house:[project](https://vidit98.github.io/publication/conference-paper/pair_diff.html)

  * [Referring Image Editing: Object-level Image Editing via Referring Expressions](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Referring_Image_Editing_Object-level_Image_Editing_via_Referring_Expressions_CVPR_2024_paper.pdf)

  * [Prompt Augmentation for Self-supervised Text-guided Image Manipulation](https://openaccess.thecvf.com/content/CVPR2024/papers/Bodur_Prompt_Augmentation_for_Self-supervised_Text-guided_Image_Manipulation_CVPR_2024_paper.pdf)

  * [Named Entity Driven Zero-Shot Image Manipulation](https://openaccess.thecvf.com/content/CVPR2024/papers/Feng_Named_Entity_Driven_Zero-Shot_Image_Manipulation_CVPR_2024_paper.pdf)

* 布局生成

  * [Constrained Layout Generation with Factor Graphs](http://arxiv.org/abs/2404.00385v1)

  * [SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control](http://arxiv.org/abs/2312.05039)

  * [MaskPLAN: Masked Generative Layout Planning from Partial Input](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_MaskPLAN_Masked_Generative_Layout_Planning_from_Partial_Input_CVPR_2024_paper.pdf)
:star:[code](https://github.com/HangZhangZ/MaskPLAN)

  * [Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation](https://arxiv.org/abs/2311.13602)
:star:[code](https://udonda.github.io/RALF/)
:house:[project](https://udonda.github.io/RALF/)

  * [Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shabani_Visual_Layout_Composer_Image-Vector_Dual_Diffusion_Model_for_Design_Layout_CVPR_2024_paper.pdf)
:house:[project](https://aminshabani.github.io/visual_layout_composer/index.html)

* 手写数学表达式

  * [Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Generating_Handwritten_Mathematical_Expressions_From_Symbol_Graphs_An_End-to-End_Pipeline_CVPR_2024_paper.pdf)

* NeRF-to-NeRF 

  * [GenN2N: Generative NeRF2NeRF Translation](http://arxiv.org/abs/2404.02788)NeRF-to-NeRF 

* 生成伪装图像

  * [LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion](http://arxiv.org/abs/2404.00292v1)

* 场景生成

  * [Towards Realistic Scene Generation with LiDAR Diffusion Models](http://arxiv.org/abs/2404.00815v1)
:star:[code](https://github.com/hancyran/LiDAR-Diffusion)
:house:[project](https://lidar-diffusion.github.io/)
:thumbsup:[LiDM：首个可以根据多模态条件生成逼真的激光雷达场景方法，加速107倍](https://mp.weixin.qq.com/s/nFbY2mR1657gKKsOTwgDQw)

  * [DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis](https://arxiv.org/abs/2303.14207)室内场景合成

* 交互式编辑

  * [Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation](http://arxiv.org/abs/2404.01050v1)
:star:[code](https://github.com/haofengl/DragNoise)

* 视频编辑

  * [CCEdit: Creative and Controllable Video Editing via Diffusion Models](https://arxiv.org/abs/2309.16496)
:house:[project](https://huggingface.co/papers/2309.16496)
:tv:[video](https://www.youtube.com/watch?v=UQw4jq-igN4)

  * [MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers](https://arxiv.org/abs/2312.12468)

  * [RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models](https://arxiv.org/abs/2312.04524)
:star:[code](http://github.com/rehg-lab/RAVE)
:house:[project](https://rave-video.github.io/)

  * [A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing](https://arxiv.org/abs/2312.05856)
:house:[project](https://stem-inv.github.io/page/)

  * [Video-P2P: Video Editing with Cross-attention Control](https://arxiv.org/abs/2303.04761)
:house:[project](https://video-p2p.github.io/)

  * [VidToMe: Video Token Merging for Zero-Shot Video Editing](https://arxiv.org/abs/2312.10656)
:house:[project](https://vidtome-diffusion.github.io/)

  * [Video Interpolation with Diffusion Models](http://arxiv.org/abs/2404.01203v1)
:star:[code](https://vidim-interpolation.github.io/)

  * [MotionEditor: Editing Video Motion via Content-Aware Diffusion](https://arxiv.org/abs/2311.18830)

  * [CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_CAMEL_CAusal_Motion_Enhancement_Tailored_for_Lifting_Text-driven_Video_Editing_CVPR_2024_paper.pdf)
:star:[code](https://github.com/zhangguiwei610/CAMEL)

  * [DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing](https://arxiv.org/abs/2310.10624v2)
:house:[project](https://showlab.github.io/DynVideo-E/)

* 漫画生成

  * [The Manga Whisperer: Automatically Generating Transcriptions for Comics](https://arxiv.org/abs/2401.10224)
:star:[code](https://github.com/ragavsachdeva/magi)

* 文本驱动 3D 风格化

  * [TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes](https://arxiv.org/abs/2312.04248)

* Image Warping

  * [Towards Progressive Multi-Frequency Representation for Image Warping](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Towards_Progressive_Multi-Frequency_Representation_for_Image_Warping_CVPR_2024_paper.pdf)

* 图像重建

  * [Equivariant Plug-and-Play Image Reconstruction](http://arxiv.org/abs/2312.01831)

  * [SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction](https://openaccess.thecvf.com/content/CVPR2024/papers/Yao_SPECAT_SPatial-spEctral_Cumulative-Attention_Transformer_for_High-Resolution_Hyperspectral_Image_Reconstruction_CVPR_2024_paper.pdf)

  * [Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Boosting_Spike_Camera_Image_Reconstruction_from_a_Perspective_of_Dealing_CVPR_2024_paper.pdf)

* 图像拼接

  * [DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly](http://arxiv.org/abs/2402.19302v1)
:star:[code](https://github.com/IIT-PAVIS/DiffAssemble)

  * [RecDiffusion: Rectangling for Image Stitching with Diffusion Models](https://arxiv.org/abs/2403.19164)
:star:[code](https://github.com/lhaippp/RecDiffusion)

* 姿势引导的人体图像合成

  * [Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis](https://arxiv.org/abs/2402.18078)
:star:[code](https://github.com/YanzuoLu/CFLD)

* 文本引导的人体图像合成 

  * [Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation](http://arxiv.org/abs/2403.05239v1)
:star:[code](https://hcplayercvpr2024.github.io)

* 文本图像对齐

  * [Text-image Alignment for Diffusion-based Perception](https://arxiv.org/abs/2310.00031)
:star:[code](https://github.com/damaggu/TADP)
:house:[project](https://www.vision.caltech.edu/tadp/)

* 基于文本的图像色调调整

  * [CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment](https://arxiv.org/abs/2404.01123)

* 图像矢量化

  * [Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Towards_High-fidelity_Artistic_Image_Vectorization_via_Texture-Encapsulated_Shape_Parameterization_CVPR_2024_paper.pdf)

* 文本-矢量

  * [NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation](http://arxiv.org/abs/2405.15217)

* 矢量字体

  * [VecFusion: Vector Font Generation with Diffusion](https://arxiv.org/abs/2312.10540)

* 矢量图形合成

  * [SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_SuperSVG_Superpixel-based_Scalable_Vector_Graphics_Synthesis_CVPR_2024_paper.pdf)

  * [Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Vector_Graphics_Generation_via_Mutually_Impulsed_Dual-domain_Diffusion_CVPR_2024_paper.pdf)

* 二维码生成

  * [Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation](https://arxiv.org/abs/2403.06452)
:star:[code](https://github.com/mulns/Text2QR)

* 背景替换

  * [Relightful Harmonization: Lighting-aware Portrait Background Replacement](https://arxiv.org/abs/2312.06886)



## 7.Image Progress(图像处理)

* 去鬼影

  * [Generating Content for HDR Deghosting from Frequency View](http://arxiv.org/abs/2404.00849v1)

* 去阴影

  * [HomoFormer: Homogenized Transformer for Image Shadow Removal](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_HomoFormer_Homogenized_Transformer_for_Image_Shadow_Removal_CVPR_2024_paper.pdf)

* 去模糊

  * [Unsupervised Blind Image Deblurring Based on Self-Enhancement](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Unsupervised_Blind_Image_Deblurring_Based_on_Self-Enhancement_CVPR_2024_paper.pdf)

  * [Latency Correction for Event-guided Deblurring and Frame Interpolation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Latency_Correction_for_Event-guided_Deblurring_and_Frame_Interpolation_CVPR_2024_paper.pdf)

  * [LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network](https://arxiv.org/abs/2307.09815)

  * [ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation](https://arxiv.org/abs/2312.10998)

  * [Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains](http://arxiv.org/abs/2403.16205v1)
:star:[code](https://zero1778.github.io/blur2blur/)

  * [AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring](https://openaccess.thecvf.com/content/CVPR2024/papers/Mao_AdaRevD_Adaptive_Patch_Exiting_Reversible_Decoder_Pushes_the_Limit_of_CVPR_2024_paper.pdf)
:star:[code](https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur)
:star:[code](https://github.com/INVOKERer/AdaRevD)

  * [A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning](https://arxiv.org/abs/2403.02611)
:star:[code](https://github.com/PieceZhang/MPT-CataBlur)

* 去雾

  * [ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing](http://arxiv.org/abs/2404.17825)

  * [Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing](https://arxiv.org/abs/2403.01105v1)

  * [A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint](http://arxiv.org/abs/2403.18548v1)
:star:[code](https://github.com/Xiaofeng-life/SFSNiD)

* 去噪

  * [Real-World Mobile Image Denoising Dataset with Efficient Baselines](https://openaccess.thecvf.com/content/CVPR2024/papers/Flepp_Real-World_Mobile_Image_Denoising_Dataset_with_Efficient_Baselines_CVPR_2024_paper.pdf)

  * [GenesisTex: Adapting Image Denoising Diffusion to Texture Space](http://arxiv.org/abs/2403.17782)

  * [Robust Image Denoising through Adversarial Frequency Mixup](https://openaccess.thecvf.com/content/CVPR2024/papers/Ryou_Robust_Image_Denoising_through_Adversarial_Frequency_Mixup_CVPR_2024_paper.pdf)

  * [Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios](http://arxiv.org/abs/2303.16783)

  * [Masked and Shuffled Blind Spot Denoising for Real-World Images](http://arxiv.org/abs/2404.09389)

  * [LAN: Learning to Adapt Noise for Image Denoising](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_LAN_Learning_to_Adapt_Noise_for_Image_Denoising_CVPR_2024_paper.pdf)

  * [Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising](https://openaccess.thecvf.com/content/CVPR2024/papers/Zeng_Unmixing_Diffusion_for_Self-Supervised_Hyperspectral_Image_Denoising_CVPR_2024_paper.pdf)

  * [Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Stable_Neighbor_Denoising_for_Source-free_Domain_Adaptive_Segmentation_CVPR_2024_paper.pdf)

  * [Transfer CLIP for Generalizable Image Denoising](https://arxiv.org/abs/2403.15132)

  * [Residual Denoising Diffusion Models](https://arxiv.org/abs/2308.13712)
:star:[code](https://github.com/nachifur/RDDM)

  * [Equivariant plug-and-play image reconstruction](https://arxiv.org/abs/2312.01831)
:star:[code](https://github.com/matthieutrs/EquivariantPnP)

  * [Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching](https://openaccess.thecvf.com/content/CVPR2024/papers/Fadnavis_Patch2Self2_Self-supervised_Denoising_on_Coresets_via_Matrix_Sketching_CVPR_2024_paper.pdf)

  * [Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Hyper-MD_Mesh_Denoising_with_Customized_Parameters_Aware_of_Noise_Intensity_CVPR_2024_paper.pdf)

  * [Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_ZERO-IG_Zero-Shot_Illumination-Guided_Joint_Denoising_and_Adaptive_Enhancement_for_Low-Light_CVPR_2024_paper.pdf)
:thumbsup:[中文简介](https://cstc.hrbeu.edu.cn/2024/0302/c3687a322183/page.htm)

* 去雨

  * [Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining](http://arxiv.org/abs/2404.01547v1)
:star:[code](https://github.com/cschenxiang/NeRD-Rain)

* 去反射  

  * [Revisiting Single Image Reflection Removal In the Wild](https://arxiv.org/abs/2311.17320)

  * [Language-guided Image Reflection Separation](https://arxiv.org/abs/2402.11874)图像反射分离

* 修图

  * [Close Imitation of Expert Retouching for Black-and-White Photography](https://openaccess.thecvf.com/content/CVPR2024/papers/Shin_Close_Imitation_of_Expert_Retouching_for_Black-and-White_Photography_CVPR_2024_paper.pdf)

* 图像增强

  * [Color Shift Estimation-and-Correction for Image Enhancement](https://arxiv.org/abs/2405.17725)

  * [FlowIE：Efficient Image Enhancement via Rectified Flow](https://arxiv.org/abs/2406.00508)

  * [Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring](https://openaccess.thecvf.com/content/CVPR2024/papers/Lv_Fourier_Priors-Guided_Diffusion_for_Zero-Shot_Joint_Low-Light_Enhancement_and_Deblurring_CVPR_2024_paper.pdf)

  * [Specularity Factorization for Low-Light Enhancement](https://arxiv.org/abs/2404.01998)

  * [Zero-Reference Low-Light Enhancement via Physical Quadruple Priors](http://arxiv.org/abs/2403.12933v1)
:star:[code](http://daooshee.github.io/QuadPrior-Website/)

  * [Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach](http://arxiv.org/abs/2404.00834v1)
:star:[code](https://vlislab22.github.io/eg-lowlight/)

  * [Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Empowering_Resampling_Operation_for_Ultra-High-Definition_Image_Enhancement_with_Model-Aware_Guidance_CVPR_2024_paper.pdf)

  * [Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving](http://arxiv.org/abs/2404.04804v1)

* 图像恢复

  * [Learning Diffusion Texture Priors for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Ye_Learning_Diffusion_Texture_Priors_for_Image_Restoration_CVPR_2024_paper.pdf)

  * [CoDe: An Explicit Content Decoupling Framework for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Gu_CoDe_An_Explicit_Content_Decoupling_Framework_for_Image_Restoration_CVPR_2024_paper.pdf)

  * [Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration](http://arxiv.org/abs/2311.16845)

  * [Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild](http://arxiv.org/abs/2401.13627)

  * [Look-Up Table Compression for Efficient Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Look-Up_Table_Compression_for_Efficient_Image_Restoration_CVPR_2024_paper.pdf)

  * [HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Pang_HIR-Diff_Unsupervised_Hyperspectral_Image_Restoration_Via_Improved_Diffusion_Models_CVPR_2024_paper.pdf)

  * [DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks](https://arxiv.org/abs/2405.04408)

  * [Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance](http://arxiv.org/abs/2312.16519)
:star:[code](https://github.com/tirer-lab/DDPG)

  * [Deep Equilibrium Diffusion Restoration with Parallel Sampling](https://arxiv.org/abs/2311.11600)
:star:[code](https://github.com/caojiezhang/DeqIR?tab=readme-ov-file)

  * [Distilling Semantic Priors from SAM to Efficient Image Restoration Models](https://arxiv.org/abs/2403.16368)

  * [Boosting Image Restoration via Priors from Pre-trained Models](http://arxiv.org/abs/2403.06793v1)

  * [Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)

  * [Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model](http://arxiv.org/abs/2403.11157v1)
:star:[code](https://github.com/iSEE-Laboratory/DiffUIR)

  * [Restoration by Generation with Constrained Priors](https://arxiv.org/abs/2312.17161)
:house:[project](https://gen2res.github.io/)

  * [Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration](http://arxiv.org/abs/2312.02918)

  * [Improving Image Restoration through Removing Degradations in Textual Representations](https://arxiv.org/abs/2312.17334)
:star:[code](https://github.com/mrluin/TextualDegRemoval)

* 图像修复

  * [Brush2Prompt: Contextual Prompt Generator for Object Inpainting](https://openaccess.thecvf.com/content/CVPR2024/papers/Chiu_Brush2Prompt_Contextual_Prompt_Generator_for_Object_Inpainting_CVPR_2024_paper.pdf)

  * [Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting](http://arxiv.org/abs/2403.18186v1)

  * [NeRFiller: Completing Scenes via Generative 3D Inpainting](http://arxiv.org/abs/2312.04560)

  * [MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior](https://arxiv.org/abs/2405.02859)3D 修复

  * [Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting](http://arxiv.org/abs/2403.19898v1)
:star:[code](https://github.com/htyjers/StrDiffusion)

* 图像超级补全

  * [Shadow-Enlightened Image Outpainting](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Shadow-Enlightened_Image_Outpainting_CVPR_2024_paper.pdf)

* 图像质量

  * [Blind Image Quality Assessment Based on Geometric Order Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Shin_Blind_Image_Quality_Assessment_Based_on_Geometric_Order_Learning_CVPR_2024_paper.pdf)

  * [Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization](http://arxiv.org/abs/2403.11397v1)

  * [Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment](https://arxiv.org/abs/2405.04167)

  * [TextCraftor: Your Text Encoder Can be Image Quality Controller](https://arxiv.org/abs/2403.18978)

  * [Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_Boosting_Image_Quality_Assessment_through_Efficient_Transformer_Adaptation_with_Local_CVPR_2024_paper.pdf)

* 恶劣天气消除

  * [Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal](https://arxiv.org/abs/2403.07684)
:star:[code](https://github.com/scott-yjyang/DiffTTA)

  * [Language-driven All-in-one Adverse Weather Removal](https://arxiv.org/abs/2312.01381)恶劣天气消除

* 大气湍流去除

  * [NB-GTR: Narrow-Band Guided Turbulence Removal](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_NB-GTR_Narrow-Band_Guided_Turbulence_Removal_CVPR_2024_paper.pdf)

* Image Portrait Relighting(图像重照光)

  * [SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting](http://arxiv.org/abs/2402.18848v1)
:house:[project](https://www.beeble.ai/)

* 图片缩小

  * [Deep Generative Model based Rate-Distortion for Image Downscaling Assessment](http://arxiv.org/abs/2403.15139v1)

* 图像校正

  * [Rolling Shutter Correction with Intermediate Distortion Flow Estimation](https://arxiv.org/abs/2404.06350)

* 图像着色

  * [Learning Inclusion Matching for Animation Paint Bucket Colorization](http://arxiv.org/abs/2403.18342v1)
:star:[code](https://ykdai.github.io/projects/InclusionMatching)着色

  * [Automatic Controllable Colorization via Imagination](http://arxiv.org/abs/2404.05661v1)
:star:[code](https://xy-cong.github.io/imagine-colorization)

* 运动(去)模糊

  * [Motion Blur Decomposition with Cross-shutter Guidance](http://arxiv.org/abs/2404.01120v1)

  * [Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Spike-guided_Motion_Deblurring_with_Unknown_Modal_Spatiotemporal_Alignment_CVPR_2024_paper.pdf)
:star:[code](https://github.com/Leozhangjiyuan/UaSDN)

  * [Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization](http://arxiv.org/abs/2404.12168v1)

  * [Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring](https://arxiv.org/abs/2401.00027)

  * [Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring](http://arxiv.org/abs/2404.13153)
:star:[code](https://github.com/ChengxuLiu/MISCFilter)

* 视频修复

  * [AVID: Any-Length Video Inpainting with Diffusion Model](https://arxiv.org/abs/2312.03816)
:star:[code](https://github.com/zhang-zx/AVID)
:house:[project](https://zhang-zx.github.io/AVID/)

  * [Towards Language-Driven Video Inpainting via Multimodal Large Language Models](https://arxiv.org/abs/2401.10226)
:house:[project](https://jianzongwu.github.io/projects/rovi)

* 视频去雾

  * [Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance](https://arxiv.org/abs/2405.09996)

* 视频去渲染

  * [Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Leveraging_Frame_Affinity_for_sRGB-to-RAW_Video_De-rendering_CVPR_2024_paper.pdf)

* 视频去模糊

  * [Frequency-aware Event-based Video Deblurring for Real-World Motion Blur](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_Frequency-aware_Event-based_Video_Deblurring_for_Real-World_Motion_Blur_CVPR_2024_paper.pdf)

  * [Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring](https://arxiv.org/abs/2406.07551)
:star:[code](https://github.com/huicongzhang/BSSTNet)
:house:[project](https://vilab.hit.edu.cn/projects/bsstnet/)

  * [FMA-Net: Flow Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring](https://arxiv.org/abs/2401.03707)
:star:[code](https://github.com/KAIST-VICLab/FMA-Net)
:house:[project](https://kaist-viclab.github.io/fmanet-site/)

  * [DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video](https://arxiv.org/abs/2403.10103)
:house:[project](https://huiqiang-sun.github.io/dyblurf/)

* 视频增强

  * [Binarized Low-light Raw Video Enhancement](http://arxiv.org/abs/2403.19944v1)

* 视频质量评估

  * [PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild](https://arxiv.org/abs/2405.17765)

  * [Learned Scanpaths Aid Blind Panoramic Video Quality Assessment](https://arxiv.org/abs/2404.00252)

  * [Modular Blind Video Quality Assessment](http://arxiv.org/abs/2402.19276)

  * [KVQ: Kwai Video Quality Assessment for Short-form Videos](https://arxiv.org/abs/2402.07220)

  * [CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement](https://arxiv.org/abs/2403.10362)
:star:[code](https://github.com/CPGA/CPGA.git)

* 夜间颜色恒定

  * [NightCC: Nighttime Color Constancy via Adaptive Channel Masking](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_NightCC_Nighttime_Color_Constancy_via_Adaptive_Channel_Masking_CVPR_2024_paper.pdf)

* 照明估计

  * [Towards a Perceptual Evaluation Framework for Lighting Estimation](https://arxiv.org/abs/2312.04334)



## 6.Image/Video Captioning(图像/视频字幕)

* [Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation](https://arxiv.org/abs/2404.19752)

* [Polos: Multimodal Metric Learning from Human Feedback for Image Captioning](http://arxiv.org/abs/2402.18091v1)
:star:[code](https://github.com/keio-smilab24/polos)
:house:[project](https://yuiga.dev/polos/)

* [Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers](http://arxiv.org/abs/2402.19479v1)
:star:[code](https://snap-research.github.io/Panda-70M)

* [MeaCap: Memory-Augmented Zero-shot Image Captioning](http://arxiv.org/abs/2403.03715v1)
:star:[code](https://github.com/joeyz0z/MeaCap)

* [Sieve: Multimodal Dataset Pruning using Image Captioning Models](http://arxiv.org/abs/2310.02110)

* [EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension]

* [EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension](https://arxiv.org/abs/2311.15879)

* 视频描述/字幕

  * [Streaming Dense Video Captioning](http://arxiv.org/abs/2404.01297v1)
:star:[code](https://github.com/google-research/scenic/tree/main/scenic/projects/streaming_dvc)
:star:[code](https://github.com/google-research/scenic)

  * [Video ReCap: Recursive Captioning of Hour-Long Videos](https://arxiv.org/abs/2402.13250)
:star:[code](https://github.com/md-mohaiminul/VideoRecap)
:house:[project](https://sites.google.com/view/vidrecap)
:sunflower:[dataset](https://github.com/md-mohaiminul/VideoRecap/blob/master/datasets.md)

  * [Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval](http://arxiv.org/abs/2404.07610v1)

  * [VideoCon: Robust Video-Language Alignment via Contrast Captions](https://arxiv.org/abs/2311.10111)
:star:[code](https://github.com/Hritikbansal/videocon)
:house:[project](https://video-con.github.io/)

  * [Retrieval-Augmented Egocentric Video Captioning](https://arxiv.org/abs/2401.00789)

* 密集字幕

  * [A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions](https://arxiv.org/abs/2312.08578)

  * [DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement](http://arxiv.org/abs/2404.02755)

* 生成图解说明

  * [Generating Illustrated Instructions](https://arxiv.org/abs/2312.04552)
:star:[code](https://github.com/sachit-menon/generating-illustrated-instructions-reproduction)
:house:[project](http://facebookresearch.github.io/IllustratedInstructions)



## 5.Image/Video Compression(图像/视频压缩)

* 视频压缩

  * [Neural Video Compression with Feature Modulation](http://arxiv.org/abs/2402.17414v1)
:star:[code](https://github.com/microsoft/DCVC)

  * [C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video](http://arxiv.org/abs/2312.02753)
:star:[code](https://github.com/google-deepmind/c3_neural_compression)
:house:[project](https://c3-neural-compression.github.io/)

  * [Task-Aware Encoder Control for Deep Video Compression](http://arxiv.org/abs/2404.04848v1)

* 图像压缩

  * [Towards Backward-Compatible Continual Learning of Image Compression](https://arxiv.org/abs/2402.18862v1)
:star:[code](https://gitlab.com/viper-purdue/continual-compression)

  * [Generative Latent Coding for Ultra-Low Bitrate Image Compression](https://openaccess.thecvf.com/content/CVPR2024/papers/Jia_Generative_Latent_Coding_for_Ultra-Low_Bitrate_Image_Compression_CVPR_2024_paper.pdf)

  * [Dual Prior Unfolding for Snapshot Compressive Imaging](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Dual_Prior_Unfolding_for_Snapshot_Compressive_Imaging_CVPR_2024_paper.pdf)

  * [Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain](https://arxiv.org/abs/2402.17200)

  * [SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image](https://arxiv.org/abs/2403.20018)
:star:[code](https://github.com/WU-CVGL/SCINeRF)

  * [JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients](http://arxiv.org/abs/2404.05558)JPEG 解码

  * [Learned Lossless Image Compression based on Bit Plane Slicing](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Learned_Lossless_Image_Compression_based_on_Bit_Plane_Slicing_CVPR_2024_paper.pdf)



## 4.Image/Video Super-Resolution(图像超分辨率)

* [Image Processing GNN: Breaking Rigidity in Super-Resolution](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Image_Processing_GNN_Breaking_Rigidity_in_Super-Resolution_CVPR_2024_paper.pdf)

* [Learning Large-Factor EM Image Super-Resolution with Generative Priors](https://openaccess.thecvf.com/content/CVPR2024/papers/Shou_Learning_Large-Factor_EM_Image_Super-Resolution_with_Generative_Priors_CVPR_2024_paper.pdf)

* [Super-Resolution Reconstruction from Bayer-Pattern Spike Streams](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_Super-Resolution_Reconstruction_from_Bayer-Pattern_Spike_Streams_CVPR_2024_paper.pdf)

* [Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World](https://openaccess.thecvf.com/content/CVPR2024/papers/Fu_Continuous_Optical_Zooming_A_Benchmark_for_Arbitrary-Scale_Image_Super-Resolution_in_CVPR_2024_paper.pdf)

* [Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary](https://arxiv.org/abs/2401.08209)
:star:[code](https://github.com/LabShuHangGU/Adaptive-Token-Dictionary)

* [Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Learning_Coupled_Dictionaries_from_Unpaired_Data_for_Image_Super-Resolution_CVPR_2024_paper.pdf)

* [SinSR: Diffusion-Based Image Super-Resolution in a Single Step](https://arxiv.org/abs/2311.14760)
:star:[code](https://github.com/wyf0912/SinSR)

* [CAMixerSR: Only Details Need More "Attention"](http://arxiv.org/abs/2402.19289v1)

* [Text-guided Explorable Image Super-resolution](http://arxiv.org/abs/2403.01124v1)

* [CFAT: Unleashing Triangular Windows for Image Super-resolution](https://arxiv.org/abs/2403.16143)

* [SeD: Semantic-Aware Discriminator for Image Super-Resolution](http://arxiv.org/abs/2402.19387v1)

* [Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts](http://arxiv.org/abs/2402.19215v1)

* [Boosting Flow-based Generative Super-Resolution Models via Learned Prior](http://arxiv.org/abs/2403.10988v1)
:star:[code](https://github.com/liyuantsao/FlowSR-LP)

* [Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss](http://arxiv.org/abs/2404.01692v1)
:star:[code](https://github.com/JaehaKim97/SR4IR)

* [AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution](https://arxiv.org/abs/2404.03296)
:star:[code](https://github.com/Cheeun/AdaBM)

* [Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer](https://arxiv.org/abs/2303.17783)

* [DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF](https://arxiv.org/abs/2404.00874)超分辨率

* [Neural Super-Resolution for Real-time Rendering with Radiance Demodulation](https://arxiv.org/abs/2308.06699)

* [Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution](https://arxiv.org/abs/2403.16643)
:star:[code](https://github.com/ProAirVerse/Self-Adaptive-Guidance-Diffusion.git)

* [Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning](https://arxiv.org/abs/2403.02601v1)

* [CoSeR: Bridging Image and Language for Cognitive Super-Resolution](https://arxiv.org/abs/2311.16512)
:star:[code](https://github.com/VINHYU/CoSeR)
:house:[project](https://coser-main.github.io/)

* [Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution](http://arxiv.org/abs/2402.18929)

* [Bilateral Event Mining and Complementary for Event Stream Super-Resolution](https://arxiv.org/abs/2405.10037)

* 盲图像超分辨率

  * [CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution](http://arxiv.org/abs/2405.07648)

  * [A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution](http://arxiv.org/abs/2404.15620)

* 真实世界超分辨率

  [Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution](http://arxiv.org/abs/2405.14934)

  * [APISR: Anime Production Inspired Real-World Anime Super-Resolution](http://arxiv.org/abs/2403.01598)

  * [SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution](https://arxiv.org/abs/2311.16518)

* VSR

  * [Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution](http://arxiv.org/abs/2403.17000v1)

  * [Enhancing Video Super-Resolution via Implicit Resampling-based Alignment](https://arxiv.org/abs/2305.00163)

  * [Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution](https://arxiv.org/abs/2312.06640)
:house:[project](https://shangchenzhou.com/projects/upscale-a-video/)

  * [Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention](https://arxiv.org/abs/2401.06312)
:star:[code](https://github.com/LabShuHangGU/MIA-VSR)

* 文本图像超分

  * [Diffusion-based Blind Text Image Super-Resolution](https://arxiv.org/abs/2312.08886)



## 3.Image Classification(图像分类)

* [Fair-VPT: Fair Visual Prompt Tuning for Image Classification](https://openaccess.thecvf.com/content/CVPR2024/papers/Park_Fair-VPT_Fair_Visual_Prompt_Tuning_for_Image_Classification_CVPR_2024_paper.pdf)

* [Logarithmic Lenses: Exploring Log RGB Data for Image Classification](https://openaccess.thecvf.com/content/CVPR2024/papers/Maxwell_Logarithmic_Lenses_Exploring_Log_RGB_Data_for_Image_Classification_CVPR_2024_paper.pdf)

* [SLICE: Stabilized LIME for Consistent Explanations for Image Classification](https://openaccess.thecvf.com/content/CVPR2024/papers/Bora_SLICE_Stabilized_LIME_for_Consistent_Explanations_for_Image_Classification_CVPR_2024_paper.pdf)

* [Classes Are Not Equal: An Empirical Study on Image Recognition Fairness](http://arxiv.org/abs/2402.18133v1)

* [MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes](http://arxiv.org/abs/2404.08968v1)

* [SURE: SUrvey REcipes for building reliable and robust deep networks](http://arxiv.org/abs/2403.00543v1)
:star:[code](https://yutingli0606.github.io/SURE/)

* [A Bayesian Approach to OOD Robustness in Image Classification](http://arxiv.org/abs/2403.07277v1)

* [Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification](http://arxiv.org/abs/2403.01944)

* [Hyperspherical Classification with Dynamic Label-to-Prototype Assignment](http://arxiv.org/abs/2403.16937v1)
:star:[code](https://github.com/msed-Ebrahimi/DL2PA_CVPR24)

* [Discover and Mitigate Multiple Biased Subgroups in Image Classifiers](https://arxiv.org/abs/2403.12777)
:star:[code](https://github.com/ZhangAIPI/DIM)

* [Deep Imbalanced Regression via Hierarchical Classification Adjustment](https://arxiv.org/pdf/2310.17154.pdf)

* [Large Language Models are Good Prompt Learners for Low-Shot Image Classification](https://arxiv.org/abs/2312.04076)
:star:[code](https://github.com/zhaohengz/LLaMP)

* [Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use](https://arxiv.org/abs/2403.02626)

* [Bayesian Exploration of Pre-trained Models for Low-shot Image Classification](https://arxiv.org/abs/2404.00312)

* [Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model](http://arxiv.org/abs/2403.19600)

* [Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification](https://arxiv.org/abs/2404.17753)
:star:[code](https://github.com/YCaigogogo/CVPR24-CODER)

* [In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification](https://openaccess.thecvf.com/content/CVPR2024/papers/Park_In-distribution_Public_Data_Synthesis_with_Diffusion_Models_for_Differentially_Private_CVPR_2024_paper.pdf)

* 域泛化图像分类

  * [Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification](https://arxiv.org/abs/2310.08255)
:house:[project](http://val.cds.iisc.ac.in/VL2V-ADiP/)

* 长尾识别

  * [LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content](http://arxiv.org/abs/2403.05854v1)

* 小样本图像分类

  * [Frozen Feature Augmentation for Few-Shot Image Classification](http://arxiv.org/abs/2403.10519v1)

* 零样本分类

  * [Label Propagation for Zero-shot Classification with Vision-Language Models](http://arxiv.org/abs/2404.04072v1)
:star:[code](https://github.com/vladan-stojnic/ZLaP)

  * [CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification](https://arxiv.org/abs/2402.17417)
:star:[code](https://github.com/laihaoran/CARZero)

  * [Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions](https://arxiv.org/abs/2401.02460)
:star:[code](https://github.com/cvl-umass/AdaptCLIPZS)零样本分类

* 细粒度

  * [Fine-grained Bipartite Concept Factorization for Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Peng_Fine-Grained_Bipartite_Concept_Factorization_for_Clustering_CVPR_2024_paper.pdf)

  * [Novel Class Discovery for Ultra-Fine-Grained Visual Categorization](http://arxiv.org/abs/2405.06283)
:star:[code](https://github.com/SSDUT-Caiyq/UFG-NCD)

* 开集分类  

  * [ProTeCt: Prompt Tuning for Taxonomic Open Set Classification](https://arxiv.org/abs/2306.02240)

* 小样本识别

  * [Instance-based Max-margin for Practical Few-shot Recognition](https://arxiv.org/abs/2305.17368)

* GCD(广义类别发现)

  * [Federated Generalized Category Discovery](https://arxiv.org/abs/2305.14107)

  * [Active Generalized Category Discovery](http://arxiv.org/abs/2403.04272v1)
:star:[code](https://github.com/mashijie1028/ActiveGCD)

  * [Contrastive Mean-Shift Learning for Generalized Category Discovery](http://arxiv.org/abs/2404.09451v1)

  * [Solving the Catastrophic Forgetting Problem in Generalized Category Discovery](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_Solving_the_Catastrophic_Forgetting_Problem_in_Generalized_Category_Discovery_CVPR_2024_paper.pdf)



## 2.Image Segmentation(图像分割)

* [Matching Anything by Segmenting Anything](https://arxiv.org/abs/2406.04221)
:star:[code](https://github.com/siyuanliii/masa)

* [Unsupervised Universal Image Segmentation](http://arxiv.org/abs/2312.17243)

* [MESA: Matching Everything by Segmenting Anything](https://arxiv.org/abs/2401.16741)

* [MRFS: Mutually Reinforcing Image Fusion and Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_MRFS_Mutually_Reinforcing_Image_Fusion_and_Segmentation_CVPR_2024_paper.pdf)

* [RobustSAM: Segment Anything Robustly on Degraded Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_RobustSAM_Segment_Anything_Robustly_on_Degraded_Images_CVPR_2024_paper.pdf)

* [Hierarchical Histogram Threshold Segmentation - Auto-terminating High-detail Oversegmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chang_Hierarchical_Histogram_Threshold_Segmentation_-_Auto-terminating_High-detail_Oversegmentation_CVPR_2024_paper.pdf)

* [Multi-Space Alignments Towards Universal LiDAR Segmentation](http://arxiv.org/abs/2405.01538)

* [CoralSCOP: Segment any COral Image on this Planet](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_CoralSCOP_Segment_any_COral_Image_on_this_Planet_CVPR_2024_paper.pdf)分割

* [SANeRF-HQ: Segment Anything for NeRF in High Quality](https://arxiv.org/abs/2312.01531)
:house:[project](https://lyclyc52.github.io/SANeRF-HQ/)

* [ASAM: Boosting Segment Anything Model with Adversarial Tuning](http://arxiv.org/abs/2405.00256)

* [ODIN: A Single Model for 2D and 3D Segmentation](https://arxiv.org/abs/2401.02416)
:star:[code](https://github.com/ayushjain1144/odin)

* [FocSAM: Delving Deeply into Focused Objects in Segmenting Anything](https://arxiv.org/abs/2405.18706)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything](https://arxiv.org/abs/2312.00863)

* [Universal Segmentation at Arbitrary Granularity with Language Instruction](https://arxiv.org/abs/2312.01623)通用分割

* [Segment and Caption Anything](https://arxiv.org/abs/2312.00869)
:house:[project](https://xk-huang.github.io/segment-caption-anything/;)

* [COCONut: Modernizing COCO Segmentation](http://arxiv.org/abs/2404.08639v1)
:star:[code](https://xdeng7.github.io/coconut.github.io/)

* [Multi-view Aggregation Network for Dichotomous Image Segmentation](http://arxiv.org/abs/2404.07445v1)
:star:[code](https://github.com/qianyu-dlut/MVANet)

* [OMG-Seg: Is One Model Good Enough For All Segmentation?](https://arxiv.org/abs/2401.10229)
:house:[project](https://lxtgh.github.io/project/omg_seg/)

* [Unsegment Anything by Simulating Deformation](https://arxiv.org/abs/2404.02585v1)

* [BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model](https://arxiv.org/abs/2401.02317)
:star:[code](https://github.com/zongzi13545329/BA-SAM)

* [VRP-SAM: SAM with Visual Reference Prompt](http://arxiv.org/abs/2402.17726v1)

* [PEM: Prototype-based Efficient MaskFormer for Image Segmentation](http://arxiv.org/abs/2402.19422v1)

* [Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM](http://arxiv.org/abs/2404.04996v1)
:star:[code](https://github.com/Drchip61/Dual_SAM)

* [CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor](https://arxiv.org/abs/2312.07661)
:house:[project](https://torrvision.com/clip_as_rnn/)

* [Benchmarking Segmentation Models with Mask-Preserved Attribute Editing](http://arxiv.org/abs/2403.01231v1)
:star:[code](https://github.com/PRIS-CV/Pascal-EA)

* [CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers](http://arxiv.org/abs/2403.07700v1)

* [Continual Segmentation with Disentangled Objectness Learning and Class Recognition](http://arxiv.org/abs/2403.03477v1)
:star:[code](https://github.com/jordangong/CoMasTRe)

* [Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms](https://arxiv.org/abs/2311.11837)
:star:[code](https://github.com/NKI-AI/kandinsky-calibration)

* [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](http://arxiv.org/abs/2404.06542v1)
:star:[code](https://aimagelab.github.io/freeda/)
:house:[project](https://aimagelab.github.io/freeda/)

* [Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model](https://arxiv.org/abs/2311.17112)

* [A Simple Recipe for Language-guided Domain Generalized Segmentation](https://arxiv.org/pdf/2311.17922.pdf)
:house:[project](https://astra-vision.github.io/FAMix/)

* [Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts](http://arxiv.org/abs/2404.00741)
:star:[code](https://github.com/uncbiag/SegNext)

* [Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation](https://arxiv.org/pdf/2312.03502.pdf)
:star:[code](https://github.com/Zhang-Haojie/WeSAM)
:thumbsup:[分割一切模型SAM泛化能力差？域适应策略给解决了](https://mp.weixin.qq.com/s/LC1uwKgrzxU9vQkoMqo5nA)

* 开放词汇分割

  * [Transferable and Principled Efficiency for Open-Vocabulary Segmentation](http://arxiv.org/abs/2404.07448)

  * [USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_USE_Universal_Segment_Embeddings_for_Open-Vocabulary_Image_Segmentation_CVPR_2024_paper.pdf)

  * [Open-Vocabulary Segmentation with Semantic-Assisted Calibration](https://arxiv.org/abs/2312.04089)
:star:[code](https://github.com/yongliu20/SCAN)

  * [OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation](http://arxiv.org/abs/2404.01409v1)

* 视频分割

  * [UniVS: Unified and Universal Video Segmentation with Prompts as Queries](http://arxiv.org/abs/2402.18115v1)
:star:[code](https://github.com/MinghanLi/UniVS)

  * [Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence](https://export.arxiv.org/abs/2404.13605)
:house:[project](https://riponcs.github.io/TurbSegRes/)视频分割 

  * [Learning to Segment Referred Objects from Narrated Egocentric Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_Learning_to_Segment_Referred_Objects_from_Narrated_Egocentric_Videos_CVPR_2024_paper.pdf)

  * [Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation](https://arxiv.org/abs/2404.03645)
:star:[code](https://github.com/heshuting555/DsHmp)

* 语义分割

  * [Open Set Domain Adaptation for Semantic Segmentation](http://arxiv.org/abs/2405.19899)

  * [ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention](http://arxiv.org/abs/2311.16682)

  * [MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation](https://openaccess.thecvf.com/content/CVPR2024/papers/Udupa_MRFP_Learning_Generalizable_Semantic_Segmentation_from_Sim-2-Real_with_Multi-Resolution_Feature_CVPR_2024_paper.pdf)

  * [TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_TASeg_Temporal_Aggregation_Network_for_LiDAR_Semantic_Segmentation_CVPR_2024_paper.pdf)

  * [ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Norouzi_ALGM_Adaptive_Local-then-Global_Token_Merging_for_Efficient_Semantic_Segmentation_with_CVPR_2024_paper.pdf)

  * [HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation](https://arxiv.org/abs/2403.16788)

  * [Contextrast: Contextual Contrastive Learning for Semantic Segmentation](https://arxiv.org/abs/2404.10633)

  * [Open-Set Domain Adaptation for Semantic Segmentation](https://arxiv.org/abs/2405.19899)

  * [SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation](https://arxiv.org/abs/2404.02638)
:star:[code](https://github.com/yejy53/SG-BEV)

  * [Frequency-Adaptive Dilated Convolution for Semantic Segmentation](https://arxiv.org/abs/2403.05369)
:star:[code](https://github.com/Linwei-Chen/FADC)

  * [GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation](http://arxiv.org/abs/2403.16370v1)

  * [Improving Bird's Eye View Semantic Segmentation by Task Decomposition](http://arxiv.org/abs/2404.01925v1)
:star:[code](https://github.com/happytianhao/TaDe)

  * [UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather](http://arxiv.org/abs/2404.05145v1)

  * [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://arxiv.org/abs/2403.14291)

  * [Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball](https://arxiv.org/abs/2404.03778)

  * 3D 语义分割

    * [Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kang_Hierarchical_Intra-modal_Correlation_Learning_for_Label-free_3D_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation](http://arxiv.org/abs/2403.14418v1)

  * 点云语义分割

    * [Rethinking Few-shot 3D Point Cloud Semantic Segmentation](http://arxiv.org/abs/2403.00592v1)
:star:[code](https://github.com/ZhaochongAn/COSeg)

    * [PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation](https://arxiv.org/abs/2404.00979)3D 点云语义分割

  * 无监督语义分割

    * [Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Learn_to_Rectify_the_Bias_of_CLIP_for_Unsupervised_Semantic_CVPR_2024_paper.pdf)

    * [EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation](https://arxiv.org/abs/2403.01482)
:star:[code](https://github.com/MICV-yonsei/EAGLE)
:house:[project](https://micv-yonsei.github.io/eagle2024/)

  * 小样本语义分割

    * [APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/He_APSeg_Auto-Prompt_Network_for_Cross-Domain_Few-Shot_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Unlocking_the_Potential_of_Pre-trained_Vision_Transformers_for_Few-Shot_Semantic_CVPR_2024_paper.pdf)

  * 零样本语义分割

    * [Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Exploring_Regional_Clues_in_CLIP_for_Zero-Shot_Semantic_Segmentation_CVPR_2024_paper.pdf)

  * 半监督语义分割

    * [Training Vision Transformers for Semi-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_Training_Vision_Transformers_for_Semi-Supervised_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Density-Guided_Semi-Supervised_3D_Semantic_Segmentation_with_Dual-Space_Hardness_Sampling_CVPR_2024_paper.pdf)

    * [AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation](http://arxiv.org/abs/2403.01818v1)
:star:[code](https://github.com/xmed-lab/AllSpark)

    * [CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation](https://arxiv.org/abs/2306.04300)
:star:[code](https://github.com/BBBBchan/CorrMatch)

    * [Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation](http://arxiv.org/abs/2403.06462v1)
:star:[code](https://github.com/Gavinwxy/DDFP) 

    * [RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Mai_RankMatch_Exploring_the_Better_Consistency_Regularization_for_Semi-supervised_Semantic_Segmentation_CVPR_2024_paper.pdf)

  * 弱监督语义分割

    * [Class Tokens Infusion for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yoon_Class_Tokens_Infusion_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Frozen_CLIP_A_Strong_Backbone_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation](http://arxiv.org/abs/2403.11184v1)
:star:[code](https://github.com/Wu0409/DuPL)

    * [Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation](https://arxiv.org/abs/2403.07630)
:star:[code](https://github.com/Barrett-python/CPAL)

    * [Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation](https://arxiv.org/abs/2402.18467)
:star:[code](https://github.com/zwyang6/SeCo.git)

    * [PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_PSDPM_Prototype-based_Secondary_Discriminative_Pixels_Mining_for_Weakly_Supervised_Semantic_CVPR_2024_paper.pdf)

    * [From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kweon_From_SAM_to_CAMs_Exploring_Segment_Anything_Model_for_Weakly_CVPR_2024_paper.pdf)

  * 域泛化语义分割

    * [Collaborating Foundation Models for Domain Generalized Semantic Segmentation](http://arxiv.org/abs/2312.09788)
:star:[code](https://github.com/yasserben/CLOUDS)

    * [Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning](http://arxiv.org/abs/2403.06122v1)
:star:[code](https://github.com/root0yang/BlindNet)

    * [Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation](https://arxiv.org/abs/2312.04265v4)
:star:[code](https://github.com/w1oves/Rein.git)

  * 文本监督语义分割

    * [Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation](http://arxiv.org/abs/2404.04231v1)

  * 开放世界语义分割

    * [Open-World Semantic Segmentation Including Class Similarity](http://arxiv.org/abs/2403.07532v1)
:star:[code](https://github.com/PRBonn/ContMAV)

  * 开放词汇语义分割

    * [Open-Vocabulary 3D Semantic Segmentation with Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_Open-Vocabulary_3D_Semantic_Segmentation_with_Foundation_Models_CVPR_2024_paper.pdf)

    * [Open-Vocabulary Semantic Segmentation with Image Embedding Balancing](https://openaccess.thecvf.com/content/CVPR2024/papers/Shan_Open-Vocabulary_Semantic_Segmentation_with_Image_Embedding_Balancing_CVPR_2024_paper.pdf)

    * [CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.pdf)

    * [Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation](http://arxiv.org/abs/2404.00262v1)

    * [SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation](https://arxiv.org/abs/2311.15537)
:star:[code](https://github.com/xb534/SED.git)

    * [Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models](https://arxiv.org/abs/2311.17095)
:star:[code](https://github.com/letitiabanana/PnP-OVSS)

* 全景分割

  * [Semantics Distortion and Style Matter: Towards Source-free UDA for Panoramic Segmentation](http://arxiv.org/abs/2403.12505v1)

  * [ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning](http://arxiv.org/abs/2403.20126v1)
:star:[code](https://github.com/clovaai/ECLIPSE)

  * [PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation](https://arxiv.org/abs/2306.10013)
:star:[code](https://github.com/Robertwyq/PanoOcc)

  * [Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations](https://openaccess.thecvf.com/content/CVPR2024/papers/de_Geus_Task-aligned_Part-aware_Panoptic_Segmentation_through_Joint_Object-Part_Representations_CVPR_2024_paper.pdf)
:star:[code](https://github.com/tue-mps/tapps)

* 实例分割

  * [Extreme Point Supervised Instance Segmentation](https://arxiv.org/abs/2405.20729)

  * [Mudslide: A Universal Nuclear Instance Segmentation Method](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Mudslide_A_Universal_Nuclear_Instance_Segmentation_Method_CVPR_2024_paper.pdf)

  * [Semantic-aware SAM for Point-Prompted Instance Segmentation](https://arxiv.org/abs/2312.15895)

  * [SAI3D: Segment Any Instance in 3D Scenes](https://arxiv.org/abs/2312.11557)
:house:[project](https://yd-yin.github.io/SAI3D)

  * [DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data](https://arxiv.org/abs/2405.10185)

  * [FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures](http://arxiv.org/abs/2404.00130)
:star:[code](https://kainmueller-lab.github.io/fisbe)

  * [Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge](https://openaccess.thecvf.com/content/CVPR2024/papers/Zou_Teeth-SEG_An_Efficient_Instance_Segmentation_Framework_for_Orthodontic_Treatment_based_CVPR_2024_paper.pdf)

  * 开放词汇实例分割

    * [MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation](https://arxiv.org/abs/2401.07745)
:house:[project](https://pku-epic.github.io/MaskClustering)

  * 3D 实例分割

    * [BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation](https://arxiv.org/abs/2403.15019)
:star:[code](https://github.com/peoplelu/BSNet)

    * [Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance](http://arxiv.org/abs/2312.10671)

    * [Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior](https://openaccess.thecvf.com/content/CVPR2024/papers/Roh_Edge-Aware_3D_Instance_Segmentation_Network_with_Intelligent_Semantic_Prior_CVPR_2024_paper.pdf)

    * [UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes](http://arxiv.org/abs/2303.14541)

* 场景分割

  * [No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation](https://arxiv.org/abs/2404.04050)
:star:[code](https://github.com/yangyangyang127/Seg-NN)

  * [MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_MirageRoom_3D_Scene_Segmentation_with_2D_Pre-trained_Models_by_Mirage_CVPR_2024_paper.pdf)

* 动作分割

  * [Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_Progress-Aware_Online_Action_Segmentation_for_Egocentric_Procedural_Task_Videos_CVPR_2024_paper.pdf)

  * [Coherent Temporal Synthesis for Incremental Action Segmentation](http://arxiv.org/abs/2403.06102v1)

  * [Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment](http://arxiv.org/abs/2403.19225v1)

  * [Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation](http://arxiv.org/abs/2404.01518v1)

  * [FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Lu_FACT_Frame-Action_Cross-Attention_Temporal_Modeling_for_Efficient_Action_Segmentation_CVPR_2024_paper.pdf)
:star:[code](https://github.com/ZijiaLewisLu/CVPR2024-FACT)

* 参考图像分割

  * [LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shah_LQMFormer_Language-aware_Query_Mask_Transformer_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf)

  * [Mask Grounding for Referring Image Segmentation](https://arxiv.org/abs/2312.12198)
:house:[project](https://yxchng.github.io/projects/mask-grounding)

  * [Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation](http://arxiv.org/abs/2404.11998v1)

  * [Prompt-Driven Referring Image Segmentation with Instance Contrasting](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Prompt-Driven_Referring_Image_Segmentation_with_Instance_Contrasting_CVPR_2024_paper.pdf)

* 指代表达式分割

  * [Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation](http://arxiv.org/abs/2312.08007)
:star:[code](https://github.com/Rubics-Xuan/MRES)

* VOS

  * [Point-VOS: Pointing Up Video Object Segmentation](https://arxiv.org/abs/2402.05917)
:house:[project](https://pointvos.github.io/)

  * [Dual Prototype Attention for Unsupervised Video Object Segmentation](https://arxiv.org/abs/2211.12036)
:star:[code](https://github.com/Hydragon516/DPA)

  * [Depth-aware Test-Time Training for Zero-shot Video Object Segmentation](http://arxiv.org/abs/2403.04258v1)
:star:[code](https://nifangbaage.github.io/DATTT)

  * [Putting the Object Back into Video Object Segmentation](https://arxiv.org/abs/2310.12982v1)
:house:[project](https://hkchengrex.github.io/Cutie)

  * [Event-assisted Low-Light Video Object Segmentation](http://arxiv.org/abs/2404.01945v1)

  * [Guided Slot Attention for Unsupervised Video Object Segmentation](https://arxiv.org/abs/2303.08314)

  * [LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation](https://arxiv.org/abs/2306.08736)
:star:[code](https://github.com/LinfengYuan1997/Losh)

  * [RMem: Restricted Memory Banks Improve Video Object Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_RMem_Restricted_Memory_Banks_Improve_Video_Object_Segmentation_CVPR_2024_paper.pdf)

* VSS

  * [Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhuang_Infer_from_What_You_Have_Seen_Before_Temporally-dependent_Classifier_for_CVPR_2024_paper.pdf)

  * [Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes](https://arxiv.org/abs/2401.15261)

* VIS

  * [VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation](http://arxiv.org/abs/2308.14710)

* 抠图

  * [In-Context Matting](http://arxiv.org/abs/2403.15789v1)
:star:[code](https://github.com/tiny-smart/in-context-matting)

  * [Unifying Automatic and Interactive Matting with Pretrained ViTs](https://openaccess.thecvf.com/content/CVPR2024/papers/Ye_Unifying_Automatic_and_Interactive_Matting_with_Pretrained_ViTs_CVPR_2024_paper.pdf)

  * [MaGGIe: Masked Guided Gradual Human Instance Matting](http://arxiv.org/abs/2404.16035)

  * [EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting](http://arxiv.org/abs/2308.12831)

* 少样本分割

  * [Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation](https://arxiv.org/abs/2405.08458)

  * [Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation](http://arxiv.org/abs/2404.10322v1)
:star:[code](https://github.com/Matt-Su/DR-Adapter)

  * [Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach](http://arxiv.org/abs/2404.11732v1)

  * [Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation](https://arxiv.org/abs/2402.17614)

  * [LLaFS: When Large Language Models Meet Few-Shot Segmentation](https://arxiv.org/abs/2311.16926)

  * [Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining](https://arxiv.org/abs/2401.08407)
:star:[code](https://github.com/niejiahao1998/IFA)

  * [Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_Addressing_Background_Context_Bias_in_Few-Shot_Segmentation_through_Iterative_Modulation_CVPR_2024_paper.pdf)

* 零样本分割

  * [Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion](https://arxiv.org/abs/2308.12469)
:star:[code](https://github.com/google/diffseg)

* 裂纹分割

  * [Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Mind_Marginal_Non-Crack_Regions_Clustering-Inspired_Representation_Learning_for_Crack_Segmentation_CVPR_2024_paper.pdf)

* 交互式分割

  * [GraCo: Granularity-Controllable Interactive Segmentation](http://arxiv.org/abs/2405.00587)
:tv:[video](https://www.youtube.com/watch?v=QE8Mi0k2nKg)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

  * [MFP: Making Full Use of Probability Maps for Interactive Image Segmentation](https://arxiv.org/abs/2404.18448)
:star:[code](https://github.com/cwlee00/MFP)

* 无模态分割

  * [pix2gestalt: Amodal Segmentation by Synthesizing Wholes](https://arxiv.org/abs/2401.14398)
:star:[code](https://github.com/cvlab-columbia/pix2gestalt)
:house:[project](https://gestalt.cs.columbia.edu/)

* 3D 分割

  * [OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning](https://arxiv.org/abs/2311.11666)

  * [PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation](http://arxiv.org/abs/2312.04016)

  * [Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Cross-Dimension_Affinity_Distillation_for_3D_EM_Neuron_Segmentation_CVPR_2024_paper.pdf)

  * [LASO: Language-guided Affordance Segmentation on 3D Object](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_LASO_Language-guided_Affordance_Segmentation_on_3D_Object_CVPR_2024_paper.pdf)



## 1.其它(other)

* [Implicit Motion Function](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Implicit_Motion_Function_CVPR_2024_paper.pdf)

* [Rewrite the stars](http://arxiv.org/abs/2403.19967)

* [Adapters Strike Back](https://arxiv.org/abs/2406.06820)
:star:[code](https://github.com/visinf/adapter_plus)

* [Detector-Free Structure from Motion](http://arxiv.org/abs/2306.15669)

* [Utility-Fairness Trade-Offs and How to Find Them](http://arxiv.org/abs/2404.09454)

* [Learning Degradation-Independent Representations for Camera ISP Pipelines](http://arxiv.org/abs/2307.00761)

* [Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion](http://arxiv.org/abs/2312.00852)

* [Event-based Visible and Infrared Fusion via Multi-task Collaboration](https://openaccess.thecvf.com/content/CVPR2024/papers/Geng_Event-based_Visible_and_Infrared_Fusion_via_Multi-task_Collaboration_CVPR_2024_paper.pdf)

* [DemoCaricature: Democratising Caricature Generation with a Rough Sketch](http://arxiv.org/abs/2312.04364)

* [PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_PolarRec_Improving_Radio_Interferometric_Data_Reconstruction_Using_Polar_Coordinates_CVPR_2024_paper.pdf)

* [Generative Multimodal Models are In-Context Learners](http://arxiv.org/abs/2312.13286)

* [CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update](http://arxiv.org/abs/2312.10908)

* [Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Fully_Exploiting_Every_Real_Sample_SuperPixel_Sample_Gradient_Model_Stealing_CVPR_2024_paper.pdf)

* [On the Content Bias in Frechet Video Distance](http://arxiv.org/abs/2404.12391)

* [Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Saini_Beyond_Seen_Primitive_Concepts_and_Attribute-Object_Compositional_Learning_CVPR_2024_paper.pdf)

* [Rotation-Agnostic Image Representation Learning for Digital Pathology](http://arxiv.org/abs/2311.08359)

* [Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences](http://arxiv.org/abs/2312.09337)

* [Neural Lineage](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Neural_Lineage_CVPR_2024_paper.pdf)

* [Scaled Decoupled Distillation](http://arxiv.org/abs/2403.13512)

* [Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?](http://arxiv.org/abs/2312.04548)

* [VAREN: Very Accurate and Realistic Equine Network](https://openaccess.thecvf.com/content/CVPR2024/papers/Zuffi_VAREN_Very_Accurate_and_Realistic_Equine_Network_CVPR_2024_paper.pdf)

* [Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging](https://openaccess.thecvf.com/content/CVPR2024/papers/Ghanekar_Passive_Snapshot_Coded_Aperture_Dual-Pixel_RGB-D_Imaging_CVPR_2024_paper.pdf)

* [Generative Proxemics: A Prior for 3D Social Interaction from Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Muller_Generative_Proxemics_A_Prior_for_3D_Social_Interaction_from_Images_CVPR_2024_paper.pdf)

* [Non-Rigid Structure-from-Motion: Temporally-Smooth Procrustean Alignment and Spatially-Variant Deformation Modeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_Non-Rigid_Structure-from-Motion_Temporally-Smooth_Procrustean_Alignment_and_Spatially-Variant_Deformation_Modeling_CVPR_2024_paper.pdf)

* [Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning](http://arxiv.org/abs/2405.16996)

* [General Point Model Pretraining with Autoencoding and Autoregressive](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_General_Point_Model_Pretraining_with_Autoencoding_and_Autoregressive_CVPR_2024_paper.pdf)

* [Estimating Extreme 3D Image Rotations using Cascaded Attention](https://openaccess.thecvf.com/content/CVPR2024/papers/Dekel_Estimating_Extreme_3D_Image_Rotations_using_Cascaded_Attention_CVPR_2024_paper.pdf)

* [Fitting Flats to Flats](https://openaccess.thecvf.com/content/CVPR2024/papers/Dogadov_Fitting_Flats_to_Flats_CVPR_2024_paper.pdf)

* [Generative Powers of Ten](http://arxiv.org/abs/2312.02149)

* [Identifying Important Group of Pixels using Interactions](http://arxiv.org/abs/2401.03785)

* [ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Han_ParameterNet_Parameters_Are_All_You_Need_for_Large-scale_Visual_Pretraining_CVPR_2024_paper.pdf)

* [HIT: Estimating Internal Human Implicit Tissues from the Body Surface](https://openaccess.thecvf.com/content/CVPR2024/papers/Keller_HIT_Estimating_Internal_Human_Implicit_Tissues_from_the_Body_Surface_CVPR_2024_paper.pdf)

* [Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions](http://arxiv.org/abs/2402.17065)

* [CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation](http://arxiv.org/abs/2404.02388)

* [Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection](http://arxiv.org/abs/2303.17890)

* [Total Selfie: Generating Full-Body Selfies](http://arxiv.org/abs/2308.14740)

* [Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos](http://arxiv.org/abs/2403.02782)

* [Pixel-Aligned Language Model](http://arxiv.org/abs/2312.09237)

* [ReCoRe: Regularized Contrastive Representation Learning of World Model](http://arxiv.org/abs/2312.09056)

* [Self-Calibrating Vicinal Risk Minimisation for Model Calibration](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Self-Calibrating_Vicinal_Risk_Minimisation_for_Model_Calibration_CVPR_2024_paper.pdf)

* [From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration](https://openaccess.thecvf.com/content/CVPR2024/papers/Qian_From_a_Birds_Eye_View_to_See_Joint_Camera_and_CVPR_2024_paper.pdf)

* [Motion Diversification Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_Motion_Diversification_Networks_CVPR_2024_paper.pdf)

* [Ungeneralizable Examples](https://export.arxiv.org/abs/2404.14016)

* [Generalized Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Sundar_Generalized_Event_Cameras_CVPR_2024_paper.pdf)

* [Event-based Structure-from-Orbit](https://arxiv.org/abs/2405.06216)

* [Seeing the World through Your Eyes](http://arxiv.org/abs/2306.09348)

* [ProMotion: Prototypes As Motion Learners](https://openaccess.thecvf.com/content/CVPR2024/papers/Lu_ProMotion_Prototypes_As_Motion_Learners_CVPR_2024_paper.pdf)

* [Move Anything with Layered Scene Diffusion](http://arxiv.org/abs/2404.07178v1)

* [GLACE: Global Local Accelerated Coordinate Encoding](https://arxiv.org/abs/2406.04340)
:star:[code](https://github.com/cvg/glace)
:house:[project](https://xjiangan.github.io/glace)

* [Quantifying Task Priority for Multi-Task Optimization](https://arxiv.org/abs/2406.02996)

* [Model Adaptation for Time Constrained Embodied Control](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_Model_Adaptation_for_Time_Constrained_Embodied_Control_CVPR_2024_paper.pdf)

* [Objects as Volumes: A Stochastic Geometry View of Opaque Solids](http://arxiv.org/abs/2312.15406)

* [EvDiG: Event-guided Direct and Global Components Separation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_EvDiG_Event-guided_Direct_and_Global_Components_Separation_CVPR_2024_paper.pdf)

* [Efficient Model Stealing Defense with Noise Transition Matrix](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Efficient_Model_Stealing_Defense_with_Noise_Transition_Matrix_CVPR_2024_paper.pdf)

* [OpenStreetView-5M: The Many Roads to Global Visual Geolocation](https://openaccess.thecvf.com/content/CVPR2024/papers/Astruc_OpenStreetView-5M_The_Many_Roads_to_Global_Visual_Geolocation_CVPR_2024_paper.pdf)

* [WaveMo: Learning Wavefront Modulations to See Through Scattering](http://arxiv.org/abs/2404.07985)

* [All Rivers Run to the Sea: Private Learning with Asymmetric Flows](http://arxiv.org/abs/2312.05264)

* [HDQMF: Holographic Feature Decomposition Using Quantum Algorithms](https://openaccess.thecvf.com/content/CVPR2024/papers/Poduval_HDQMF_Holographic_Feature_Decomposition_Using_Quantum_Algorithms_CVPR_2024_paper.pdf)

* [Cross-dimension Affinity Distillation for 3D EM Neuron Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Cross-Dimension_Affinity_Distillation_for_3D_EM_Neuron_Segmentation_CVPR_2024_paper.pdf)

* [READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning](https://openaccess.thecvf.com/content/CVPR2024/papers/Oba_READ_Retrieval-Enhanced_Asymmetric_Diffusion_for_Motion_Planning_CVPR_2024_paper.pdf)

* [AssistGUI: Task-Oriented PC Graphical User Interface Automation](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_AssistGUI_Task-Oriented_PC_Graphical_User_Interface_Automation_CVPR_2024_paper.pdf)

* [Towards Robust Learning to Optimize with Theoretical Guarantees](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_Towards_Robust_Learning_to_Optimize_with_Theoretical_Guarantees_CVPR_2024_paper.pdf)

* [Selective Nonlinearities Removal from Digital Signals](https://openaccess.thecvf.com/content/CVPR2024/papers/Maliszewski_Selective_Nonlinearities_Removal_from_Digital_Signals_CVPR_2024_paper.pdf)

* [Ensemble Diversity Facilitates Adversarial Transferability](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_Ensemble_Diversity_Facilitates_Adversarial_Transferability_CVPR_2024_paper.pdf)

* [One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications](http://arxiv.org/abs/2312.16145)

* [DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations](https://openaccess.thecvf.com/content/CVPR2024/papers/Augustin_DiG-IN_Diffusion_Guidance_for_Investigating_Networks_-_Uncovering_Classifier_Differences_CVPR_2024_paper.pdf)

* [Spatial-Aware Regression for Keypoint Localization](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Spatial-Aware_Regression_for_Keypoint_Localization_CVPR_2024_paper.pdf)

* [LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_LP_A_Surprisingly_Strong_Linear_Probe_for_Few-Shot_CLIP_CVPR_2024_paper.pdf)

* [One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls](http://arxiv.org/abs/2311.15744)

* [Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory](https://openaccess.thecvf.com/content/CVPR2024/papers/Ye_Online_Task-Free_Continual_Generative_and_Discriminative_Learning_via_Dynamic_Cluster_CVPR_2024_paper.pdf)

* [Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households](http://arxiv.org/abs/2404.09001)

* [SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers](http://arxiv.org/abs/2312.00648)

* [The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes](http://arxiv.org/abs/2402.08922)

* [Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Segment_Any_Event_Streams_via_Weighted_Adaptation_of_Pivotal_Tokens_CVPR_2024_paper.pdf)

* [The More You See in 2D the More You Perceive in 3D](http://arxiv.org/abs/2404.03652)

* [Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy](http://arxiv.org/abs/2404.02176)

* [DiffusionLight: Light Probes for Free by Painting a Chrome Ball](http://arxiv.org/abs/2312.09168)

* [Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement](http://arxiv.org/abs/2404.19294)

* [Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing](https://arxiv.org/abs/2310.12153)

* [QUADify: Extracting Meshes with Pixel-level Details and Materials from Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Fruhauf_QUADify_Extracting_Meshes_with_Pixel-level_Details_and_Materials_from_Images_CVPR_2024_paper.pdf)

* [Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities](http://arxiv.org/abs/2311.05698)

* [E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_E-GPS_Explainable_Geometry_Problem_Solving_via_Top-Down_Solver_and_Bottom-Up_CVPR_2024_paper.pdf)

* [Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_Zero-Shot_Structure-Preserving_Diffusion_Model_for_High_Dynamic_Range_Tone_Mapping_CVPR_2024_paper.pdf)

* [Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Outdoor_Scene_Extrapolation_with_Hierarchical_Generative_Cellular_Automata_CVPR_2024_paper.pdf)

* [Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion](https://openaccess.thecvf.com/content/CVPR2024/papers/Manam_Leveraging_Camera_Triplets_for_Efficient_and_Accurate_Structure-from-Motion_CVPR_2024_paper.pdf)

* [Partial-to-Partial Shape Matching with Geometric Consistency](http://arxiv.org/abs/2404.12209)

* [Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision](http://arxiv.org/abs/2404.15672)

* [Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding](http://arxiv.org/abs/2402.08919)

* [Resolution Limit of Single-Photon LiDAR](http://arxiv.org/abs/2403.17719)

* [Permutation Equivariance of Transformers and Its Applications](http://arxiv.org/abs/2304.07735)

* [From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations](http://arxiv.org/abs/2401.01885)

* [Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Continual-MAE_Adaptive_Distribution_Masked_Autoencoders_for_Continual_Test-Time_Adaptation_CVPR_2024_paper.pdf)

* [GDA: Generalized Diffusion for Robust Test-time Adaptation](http://arxiv.org/abs/2404.00095)

* [Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence](http://arxiv.org/abs/2311.17034)

* [Amodal Ground Truth and Completion in the Wild](http://arxiv.org/abs/2312.17247)

* [Spatio-Temporal Turbulence Mitigation: A Translational Perspective](http://arxiv.org/abs/2401.04244)

* [EgoGen: An Egocentric Synthetic Data Generator](http://arxiv.org/abs/2401.08739)

* [Time- Memory- and Parameter-Efficient Visual Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Mercea_Time-_Memory-_and_Parameter-Efficient_Visual_Adaptation_CVPR_2024_paper.pdf)

* [Finsler-Laplace-Beltrami Operators with Application to Shape Analysis](https://openaccess.thecvf.com/content/CVPR2024/papers/Weber_Finsler-Laplace-Beltrami_Operators_with_Application_to_Shape_Analysis_CVPR_2024_paper.pdf)

* [Intensity-Robust Autofocus for Spike Camera](https://openaccess.thecvf.com/content/CVPR2024/papers/Su_Intensity-Robust_Autofocus_for_Spike_Camera_CVPR_2024_paper.pdf)

* [LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset](http://arxiv.org/abs/2312.12418)

* [Learning to Navigate Efficiently and Precisely in Real Environments](http://arxiv.org/abs/2401.14349)

* [State Space Models for Event Cameras](http://arxiv.org/abs/2402.15584)

* [Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Real-Time_Exposure_Correction_via_Collaborative_Transformations_and_Adaptive_Sampling_CVPR_2024_paper.pdf)

* [Robust Self-calibration of Focal Lengths from the Fundamental Matrix](http://arxiv.org/abs/2311.16304)

* [Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability](http://arxiv.org/abs/2312.10634)

* [PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought](https://openaccess.thecvf.com/content/CVPR2024/papers/Yao_PromptCoT_Align_Prompt_Distribution_via_Adapted_Chain-of-Thought_CVPR_2024_paper.pdf)

* [Uncertainty-Guided Never-Ending Learning to Drive](https://openaccess.thecvf.com/content/CVPR2024/papers/Lai_Uncertainty-Guided_Never-Ending_Learning_to_Drive_CVPR_2024_paper.pdf)

* [Flow-Guided Online Stereo Rectification for Wide Baseline Stereo](https://openaccess.thecvf.com/content/CVPR2024/papers/Kumar_Flow-Guided_Online_Stereo_Rectification_for_Wide_Baseline_Stereo_CVPR_2024_paper.pdf)

* [Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Dufour_Dont_Drop_Your_Samples_Coherence-Aware_Training_Benefits_Conditional_Diffusion_CVPR_2024_paper.pdf)

* [Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Improving_Spectral_Snapshot_Reconstruction_with_Spectral-Spatial_Rectification_CVPR_2024_paper.pdf)

* [Data-Efficient Multimodal Fusion on a Single GPU](http://arxiv.org/abs/2312.10144)

* [Learning the 3D Fauna of the Web](http://arxiv.org/abs/2401.02400)

* [iToF-flow-based High Frame Rate Depth Imaging](https://openaccess.thecvf.com/content/CVPR2024/papers/Meng_iToF-flow-based_High_Frame_Rate_Depth_Imaging_CVPR_2024_paper.pdf)

* [Adaptive Softassign via Hadamard-Equipped Sinkhorn](http://arxiv.org/abs/2309.13855)

* [Step Differences in Instructional Video](http://arxiv.org/abs/2404.16222)

* [EASE-DETR: Easing the Competition among Object Queries](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_EASE-DETR_Easing_the_Competition_among_Object_Queries_CVPR_2024_paper.pdf)

* [Making Visual Sense of Oracle Bones for You and Me](https://openaccess.thecvf.com/content/CVPR2024/papers/Qiao_Making_Visual_Sense_of_Oracle_Bones_for_You_and_Me_CVPR_2024_paper.pdf)

* [2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_2S-UDF_A_Novel_Two-stage_UDF_Learning_Method_for_Robust_Non-watertight_CVPR_2024_paper.pdf)多视图图像重建

* [Multimodal Representation Learning by Alternating Unimodal Adaptation](http://arxiv.org/abs/2311.10707)多模态

* [Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts](https://arxiv.org/abs/2312.00968)

* [Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering](http://arxiv.org/abs/2404.15655)多模态

* [Efficient Hyperparameter Optimization with Adaptive Fidelity Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_Efficient_Hyperparameter_Optimization_with_Adaptive_Fidelity_Identification_CVPR_2024_paper.pdf)

* [Multi-modal learning for geospatial vegetation forecasting](https://arxiv.org/abs/2303.16198)
:star:[code](https://github.com/vitusbenson/greenearthnet)

* [AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One](https://openaccess.thecvf.com/content/CVPR2024/papers/Ranzinger_AM-RADIO_Agglomerative_Vision_Foundation_Model_Reduce_All_Domains_Into_One_CVPR_2024_paper.pdf)
:star:[code](https://github.com/NVlabs/RADIO)

* [Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Practical_Measurements_of_Translucent_Materials_with_Inter-Pixel_Translucency_Prior_CVPR_2024_paper.pdf)

* [Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation](https://arxiv.org/abs/2406.00699)
:star:[code](https://github.com/xiaoyuanpigo/maxlin)

* [Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling](https://arxiv.org/abs/2405.04309)

* [Seeing Motion at Nighttime with an Event Camera](http://arxiv.org/abs/2404.11884v1)
:star:[code](https://github.com/Liu-haoyue/NER-Net)

* [Batch Normalization Alleviates the Spectral Bias in Coordinate Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_Batch_Normalization_Alleviates_the_Spectral_Bias_in_Coordinate_Networks_CVPR_2024_paper.pdf)

* [Affine Equivariant Networks Based on Differential Invariants](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Affine_Equivariant_Networks_Based_on_Differential_Invariants_CVPR_2024_paper.pdf)

* [NC-TTT: A Noise Constrastive Approach for Test-Time Training](https://openaccess.thecvf.com/content/CVPR2024/papers/Osowiechi_NC-TTT_A_Noise_Constrastive_Approach_for_Test-Time_Training_CVPR_2024_paper.pdf)

* [Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Raistrick_Infinigen_Indoors_Photorealistic_Indoor_Scenes_using_Procedural_Generation_CVPR_2024_paper.pdf)

* [Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains](https://openaccess.thecvf.com/content/CVPR2024/papers/Baek_Unexplored_Faces_of_Robustness_and_Out-of-Distribution_Covariate_Shifts_in_Environment_CVPR_2024_paper.pdf)

* [Pre-training Vision Models with Mandelbulb Variations](https://openaccess.thecvf.com/content/CVPR2024/papers/Chiche_Pre-training_Vision_Models_with_Mandelbulb_Variations_CVPR_2024_paper.pdf)

* [Noisy One-point Homographies are Surprisingly Good](https://openaccess.thecvf.com/content/CVPR2024/papers/Ding_Noisy_One-point_Homographies_are_Surprisingly_Good_CVPR_2024_paper.pdf)

* [Revisiting Global Translation Estimation with Feature Tracks](https://openaccess.thecvf.com/content/CVPR2024/papers/Tao_Revisiting_Global_Translation_Estimation_with_Feature_Tracks_CVPR_2024_paper.pdf)
:house:[project](http://www.3dv.ac.cn/en/publication/cvpr-c/)

* [Efficient Scene Recovery Using Luminous Flux Prior](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Efficient_Scene_Recovery_Using_Luminous_Flux_Prior_CVPR_2024_paper.pdf)

* [MR-VNet: Media Restoration using Volterra Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Roheda_MR-VNet_Media_Restoration_using_Volterra_Networks_CVPR_2024_paper.pdf)

* [LEAD: Exploring Logit Space Evolution for Model Selection](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_LEAD_Exploring_Logit_Space_Evolution_for_Model_Selection_CVPR_2024_paper.pdf)

* [EventPS: Real-Time Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_EventPS_Real-Time_Photometric_Stereo_Using_an_Event_Camera_CVPR_2024_paper.pdf)

* [Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Hong_Your_Transferability_Barrier_is_Fragile_Free-Lunch_for_Transferring_the_Non-Transferable_CVPR_2024_paper.pdf)

* [A Theory of Joint Light and Heat Transport for Lambertian Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Ramanagopal_A_Theory_of_Joint_Light_and_Heat_Transport_for_Lambertian_CVPR_2024_paper.pdf)

* [A Physics-informed Low-rank Deep Neural Network for Blind and Universal Lens Aberration Correction](https://openaccess.thecvf.com/content/CVPR2024/papers/Gong_A_Physics-informed_Low-rank_Deep_Neural_Network_for_Blind_and_Universal_CVPR_2024_paper.pdf)

* [MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_MCNet_Rethinking_the_Core_Ingredients_for_Accurate_and_Efficient_Homography_CVPR_2024_paper.pdf)

* [Animating General Image with Large Visual Motion Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Animating_General_Image_with_Large_Visual_Motion_Model_CVPR_2024_paper.pdf)

* [Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Pixel-level_Semantic_Correspondence_through_Layout-aware_Representation_Learning_and_Multi-scale_Matching_CVPR_2024_paper.pdf)

* [Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_Tuning_Stable_Rank_Shrinkage_Aiming_at_the_Overlooked_Structural_Risk_CVPR_2024_paper.pdf)

* [Domain Gap Embeddings for Generative Dataset Augmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Domain_Gap_Embeddings_for_Generative_Dataset_Augmentation_CVPR_2024_paper.pdf)

* [Absolute Pose from One or Two Scaled and Oriented Features](https://openaccess.thecvf.com/content/CVPR2024/papers/Ventura_Absolute_Pose_from_One_or_Two_Scaled_and_Oriented_Features_CVPR_2024_paper.pdf)

* [Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance](https://openaccess.thecvf.com/content/CVPR2024/papers/Koneputugodage_Small_Steps_and_Level_Sets_Fitting_Neural_Surface_Models_with_CVPR_2024_paper.pdf)

* [From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers](https://openaccess.thecvf.com/content/CVPR2024/papers/Gurumurthy_From_Variance_to_Veracity_Unbundling_and_Mitigating_Gradient_Variance_in_CVPR_2024_paper.pdf)

* [Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse](http://arxiv.org/abs/2405.05587)

* [Latent Modulated Function for Computational Optimal Continuous Image Representation](http://arxiv.org/abs/2404.16451)

* [Improved Self-Training for Test-Time Adaptation](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_Improved_Self-Training_for_Test-Time_Adaptation_CVPR_2024_paper.pdf)

* [Learning with Structural Labels for Learning with Noisy Labels](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_Learning_with_Structural_Labels_for_Learning_with_Noisy_Labels_CVPR_2024_paper.pdf)

* [An N-Point Linear Solver for Line and Motion Estimation with Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_An_N-Point_Linear_Solver_for_Line_and_Motion_Estimation_with_CVPR_2024_paper.pdf)

* [Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor](https://openaccess.thecvf.com/content/CVPR2024/papers/Park_Not_All_Classes_Stand_on_Same_Embeddings_Calibrating_a_Semantic_CVPR_2024_paper.pdf)

* [Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Hybrid_Proposal_Refiner_Revisiting_DETR_Series_from_the_Faster_R-CNN_CVPR_2024_paper.pdf)

* [FADES: Fair Disentanglement with Sensitive Relevance](https://openaccess.thecvf.com/content/CVPR2024/papers/Jang_FADES_Fair_Disentanglement_with_Sensitive_Relevance_CVPR_2024_paper.pdf)

* [D^4: Dataset Distillation via Disentangled Diffusion Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Su_D4_Dataset_Distillation_via_Disentangled_Diffusion_Model_CVPR_2024_paper.pdf)

* [Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Embracing_Unimodal_Aleatoric_Uncertainty_for_Robust_Multimodal_Fusion_CVPR_2024_paper.pdf)

* [Adversarial Distillation Based on Slack Matching and Attribution Region Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Yin_Adversarial_Distillation_Based_on_Slack_Matching_and_Attribution_Region_Alignment_CVPR_2024_paper.pdf)

* [In Search of a Data Transformation That Accelerates Neural Field Training](http://arxiv.org/abs/2311.17094)

* [SIRA: Scalable Inter-frame Relation and Association for Radar Perception](https://openaccess.thecvf.com/content/CVPR2024/papers/Yataka_SIRA_Scalable_Inter-frame_Relation_and_Association_for_Radar_Perception_CVPR_2024_paper.pdf)

* [Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection](http://arxiv.org/abs/2405.19902)

* [Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Efficient_Detection_of_Long_Consistent_Cycles_and_its_Application_to_CVPR_2024_paper.pdf)

* [Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now](https://openaccess.thecvf.com/content/CVPR2024/papers/Sarkar_Shadows_Dont_Lie_and_Lines_Cant_Bend_Generative_Models_dont_CVPR_2024_paper.pdf)

* [Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions](http://arxiv.org/abs/2401.10217)

* [Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation](https://openaccess.thecvf.com/content/CVPR2024/papers/Noohdani_Decompose-and-Compose_A_Compositional_Approach_to_Mitigating_Spurious_Correlation_CVPR_2024_paper.pdf)

* [Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology](https://arxiv.org/abs/2405.11643)

* [Tune-An-Ellipse: CLIP Has Potential to Find What You Want](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_Tune-An-Ellipse_CLIP_Has_Potential_to_Find_What_You_Want_CVPR_2024_paper.pdf)

* [Task-Driven Wavelets using Constrained Empirical Risk Minimization](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcus_Task-Driven_Wavelets_using_Constrained_Empirical_Risk_Minimization_CVPR_2024_paper.pdf)

* [TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light](https://openaccess.thecvf.com/content/CVPR2024/papers/Mirdehghan_TurboSL_Dense_Accurate_and_Fast_3D_by_Neural_Inverse_Structured_CVPR_2024_paper.pdf)

* [Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation](http://arxiv.org/abs/2405.03662)

* [Robust Noisy Correspondence Learning with Equivariant Similarity Consistency](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Robust_Noisy_Correspondence_Learning_with_Equivariant_Similarity_Consistency_CVPR_2024_paper.pdf)

* [Scaling Laws for Data Filtering-- Data Curation cannot be Compute Agnostic](https://openaccess.thecvf.com/content/CVPR2024/papers/Goyal_Scaling_Laws_for_Data_Filtering--_Data_Curation_cannot_be_Compute_CVPR_2024_paper.pdf)
:star:[code](https://github.com/locuslab/scaling_laws_data_filtering)

* [Learning to Rank Patches for Unbiased Image Redundancy Reduction](https://arxiv.org/abs/2404.00680)

* [Plug-and-Play Diffusion Distillation](https://arxiv.org/abs/2406.01954)

* [Minimal Perspective Autocalibration](https://arxiv.org/abs/2405.05605)

* [Differentiable Micro-Mesh Construction](https://openaccess.thecvf.com/content/CVPR2024/papers/Dou_Differentiable_Micro-Mesh_Construction_CVPR_2024_paper.pdf)

* [PointBeV: A Sparse Approach for BeV Predictions](https://openaccess.thecvf.com/content/CVPR2024/papers/Chambon_PointBeV_A_Sparse_Approach_for_BeV_Predictions_CVPR_2024_paper.pdf)
:star:[code](https://github.com/valeoai/PointBeV)

* [Sheared Backpropagation for Fine-tuning Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Sheared_Backpropagation_for_Fine-tuning_Foundation_Models_CVPR_2024_paper.pdf)

* [AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings](https://openaccess.thecvf.com/content/CVPR2024/papers/Watson_AirPlanes_Accurate_Plane_Estimation_via_3D-Consistent_Embeddings_CVPR_2024_paper.pdf)

* [Differentiable Neural Surface Refinement for Modeling Transparent Objects](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Differentiable_Neural_Surface_Refinement_for_Modeling_Transparent_Objects_CVPR_2024_paper.pdf)

* [Communication-Efficient Collaborative Perception via Information Filling with Codebook](https://arxiv.org/abs/2405.04966)

* [Friendly Sharpness-Aware Minimization](https://arxiv.org/abs/2403.12350)
:star:[code](https://github.com/nblt/F-SAM)

* [RepAn: Enhanced Annealing through Re-parameterization](https://openaccess.thecvf.com/content/CVPR2024/papers/Fei_RepAn_Enhanced_Annealing_through_Re-parameterization_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [LiSA：LiDAR Localization with Semantic Awareness](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_LiSA_LiDAR_Localization_with_Semantic_Awareness_CVPR_2024_paper.pdf)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations](https://arxiv.org/abs/2312.06716)

* [As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors](https://arxiv.org/abs/2311.16739)
:house:[project](https://as-plausible-aspossible.github.io/)

* [Fun with Flags: Robust Principal Directions via Flag Manifolds](https://arxiv.org/abs/2401.04071)

* [Steerers: A Framework for Rotation Equivariant Keypoint Descriptors](https://openaccess.thecvf.com/content/CVPR2024/papers/Bokman_Steerers_A_Framework_for_Rotation_Equivariant_Keypoint_Descriptors_CVPR_2024_paper.pdf)
:star:[code](https://github.com/georg-bn/rotation-steerers)

* [GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields](https://arxiv.org/abs/2404.00931)

* [PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors](https://arxiv.org/abs/2304.05440)

* [InceptionNeXt: When Inception Meets ConvNeXt](https://arxiv.org/abs/2303.16900)
:star:[code](https://github.com/sail-sg/inceptionnext)

* [Aligning and Prompting Everything All at Once for Universal Visual Perception](https://arxiv.org/abs/2312.02153)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [Backpropagation-free Network for 3D Test-Time Adaptation](https://arxiv.org/abs/2403.18442)
:star:[code](https://github.com/abie-e/BFTT3D)

* [Accept the Modality Gap: An Exploration in the Hyperbolic Space](https://assets.amazon.science/e7/45/19b17928486db6c912aed1fab96f/accept-the-modality-gap-an-exploration-in-the-hyperbolic-space.pdf)

* [Discontinuity-preserving Normal Integration with Auxiliary Edges](https://arxiv.org/abs/2404.03138)
:tv:[video](https://youtu.be/MTTcW5kAOFE)

* [1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness](https://arxiv.org/abs/2311.16833)
:star:[code](https://github.com/berndprach/1LipschitzLayersCompared)

* [PoNQ: a Neural QEM-based Mesh Representation](https://arxiv.org/abs/2403.12870)

* [Generating Non-Stationary Textures using Self-Rectification](https://arxiv.org/abs/2401.02847)
:star:[code](https://github.com/xiaorongjun000/Self-Rectification)

* [Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Degrees_of_Freedom_Matter_Inferring_Dynamics_from_Point_Trajectories_CVPR_2024_paper.pdf)
:house:[project](https://yz-cnsdqz.github.io/eigenmotion/DOMA/)

* [Multi-agent Collaborative Perception via Motion-aware Robust Communication Network](https://openaccess.thecvf.com/content/CVPR2024/papers/Hong_Multi-agent_Collaborative_Perception_via_Motion-aware_Robust_Communication_Network_CVPR_2024_paper.pdf)

* [ActiveDC: Distribution Calibration for Active Finetuning](https://arxiv.org/abs/2311.07634)

* [Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning](https://arxiv.org/abs/2311.13613)

* [Binding Touch to Everything: Learning Unified Multimodal Tactile Representations](https://arxiv.org/abs/2401.18084)
:house:[project](https://cfeng16.github.io/UniTouch/)

* [DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting](https://arxiv.org/abs/2312.06734)
:star:[code](https://github.com/DeminYu98/DiffCast)降水临近预报

* [Intrinsic Image Diffusion for Indoor Single-view Material Estimation](https://arxiv.org/abs/2312.12274)
:house:[project](https://peter-kocsis.github.io/IntrinsicImageDiffusion/)室内单视图材料估计

* [Neural Underwater Scene Representation](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_Neural_Underwater_Scene_Representation_CVPR_2024_paper.pdf)

* [UniPTS: A Unified Framework for Proficient Post-Training Sparsity](https://arxiv.org/abs/2405.18810)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [De-Diffusion Makes Text a Strong Cross-Modal Interface](https://arxiv.org/abs/2311.00618)
:house:[project](https://dediffusion.github.io/)

* [MMM: Generative Masked Motion Model](https://arxiv.org/abs/2312.03596)
:house:[project](https://exitudio.github.io/MMM-page)

* [Gaussian Shadow Casting for Neural Characters](https://arxiv.org/abs/2401.06116)

* [ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations](https://arxiv.org/abs/2403.13870)
:star:[code](https://github.com/rwchakra/exmap)

* [Learning Structure-from-Motion with Graph Attention Networks](https://arxiv.org/abs/2308.15984)

* [VINECS: Video-based Neural Character Skinning](https://arxiv.org/abs/2307.00842)

* [CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_CrossMAE_Cross-Modality_Masked_Autoencoders_for_Region-Aware_Audio-Visual_Pre-Training_CVPR_2024_paper.pdf)

* [PolarMatte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen Matting](https://openaccess.thecvf.com/content/CVPR2024/papers/Enomoto_PolarMatte_Fully_Computational_Ground-Truth-Quality_Alpha_Matte_Extraction_for_Images_and_CVPR_2024_paper.pdf)

* [Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features](http://arxiv.org/abs/2311.17024)

* [Mind Artist: Creating Artistic Snapshots with Human Thought](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Mind_Artist_Creating_Artistic_Snapshots_with_Human_Thought_CVPR_2024_paper.pdf)

* [ViT-Lens: Towards Omni-modal Representations](https://arxiv.org/abs/2311.16081)
:star:[code](https://github.com/TencentARC/ViT-Lens)

* [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Florence-2_Advancing_a_Unified_Representation_for_a_Variety_of_Vision_CVPR_2024_paper.pdf)

* [Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Mask4Align_Aligned_Entity_Prompting_with_Color_Masks_for_Multi-Entity_Localization_CVPR_2024_paper.pdf)

* [Locally Adaptive Neural 3D Morphable Models](https://arxiv.org/abs/2401.02937)
:star:[code](https://github.com/michaeltrs/LAMM)

* [SignGraph: A Sign Sequence is Worth Graphs of Nodes](https://openaccess.thecvf.com/content/CVPR2024/papers/Gan_SignGraph_A_Sign_Sequence_is_Worth_Graphs_of_Nodes_CVPR_2024_paper.pdf)

* [PELA: Learning Parameter-Efficient Models with Low-Rank Approximation](https://arxiv.org/abs/2310.10700)
:star:[code](https://github.com/guoyang9/PELA)

* [Versatile Navigation under Partial Observability via Value-Guided Diffusion Policy](https://arxiv.org/abs/2404.02176)

* [Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning](https://arxiv.org/abs/2401.04105)

* [Discovering and Mitigating Visual Biases through Keyword Explanation](https://arxiv.org/abs/2301.11104)

* [PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment](https://arxiv.org/abs/2312.09866)

* [Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching](https://arxiv.org/abs/2311.17950)
:star:[code](https://github.com/shaoshitong/G_VBSM_Dataset_Condensation)数据压缩

* [L2B: Learning to Bootstrap Robust Models for Combating Label Noise](https://arxiv.org/abs/2202.04291)
:star:[code](https://github.com/yuyinzhou/l2b)

* [GART: Gaussian Articulated Template Models](https://arxiv.org/abs/2311.16099)
:house:[project](https://www.cis.upenn.edu/~leijh/projects/gart/)

* [Towards Learning a Generalist Model for Embodied Navigation](https://arxiv.org/abs/2312.02010)

* [Revisiting Sampson Approximations for Geometric Estimation Problems](https://arxiv.org/abs/2401.07114)

* [Real-Time Neural BRDF with Spherically Distributed Primitives](https://arxiv.org/abs/2310.08332)

* [PIGEON: Predicting Image Geolocations](https://arxiv.org/abs/2307.05845)图像地理位置

* [Uncertainty Visualization via Low-Dimensional Posterior Projections](https://arxiv.org/abs/2312.07804)

* [Eclipse: Disambiguating Illumination and Materials using Unintended Shadows](https://arxiv.org/abs/2305.16321)
:house:[project](https://dorverbin.github.io/eclipse/)

* [Fast ODE-based Sampling for Diffusion Models in Around 5 Steps](https://arxiv.org/abs/2312.00094)
:star:[code](https://github.com/zju-pi/diff-sampler)

* [CLiC: Concept Learning in Context](https://arxiv.org/abs/2311.17083)

* [Pick-or-Mix: Dynamic Channel Sampling for ConvNets](https://openreview.net/forum?id=Howb7fXB4V)

* [AutoAD III: The Prequel - Back to the Pixels](https://openaccess.thecvf.com/content/CVPR2024/papers/Han_AutoAD_III_The_Prequel_-_Back_to_the_Pixels_CVPR_2024_paper.pdf)
:house:[project](https://www.robots.ox.ac.uk/vgg/research/autoad/)

* [Training-free Pretrained Model Merging](https://arxiv.org/abs/2403.01753)
:star:[code](https://github.com/zju-vipa/training_free_model_merging)

* [Overcoming Generic Knowledge Loss with Selective Parameter Update](http://arxiv.org/abs/2308.12462)

* [Selective nonlinearities removal from digital signals](https://arxiv.org/abs/2403.09731)

* [Memory-Scalable and Simplified Functional Map Learning](https://arxiv.org/abs/2404.00330)

* [Fully Exploiting Every Real Sample: Super-Pixel Sample Gradient Model Stealing](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Fully_Exploiting_Every_Real_Sample_SuperPixel_Sample_Gradient_Model_Stealing_CVPR_2024_paper.pdf)

* [Hierarchical Correlation Clustering and Tree Preserving Embedding](https://arxiv.org/abs/2002.07756)

* [GLID: Pre-training a Generalist Encoder-Decoder Vision Model](http://arxiv.org/abs/2404.07603v1)

* [SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_SynSP_Synergy_of_Smoothness_and_Precision_in_Pose_Sequences_Refinement_CVPR_2024_paper.pdf)

* [MS-DETR: Efficient DETR Training with Mixed Supervision](https://arxiv.org/abs/2401.03989)

* [PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization](https://arxiv.org/abs/2312.06354)
:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)

* [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891)
:star:[code](https://github.com/LiheYoung/Depth-Anything)

* [Holodeck: Language Guided Generation of 3D Embodied AI Environments](https://arxiv.org/abs/2312.09067)

* [Unified Entropy Optimization for Open-Set Test-Time Adaptation](https://arxiv.org/abs/2404.06065)

* [IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_IIRP-Net_Iterative_Inference_Residual_Pyramid_Network_for_Enhanced_Image_Registration_CVPR_2024_paper.pdf)图像配准

* [H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration](https://openaccess.thecvf.com/content/CVPR2024/papers/Ghahremani_H-ViT_A_Hierarchical_Vision_Transformer_for_Deformable_Image_Registration_CVPR_2024_paper.pdf)

* [Generative Unlearning for Any Identity](https://arxiv.org/abs/2405.09879)
:star:[code](https://github.com/KHU-AGI/GUIDE)

* [Error Detection in Egocentric Procedural Task Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Lee_Error_Detection_in_Egocentric_Procedural_Task_Videos_CVPR_2024_paper.pdf)

* [Enhancing Multimodal Cooperation via Sample-level Modality Valuation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wei_Enhancing_Multimodal_Cooperation_via_Sample-level_Modality_Valuation_CVPR_2024_paper.pdf)
:star:[code](https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation)

* [Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships](https://arxiv.org/abs/2403.17173)

* [Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning](https://openaccess.thecvf.com/content/CVPR2024/papers/Jiang_Ink_Dot-Oriented_Differentiable_Optimization_for_Neural_Image_Halftoning_CVPR_2024_paper.pdf)

* [SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction](https://jianweiguo.net/publications/papers/2024_CVPR_SVDTree_main.pdf)
:star:[code](https://github.com/RyuZhihao123/SVDTree)单图像树重建

* [Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair](http://arxiv.org/abs/2404.19250)去偏见

* [BoQ: A Place is Worth a Bag of Learnable Queries](https://arxiv.org/abs/2405.07364)

* [Distilled Datamodel with Reverse Gradient Matching](https://export.arxiv.org/abs/2404.14006)

* [Towards Calibrated Multi-label Deep Neural Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Cheng_Towards_Calibrated_Multi-label_Deep_Neural_Networks_CVPR_2024_paper.pdf)

* [BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation](https://arxiv.org/abs/2405.09546)

* [MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_MART_Masked_Affective_RepresenTation_Learning_via_Masked_Temporal_Distribution_Distillation_CVPR_2024_paper.pdf)

* [Gradient-based Parameter Selection for Efficient Fine-Tuning](https://arxiv.org/abs/2312.10136)

* [In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging](https://arxiv.org/abs/2312.13319)

* [ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images](https://arxiv.org/abs/2311.15264)

* [Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements](https://arxiv.org/abs/2405.02581)

* [Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation](https://arxiv.org/abs/2301.09209)
:star:[code](https://github.com/RazvanPasca/LanguageNAO)
:house:[project](https://eth-ait.github.io/transfusion-proj/)

* [SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective](https://arxiv.org/abs/2305.14912)

* [Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology](http://arxiv.org/abs/2404.10242v1)

* [Epistemic Uncertainty Quantification For Pre-trained Neural Network](http://arxiv.org/abs/2404.10124v1)

* [Fooling Polarization-based Vision using Locally Controllable Polarizing Projection](https://arxiv.org/abs/2303.17890)

* [TEA: Test-time Energy Adaptation](https://arxiv.org/abs/2311.14402v2)
:star:[code](https://github.com/yuanyige/tea)

* [Would Deep Generative Models Amplify Bias in Future Models?](https://arxiv.org/abs/2404.03242)

* [A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion](http://arxiv.org/abs/2404.11590v1)

* [Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation](http://arxiv.org/abs/2404.10966v1)
:star:[code](https://github.com/gist-ailab/domain-specific-block-selection-and-paired-view-pseudo-labeling-for-online-TTA)

* [DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_DeMatch_Deep_Decomposition_of_Motion_Field_for_Two-View_Correspondence_Learning_CVPR_2024_paper.pdf)
:star:[code](https://github.com/SuhZhang/DeMatch)

* [Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users](https://openaccess.thecvf.com/content/CVPR2024/papers/Massiceti_Explaining_CLIPs_Performance_Disparities_on_Data_from_BlindLow_Vision_Users_CVPR_2024_paper.pdf)

* [Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models](https://arxiv.org/abs/2312.15297)

* [CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition](https://arxiv.org/abs/2402.16594)

* [Bayesian Differentiable Physics for Cloth Digitalization](https://arxiv.org/abs/2402.17664)
:star:[code](https://github.com/realcrane/Bayesian-Differentiable-Physics-for-Cloth-Digitalization)

* [DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models](http://arxiv.org/abs/2404.08079v1)

* [FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions](http://arxiv.org/abs/2312.02434)

* [Physical Property Understanding from Language-Embedded Feature Fields](http://arxiv.org/abs/2404.04242v1)
:star:[code](https://ajzhai.github.io/NeRF2Physics/)

* [Clustering for Protein Representation Learning](https://arxiv.org/abs/2404.00254)

* [Learning Triangular Distribution in Visual World](https://arxiv.org/abs/2311.18605)

* [InstructDiffusion: A Generalist Modeling Interface for Vision Tasks](https://arxiv.org/abs/2309.03895)

* [NeISF: Neural Incident Stokes Field for Geometry and Material Estimation](https://arxiv.org/abs/2311.13187)

* [Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Correspondence-Free_Non-Rigid_Point_Set_Registration_Using_Unsupervised_Clustering_Analysis_CVPR_2024_paper.pdf)

* [Robust Depth Enhancement via Polarization Prompt Fusion Tuning](http://arxiv.org/abs/2404.04318v1)
:star:[code](https://lastbasket.github.io/PPFT/)

* [Dual-Scale Transformer for Large-Scale Single-Pixel Imaging](http://arxiv.org/abs/2404.05001v1)
:star:[code](https://github.com/Gang-Qu/HATNet-SPI)

* [Posterior Distillation Sampling](https://arxiv.org/abs/2311.13831)
:house:[project](https://posterior-distillation-sampling.github.io/)

* [Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo](http://arxiv.org/abs/2404.01612v1)
:star:[code](https://github.com/LMozart/CVPR2024-SpinUP)

* [Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications](https://arxiv.org/abs/2401.06197)
:star:[code](https://github.com/OpenGVLab/DCNv4)

* [MaxQ: Multi-Axis Query for N:M Sparsity Network](https://arxiv.org/abs/2312.07061)
:star:[code](https://github.com/JingyangXiang/MaxQ)

* [AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation](http://arxiv.org/abs/2404.01351v1)
:star:[code](https://github.com/taeckyung/AETTA)

* [Can Biases in ImageNet Models Explain Generalization?](http://arxiv.org/abs/2404.01509v1)
:star:[code](https://github.com/paulgavrikov/biases_vs_generalization)

* [LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction](https://arxiv.org/abs/2404.00913)
:house:[project](https://zoubo9034.github.io/Excitor/)

* [From Activation to Initialization: Scaling Insights for Optimizing Neural Fields](http://arxiv.org/abs/2403.19205v1)

* [Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation](http://arxiv.org/abs/2404.00563v1)
:star:[code](https://github.com/VincenDen/IID)

* [UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion](http://arxiv.org/abs/2404.06851v1)
:star:[code](https://weiqi-zhang.github.io/UDiFF)

* [PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks](http://arxiv.org/abs/2404.00103v1)

* [PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding](https://openaccess.thecvf.com/content/CVPR2024/papers/Nie_PredToken_Predicting_Unknown_Tokens_and_Beyond_with_Coarse-to-Fine_Iterative_Decoding_CVPR_2024_paper.pdf)

* [AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor](https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_AdaShift_Learning_Discriminative_Self-Gated_Neural_Feature_Activation_With_an_Adaptive_CVPR_2024_paper.pdf)

* [Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo](http://arxiv.org/abs/2404.00098v1)

* [MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction](http://arxiv.org/abs/2404.00876v1)
:star:[code](https://github.com/xiaolul2/MGMap)

* [Prompt Learning via Meta-Regularization](http://arxiv.org/abs/2404.00851v1)
:star:[code](https://github.com/mlvlab/ProMetaR)

* [Scalable 3D Registration via Truncated Entry-wise Absolute Residuals](http://arxiv.org/abs/2404.00915v1)

* [CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization](http://arxiv.org/abs/2404.00521v1)

* [Alpha-CLIP: A CLIP Model Focusing on Wherever You Want](https://arxiv.org/abs/2312.03818)
:star:[code](https://github.com/SunzeY/AlphaCLIP)
:house:[project](https://aleafy.github.io/alpha-clip)

* [Generative Quanta Color Imaging](http://arxiv.org/abs/2403.19066v1)
:star:[code](https://vishal-s-p.github.io/projects/2023/generative_quanta_color.html)

* [Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D](http://arxiv.org/abs/2403.18922v1)

* [MedBN: Robust Test-Time Adaptation against Malicious Test Samples](http://arxiv.org/abs/2403.19326v1)

* [Material Palette: Extraction of Materials from a Single Image](https://arxiv.org/abs/2311.17060)
:star:[code](https://github.com/astra-vision/MaterialPalette)
:house:[project](https://astra-vision.github.io/MaterialPalette/)

* [Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks](https://arxiv.org/abs/2403.10097)

* [Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training](https://arxiv.org/abs/2308.09718)
:star:[code](https://github.com/Pointcept/Pointcept)

* [Riemannian Multinomial Logistics Regression for SPD Neural Networks](https://arxiv.org/abs/2305.11288)
:star:[code](https://github.com/GitZH-Chen/SPDMLR)

* [A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network](https://arxiv.org/abs/2403.03739)
:star:[code](https://github.com/Ruichen0424/AB-BNN)

* [BiPer: Binary Neural Networks using a Periodic Function](http://arxiv.org/abs/2404.01278)

* [Backpropagation-free Network for 3D Test-time Adaptation](http://arxiv.org/abs/2403.18442v1)
:star:[code](https://github.com/abie-e/BFTT3D)

* [Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning](https://arxiv.org/abs/2405.05714)
:star:[code](https://github.com/RyanZhaoIc/PLM)

* [ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object](http://arxiv.org/abs/2403.18775v1)
:star:[code](https://github.com/chenshuang-zhang/imagenet_d)

* [Region-Based Representations Revisited](https://arxiv.org/abs/2402.02352)

* [Neural Clustering based Visual Representation Learning](http://arxiv.org/abs/2403.17409v1)
:star:[code](https://github.com/guikunchen/FEC/)

* [Efficient Stitchable Task Adaptation](https://arxiv.org/abs/2311.17352)
:star:[code](https://github.com/ziplab/Stitched_LLaMA)

* [Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery](http://arxiv.org/abs/2403.16194v1)

* [Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis](http://arxiv.org/abs/2403.16258v1)

* [PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution](http://arxiv.org/abs/2403.07589v1)

* [Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture](http://arxiv.org/abs/2403.07347v1)
:star:[code](https://github.com/Jiafei127/FD4MM)

* [LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels](http://arxiv.org/abs/2403.15173v1)
:star:[code](https://github.com/FengZicai/LSK3DNet)

* [Continual Forgetting for Pre-trained Vision Models](http://arxiv.org/abs/2403.11530v1)
:star:[code](https://github.com/bjzhb666/GS-LoRA)

* [EarthLoc: Astronaut Photography Localization by Indexing Earth from Space](http://arxiv.org/abs/2403.06758v1)
:star:[code](https://github.com/gmberton/EarthLoc)

* [SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks](http://arxiv.org/abs/2403.14302v1)

* [Learned Trajectory Embedding for Subspace Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Lochman_Learned_Trajectory_Embedding_for_Subspace_Clustering_CVPR_2024_paper.pdf)

* [UnO: Unsupervised Occupancy Fields for Perception and Forecasting](https://openaccess.thecvf.com/content/CVPR2024/papers/Agro_UnO_Unsupervised_Occupancy_Fields_for_Perception_and_Forecasting_CVPR_2024_paper.pdf)

* [ParamISP: Learned Forward and Inverse ISPs using Camera Parameters](https://arxiv.org/abs/2312.13313)

* [PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos](https://arxiv.org/abs/2404.08921)

* [Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos](https://arxiv.org/abs/2307.04760)
:house:[project](http://vision.cs.utexas.edu/projects/ego_av_corr)

* [Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner](https://arxiv.org/abs/2310.09469)

* [Learning Object State Changes in Videos: An Open-World Perspective](https://arxiv.org/abs/2312.11782)
:house:[project](https://vision.cs.utexas.edu/projects/VidOSC/)

* [U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation](https://arxiv.org/abs/2403.20231)

* [Improved Implicit Neural Representation with Fourier Reparameterized Training](https://openaccess.thecvf.com/content/CVPR2024/papers/Shi_Improved_Implicit_Neural_Representation_with_Fourier_Reparameterized_Training_CVPR_2024_paper.pdf)

* [Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations](https://arxiv.org/abs/2403.02090)

* [AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis](http://arxiv.org/abs/2402.17483v1)

* [Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing](http://arxiv.org/abs/2402.18277v1)

* [Misalignment-Robust Frequency Distribution Loss for Image Transformation](http://arxiv.org/abs/2402.18192v1)
:star:[code](https://github.com/eezkni/FDL)

* [Boosting Neural Representations for Videos with a Conditional Decoder](http://arxiv.org/abs/2402.18152v1)

* [SeMoLi: What Moves Together Belongs Together](http://arxiv.org/abs/2402.19463v1)

* [VideoMAC: Video Masked Autoencoders Meet ConvNets](http://arxiv.org/abs/2402.19082v1)

* [WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts](http://arxiv.org/abs/2402.18956)

* [Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning](http://arxiv.org/abs/2403.01781v1)

* [Training-Free Pretrained Model Merging](http://arxiv.org/abs/2403.01753v1)
:star:[code](https://github.com/zju-vipa/training_free_model_merging)

* [Neural Redshift: Random Networks are not Random Functions](http://arxiv.org/abs/2403.02241v2)

* [Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives](https://arxiv.org/abs/2311.18259)

* [SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation](https://arxiv.org/abs/2405.18322)自监督关键点估计

* [Unsupervised Feature Learning with Emergent Data-Driven Prototypicality](https://arxiv.org/abs/2307.01421)

* [LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking](http://arxiv.org/abs/2403.04303v1)
:star:[code](https://github.com/li-jl16/LORS)
:thumbsup:[LORS算法：低秩残差结构用于参数高效网络堆叠，参数少、成本低、内存小](https://mp.weixin.qq.com/s/mNzyY45mB6A6JDE-XLhGTw)

* [HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction](http://arxiv.org/abs/2403.08639v1)

* [Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image](http://arxiv.org/abs/2403.09632v1)

* [Desigen: A Pipeline for Controllable Design Template Generation](http://arxiv.org/abs/2403.09093v1)
:star:[code](https://whaohan.github.io/desigen)

* [S2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Long_S2MVTC_a_Simple_yet_Efficient_Scalable_Multi-View_Tensor_Clustering_CVPR_2024_paper.pdf)
:star:[code](https://github.com/longzhen520/S2MVTC)

* [Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer](http://arxiv.org/abs/2403.19979v1)

* [Rewrite the Stars](http://arxiv.org/abs/2403.19967v1)
:star:[code](https://github.com/ma-xu/Rewrite-the-Stars)

* [Neural Refinement for Absolute Pose Regression with Feature Synthesis](https://arxiv.org/abs/2303.10087)
:star:[code](https://github.com/ActiveVisionLab/NeFeS)
:house:[project](https://nefes.active.vision/)

* [Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Mou_Instruct_4D-to-4D_Editing_4D_Scenes_as_Pseudo-3D_Scenes_Using_2D_CVPR_2024_paper.pdf)

* [Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It](https://arxiv.org/abs/2312.06420)
:star:[code](https://github.com/LiljaAdam/geographical-splits)

* [FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication](http://arxiv.org/abs/2404.16123)
:house:[project](https://ericslyman.com/fairdedup/)

* [Unsupervised Deep Unrolling Networks for Phase Unwrapping](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Unsupervised_Deep_Unrolling_Networks_for_Phase_Unwrapping_CVPR_2024_paper.pdf)相位展开

* 压缩感知

  * [Reconstruction-free Cascaded Adaptive Compressive Sensing](https://openaccess.thecvf.com/content/CVPR2024/papers/Qiu_Reconstruction-free_Cascaded_Adaptive_Compressive_Sensing_CVPR_2024_paper.pdf)

  * [UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_UFC-Net_Unrolling_Fixed-point_Continuous_Network_for_Deep_Compressive_Sensing_CVPR_2024_paper.pdf)

  * [CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_CPP-Net_Embracing_Multi-Scale_Feature_Fusion_into_Deep_Unfolding_CP-PPA_Network_CVPR_2024_paper.pdf)

* 数据增强

  * [DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models](https://arxiv.org/abs/2405.14881)



## 2020 年论文分类汇总戳这里

↘️[CVPR-2020-Papers](https://github.com/52CV/CVPR-2020-Papers) 

↘️[ECCV-2020-Papers](https://github.com/52CV/ECCV-2020-Papers)



## 2021 年论文分类汇总戳这里

↘️[ICCV-2021-Papers](https://github.com/52CV/ICCV-2021-Papers)

↘️[CVPR-2021-Papers](https://github.com/52CV/CVPR-2021-Papers)



## 2022 年论文分类汇总戳这里

↘️[CVPR-2022-Papers](https://github.com/52CV/CVPR-2022-Papers/blob/main/README.md)

↘️[WACV-2022-Papers](https://github.com/52CV/WACV-2022-Papers)

↘️[ECCV-2022-Papers](https://github.com/52CV/ECCV-2022-Papers/blob/main/README.md)

### 扫码CV君微信(注明：CVPR)入微信交流群：

![9475fa20fd5e95235d9fa23ae9587a2](https://user-images.githubusercontent.com/62801906/156720309-de92964f-a6da-464a-b21f-cfb270c13e27.png)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52CV/CVPR-2024-Papers

Awesome Lists containing this project

README