{"id":14669470,"url":"https://github.com/52CV/CVPR-2024-Papers","last_synced_at":"2025-09-08T23:31:25.967Z","repository":{"id":225282735,"uuid":"724968967","full_name":"52CV/CVPR-2024-Papers","owner":"52CV","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-12T03:27:45.000Z","size":116,"stargazers_count":149,"open_issues_count":1,"forks_count":8,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-04-12T13:24:29.590Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/52CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-11-29T06:53:31.000Z","updated_at":"2024-04-15T06:47:44.853Z","dependencies_parsed_at":"2024-04-15T06:47:41.093Z","dependency_job_id":null,"html_url":"https://github.com/52CV/CVPR-2024-Papers","commit_stats":null,"previous_names":["52cv/cvpr-2024-papers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2024-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2024-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2024-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2024-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/52CV","download_url":"https://codeload.github.com/52CV/CVPR-2024-Papers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232362737,"owners_count":18511616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-12T02:02:58.351Z","updated_at":"2025-01-03T16:31:09.520Z","avatar_url":"https://github.com/52CV.png","language":null,"funding_links":[],"categories":["Update Logs"],"sub_categories":["World Foundation Model Platform"],"readme":"# CVPR-2024-Papers\n![homepage_image](https://github.com/52CV/CVPR-2024-Papers/assets/62801906/41a45750-bca8-4cb8-89dc-a04b0bbe7b2c)\n\n## 官网链接：https://cvpr.thecvf.com/\n\n### 研讨会 :bell:：6 月 17-18 日\u003cbr\u003e\n### 主会 :bell:：6 月 19-21 日\n\n## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~\n\n## 2024 年论文分类汇总戳这里\n↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)\n↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)\n↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)\n\n## 2023 年论文分类汇总戳这里\n↘️[CVPR-2023-Papers](https://github.com/52CV/CVPR-2023-Papers)\n↘️[WACV-2023-Papers](https://github.com/52CV/WACV-2023-Papers)\n↘️[ICCV-2023-Papers](https://github.com/52CV/ICCV-2023-Papers)\n\n## [2022 年论文分类汇总戳这里](#000)\n## [2021 年论文分类汇总戳这里](#00)\n## [2020 年论文分类汇总戳这里](#0)\n\n## 💥💥💥收录论文已全部更新，并全部分类完成！！！\n\n### 🏆Best Papers\n* [Generative Image Dynamics](https://arxiv.org/abs/2309.07906)\u003cbr\u003e:house:[project](https://generative-dynamics.github.io/)\n* [Rich Human Feedback for Text-to-Image Generation](http://arxiv.org/abs/2312.10240)\n\n### 🏅Best Paper Runners-Up\n* [EventPS: Real-Time Photometric Stereo Using an Event Camera](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_EventPS_Real-Time_Photometric_Stereo_Using_an_Event_Camera_CVPR_2024_paper.pdf)\n* [pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction](http://arxiv.org/abs/2312.12337)\n\n### 🥇Best Student Papers\n* [Mip-Splatting: Alias-free 3D Gaussian Splatting](https://arxiv.org/abs/2311.16493)\u003cbr\u003e:star:[code](https://github.com/autonomousvision/mip-splatting)\u003cbr\u003e:house:[project](https://niujinshuchong.github.io/mip-splatting/)\n* [BioCLIP: A Vision Foundation Model for the Tree of Life](https://arxiv.org/abs/2311.18803)\u003cbr\u003e:star:[code](https://github.com/Imageomics/bioclip)\n\n### 🥈Best Student Paper Runner-Ups\n* [SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency](https://openaccess.thecvf.com/content/CVPR2024/papers/Roetzer_SpiderMatch_3D_Shape_Matching_with_Global_Optimality_and_Geometric_Consistency_CVPR_2024_paper.pdf)\n* [Image Processing GNN: Breaking Rigidity in Super-Resolution](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Image_Processing_GNN_Breaking_Rigidity_in_Super-Resolution_CVPR_2024_paper.pdf)\n* [Objects as Volumes: A Stochastic Geometry View of Opaque Solids](http://arxiv.org/abs/2312.15406)\n* [Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods](https://arxiv.org/abs/2212.06872)\n\n\n## 目录\n\n|:cat:|:dog:|:tiger:|:wolf:|\n|------|------|------|------|\n|[1.其它(other)](#1)|[2.Image Segmentation(图像分割)](#2)|[3.Image Classification(图像分类)](#3)|[4.Image/Video Super-Resolution(图像超分辨率)](#4)|\n|[5.Image/Video Compression(图像/视频压缩)](#5)|[6.Image/Video Captioning(图像/视频字幕)](#6)|[7.Image Progress(图像处理)](#7)|[8.Image Synthesis(图像生成)](#8)|\n|[9.Face(人脸)](#9)|[10.Medical Image Progress(医学影响处理)](#10)|[11.3D](#11)|[12.Video](#12)|\n|[13.HPE(人体姿态估计)](#13)|[14.HAR(人体动作识别检测)](#14)|[15.Object Detection(目标检测)](#15)|[16.Point Cloud(点云)](#16)|\n|[17.Automated Driving(自动驾驶)](#17)|[18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)](#18)|[19.Object Pose Estimation(物体姿态估计)](#19)|[20.Optical Flow Estimation(光流估计)](#20)|\n|[21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#21)|[22.Deepfake Detection](#22)|[23.Sound(语音处理)](#23)|[24.ML(机器学习)](#24)|\n|[25.Object Tracking(目标跟踪)](#25)|[26.Information Security(信息安全)](#25)|[27.Vision-Language(视觉语言)](#27)|[28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#28)|\n|[29.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#29)|[30.Person Re-Id(人员重识别)](#30)|[31.Edge Detection(边缘检测)](#31)|[32.NLP(自然语言处理)](#32)|\n|[33.NeRF](#33)|[34.Human–Computer Interaction(人机交互)](#34)|[35.Scene Understanding(场景理解)](#35)|[36.4D Reconstruction(4D 重建)](#36)|\n|[37.OCR](#37)|[38.VQA(视觉问答)](#38)|[39.Motion Generation(动作生成)](#39)|[40.Scene Graph Generation(场景图生成)](#40)|\n|[41.Graph Generative Network(GNN/GCN)](#41)|[42.Image Retrieval(图像检索)](#42)|[43.Image Matching(图像匹配)](#43)|[44.Image Fusion(图像融合)](#44)|\n|[45.NAS(神经架构搜索)](#45)|[46.Industrial Anomaly Detection(工业缺陷检测)](#46)|[47.Dense Predictions(密集预测)](#47)|[48.Semi/self-supervised learning(半/自监督)](#48)|\n|[49.Dataset(数据集)](#49)|[50.OOD Detection](#50)|[51.Style Transfer(风格迁移)](#51)|[52.Biomedical](#52)|\n|[53.Light-Field(光场)](#53)|[54.ViT](#54)|[55.REC(指代表达理解)](#55)|[56.Visual emotion recognition(视觉情绪识别)](#56)|\n|[57.Visual Relationship Detection(视觉关系检测)](#57)|[58.Fisheye Images(鱼眼图像)](#58)|[59.Clustering(聚类)](#59)|[60.Sketch(草图)](#60)|\n|[61.Gaze](#61)|[62.全家桶](#62)|\n\n\n\u003ca name=\"62\"/\u003e\n\n## 62.全家桶\n* [UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition](https://arxiv.org/abs/2311.15599)\u003cbr\u003e:star:[code](https://github.com/AILab-CVC/UniRepLKNet)用于音频、视频、点云、时间序列和图像识别的通用感知大内核卷积网络\n* [GPT4Point: A Unified Framework for Point-Language Understanding and Generation](https://arxiv.org/abs/2312.02980)点语言理解和生成的统一框架\n* [AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond](https://arxiv.org/abs/2311.16468)用于运动理解、规划、生成等的一体化框架\n\n\u003ca name=\"61\"/\u003e\n\n## 61.Gaze\n* [Sharingan: A Transformer Architecture for Multi-Person Gaze Following](https://arxiv.org/abs/2310.00816)目光跟随\n* [From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation](https://openaccess.thecvf.com/content/CVPR2024/papers/Bao_From_Feature_to_Gaze_A_Generalizable_Replacement_of_Linear_Layer_CVPR_2024_paper.pdf)\n\n\u003ca name=\"60\"/\u003e\n\n## 60.Sketch(草图)\n* [What Sketch Explainability Really Means for Downstream Tasks](http://arxiv.org/abs/2403.09480v1)\n* [SketchINR: A First Look into Sketches as Implicit Neural Representations](https://arxiv.org/abs/2403.09344)\n* [Open Vocabulary Semantic Scene Sketch Understanding](https://arxiv.org/abs/2312.12463)草图理解\n* [CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention](https://arxiv.org/abs/2402.17678)\n\n\u003ca name=\"59\"/\u003e\n\n## 59.Clustering(聚类)\n* [MoDE: CLIP Data Experts via Clustering](http://arxiv.org/abs/2404.16030)聚类\n* [Fine-Grained Bipartite Concept Factorization for Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Peng_Fine-Grained_Bipartite_Concept_Factorization_for_Clustering_CVPR_2024_paper.pdf)\n* 多视图聚类\n  * [Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios](https://arxiv.org/abs/2303.17245)\n  * [Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Learn_from_View_Correlation_An_Anchor_Enhancement_Strategy_for_Multi-view_CVPR_2024_paper.pdf)\n  * [Differentiable Information Bottleneck for Deterministic Multi-view Clustering](https://arxiv.org/abs/2403.15681)\n\n\u003ca name=\"58\"/\u003e\n\n## 58.Fisheye Images(鱼眼图像)\n* [Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption](https://openaccess.thecvf.com/content/CVPR2024/papers/Wakai_Deep_Single_Image_Camera_Calibration_by_Heatmap_Regression_to_Recover_CVPR_2024_paper.pdf)鱼眼图像\n\n\u003ca name=\"57\"/\u003e\n\n## 57.Visual Relationship Detection(视觉关系检测)\n* [Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection](http://arxiv.org/abs/2403.17709v1)\u003cbr\u003e:star:[code](https://github.com/mlvlab/SpeaQ)\n\n\u003ca name=\"56\"/\u003e\n\n## 56.Visual emotion recognition(视觉情绪识别)\n* [EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning](https://export.arxiv.org/abs/2404.16670)\u003cbr\u003e:star:[code](https://github.com/aimmemotion/EmoVIT)视觉情感理解\n* 多模态意图识别\n  * [Contextual Augmented Global Contrast for Multimodal Intent Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Contextual_Augmented_Global_Contrast_for_Multimodal_Intent_Recognition_CVPR_2024_paper.pdf)\n\n\u003ca name=\"55\"/\u003e\n\n## 55.Referring Expression Comprehension(指代表达理解)\n* [ScanFormer: Referring Expression Comprehension by Iteratively Scanning](https://openaccess.thecvf.com/content/CVPR2024/papers/Su_ScanFormer_Referring_Expression_Comprehension_by_Iteratively_Scanning_CVPR_2024_paper.pdf)\n* [Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions](https://arxiv.org/abs/2311.17048)\u003cbr\u003e:star:[code](https://github.com/Show-han/Zeroshot_REC)零样本指代表达理解\n* [Revisiting Counterfactual Problems in Referring Expression Comprehension](https://openaccess.thecvf.com/content/CVPR2024/papers/Yu_Revisiting_Counterfactual_Problems_in_Referring_Expression_Comprehension_CVPR_2024_paper.pdf)\n\n\u003ca name=\"54\"/\u003e\n\n## 54.Vision Transformers\n* [Dexterous Grasp Transformer](http://arxiv.org/abs/2404.18135)\n* [Mean-Shift Feature Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Kobayashi_Mean-Shift_Feature_Transformer_CVPR_2024_paper.pdf)\n* [MLP Can Be A Good Transformer Learner](https://arxiv.org/abs/2404.05657)\u003cbr\u003e:star:[code](https://github.com/sihaoevery/lambda_vit)\n* [Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers](http://arxiv.org/abs/2303.09383)\n* [Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers](http://arxiv.org/abs/2404.07292)\n* [Dual-scale Transformer for Large-scale Single-Pixel Imaging](http://arxiv.org/abs/2404.05001)\n* [DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets](http://arxiv.org/abs/2404.02900)\n* [Solving Masked Jigsaw Puzzles with Diffusion Transformers](http://arxiv.org/abs/2404.07292)\n* [Towards Understanding and Improving Adversarial Robustness of Vision Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Jain_Towards_Understanding_and_Improving_Adversarial_Robustness_of_Vision_Transformers_CVPR_2024_paper.pdf)\n* [RMT: Retentive Networks Meet Vision Transformers](https://arxiv.org/abs/2309.11523)\u003cbr\u003e:star:[code](https://github.com/qhfan/RMT)\n* [You Only Need Less Attention at Each Stage in Vision Transformers](https://arxiv.org/abs/2406.00427)\n* [MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers](https://arxiv.org/abs/2311.15475)\u003cbr\u003e:house:[project](https://nihalsid.github.io/mesh-gpt/)\n* [Instance-Aware Group Quantization for Vision Transformers](https://arxiv.org/abs/2404.00928)\u003cbr\u003e:house:[project](https://cvlab.yonsei.ac.kr/projects/IGQ-ViT/)\n* [Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers](http://arxiv.org/abs/2403.10030v1)\u003cbr\u003e:star:[code](https://github.com/mlvlab/MCTF)\n* [RepViT: Revisiting Mobile CNN From ViT Perspective](https://arxiv.org/abs/2307.09283)\u003cbr\u003e:star:[code](https://github.com/THU-MIG/RepViT)\n* [Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer](http://arxiv.org/abs/2403.14552v1)\n* [Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers](https://arxiv.org/abs/2403.10574)\u003cbr\u003e:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)\n* [Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods](https://arxiv.org/abs/2212.06872)\n* [On the Faithfulness of Vision Transformer Explanations](http://arxiv.org/abs/2404.01415v1)\n* [Learning Correlation Structures for Vision Transformers](http://arxiv.org/abs/2404.03924v1)\n* [Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach](https://arxiv.org/abs/2403.19067)\u003cbr\u003e:star:[code](https://github.com/zstarN70/RLRR.git)\n* [Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression](http://arxiv.org/abs/2403.15835v1)\n* [Point Transformer V3: Simpler Faster Stronger](https://arxiv.org/abs/2312.10035)\u003cbr\u003e:star:[code](https://github.com/Pointcept/PointTransformerV3)\n* [A General and Efficient Training for Transformer via Token Expansion](http://arxiv.org/abs/2404.00672v1)\u003cbr\u003e:star:[code](https://github.com/Osilly/TokenExpansion)\n* [HEAL-SWIN: A Vision Transformer On The Sphere](https://arxiv.org/abs/2307.07313)\u003cbr\u003e:star:[code](https://github.com/JanEGerken/HEAL-SWIN)\n* [SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design](https://arxiv.org/abs/2401.16456)Vision\n* [TransNeXt: Robust Foveal Visual Perception for Vision Transformers](https://arxiv.org/abs/2311.17132)\u003cbr\u003e:star:[code](https://github.com/DaiShiResearch/TransNeXt)\n* [Making Vision Transformers Truly Shift-Equivariant](https://arxiv.org/abs/2305.16316)\n* [Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities](https://arxiv.org/abs/2401.14405)\u003cbr\u003e:star:[code](https://github.com/AILab-CVC/M2PT)\n* [Random Entangled Tokens for Adversarially Robust Vision Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Gong_Random_Entangled_Tokens_for_Adversarially_Robust_Vision_Transformer_CVPR_2024_paper.pdf)\n\n\u003ca name=\"53\"/\u003e\n\n## 53.Light-Field(光场)\n* [Time-Efficient Light-Field Acquisition Using Coded Aperture and Events](https://arxiv.org/abs/2403.07244)\u003cbr\u003e:house:[project](https://www.fujii.nuee.nagoya-u.ac.jp/Research/EventLF/)\n* [Continuous Pose for Monocular Cameras in Neural Implicit Representation](https://arxiv.org/abs/2311.17119)\u003cbr\u003e:star:[code](https://github.com/qimaqi/Continuous-Pose-in-NeRF)\n* [PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Tu_PanoPose_Self-supervised_Relative_Pose_Estimation_for_Panoramic_Images_CVPR_2024_paper.pdf)\u003cbr\u003e:house:[project](http://www.3dv.ac.cn/en/publication/cvpr-b/)\n* [Unbiased Estimator for Distorted Conics in Camera Calibration](http://arxiv.org/abs/2403.04583)\n* 相机姿态\n  * [Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences](https://arxiv.org/abs/2404.06337)\n  * [Map-Relative Pose Regression for Visual Re-Localization](http://arxiv.org/abs/2404.09884v1)\u003cbr\u003e:star:[code](https://nianticlabs.github.io/marepo)\n  * [The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement](http://arxiv.org/abs/2404.10438v1)\u003cbr\u003e:star:[code](https://github.com/ga1i13o/mcloc_poseref)\n* 快照压缩成像\n  * [DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model](https://arxiv.org/abs/2311.11417)\n\n\u003ca name=\"52\"/\u003e\n\n## 52.Biomedical\n* [ManiFPT: Defining and Analyzing Fingerprints of Generative Models](https://arxiv.org/abs/2402.10401)\n* [Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Tiong_Flexible_Biometrics_Recognition_Bridging_the_Multimodality_Gap_through_Attention_Alignment_CVPR_2024_paper.pdf)生物识别\n* 人员识别\n  * [Activity-Biometrics: Person Identification from Daily Activities](http://arxiv.org/abs/2403.17360v1)\u003cbr\u003e:star:[code](https://github.com/sacrcv/Activity-Biometrics/)\n\n\u003ca name=\"51\"/\u003e\n\n## 51.Style Transfer(风格迁移)\n* [Z*: Zero-shot Style Transfer via Attention Reweighting](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Z_Zero-shot_Style_Transfer_via_Attention_Reweighting_CVPR_2024_paper.pdf)\n* [MoST: Motion Style Transformer Between Diverse Action Contents](http://arxiv.org/abs/2403.06225)\u003cbr\u003e:star:[code](https://github.com/Boeun-Kim/MoST)\n* [ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation](https://arxiv.org/abs/2312.02109)\u003cbr\u003e:star:[code](https://github.com/cardinalblue/ArtAdapter)\u003cbr\u003e:house:[project](https://cardinalblue.github.io/artadapter.github.io/)\n* [Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Song_Arbitrary_Motion_Style_Transfer_with_Multi-condition_Motion_Latent_Diffusion_Model_CVPR_2024_paper.pdf)\n* [Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer](https://arxiv.org/abs/2312.09008v2)\u003cbr\u003e:house:[project](https://jiwoogit.github.io/StyleID_site)\n* [Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network](https://arxiv.org/abs/2405.19775)\u003cbr\u003e:thumbsup:[平衡效率与质量，南航提出新风格迁移算法Puff-Net](https://mp.weixin.qq.com/s/B-RkdeQNvIXmAYJMUkHkYQ)\n* 零样本文本驱动运动迁移\n  * [Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer](http://arxiv.org/abs/2311.17009)\u003cbr\u003e:house:[project](https://diffusion-motion-transfer.github.io/)\n\n\u003ca name=\"50\"/\u003e\n\n## 50.OOD Detection\n* [Test-Time Linear Out-of-Distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Test-Time_Linear_Out-of-Distribution_Detection_CVPR_2024_paper.pdf)\n* [Segment Every Out-of-Distribution Object](https://arxiv.org/abs/2311.16516)\n* [Label-Efficient Group Robustness via Out-of-Distribution Concept Curation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Label-Efficient_Group_Robustness_via_Out-of-Distribution_Concept_Curation_CVPR_2024_paper.pdf)\n* [Enhancing the Power of OOD Detection via Sample-Aware Model Selection](https://openaccess.thecvf.com/content/CVPR2024/papers/Xue_Enhancing_the_Power_of_OOD_Detection_via_Sample-Aware_Model_Selection_CVPR_2024_paper.pdf)OOD\n* [Discriminability-Driven Channel Selection for Out-of-Distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Discriminability-Driven_Channel_Selection_for_Out-of-Distribution_Detection_CVPR_2024_paper.pdf)\n* [CORES: Convolutional Response-based Score for Out-of-distribution Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_CORES_Convolutional_Response-based_Score_for_Out-of-distribution_Detection_CVPR_2024_paper.pdf)\n* [Learning Transferable Negative Prompts for Out-of-Distribution Detection](https://arxiv.org/abs/2404.03248)\u003cbr\u003e:star:[code](https://github.com/mala-lab/negprompt)\n* [A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?](http://arxiv.org/abs/2404.01775v1)\u003cbr\u003e:star:[code](https://github.com/glhr/ood-labelnoise)\n* [Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments](https://arxiv.org/abs/2403.01773)\n* [A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?](http://arxiv.org/abs/2404.01775)\n* 异常检测\n  * [Hyperbolic Anomaly Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Hyperbolic_Anomaly_Detection_CVPR_2024_paper.pdf)\n  * [Universal Novelty Detection through Adaptive Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Mirzaei_Universal_Novelty_Detection_Through_Adaptive_Contrastive_Learning_CVPR_2024_paper.pdf)\n  * [Looking 3D: Anomaly Detection with 2D-3D Alignment](https://openaccess.thecvf.com/content/CVPR2024/papers/Bhunia_Looking_3D_Anomaly_Detection_with_2D-3D_Alignment_CVPR_2024_paper.pdf)\n\n\u003ca name=\"49\"/\u003e\n\n## 49.Dataset(数据集)\n* 数据集\n  * [Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Multiagent_Multitraversal_Multimodal_Self-Driving_Open_MARS_Dataset_CVPR_2024_paper.pdf)\n  * [4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_4D-DRESS_A_4D_Dataset_of_Real-World_Human_Clothing_With_Semantic_CVPR_2024_paper.pdf)\n  * [DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_DiLiGenRT_A_Photometric_Stereo_Dataset_with_Quantified_Roughness_and_Translucency_CVPR_2024_paper.pdf)\n  * [MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation](http://arxiv.org/abs/2404.02790)\n  * [LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs](http://arxiv.org/abs/2312.04372)\n  * [360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries](http://arxiv.org/abs/2311.17389)\n  * [Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline](http://arxiv.org/abs/2312.02528)\n  * [MSU-4S - The Michigan State University Four Seasons Dataset](https://openaccess.thecvf.com/content/CVPR2024/papers/Kent_MSU-4S_-_The_Michigan_State_University_Four_Seasons_Dataset_CVPR_2024_paper.pdf)\n  * [DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Lu_DiVa-360_The_Dynamic_Visual_Dataset_for_Immersive_Neural_Fields_CVPR_2024_paper.pdf)\n  * [Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline](http://arxiv.org/abs/2309.14611)\n  * [LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Guo_LiDAR-Net_A_Real-scanned_3D_Point_Cloud_Dataset_for_Indoor_Scenes_CVPR_2024_paper.pdf)\n  * [Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Advancing_Saliency_Ranking_with_Human_Fixations_Dataset_Models_and_Benchmarks_CVPR_2024_paper.pdf)\n  * [MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying](https://openaccess.thecvf.com/content/CVPR2024/papers/Burgert_MAGICK_A_Large-scale_Captioned_Dataset_from_Matting_Generated_Images_using_CVPR_2024_paper.pdf)\n  * [HardMo: A Large-Scale Hardcase Dataset for Motion Capture](https://openaccess.thecvf.com/content/CVPR2024/papers/Liao_HardMo_A_Large-Scale_Hardcase_Dataset_for_Motion_Capture_CVPR_2024_paper.pdf)\n  * [The STVchrono Dataset: Towards Continuous Change Recognition in Time](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_The_STVchrono_Dataset_Towards_Continuous_Change_Recognition_in_Time_CVPR_2024_paper.pdf)\n  * [Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Nguyen_Insect-Foundation_A_Foundation_Model_and_Large-scale_1M_Dataset_for_Visual_CVPR_2024_paper.pdf)\n  * [LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising](https://arxiv.org/abs/2405.19718)\n  * [On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm](https://arxiv.org/abs/2312.03526)\n  * [Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods](https://openaccess.thecvf.com/content/CVPR2024/papers/Qu_Towards_Modern_Image_Manipulation_Localization_A_Large-Scale_Dataset_and_Novel_CVPR_2024_paper.pdf)\n  * [Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation](https://arxiv.org/abs/2306.11290)\n  * [FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FineSports_A_Multi-person_Hierarchical_Sports_Video_Dataset_for_Fine-grained_Action_CVPR_2024_paper.pdf)细粒度动作理解\n  * [MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos](https://arxiv.org/abs/2306.04216)\u003cbr\u003e:house:[project](https://mmsum-dataset.github.io/)\n  * [Traffic Scene Parsing through the TSP6K Dataset](http://arxiv.org/abs/2303.02835)\n  * [Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset](https://arxiv.org/abs/2311.17396)\n  * [RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos](https://arxiv.org/abs/2401.12592)\u003cbr\u003e:house:[project](https://wildrgbd.github.io/)\u003cbr\u003e:sunflower:[dataset](https://github.com/wildrgbd/wildrgbd)RGB-D object数据集\n  * [eTraM: Event-based Traffic Monitoring Dataset](https://arxiv.org/abs/2403.19976)\u003cbr\u003e:star:[code](https://github.com/eventbasedvision/eTraM)\u003cbr\u003e:house:[project](https://eventbasedvision.github.io/eTraM/)流量监控数据集\n  * [Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network](http://arxiv.org/abs/2405.00244)\u003cbr\u003e:sunflower:[dataset](https://github.com/yungsyu99/Real-HDRV)\n  * [JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups](http://arxiv.org/abs/2404.04458v1)\u003cbr\u003e:house:[project](https://jrdb.erc.monash.edu/dataset/social)\n  * [TULIP: Multi-camera 3D Precision Assessment of Parkinson's Disease](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_TULIP_Multi-camera_3D_Precision_Assessment_of_Parkinsons_Disease_CVPR_2024_paper.pdf)\n  * [JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments](http://arxiv.org/abs/2404.01686v1)\n  * [OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion](http://arxiv.org/abs/2403.19417v1)\u003cbr\u003e:house:[project](https://oakink.net/v2)\n  * [SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos](http://arxiv.org/abs/2404.04565v1)\n  * [RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method](http://arxiv.org/abs/2403.19501v1)\u003cbr\u003e:house:[project](http://www.lidarhumanmotion.net/reli11d/)\u003cbr\u003e:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)\n  * [MatSynth: A Modern PBR Materials Dataset](https://arxiv.org/abs/2401.06056)\u003cbr\u003e:house:[project](https://gvecchio.com/matsynth/)\n  * [RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception](http://arxiv.org/abs/2403.10145v1)\u003cbr\u003e:star:[code](https://github.com/AIR-THU/DAIR-RCooper)\n  * [Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection](http://arxiv.org/abs/2403.12580v1)\n  * [EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World](http://arxiv.org/abs/2403.16182v1)\u003cbr\u003e:star:[code](https://github.com/OpenGVLab/EgoExoLearn)\n  * [MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception](https://arxiv.org/abs/2403.11496)\u003cbr\u003e:sunflower:[dataset](https://mcdviral.github.io/)\n  * [HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios](https://arxiv.org/abs/2212.10428)\n  * [HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative](https://arxiv.org/abs/2403.02640)\u003cbr\u003e:sunflower:[dataset](https://holovic.net/)\n  * [DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision](https://arxiv.org/abs/2312.16256)\u003cbr\u003e:sunflower:[dataset](https://github.com/DL3DV-10K/Dataset)\n  * [EFHQ: Multi-purpose ExtremePose-Face-HQ dataset](https://arxiv.org/abs/2312.17205)\u003cbr\u003e:star:[code](https://www.vinai.io/)\u003cbr\u003e:house:[project](https://bomcon123456.github.io/efhq/)数据集\n  * [LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images](https://arxiv.org/abs/2403.13171)\n  * [MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors](http://arxiv.org/abs/2403.17610v1)\u003cbr\u003e:star:[code](https://haolyuan.github.io/MMVP-Dataset/)\n  * [FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions](https://arxiv.org/abs/2309.05073)\u003cbr\u003e:house:[project](https://wangjiongw.github.io/freeman/)\n  * [TUMTraf V2X Cooperative Perception Dataset](https://arxiv.org/pdf/2403.01316.pdf)\u003cbr\u003e:house:[project](https://tum-traffic-dataset.github.io/tumtraf-v2x/)\n  * [MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures](https://arxiv.org/abs/2312.02963)\u003cbr\u003e:sunflower:[dataset](https://x-zhangyang.github.io/MVHumanNet/)\n* 基准\n  * [When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_When_Visual_Grounding_Meets_Gigapixel-level_Large-scale_Scenes_Benchmark_and_Approach_CVPR_2024_paper.pdf)\n  * [THRONE: A Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models](http://arxiv.org/abs/2405.05256)\n  * [M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Pu_M3-UDA_A_New_Benchmark_for_Unsupervised_Domain_Adaptive_Fetal_Cardiac_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/LiwenWang919/M3-UDA)\n  * [DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos](https://arxiv.org/abs/2312.09523)现实视频中远程点跟踪的基准\n  * [SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge](https://arxiv.org/abs/2405.09713)\n  * [MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_MAPLM_A_Real-World_Large-Scale_Vision-Language_Benchmark_for_Map_and_Traffic_CVPR_2024_paper.pdf)\n  * [RoDLA: Benchmarking the Robustness of Document Layout Analysis Models](http://arxiv.org/abs/2403.14442v1)\u003cbr\u003e:star:[code](https://yufanchen96.github.io/projects/RoDLA)\n  * [GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation](https://openaccess.thecvf.com/content/CVPR2024/papers/Khanna_GOAT-Bench_A_Benchmark_for_Multi-Modal_Lifelong_Navigation_CVPR_2024_paper.pdf)\n  * [MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI](http://arxiv.org/abs/2311.16502)\n  * [Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks](https://openaccess.thecvf.com/content/CVPR2024/papers/Deng_Advancing_Saliency_Ranking_with_Human_Fixations_Dataset_Models_and_Benchmarks_CVPR_2024_paper.pdf)\n  * [ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Rosasco_ConCon-Chi_Concept-Context_Chimera_Benchmark_for_Personalized_Vision-Language_Tasks_CVPR_2024_paper.pdf)\n  * [Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark](http://arxiv.org/abs/2403.18821v1)\u003cbr\u003e:star:[code](https://facebookresearch.github.io/real-acoustic-fields/)\n  * [UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement](http://arxiv.org/abs/2404.14542)\n  * [PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_PKU-DyMVHumans_A_Multi-View_Video_Benchmark_for_High-Fidelity_Dynamic_Human_Modeling_CVPR_2024_paper.pdf)\u003cbr\u003e:house:[project](https://pku-dymvhumans.github.io/)\n  * [MVBench: A Comprehensive Multi-modal Video Understanding Benchmark](https://arxiv.org/abs/2311.17005)\u003cbr\u003e:star:[code](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2)\n  * [Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly](http://arxiv.org/abs/2405.00181)\n  * [VBench : Comprehensive Benchmark Suite for Video Generative Models](https://arxiv.org/abs/2311.17982)\u003cbr\u003e:star:[code](https://arxiv.org/abs/2311.17982)\u003cbr\u003e:house:[project](https://vchitect.github.io/VBench-project/)\n  * [MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark](http://arxiv.org/abs/2403.20225v1)\n  * [CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs](https://arxiv.org/abs/2311.16703)\u003cbr\u003e:house:[project](https://enigma-li.github.io/CADTalk/)\n  * [How to Train Neural Field Representations: A Comprehensive Study and Benchmark](https://arxiv.org/abs/2312.10531)\n  * [OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM](https://arxiv.org/abs/2402.09181)\n\n\u003ca name=\"48\"/\u003e\n\n## 48.Semi/self-supervised learning(半/自监督)\n* 弱监督学习\n  * 部分标签学习\n    * [CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning](https://arxiv.org/abs/2303.10365)部分标签学习-弱监督学习问题\n* 半监督\n  * [Targeted Representation Alignment for Open-World Semi-Supervised Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Targeted_Representation_Alignment_for_Open-World_Semi-Supervised_Learning_CVPR_2024_paper.pdf)\n  * [SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_SeNM-VAE_Semi-Supervised_Noise_Modeling_with_Hierarchical_Variational_Autoencoder_CVPR_2024_paper.pdf)\n  * [CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning](http://arxiv.org/abs/2403.10391v1)\n  * [BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning](http://arxiv.org/abs/2404.01179v1)\n  * 正样本标签学习\n    * [Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation](https://openaccess.thecvf.com/content/CVPR2024/papers/Long_Positive-Unlabeled_Learning_by_Latent_Group-Aware_Meta_Disambiguation_CVPR_2024_paper.pdf)Positive-Unlabeled Learning(正样本标签学习)半监督学习的一个重要分支\n* 自监督学习\n  * [Self-supervised Representation Learning from Arbitrary Scenarios](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Self-Supervised_Representation_Learning_from_Arbitrary_Scenarios_CVPR_2024_paper.pdf)\n  * [Self-supervised Debiasing Using Low Rank Regularization](http://arxiv.org/abs/2210.05248)\n  * [Self-Supervised Dual Contouring](http://arxiv.org/abs/2405.18131)\n  * [Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces](https://arxiv.org/abs/2404.17620)\n  * [Self-Supervised Representation Learning from Arbitrary Scenarios](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Self-Supervised_Representation_Learning_from_Arbitrary_Scenarios_CVPR_2024_paper.pdf)\n  * [SD2Event: Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_SD2EventSelf-supervised_Learning_of_Dynamic_Detectors_and_Contextual_Descriptors_for_Event_CVPR_2024_paper.pdf)\n  * [An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing](https://openaccess.thecvf.com/content/CVPR2024/papers/Hu_An_Asymmetric_Augmented_Self-Supervised_Learning_Method_for_Unsupervised_Fine-Grained_Image_CVPR_2024_paper.pdf)\n  * [Self-supervised debiasing using low rank regularization](https://arxiv.org/abs/2210.05248)\n  * [CNC-Net: Self-Supervised Learning for CNC Machining Operations](https://arxiv.org/abs/2312.09925)\n* 无监督学习\n  * [Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos](https://openaccess.thecvf.com/content/CVPR2024/papers/Sommer_Unsupervised_Learning_of_Category-Level_3D_Pose_from_Object-Centric_Videos_CVPR_2024_paper.pdf)\n\n\u003ca name=\"47\"/\u003e\n\n## 47.Dense Predictions(密集预测)\n* [Efficient Multitask Dense Predictor via Binarization](https://arxiv.org/abs/2405.14136)密集预测\n* [Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_Going_Beyond_Multi-Task_Dense_Prediction_with_Synergy_Embedding_Models_CVPR_2024_paper.pdf)\n* [Exploiting Diffusion Prior for Generalizable Dense Prediction](https://arxiv.org/abs/2311.18832)\u003cbr\u003e:house:[project](https://shinying.github.io/dmp)\n* [ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions](http://arxiv.org/abs/2403.07392v1)\u003cbr\u003e:star:[code](https://github.com/Traffic-X/ViT-CoMer)\u003cbr\u003e:thumbsup:[百度提出视觉新骨干ViT-CoMer，刷新密集预测任务SOTA](https://mp.weixin.qq.com/s/Q2xI_rU5_7Mv6jiYeu6NkA)\n* [Multi-Task Dense Prediction via Mixture of Low-Rank Experts](http://arxiv.org/abs/2403.17749v1)\u003cbr\u003e:star:[code](https://github.com/YuqiYang213/MLoRE)\n\n\n\u003ca name=\"46\"/\u003e\n\n## 46.Industrial Anomaly Detection(工业缺陷检测)\n* [Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection](https://arxiv.org/abs/2310.12790)\u003cbr\u003e:star:[code](https://github.com/mala-lab/AHL)\n* 异常检测\n  * [Supervised Anomaly Detection for Complex Industrial Images](http://arxiv.org/abs/2405.04953)\n  * [Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Prompt-Enhanced_Multiple_Instance_Learning_for_Weakly_Supervised_Video_Anomaly_Detection_CVPR_2024_paper.pdf)弱监督异常检测\n  * [Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping](https://arxiv.org/abs/2312.04521)\n  * [Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation](https://arxiv.org/abs/2403.06247)\n  * [Long-Tailed Anomaly Detection with Learnable Class Names](http://arxiv.org/abs/2403.20236v1)\u003cbr\u003e:house:[project](https://zenodo.org/records/10854201)\n  * [RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection](http://arxiv.org/abs/2403.05897v1)\u003cbr\u003e:star:[code](https://github.com/cnulab/RealNet)\n  * [Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts](http://arxiv.org/abs/2403.06495v1)\u003cbr\u003e:star:[code](https://github.com/mala-lab/InCTRL)\n  * [PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection](http://arxiv.org/abs/2404.05231v1)\u003cbr\u003e:star:[code](https://github.com/FuNz-0/PromptAD)\n* 薄膜去除\n  * [Learning to Remove Wrinkled Transparent Film with Polarized Prior](http://arxiv.org/abs/2403.04368v1)\u003cbr\u003e:star:[code](https://github.com/jqtangust/FilmRemoval)\n* 基准/数据集\n  * [Real-IAD: A Real-World Multi-view Dataset for Benchmarking Versatile Industrial Anomaly Detection](https://arxiv.org/abs/2403.12580)\u003cbr\u003e:star:[code](https://github.com/TencentYoutuResearch/AnomalyDetection_Real-IAD)\n  * [Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network](https://arxiv.org/abs/2311.14897)\u003cbr\u003e:star:[code](https://github.com/Chopper-233/Anomaly-ShapeNet)\n\n\u003ca name=\"45\"/\u003e\n\n## 45.Neural Architecture Search(神经架构搜索)\n* [Towards Accurate and Robust Architectures via Neural Architecture Search](https://arxiv.org/abs/2405.05502)\n* [Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach](http://arxiv.org/abs/2403.11380v1)\n* [Building Optimal Neural Architectures using Interpretable Knowledge](http://arxiv.org/abs/2403.13293v1)\u003cbr\u003e:star:[code](https://github.com/Ascend-Research/AutoBuild)\n* [AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search](http://arxiv.org/abs/2403.19232v1)\n* [SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model](https://arxiv.org/abs/2406.00195)\n* [Insights from the Use of Previously Unseen Neural Architecture Search Datasets](https://arxiv.org/abs/2404.02189)\n* [FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer](https://arxiv.org/abs/2403.12821)\u003cbr\u003e:star:[code](http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer)\n\n\u003ca name=\"44\"/\u003e\n\n## 44.Image Fusion(图像融合)\n* [Equivariant Multi-Modality Image Fusion](https://arxiv.org/abs/2305.11443)图像融合\n* [Task-Customized Mixture of Adapters for General Image Fusion](http://arxiv.org/abs/2403.12494v1)\u003cbr\u003e:star:[code](https://github.com/YangSun22/TC-MoA)\n* [Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion](http://arxiv.org/abs/2403.16387v1)\u003cbr\u003e:star:[code](https://github.com/XunpengYi/Text-IF)\n* [Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Tan_Revisiting_Spatial-Frequency_Information_Integration_from_a_Hierarchical_Perspective_for_Panchromatic_CVPR_2024_paper.pdf)\n* [Neural Spline Fields for Burst Image Fusion and Layer Separation](https://arxiv.org/abs/2312.14235)\u003cbr\u003e:house:[project](https://light.princeton.edu/publication/nsf)\n* 红外和可见光图像融合\n  * [Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_Probing_Synergistic_High-Order_Interaction_in_Infrared_and_Visible_Image_Fusion_CVPR_2024_paper.pdf)\n\n\u003ca name=\"43\"/\u003e\n\n## 43.Image Matching(图像匹配)\n* [XFeat: Accelerated Features for Lightweight Image Matching](https://arxiv.org/abs/2404.19174)\u003cbr\u003e:house:[project](http://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24)图像匹配\n* 图像-文本\n  * [Composing Object Relations and Attributes for Image-Text Matching](https://openaccess.thecvf.com/content/CVPR2024/papers/Pham_Composing_Object_Relations_and_Attributes_for_Image-Text_Matching_CVPR_2024_paper.pdf)\n\n\u003ca name=\"42\"/\u003e\n\n## 42.Image Retrieval(图像检索)\n* [Language-only Training of Zero-shot Composed Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Gu_Language-only_Training_of_Zero-shot_Composed_Image_Retrieval_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/navervision/lincir)\n* [Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Evaluating_Transferability_in_Retrieval_Tasks_An_Approach_Using_MMD_and_CVPR_2024_paper.pdf)\n* [Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval](http://arxiv.org/abs/2403.16005v1)\n* [On Train-Test Class Overlap and Detection for Image Retrieval](http://arxiv.org/abs/2404.01524v1)\u003cbr\u003e:star:[code](https://github.com/dealicious-inc/RGLDv2-clean)\n* [D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_D3still_Decoupled_Differential_Distillation_for_Asymmetric_Image_Retrieval_CVPR_2024_paper.pdf)\n* [Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection](http://arxiv.org/abs/2404.09263)\n* 跨域检索\n  * [ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval](https://arxiv.org/abs/2312.12478)\u003cbr\u003e:star:[code](https://github.com/fangkaipeng/ProS)\n* 视频检索\n  * [Composed Video Retrieval via Enriched Context and Discriminative Embeddings](http://arxiv.org/abs/2403.16997v1)\u003cbr\u003e:star:[code](https://github.com/OmkarThawakar/composed-video-retrieval)\n* 跨模态检索\n  * [Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval](http://arxiv.org/abs/2403.05105v1)\u003cbr\u003e:star:[code](https://github.com/hhc1997/L2RM)\n  * [Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Fine-grained_Prototypical_Voting_with_Heterogeneous_Mixup_for_Semi-supervised_2D-3D_Cross-modal_CVPR_2024_paper.pdf)\n* 文本-视频检索\n  * [Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval](http://arxiv.org/abs/2403.17998v1)\u003cbr\u003e:star:[code](https://github.com/Jiamian-Wang/T-MASS-text-video-retrieval)\n  * [Holistic Features are almost Sufficient for Text-to-Video Retrieval](https://www.researchgate.net/publication/379270657_Holistic_Features_are_almost_Sufficient_for_Text-to-Video_Retrieval)\n* 图像-文本检索\n  * [How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_How_to_Make_Cross_Encoder_a_Good_Teacher_for_Efficient_CVPR_2024_paper.pdf)\n* 视频文本检索\n  * [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Jin_MV-Adapter_Multimodal_Video_Transfer_Learning_for_Video_Text_Retrieval_CVPR_2024_paper.pdf)\n* 组合图像检索\n  * [Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval](http://arxiv.org/abs/2404.15516)\n* 细粒度图像检索\n  * [You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval](https://arxiv.org/abs/2403.07222)\u003cbr\u003e:house:[project](https://subhadeepkoley.github.io/Sketch2Word)\n  * [Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Characteristics_Matching_Based_Hash_Codes_Generation_for_Efficient_Fine-grained_Image_CVPR_2024_paper.pdf)\n* 基于草图的检索\n  * [How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?](http://arxiv.org/abs/2403.07203v1)\u003cbr\u003e:star:[code](https://subhadeepkoley.github.io/AbstractAway)\n  * [Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers](http://arxiv.org/abs/2403.07214v1)\u003cbr\u003e:house:[project](https://subhadeepkoley.github.io/DiffusionZSSBIR)  \n\n\u003ca name=\"41\"/\u003e\n\n## 41.Graph Generative Network(GNN/GCN)\n* GNN\n  * [Domain Separation Graph Neural Networks for Saliency Object Ranking](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Domain_Separation_Graph_Neural_Networks_for_Saliency_Object_Ranking_CVPR_2024_paper.pdf)\n  * [GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs](https://arxiv.org/abs/2405.06849)\n  * [FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_FC-GNN_Recovering_Reliable_and_Accurate_Correspondences_from_Interferences_CVPR_2024_paper.pdf)\n  * [DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching](https://arxiv.org/abs/2306.12547)\n  * [GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds](https://arxiv.org/abs/2312.00068)图生成网络\n* GCN\n  * [Learning for Transductive Threshold Calibration in Open-World Recognition](https://arxiv.org/abs/2305.12039)\n\n\u003ca name=\"40\"/\u003e\n\n## 40.Scene Graph Generation(场景图生成)\n* [Leveraging Predicate and Triplet Learning for Scene Graph Generation](https://arxiv.org/abs/2406.02038)\n* [OED: Towards One-stage End-to-End Dynamic Scene Graph Generation](https://arxiv.org/abs/2405.16925)\n* [CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_CLIP-Driven_Open-Vocabulary_3D_Scene_Graph_Generation_via_Cross-Modality_Contrastive_Learning_CVPR_2024_paper.pdf)\n* [Multi-Level Neural Scene Graphs for Dynamic Urban Environments](http://arxiv.org/abs/2404.00168v1)\u003cbr\u003e:star:[code](https://tobiasfshr.github.io/pub/ml-nsg/)\n* [HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation](http://arxiv.org/abs/2403.12033v1)\u003cbr\u003e:star:[code](https://zhangce01.github.io/HiKER-SGG)\u003cbr\u003e:star:[code](https://github.com/zhangce01/HiKER-SGG)\n* [DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation](http://arxiv.org/abs/2403.14886v1)\u003cbr\u003e:star:[code](https://github.com/zeeshanhayder/DSGG)\u003cbr\u003e:house:[project](https://zeeshanhayder.github.io/DSGG/)\n* [From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models](http://arxiv.org/abs/2404.00906v1)\n* [EGTR: Extracting Graph from Transformer for Scene Graph Generation](http://arxiv.org/abs/2404.02072v1)\u003cbr\u003e:star:[code](https://github.com/naver-ai/egtr)\n* [LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation](http://arxiv.org/abs/2310.10404)\n\n\n\u003ca name=\"39\"/\u003e\n\n## 39.Motion Generation(动作生成)\n* [Programmable Motion Generation for Open-Set Motion Control Tasks](https://arxiv.org/abs/2405.19283)\n* [Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance](http://arxiv.org/abs/2403.18036v1)\n* [AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents](https://arxiv.org/abs/2403.12835)\n* [Towards Variable and Coordinated Holistic Co-Speech Motion Generation](http://arxiv.org/abs/2404.00368v1)\u003cbr\u003e:star:[code](https://feifeifeiliu.github.io/probtalk/)\n* [Generating Human Motion in 3D Scenes from Text Descriptions](http://arxiv.org/abs/2405.07784)根据文本描述生成 3D 场景中的人体运动\n* [NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis](https://arxiv.org/abs/2307.07511)\u003cbr\u003e:house:[project](https://nileshkulkarni.github.io/nifty)人体运动合成\n* [OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers](https://arxiv.org/abs/2312.08985)\u003cbr\u003e:house:[project](https://tr3e.github.io/omg-page)\n* [WANDR: Intention-guided Human Motion Generation](http://arxiv.org/abs/2404.15383)\u003cbr\u003e:tv:[video](https://www.youtube.com/watch?v=9szizM-XUCg)\n* [MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion](http://arxiv.org/abs/2310.14729)\u003cbr\u003e:house:[project](https://guytevet.github.io/mas-page/)\n* [Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action](http://arxiv.org/abs/2312.17172)\n* [Multimodal Sense-Informed Forecasting of 3D Human Motions](https://arxiv.org/abs/2405.02911)\n* 运动检索\n  * [Tri-Modal Motion Retrieval by Learning a Joint Embedding Space](http://arxiv.org/abs/2403.00691)\n* 动物运动\n  * [OmniMotionGPT: Animal Motion Generation with Limited Data](https://arxiv.org/abs/2311.18303)\u003cbr\u003e:star:[code](https://zshyang.github.io/omgpt-website/)\u003cbr\u003e:house:[project](https://zshyang.github.io/omgpt-website/)\n* 人体运动预测\n  * [MoML: Online Meta Adaptation for 3D Human Motion Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_MoML_Online_Meta_Adaptation_for_3D_Human_Motion_Prediction_CVPR_2024_paper.pdf)\n  * [MoST: Multi-Modality Scene Tokenization for Motion Prediction](http://arxiv.org/abs/2404.19531)\n  * [Rethinking Human Motion Prediction with Symplectic Integral](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_Rethinking_Human_Motion_Prediction_with_Symplectic_Integral_CVPR_2024_paper.pdf)\n  * [Human Motion Prediction Under Unexpected Perturbation](https://openaccess.thecvf.com/content/CVPR2024/papers/Yue_Human_Motion_Prediction_Under_Unexpected_Perturbation_CVPR_2024_paper.pdf)\n  * [Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy](https://openaccess.thecvf.com/content/CVPR2024/papers/Kang_Continual_Learning_for_Motion_Prediction_Model_via_Meta-Representation_Learning_and_CVPR_2024_paper.pdf)\n* 人体运动估计\n  * [MultiPhys: Multi-Person Physics-aware 3D Motion Estimation](https://arxiv.org/abs/2404.11987)\u003cbr\u003e:house:[project](http://www.iri.upc.edu/people/nugrinovic/multiphys/)\n  * [A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals](https://arxiv.org/abs/2404.04890)人体运动估计\n* 人体运动重建\n  * [RoHM: Robust Human Motion Reconstruction via Diffusion](http://arxiv.org/abs/2401.08570)\n\n\u003ca name=\"38\"/\u003e\n\n## 38.Vision Question Answering(视觉问答)\n* [GRAM: Global Reasoning for Multi-Page VQA](https://arxiv.org/abs/2401.03411)\n* [SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities](http://arxiv.org/abs/2401.12168)\u003cbr\u003e:house:[project](https://spatial-vlm.github.io/)\n* [Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering](http://arxiv.org/abs/2404.10193v1)\n* [How to Configure Good In-Context Sequence for Visual Question Answering](https://arxiv.org/abs/2312.01571)\u003cbr\u003e:star:[code](https://github.com/GaryJiajia/OFv2_ICL_VQA)\n* [Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models](https://arxiv.org/abs/2312.06685)\n* [Question Aware Vision Transformer for Multimodal Reasoning](http://arxiv.org/abs/2402.05472)\n* [OpenEQA: Embodied Question Answering in the Era of Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Majumdar_OpenEQA_Embodied_Question_Answering_in_the_Era_of_Foundation_Models_CVPR_2024_paper.pdf)\n* Video-QA\n  * [Grounded Question-Answering in Long Egocentric Videos](https://arxiv.org/abs/2312.06505)\u003cbr\u003e:star:[code](https://github.com/Becomebright/GroundVQA)\n  * [Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels](http://arxiv.org/abs/2403.14430v1) \n  * [Language-aware Visual Semantic Distillation for Video Question Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Zou_Language-aware_Visual_Semantic_Distillation_for_Video_Question_Answering_CVPR_2024_paper.pdf)\n  * [MoReVQA: Exploring Modular Reasoning Models for Video Question Answering](https://arxiv.org/abs/2404.06511)\n  * [Can I Trust Your Answer? Visually Grounded Video Question Answering](https://arxiv.org/abs/2309.01327)\u003cbr\u003e:star:[code](https://github.com/doc-doc/NExT-GQA)\n  * [Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Liao_Align_and_Aggregate_Compositional_Reasoning_with_Video_Alignment_and_Answer_CVPR_2024_paper.pdf)\n* 图表问答\n  * [CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_CoG-DQA_Chain-of-Guiding_Learning_with_Large_Language_Models_for_Diagram_Question_CVPR_2024_paper.pdf)\n  * [Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA](http://arxiv.org/abs/2403.16385v1)\n* 视觉文本问答\n  * [VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning](https://arxiv.org/abs/2303.02635)\n\n\u003ca name=\"37\"/\u003e\n\n## 37.OCR\n* 场景文本识别\n  * [OTE: Exploring Accurate Scene Text Recognition Using One Token](https://openaccess.thecvf.com/content/CVPR2024/papers/Xu_OTE_Exploring_Accurate_Scene_Text_Recognition_Using_One_Token_CVPR_2024_paper.pdf)\n  * [An Empirical Study of Scaling Law for Scene Text Recognition](https://arxiv.org/abs/2401.00028)\u003cbr\u003e:star:[code](https://github.com/large-ocr-model/large-ocr-model.github.io)场景文本识别\n  * [Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer](https://arxiv.org/abs/2311.13120)\u003cbr\u003e:star:[code](https://github.com/bytedance/E2STR)\n  * [Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction](https://openaccess.thecvf.com/content/CVPR2024/papers/Zheng_Kernel_Adaptive_Convolution_for_Scene_Text_Detection_via_Distance_Map_CVPR_2024_paper.pdf)\n  * [Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing](https://arxiv.org/abs/2405.04377)场景文本识别、删除和编辑\n  * [ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting](https://arxiv.org/abs/2403.00303)\u003cbr\u003e:star:[code](https://github.com/PriNing/ODM)\n* 场景文本图像合成\n  * [Layout-Agnostic Scene Text Image Synthesis with Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhangli_Layout-Agnostic_Scene_Text_Image_Synthesis_with_Diffusion_Models_CVPR_2024_paper.pdf)\n* 场景文本理解\n  * [LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_LayoutFormer_Hierarchical_Text_Detection_Towards_Scene_Text_Understanding_CVPR_2024_paper.pdf)\n* 化学结构识别\n  * [Atom-Level Optical Chemical Structure Recognition with Limited Supervision](https://arxiv.org/abs/2404.01743)\u003cbr\u003e:star:[code](https://github.com/molden/atomlenz)\n* 文档色度检测\n  * [CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_CMA_A_Chromaticity_Map_Adapter_for_Robust_Detection_of_Screen-Recapture_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/chenlewis/Chromaticity-Map-Adapter-for-DPAD)\n* 文本检测\n  * [OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition](http://arxiv.org/abs/2403.19128v1)\u003cbr\u003e:star:[code](https://github.com/AlibabaResearch/AdvancedLiterateMachinery)\n  * [Bridging the Gap Between End-to-End and Two-Step Text Spotting](http://arxiv.org/abs/2404.04624v1)\u003cbr\u003e:star:[code](https://github.com/mxin262/Bridging-Text-Spotting)\n  * [Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis](https://arxiv.org/abs/2405.07481)\n* 文档理解\n  * [LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding](http://arxiv.org/abs/2404.05225v1)\u003cbr\u003e:star:[code](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LayoutLLM)\n  * [HRVDA: High-Resolution Visual Document Assistant](http://arxiv.org/abs/2404.06918v1)\n* 字体生成\n  * [Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Fu_Generate_Like_Experts_Multi-Stage_Font_Generation_by_Incorporating_Font_Transfer_CVPR_2024_paper.pdf)\n\n\u003ca name=\"36\"/\u003e\n\n## 36.4D Reconstruction(4D 重建)\n* [Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle](https://arxiv.org/abs/2312.03431)\u003cbr\u003e:house:[project](https://nju-3dv.github.io/projects/Gaussian-Flow)\n* [Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking](https://arxiv.org/abs/2401.06614)\u003cbr\u003e:house:[project](https://vveicao.github.io/projects/Motion2VecSets/)\n* [4D Gaussian Splatting for Real-Time Dynamic Scene Rendering](https://arxiv.org/abs/2310.08528)\u003cbr\u003e:star:[code](https://github.com/hustvl/4DGaussians)\u003cbr\u003e:house:[project](https://guanjunwu.github.io/4dgs/)\n* 文本和图像引导 4D 场景生成\n  * [A Unified Approach for Text- and Image-guided 4D Scene Generation](https://arxiv.org/abs/2311.16854)\u003cbr\u003e:house:[project](https://research.nvidia.com/labs/nxp/dream-in-4d/)\n* 4D视图合成\n  * [4K4D: Real-Time 4D View Synthesis at 4K Resolution](https://arxiv.org/abs/2310.11448)\u003cbr\u003e:star:[code](https://github.com/zju3dv/4K4D)\u003cbr\u003e:house:[project](https://zju3dv.github.io/4k4d/)\n* 语言到 4D 建模\n  * [L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_L4D-Track_Language-to-4D_Modeling_Towards_6-DoF_Tracking_and_Shape_Reconstruction_in_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/S-JingTao/L4D_Track)\n\n\u003ca name=\"35\"/\u003e\n\n## 35.Scene Understanding(场景理解)\n* [Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Omni-Q_Omni-Directional_Scene_Understanding_for_Unsupervised_Visual_Grounding_CVPR_2024_paper.pdf)\n* [PanoContext-Former: Panoramic Total Scene Understanding with a Transformer](https://openaccess.thecvf.com/content/CVPR2024/papers/Dong_PanoContext-Former_Panoramic_Total_Scene_Understanding_with_a_Transformer_CVPR_2024_paper.pdf)\n* [DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving](https://arxiv.org/abs/2405.04390)\n* [OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies](https://arxiv.org/abs/2405.05259)\u003cbr\u003e:star:[code](https://github.com/ldkong1205/OpenESS)\n* [A Category Agnostic Model for Visual Rearrangment](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_A_Category_Agnostic_Model_for_Visual_Rearrangment_CVPR_2024_paper.pdf)\u003cbr\u003e:thumbsup:[VILP](https://vipl.ict.ac.cn/news/research/202403/t20240315_207758.html)\n* [360+x: A Panoptic Multi-modal Scene Understanding Dataset](http://arxiv.org/abs/2404.00989v1)\u003cbr\u003e:star:[code](https://x360dataset.github.io)\n* 开放词汇场景理解\n  * [Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding](https://arxiv.org/abs/2311.18482)\n* 3D场景理解\n  * [HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting](https://arxiv.org/abs/2403.12722)\u003cbr\u003e:house:[project](https://xdimlab.github.io/hugs_website)\n  * [SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field](https://arxiv.org/abs/2403.14366)\n  * [GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding](http://arxiv.org/abs/2403.03608v1)\n  * [GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_GP-NeRF_Generalized_Perception_NeRF_for_Context-Aware_3D_Scene_Understanding_CVPR_2024_paper.pdf)\n  * [RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding](https://arxiv.org/abs/2304.00962)\u003cbr\u003e:house:[project](https://jihanyang.github.io/projects/RegionPLC)\n  * [GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding](http://arxiv.org/abs/2403.09639v1)\u003cbr\u003e:star:[code](https://github.com/dvlab-research/GroupContrast)\n  * [SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Delitzas_SceneFun3D_Fine-Grained_Functionality_and_Affordance_Understanding_in_3D_Scenes_CVPR_2024_paper.pdf)\n\n\u003ca name=\"34\"/\u003e\n\n## 34.Human–Computer Interaction(人机交互)\n* [Exploring Pose-Aware Human-Object Interaction via Hybrid Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Wu_Exploring_Pose-Aware_Human-Object_Interaction_via_Hybrid_Learning_CVPR_2024_paper.pdf)\n* [Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Bilateral_Adaptation_for_Human-Object_Interaction_Detection_with_Occlusion-Robustness_CVPR_2024_paper.pdf)\n* [Scaling Up Dynamic Human-Scene Interaction Modeling](https://arxiv.org/abs/2403.08629)\u003cbr\u003e:star:[code](https://huggingface.co/spaces/jnnan/trumans/tree/main)\u003cbr\u003e:house:[project](https://jnnan.github.io/trumans/)\n* [ReGenNet: Towards Human Action-Reaction Synthesis](http://arxiv.org/abs/2403.11882v1)\u003cbr\u003e:star:[code](https://liangxuy.github.io/ReGenNet/)\n* [DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback](https://arxiv.org/pdf/2311.10081.pdf)\u003cbr\u003e:star:[code](https://huggingface.co/datasets/YangyiYY/LVLM_NLF)交互\n* [HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_HOI-M3_Capture_Multiple_Humans_and_Objects_Interaction_within_Contextual_Environment_CVPR_2024_paper.pdf)\n* [GenZI: Zero-Shot 3D Human-Scene Interaction Generation](http://arxiv.org/abs/2311.17737)\n* [Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection](http://arxiv.org/abs/2404.06194)\n* 人体运动跟踪\n  * [HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations](http://arxiv.org/abs/2403.03561v1)\u003cbr\u003e:star:[code](https://pico-ai-team.github.io/hmd-poser)\u003cbr\u003e:house:[project](https://pico-ai-team.github.io/hmd-poser)\n* 新运动合成\n  * [PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics](https://arxiv.org/abs/2311.12198)\u003cbr\u003e:star:[code](https://github.com/XPandora/PhysGaussian)\u003cbr\u003e:house:[project](https://xpandora.github.io/PhysGaussian/)\n* 手部交互\n  * [InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion](http://arxiv.org/abs/2403.17422v1)\u003cbr\u003e:star:[code](https://jyunlee.github.io/projects/interhandgen/)\n  * [HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data](http://arxiv.org/abs/2403.12011)\n  * [Physics-Aware Hand-Object Interaction Denoising](http://arxiv.org/abs/2405.11481)\n  * [HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video](https://arxiv.org/abs/2311.18448)\u003cbr\u003e:star:[code](https://github.com/zc-alexfan/hold)\u003cbr\u003e:house:[project](https://zc-alexfan.github.io/hold)手物交互\n  * [GEARS: Local Geometry-aware Hand-object Interaction Synthesis](https://arxiv.org/abs/2404.01758)\n  * [TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding](https://arxiv.org/abs/2401.08399)\u003cbr\u003e:house:[project](https://taco2024.github.io/)\n  * [Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction](http://arxiv.org/abs/2404.00562v1)\u003cbr\u003e:star:[code](https://github.com/JunukCha/Text2HOI)\n  * [G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis](http://arxiv.org/abs/2404.12383v1)\u003cbr\u003e:star:[code](https://judyye.github.io/ghop-www)\n  * [MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision](http://arxiv.org/abs/2310.11696)\n  * [HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild](https://openaccess.thecvf.com/content/CVPR2024/papers/Narasimhaswamy_HOIST-Former_Hand-held_Objects_Identification_Segmentation_and_Tracking_in_the_Wild_CVPR_2024_paper.pdf)\n* 人物交互\n  * [Discovering Syntactic Interaction Clues for Human-Object Interaction Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Luo_Discovering_Syntactic_Interaction_Clues_for_Human-Object_Interaction_Detection_CVPR_2024_paper.pdf)\n  * [Open-World Human-Object Interaction Detection via Multi-modal Prompts](https://arxiv.org/abs/2406.07221)\n  * [LEMON: Learning 3D Human-Object Interaction Relation from 2D Images](https://arxiv.org/pdf/2312.08963.pdf)\u003cbr\u003e:star:[code](https://github.com/yyvhang/lemon_3d)\u003cbr\u003e:house:[project](https://yyvhang.github.io/LEMON/)\n  * [Disentangled Pre-training for Human-Object Interaction Detection](http://arxiv.org/abs/2404.01725v1)\u003cbr\u003e:star:[code](https://github.com/xingaoli/DP-HOI)\n  * [GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation](https://arxiv.org/abs/2401.00929)\u003cbr\u003e:house:[project](https://genh2r.github.io/)\n  * [Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition](http://arxiv.org/abs/2405.09931)\u003cbr\u003e:house:[project](https://yuchen2199.github.io/Interactive-Gaze/)\n  * [Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation](https://arxiv.org/abs/2312.07063)\u003cbr\u003e:house:[project](https://virtualhumans.mpi-inf.mpg.de/procigen-hdm)\n  * 3D 人物交互\n    * [I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions](https://arxiv.org/abs/2312.08869)\u003cbr\u003e:house:[project](https://afterjourney00.github.io/IM-HOI.github.io/)\n    * [CG-HOI: Contact-Guided 3D Human-Object Interaction Generation](https://arxiv.org/abs/2311.16097)\u003cbr\u003e:house:[project](https://cg-hoi.christian-diller.de/)\n* 人-人交互\n  * [Inter-X: Towards Versatile Human-Human Interaction Analysis](https://arxiv.org/abs/2312.16051)\u003cbr\u003e:star:[code](https://github.com/liangxuy/Inter-X)\u003cbr\u003e:house:[project](https://liangxuy.github.io/inter-x/)\u003cbr\u003e:thumbsup:[三维数字人重建、编辑与驱动](https://valser.org/webinar/slide/slides/20240403/Valse20240403%E6%99%8F%E8%BD%B6%E8%B6%85.pdf)\n\n\n\n\u003ca name=\"33\"/\u003e\n\n## 33.NeRF\n* [GARField: Group Anything with Radiance Fields](http://arxiv.org/abs/2401.09419)\n* [IReNe: Instant Recoloring of Neural Radiance Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Mazzucchelli_IReNe_Instant_Recoloring_of_Neural_Radiance_Fields_CVPR_2024_paper.pdf)\n* [PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF](https://openaccess.thecvf.com/content/CVPR2024/papers/Feng_PIE-NeRF_Physics-based_Interactive_Elastodynamics_with_NeRF_CVPR_2024_paper.pdf)\n* [LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes](http://arxiv.org/abs/2405.00900)\n* [SIGNeRF: Scene Integrated Generation for Neural Radiance Fields](http://arxiv.org/abs/2401.01647)\n* [NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_NC-SDF_Enhancing_Indoor_Scene_Reconstruction_Using_Neural_SDFs_with_View-Dependent_CVPR_2024_paper.pdf)\n* [SpecNeRF: Gaussian Directional Encoding for Specular Reflections](http://arxiv.org/abs/2312.13102)\n* [PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_PaReNeRF_Toward_Fast_Large-scale_Dynamic_NeRF_with_Patch-based_Reference_CVPR_2024_paper.pdf)NeRF\n* [Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_Global_and_Hierarchical_Geometry_Consistency_Priors_for_Few-shot_NeRFs_in_CVPR_2024_paper.pdf)\u003cbr\u003e:thumbsup:[摘要](https://informatics.xmu.edu.cn/info/1053/36349.htm)\n* [NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs](http://arxiv.org/abs/2402.08622)\n* [Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling](https://arxiv.org/abs/2405.14847)\u003cbr\u003e:star:[code](https://github.com/lwwu2/nde)\n* [Accelerating Neural Field Training via Soft Mining](http://arxiv.org/abs/2312.00075)\n* [Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling](https://arxiv.org/abs/2406.03723)\u003cbr\u003e:house:[project](https://merl.com/research/highlights/gear-nerf)\n* [How Far Can We Compress Instant-NGP-Based NeRF?](https://arxiv.org/abs/2406.04101)\u003cbr\u003e:star:[code](https://github.com/yihangchen-ee/cnc/)\u003cbr\u003e:house:[project](https://yihangchen-ee.github.io/project_cnc/)\n* [BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction](http://arxiv.org/abs/2404.13024)\u003cbr\u003e:star:[code](https://theialab.github.io/banf/)\u003cbr\u003e:house:[project](https://theialab.github.io/banf/)\n* [Tactile-Augmented Radiance Fields](https://arxiv.org/abs/2405.04534)\u003cbr\u003e:star:[code](https://github.com/Dou-Yiming/TaRF/)\u003cbr\u003e:house:[project](https://dou-yiming.github.io/TaRF)\n* [NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild](https://arxiv.org/abs/2405.18715)\u003cbr\u003e:house:[project](https://nerf-on-the-go.github.io/)\n* [L0-Sampler: An L0 Model Guided Volume Sampling for NeRF](https://arxiv.org/abs/2311.07044)\u003cbr\u003e:house:[project](https://ustc3dv.github.io/L0-Sampler/)NeRF\n* [HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses](https://arxiv.org/abs/2312.02232)\n* [Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields](https://arxiv.org/abs/2311.11845)\u003cbr\u003e:star:[code](https://github.com/tatakai1/EVENeRF)\n* [NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation](https://arxiv.org/abs/2404.02185)\n* [MuRF: Multi-Baseline Radiance Fields](https://arxiv.org/abs/2312.04565)\u003cbr\u003e:house:[project](https://haofeixu.github.io/murf/)\u003cbr\u003e:house:[project](https://ivrl.github.io/InNeRF360)\n* [InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields](https://arxiv.org/abs/2305.15094)\n* [NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors](https://arxiv.org/abs/2403.03122)\u003cbr\u003e:star:[code](https://github.com/hynann/NRDF)\u003cbr\u003e:house:[project](https://virtualhumans.mpi-inf.mpg.de/nrdf/)\n* [Neural Fields as Distributions: Signal Processing Beyond Euclidean Space](https://openaccess.thecvf.com/content/CVPR2024/papers/Rebain_Neural_Fields_as_Distributions_Signal_Processing_Beyond_Euclidean_Space_CVPR_2024_paper.pdf)\u003cbr\u003e:house:[project](https://ubc-vision.github.io/nfd/)\n* [CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs](http://arxiv.org/abs/2403.16885v1)\u003cbr\u003e:star:[code](https://zhongyingji.github.io/CVT-xRF)\n* [DaReNeRF: Direction-aware Representation for Dynamic Scenes](http://arxiv.org/abs/2403.02265v1)\n* [Geometry Transfer for Stylizing Radiance Fields](https://arxiv.org/abs/2402.00863)\u003cbr\u003e:house:[project](https://hyblue.github.io/geo-srf/)\n* [S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes](http://arxiv.org/abs/2403.06205v1)\u003cbr\u003e:star:[code](https://xingyi-li.github.io/s-dyrf/)\n* [SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream](http://arxiv.org/abs/2403.11222v1)\u003cbr\u003e:star:[code](https://github.com/BIT-Vision/SpikeNeRF)\n* [Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes](http://arxiv.org/abs/2403.16141v1)\u003cbr\u003e:star:[code](https://otonari726.github.io/entitynerf/)\n* [Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates](https://arxiv.org/abs/2309.11281)\u003cbr\u003e:star:[code](https://github.com/kcshum/pose-conditioned-NeRF-object-fusion)\n* [LAENeRF: Local Appearance Editing for Neural Radiance Fields](https://arxiv.org/abs/2312.09913)\u003cbr\u003e:star:[code](https://github.com/r4dl/LAENeRF)\u003cbr\u003e:house:[project](https://r4dl.github.io/LAENeRF/)\n* [Single View Refractive Index Tomography with Neural Fields](http://arxiv.org/abs/2309.04437)\n* [ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models](https://arxiv.org/abs/2406.06133)\n* [TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video](https://arxiv.org/abs/2312.06713)\n* [NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation](http://arxiv.org/abs/2403.17537v1)\u003cbr\u003e:star:[code](https://cnhaox.github.io/NeRF-HuGS/)\n* [Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency](http://arxiv.org/abs/2403.17638v1)\u003cbr\u003e:star:[code](https://github.com/HKCLynn/ReVoRF)\n* [Grounding and Enhancing Grid-based Models for Neural Fields](http://arxiv.org/abs/2403.20002v1)\u003cbr\u003e:house:[project](https://sites.google.com/view/cvpr24-2034-submission/home)\n* [Mitigating Motion Blur in Neural Radiance Fields with Events and Frames](http://arxiv.org/abs/2403.19780v1)\n* [OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos](http://arxiv.org/abs/2404.00676v1)\n* [Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects](http://arxiv.org/abs/2404.01440v1)\u003cbr\u003e:star:[code](https://github.com/NVlabs/DigitalTwinArt)\n* [Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields](https://openaccess.thecvf.com/content/CVPR2024/papers/Goli_Bayes_Rays_Uncertainty_Quantification_for_Neural_Radiance_Fields_CVPR_2024_paper.pdf)\n* [Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields](http://arxiv.org/abs/2404.02155v1)\u003cbr\u003e:house:[project](https://pals.ttic.edu/p/alpha-invariance)\n* [Dynamic LiDAR Re-simulation using Compositional Neural Fields](https://arxiv.org/abs/2312.05247)\u003cbr\u003e:house:[project](https://shengyuh.github.io/dynfl)\n* [SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields](https://arxiv.org/abs/2311.15803)\u003cbr\u003e:house:[project](https://qherau.github.io/SOAC/)\n* [ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization](https://arxiv.org/abs/2401.08937)\n* [NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows](https://openaccess.thecvf.com/content/CVPR2024/papers/Tang_NeRFDeformer_NeRF_Transformation_from_a_Single_View_via_3D_Scene_CVPR_2024_paper.pdf)\n* 新视图合成\n  * [ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image](http://arxiv.org/abs/2310.17994)\n  * [Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/Hong_Unifying_Correspondence_Pose_and_NeRF_for_Generalized_Pose-Free_Novel_View_CVPR_2024_paper.pdf)\n  * [NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis](https://openaccess.thecvf.com/content/CVPR2024/papers/You_NeLF-Pro_Neural_Light_Field_Probes_for_Multi-Scale_Novel_View_Synthesis_CVPR_2024_paper.pdf)\n  * [3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis](http://arxiv.org/abs/2404.06270)\n  * [G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images](http://arxiv.org/abs/2404.07474v1)\n  * [MultiDiff: Consistent Novel View Synthesis from a Single Image](https://openaccess.thecvf.com/content/CVPR2024/papers/Muller_MultiDiff_Consistent_Novel_View_Synthesis_from_a_Single_Image_CVPR_2024_paper.pdf)\n  * [Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis](https://arxiv.org/abs/2401.02436)\n  * [DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis](https://arxiv.org/abs/2312.13016)\n  * [3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis](https://arxiv.org/abs/2404.06270)\n  * [Generalizable Novel-View Synthesis using a Stereo Camera](http://arxiv.org/abs/2404.13541)\u003cbr\u003e:house:[project](https://jinwonjoon.github.io/stereonerf/)\n  * [DART: Implicit Doppler Tomography for Radar Novel View Synthesis](http://arxiv.org/abs/2403.03896v1)\u003cbr\u003e:house:[project](https://wiselabcmu.github.io/dart/)\n  * [XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold](http://arxiv.org/abs/2403.19517v1)\n  * [Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis](https://arxiv.org/abs/2312.16812)\u003cbr\u003e:star:[code](https://github.com/oppo-us-research/SpacetimeGaussians)\u003cbr\u003e:house:[project](https://oppo-us-research.github.io/SpacetimeGaussians-website/)\n  * [NViST: In the Wild New View Synthesis from a Single Image with Transformers](https://arxiv.org/abs/2312.08568)\u003cbr\u003e:star:[code](https://github.com/wbjang/nvist_official)\u003cbr\u003e:house:[project](https://wbjang.github.io/nvist_webpage/)\n  * [ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models](https://arxiv.org/abs/2312.01305)\u003cbr\u003e:house:[project](https://jgkwak95.github.io/ViVid-1-to-3/)\n  * [SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes](https://arxiv.org/abs/2312.14937)\u003cbr\u003e:star:[code](https://github.com/yihua7/SC-GS)\u003cbr\u003e:house:[project](https://yihua7.github.io/SC-GS-web/)\n  * [Neural Visibility Field for Uncertainty-Driven Active Mapping](https://arxiv.org/abs/2406.06948)\u003cbr\u003e:house:[project](https://sites.google.com/view/nvf-cvpr24/)\n  * [EscherNet: A Generative Model for Scalable View Synthesis](https://arxiv.org/abs/2402.03908)\u003cbr\u003e:star:[code](https://github.com/kxhit/EscherNet)\u003cbr\u003e:house:[project](https://kxhit.github.io/EscherNet)\n  * [GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis](https://arxiv.org/pdf/2312.02155.pdf)\u003cbr\u003e:star:[code](https://github.com/ShunyuanZheng/GPS-Gaussian)\u003cbr\u003e:house:[project](https://shunyuanzheng.github.io/GPS-Gaussian)新视图\n  * [DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization](http://arxiv.org/abs/2403.06912v1)\u003cbr\u003e:star:[code](https://github.com/Fictionarry/DNGaussian)\u003cbr\u003e:house:[project](https://fictionarry.github.io/DNGaussian/)\n  * [LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis](https://arxiv.org/abs/2404.02742)\u003cbr\u003e:star:[code](https://github.com/ispc-lab/LiDAR4D)\u003cbr\u003e:house:[project](https://dyfcalid.github.io/LiDAR4D)\n  * [Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?](http://arxiv.org/abs/2403.06092v1)\n  * [Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models](https://github.com/Q-Future/Q-Instruct/tree/main/fig/Q_Instruct_v0_1_preview.pdf)\u003cbr\u003e:star:[code](https://huggingface.co/datasets/teowu/Q-Instruct)\u003cbr\u003e:house:[project](https://q-future.github.io/Q-Instruct/)\n  * [CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs](https://arxiv.org/abs/2312.07246)\u003cbr\u003e:star:[code](https://github.com/KU-CVLAB/CoPoNeRF)\u003cbr\u003e:house:[project](https://ku-cvlab.github.io/CoPoNeRF/)\n  * [EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion](https://arxiv.org/abs/2312.06725)\u003cbr\u003e:star:[code](https://github.com/huanngzh/EpiDiff)\u003cbr\u003e:house:[project](https://huanngzh.github.io/EpiDiff/)\n  * [Free3D: Consistent Novel View Synthesis without 3D Representation](https://arxiv.org/abs/2312.04551)\u003cbr\u003e:star:[code](https://github.com/lyndonzheng/Free3D)\u003cbr\u003e:house:[project](https://chuanxiaz.com/free3d/)\n  * [Novel View Synthesis with View-Dependent Effects from a Single Image](https://arxiv.org/abs/2312.08071)\u003cbr\u003e:house:[project](https://kaist-viclab.github.io/monovde-site)\n* 渲染\n  * [NeRF Director: Revisiting View Selection in Neural Volume Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_NeRF_Director_Revisiting_View_Selection_in_Neural_Volume_Rendering_CVPR_2024_paper.pdf)\n  * [Multiplane Prior Guided Few-Shot Aerial Scene Rendering](https://arxiv.org/abs/2406.04961)渲染\n  * [Differentiable Point-based Inverse Rendering](https://arxiv.org/abs/2312.02480)逆渲染\n  * [Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance](https://arxiv.org/abs/2312.04529)渲染\n  * [Perceptual Assessment and Optimization of HDR Image Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Cao_Perceptual_Assessment_and_Optimization_of_HDR_Image_Rendering_CVPR_2024_paper.pdf)\n  * [Global Latent Neural Rendering](https://arxiv.org/abs/2312.08338)\n  * [Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields](https://arxiv.org/abs/2404.17528)\u003cbr\u003e:star:[code](https://github.com/TQTQliu/GeFu)\u003cbr\u003e:house:[project](https://gefucvpr24.github.io/)\n  * [GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering](https://arxiv.org/abs/2402.10128)\u003cbr\u003e:house:[project](https://abdullahamdi.com/ges)\n  * [Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination](https://openaccess.thecvf.com/content/CVPR2024/papers/Zeng_Real-time_Acquisition_and_Reconstruction_of_Dynamic_Volumes_with_Neural_Structured_CVPR_2024_paper.pdf)\u003cbr\u003e:house:[project](https://svbrdf.github.io/publications/realtimedynamic/project.html)\u003cbr\u003e:tv:[video](https://www.youtube.com/watch?v=XoTYTGSueh4)\u003cbr\u003e:thumbsup:[借助神经结构光，浙大实现动态三维现象的实时采集重建](https://mp.weixin.qq.com/s/cUnFIaL4xLaHBOWpNcI7Yg)\n  * [Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields](http://arxiv.org/abs/2403.16224v1)\u003cbr\u003e:house:[project](https://whyy.site/paper/nep)\n  * [Dr.Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering](https://arxiv.org/abs/2308.08843)\u003cbr\u003e:house:[project](https://shengcn.github.io/DrBokeh/)\n  * [HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting](https://arxiv.org/abs/2312.03461)\u003cbr\u003e:thumbsup:[HiFi4G: 通过紧凑高斯进行高保真人体性能渲染](https://cloud.tencent.com/developer/article/2383180)\n  * [ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering](https://arxiv.org/abs/2312.05941)\u003cbr\u003e:house:[project](https://vcai.mpi-inf.mpg.de/projects/ash/)\n  * [SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild](https://arxiv.org/abs/2401.10171)\u003cbr\u003e:house:[project](https://shinobi.aengelhardt.com/)神经渲染\n  * [LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering](https://openaccess.thecvf.com/content/CVPR2024/papers/Choi_LTM_Lightweight_Textured_Mesh_Extraction_and_Refinement_of_Large_Unbounded_CVPR_2024_paper.pdf)\n  * [HashPoint: Accelerated Point Searching and Sampling for Neural Rendering](https://export.arxiv.org/abs/2404.14044)\u003cbr\u003e:house:[project](https://jiahao-ma.github.io/hashpoint/)\n  * [HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces](https://arxiv.org/abs/2312.03160)\u003cbr\u003e:house:[project](https://haithemturki.com/hybrid-nerf/)\n  * [DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling](https://arxiv.org/abs/2402.08876)\u003cbr\u003e:star:[code](https://github.com/LIA-DiTella/DiffUDF)\u003cbr\u003e:house:[project](https://lia-ditella.github.io/DUDF/)\n  * [Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras](https://arxiv.org/abs/2312.07423)\u003cbr\u003e:house:[project](https://vcai.mpi-inf.mpg.de/projects/holochar/)\n  * [ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis](https://arxiv.org/abs/2311.17123)\u003cbr\u003e:house:[project](https://gaoxiangjun.github.io/contex_human/)\n* 多视图逆渲染\n  * [VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources](https://openaccess.thecvf.com/content/CVPR2024/papers/Fei_VMINer_Versatile_Multi-view_Inverse_Rendering_with_Near-_and_Far-field_Light_CVPR_2024_paper.pdf)\n* 目标重建\n  * [Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction](https://arxiv.org/abs/2312.01196)\u003cbr\u003e:house:[project](https://geometric-rl.mpi-inf.mpg.de/npg)\n  * [SAOR: Single-View Articulated Object Reconstruction](https://arxiv.org/abs/2303.13514)\u003cbr\u003e:house:[project](https://mehmetaygun.github.io/saor)\n\n\n\n\n\u003ca name=\"32\"/\u003e\n\n## 32.NLP(自然语言处理)\n* [Describing Differences in Image Sets with Natural Language](http://arxiv.org/abs/2312.02974)\n* 实体识别\n  * [A Generative Approach for Wikipedia-Scale Visual Entity Recognition](http://arxiv.org/abs/2403.02041v1)\n* 提示学习\n  * [BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP](https://arxiv.org/abs/2311.16194)\n  * [Active Prompt Learning in Vision Language Models](https://arxiv.org/abs/2311.11178)\u003cbr\u003e:star:[code](https://github.com/kaist-dmlab/pcb)\n  * [Domain Prompt Learning with Quaternion Networks](https://arxiv.org/abs/2312.08878)\n  * [On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?](http://arxiv.org/abs/2405.02266)\n  * [ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection](http://arxiv.org/abs/2311.15243)\n* 基础模型\n  * [Asymmetric Masked Distillation for Pre-Training Small Foundation Models](https://arxiv.org/abs/2311.03149)\u003cbr\u003e:star:[code](https://github.com/MCG-NJU/AMD)\n  * [Bootstrapping SparseFormers from Vision Foundation Models](https://arxiv.org/abs/2312.01987)\u003cbr\u003e:star:[code](https://github.com/showlab/sparseformer)\n \n\u003ca name=\"31\"/\u003e\n\n## 31.Edge Detection(边缘检测)\n* [MuGE: Multiple Granularity Edge Detection](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_MuGE_Multiple_Granularity_Edge_Detection_CVPR_2024_paper.pdf)\n* [RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses](http://arxiv.org/abs/2403.01795v1)\u003cbr\u003e:star:[code](https://ranked-cvpr24.github.io)\n\n\u003ca name=\"30\"/\u003e\n\n## 30.Person Re-Identification(人员重识别)\n* [Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Fusing_Personal_and_Environmental_Cues_for_Identification_and_Segmentation_of_CVPR_2024_paper.pdf)\n* [Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception](https://arxiv.org/abs/2311.13793)\n* 行人检测\n  * [DAP: A Dynamic Adversarial Patch for Evading Person Detectors](https://arxiv.org/abs/2305.11618)\n  * [Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection](http://arxiv.org/abs/2403.01300v1)\u003cbr\u003e:star:[code](https://github.com/ssbin0914/Causal-Mode-Multiplexer)\n  * [WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion](http://arxiv.org/abs/2403.19022)\n  * 基于文本的行人检索\n    * [UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity](https://arxiv.org/abs/2312.03441)\u003cbr\u003e:star:[code](https://github.com/Zplusdragon/UFineBench)\n* 人群计数\n  * [Single Domain Generalization for Crowd Counting](http://arxiv.org/abs/2403.09124v1)\u003cbr\u003e:star:[code](https://github.com/Shimmer93/MPCount)\n  * [CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models](https://arxiv.org/abs/2303.12790)\u003cbr\u003e:house:[project](https://dylran.github.io/crowddiff.github.io)\n  * [Regressor-Segmenter Mutual Prompt Learning for Crowd Counting](https://arxiv.org/abs/2312.01711)\n* 行人属性检测\n  * [Learning Group Activity Features Through Person Attribute Prediction](https://arxiv.org/abs/2403.02753)\u003cbr\u003e:star:[code](https://github.com/chihina/GAFL-CVPR2024)\u003cbr\u003e:house:[project](https://www.toyota-ti.ac.jp/Lab/Denshi/iim/ukita/selection/CVPR2024-GAFL.html)\n* 重识别\n  * [SEAS: ShapE-Aligned Supervision for Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_SEAS_ShapE-Aligned_Supervision_for_Person_Re-Identification_CVPR_2024_paper.pdf)\n  * [Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Cui_Learning_Continual_Compatible_Representation_for_Re-indexing_Free_Lifelong_Person_Re-identification_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/PKU-ICST-MIPL/C2R_CVPR2024)\n  * [View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network](http://arxiv.org/abs/2403.14513v1)\u003cbr\u003e:star:[code](https://github.com/LinlyAC/VDT-AGPReID)\n  * [CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification](https://arxiv.org/abs/2311.10605)\n  * [Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability](https://openaccess.thecvf.com/content/CVPR2024/papers/Huang_Attribute-Guided_Pedestrian_Retrieval_Bridging_Person_Re-ID_with_Internal_Attribute_Variability_CVPR_2024_paper.pdf)\n  * [All in One Framework for Multimodal Re-identification in the Wild](https://arxiv.org/abs/2405.04741)\n  * [A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_A_Pedestrian_is_Worth_One_Prompt_Towards_Language_Guidance_Person_CVPR_2024_paper.pdf)\n  * [Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification](https://zhoujiahuan1991.github.io/pub/CVPR2024_DKP.pdf)\n  * [Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions](https://arxiv.org/abs/2306.07520)\u003cbr\u003e:star:[code](https://github.com/hwz-zju/Instruct-ReID)\n  * 基于雷达的Re-Id\n    * [LiDAR-based Person Re-identification](https://arxiv.org/abs/2312.03033)\n  * 可见光-红外人员重识别\n    * [Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification](http://arxiv.org/abs/2403.11708v1)\u003cbr\u003e:star:[code](https://github.com/1KK077/IDKL)\n    * [Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Yang_Shallow-Deep_Collaborative_Learning_for_Unsupervised_Visible-Infrared_Person_Re-Identification_CVPR_2024_paper.pdf)\n  * 文本-图像重识别\n    * [Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID](https://arxiv.org/abs/2405.04940)\n    * [Noisy-Correspondence Learning for Text-to-Image Person Re-identification](https://arxiv.org/abs/2308.09911)\u003cbr\u003e:star:[code](https://github.com/QinYang79/RDE)\n* 步态识别\n  * [Learning Visual Prompt for Gait Recognition](https://openaccess.thecvf.com/content/CVPR2024/papers/Ma_Learning_Visual_Prompt_for_Gait_Recognition_CVPR_2024_paper.pdf)\n  * [BigGait: Learning Gait Representation You Want by Large Vision Models](https://arxiv.org/abs/2402.19122)\u003cbr\u003e:star:[code](https://github.com/ShiqiYu/OpenGait)\n\n\u003ca name=\"29\"/\u003e\n\n## 29.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)\n* MC\n  * [Dense Vision Transformer Compression with Few Samples](http://arxiv.org/abs/2403.18708v1)\n* KD\n  * [Small Scale Data-Free Knowledge Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Small_Scale_Data-Free_Knowledge_Distillation_CVPR_2024_paper.pdf)\n  * [KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_KD-DETR_Knowledge_Distillation_for_Detection_Transformer_with_Consistent_Distillation_Points_CVPR_2024_paper.pdf)\n  * [Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation](http://arxiv.org/abs/2404.07933)\n  * [Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities](http://arxiv.org/abs/2404.16456)\n  * [C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation](https://openaccess.thecvf.com/content/CVPR2024/papers/Huo_C2KD_Bridging_the_Modality_Gap_for_Cross-Modal_Knowledge_Distillation_CVPR_2024_paper.pdf)\n  * [CrossKD: Cross-Head Knowledge Distillation for Object Detection](http://arxiv.org/abs/2306.11369)\n  * [CLIP-KD: An Empirical Study of CLIP Model Distillation](https://arxiv.org/abs/2307.12732)\u003cbr\u003e:star:[code](https://github.com/winycg/CLIP-KD)\n  * [Aligning Logits Generatively for Principled Black-Box Knowledge Distillation](https://arxiv.org/abs/2205.10490)\n  * [FreeKD: Knowledge Distillation via Semantic Frequency Prompt](https://arxiv.org/abs/2311.12079)\n  * [Logit Standardization in Knowledge Distillation](http://arxiv.org/abs/2403.01427v1)\n  * [$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections](http://arxiv.org/abs/2403.06213v1)\u003cbr\u003e:star:[code](https://github.com/roymiles/vkd)\n  * [Scale Decoupled Distillation](http://arxiv.org/abs/2403.13512v1)\u003cbr\u003e:star:[code](https://github.com/shicaiwei123/SDD-CVPR2024)\n  * [NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation](https://arxiv.org/abs/2310.00258v2)\u003cbr\u003e:star:[code](https://github.com/tmtuan1307/nayer)\n  * [De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts](http://arxiv.org/abs/2403.19539v1)\n  * [PromptKD: Unsupervised Prompt Distillation for Vision-Language Models](https://arxiv.org/abs/2403.02781)\u003cbr\u003e:star:[code](https://github.com/zhengli97/PromptKD)\u003cbr\u003e:house:[project](https://zhengli97.github.io/PromptKD/)\u003cbr\u003e:thumbsup:[中文解读](https://zhengli97.github.io/PromptKD/chinese_interpertation.html)\n* 剪枝\n  * [Device-Wise Federated Network Pruning](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_Device-Wise_Federated_Network_Pruning_CVPR_2024_paper.pdf)\n  * [FedMef: Towards Memory-efficient Federated Dynamic Pruning](http://arxiv.org/abs/2403.14737)\n  * [OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning](http://arxiv.org/abs/2403.13351)\n  * [BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks](https://openaccess.thecvf.com/content/CVPR2024/papers/Gao_BilevelPruning_Unified_Dynamic_and_Static_Channel_Pruning_for_Convolutional_Neural_CVPR_2024_paper.pdf)\n  * [Resource-Efficient Transformer Pruning for Finetuning of Large Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Ilhan_Resource-Efficient_Transformer_Pruning_for_Finetuning_of_Large_Models_CVPR_2024_paper.pdf)\n  * [Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch](https://arxiv.org/abs/2403.14729)\n  * [Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning](https://arxiv.org/abs/2406.01820)\u003cbr\u003e:house:[project](https://iurada.github.io/PX)\n  * [Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Zero-TPrune_Zero-Shot_Token_Pruning_through_Leveraging_of_the_Attention_Graph_CVPR_2024_paper.pdf)\n  * [MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric](http://arxiv.org/abs/2403.07839v1)\n  * [Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment](http://arxiv.org/abs/2403.19490v1)\n  * [MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning](http://arxiv.org/abs/2404.05621v1)\u003cbr\u003e:star:[code](https://github.com/FarinaMatteo/multiflow)\n* 量化\n  * [PTQ4SAM: Post-Training Quantization for Segment Anything](https://arxiv.org/abs/2405.03144)\n  * [Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector](https://openaccess.thecvf.com/content/CVPR2024/papers/Ding_Reg-PTQ_Regression-specialized_Post-training_Quantization_for_Fully_Quantized_Object_Detector_CVPR_2024_paper.pdf)\n  * [Data-Free Quantization via Pseudo-label Filtering](https://openaccess.thecvf.com/content/CVPR2024/papers/Fan_Data-Free_Quantization_via_Pseudo-label_Filtering_CVPR_2024_paper.pdf)\n  * [JointSQ: Joint Sparsification-Quantization for Distributed Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Xie_JointSQ_Joint_Sparsification-Quantization_for_Distributed_Learning_CVPR_2024_paper.pdf)\n  * [Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning](http://arxiv.org/abs/2401.01543)\n  * [Epistemic Uncertainty Quantification For Pre-Trained Neural Networks](http://arxiv.org/abs/2404.10124)\n  * [Enhancing Post-training Quantization Calibration through Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Enhancing_Post-training_Quantization_Calibration_through_Contrastive_Learning_CVPR_2024_paper.pdf)\n  * [Towards Accurate Post-training Quantization for Diffusion Models](http://arxiv.org/abs/2305.18723)量化\n  * [Is Conventional SNN Really Efficient? A Perspective from Network Quantization](https://arxiv.org/abs/2311.10802)\n  * [Are Conventional SNNs Really Efficient? A Perspective from Network Quantization](https://openaccess.thecvf.com/content/CVPR2024/papers/Shen_Are_Conventional_SNNs_Really_Efficient_A_Perspective_from_Network_Quantization_CVPR_2024_paper.pdf)\n\n\u003ca name=\"28\"/\u003e\n\n## 28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)\n* [Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization](http://arxiv.org/abs/2403.14198v1)\u003cbr\u003e:star:[code](https://github.com/liguopeng0923/UCVGL)\n* [Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery](http://arxiv.org/abs/2403.05419v1)\u003cbr\u003e:star:[code](https://github.com/techmn/satmae_pp)\n* [Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery](http://arxiv.org/abs/2403.11812v1)\u003cbr\u003e:house:[project](https://zyqz97.github.io/Aerial_Lifting/)\n* [S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_S2MAE_A_Spatial-Spectral_Pretraining_Foundation_Model_for_Spectral_Remote_Sensing_CVPR_2024_paper.pdf)\n* [Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans](https://arxiv.org/abs/2304.09704)\u003cbr\u003e:house:[project](https://imagine.enpc.fr/~loiseaur/learnable-earth-parser)\n* [WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification](https://openaccess.thecvf.com/content/CVPR2024/papers/Kumar_WildlifeMapper_Aerial_Image_Analysis_for_Multi-Species_Detection_and_Identification_CVPR_2024_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/UCSB-VRL/WildlifeMapper)\n* [Learning without Exact Guidanc","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52CV%2FCVPR-2024-Papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F52CV%2FCVPR-2024-Papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52CV%2FCVPR-2024-Papers/lists"}