https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo
Owner: DWCTOD
License: apache-2.0
Created: 2021-03-13T12:26:36.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2024-04-25T14:34:34.000Z (about 2 years ago)
Last Synced: 2025-03-23T12:41:42.767Z (over 1 year ago)
Topics: computer-vision, cvpr, cvpr2021, cvpr2022, cvpr2023, cvpr2024, llm, multimodal-deep-learning, object-detection, segment-anything, segmentation
Homepage:
Size: 137 KB
Stars: 1,335
Watchers: 27
Forks: 150
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

100-AI-Machine-learning-Deep-learning-Computer-vision-NLP - 👆

README

# CVPR2024-Papers-with-Code-Demo

:star_and_crescent:**添加微信: nvshenj125, 备注方向，进交流学习群**

欢迎关注公众号：AI算法与图像处理

:star2: [CVPR 2024](https://cvpr.thecvf.com/Conferences/2024) 持续更新最新论文/paper和相应的开源代码/code！

B站demo：https://space.bilibili.com/288489574

> :hand: 注：欢迎各位大佬提交issue，分享CVPR 2022论文/paper和开源项目！共同完善这个项目
>
> 往年顶会论文汇总：
>
> [CVPR2021](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2021.md)
>
> [CVPR2022](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2022.md)
>
> [CVPR2023](https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo/blob/main/CVPR2023.md)
>
> [ICCV2021](https://github.com/DWCTOD/ICCV2021-Papers-with-Code-Demo)
>
> [ECCV2022](https://github.com/DWCTOD/ECCV2022-Papers-with-Code-Demo)

### **:fireworks: 欢迎进群** | Welcome

CVPR 2024 论文/paper交流群已成立！已经收录的同学，可以添加微信：**nvshenj125**，请备注：**CVPR+姓名+学校/公司名称**！一定要根据格式申请，可以拉你进群。

### :hammer: **目录 |Table of Contents（点击直接跳转）**

目录（右侧点击可折叠）

- [Backbone](#Backbone)
- [数据集/Dataset](#Dataset)
- [Diffusion Model](#DiffusionModel)
- [Text-to-Image](#T2I)
- [NAS](#NAS)
- [NeRF](#NeRF)
- [Knowledge Distillation](#KnowledgeDistillation)
- [多模态 / Multimodal ](#Multimodal)
- [对比学习/Contrastive Learning](#ContrastiveLearning)
- [图神经网络 / Graph Neural Networks](#GNN)
- [胶囊网络 / Capsule Network](#CapsuleNetwork)
- [图像分类 / Image Classification](#ImageClassification)
- [目标检测/Object Detection](#ObjectDetection)
- [目标跟踪/Object Tracking](#ObjectTracking)
- [轨迹预测/Trajectory Prediction](#TrajectoryPrediction)
- [语义分割/Segmentation](#Segmentation)
- [弱监督语义分割/Weakly Supervised Semantic Segmentation](#WSSS)
- [医学图像分割](#MedicalImageSegmentation)
- [视频目标分割/Video Object Segmentation](#VideoObjectSegmentation)
- [交互式视频目标分割/Interactive Video Object Segmentation](#InteractiveVideoObjectSegmentation)
- [Visual Transformer](#VisualTransformer)
- [深度估计/Depth Estimation](#DepthEstimation)
- [人脸识别/Face Recognition](#FaceRecognition)
- [人脸检测/Face Detection](#FaceDetection)
- [人脸活体检测/Face Anti-Spoofing](#FaceAnti-Spoofing)
- [人脸年龄估计/Age Estimation](#AgeEstimation)
- [人脸表情识别/Facial Expression Recognition](#FacialExpressionRecognition)
- [人脸属性识别/Facial Attribute Recognition](#FacialAttributeRecognition)
- [人脸编辑/Facial Editing](#FacialEditing)
- [人脸重建/Face Reconstruction](#FaceReconstruction)
- [Talking Face](#TalkingFace)
- [换脸/Face Swap](#FaceSwap)
- [姿态估计/Pose Estimation](#HumanPoseEstimation)
- [手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)](#HandPoseEstimation)
- [视频动作检测/Video Action Detection](#VideoActionDetection)
- [手语翻译/Sign Language Translation](#SignLanguageTranslation)
- [3D人体重建](#3D人体重建)
- [行人重识别/Person Re-identification](#PersonRe-identification)
- [行人搜索/Person Search](#PersonSearch)
- [人群计数 / Crowd Counting](#CrowdCounting)
- [GAN](#GAN)
- [彩妆迁移 / Color-Pattern Makeup Transfer](#CPM)
- [字体生成 / Font Generation](#FontGeneration)
- [场景文本检测、识别/Scene Text Detection/Recognition](#OCR)
- [图像、视频检索 / Image Retrieval/Video retrieval](#Retrieval)
- [Image Animation](#ImageAnimation)
- [抠图/Image Matting](#ImageMatting)
- [超分辨率/Super Resolution](#SuperResolution)
- [图像复原/Image Restoration](#ImageRestoration)
- [图像补全/Image Inpainting](#ImageInpainting)
- [图像去噪/Image Denoising](#ImageDenoising)
- [图像编辑/Image Editing](#ImageEditing)
- [图像拼接/Image stitching](#Imagestitching)
- [图像匹配/Image Matching](#ImageMatching)
- [图像融合/Image Blending](#ImageBlending)
- [图像去雾/Image Dehazing](#ImageDehazing)
- [图像去模糊/Image Deblur](#ImageDeblur)
- [图像压缩/Image Compression](#ImageCompression)
- [反光去除/Reflection Removal](#ReflectionRemoval)
- [车道线检测/Lane Detection](#LaneDetection)
- [自动驾驶 / Autonomous Driving](#AutonomousDriving)
- [流体重建/Fluid Reconstruction](#FluidReconstruction)
- [场景重建 / Scene Reconstruction](#SceneReconstruction)
- [3D Reconstruction](#3DReconstruction)
- [视频插帧/Frame Interpolation](#FrameInterpolation)
- [视频超分 / Video Super-Resolution](#VideoSuper-Resolution)
- [3D点云/3D point cloud](#3DPointCloud)
- [标签噪声 / Label-Noise](#Label-Noise)
- [对抗样本/Adversarial Examples](#AdversarialExamples)
- [Anomaly Detection](#AnomalyDetection)
- [其他/Other](#Other)

## Backbone

[返回目录/back](#Contents)

## 数据集/Dataset

**HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative**

- 论文/Paper: http://arxiv.org/pdf/2403.02640
- 代码/Code: None

**Traffic Scene Parsing through the TSP6K Dataset**

- 论文/Paper: https://arxiv.org/pdf/2303.02835.pdf
- 代码/Code: https://github.com/PengtaoJiang/TSP6K

[返回目录/back](#Contents)

# Diffusion Model

**Balancing Act: Distribution-Guided Debiasing in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2402.18206
- 代码/Code: None

**DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2402.19481
- 代码/Code: https://github.com/mit-han-lab/distrifuser

**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly**

- 论文/Paper: http://arxiv.org/pdf/2402.19302
- 代码/Code: https://github.com/iit-pavis/diffassemble

**Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks**

- 论文/Paper: http://arxiv.org/pdf/2403.00644
- 代码/Code: None

**Few-shot Learner Parameterization by Diffusion Time-steps**

- 论文/Paper: http://arxiv.org/pdf/2403.02649
- 代码/Code: https://github.com/yue-zhongqi/tif

**MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant**

- 论文/Paper: http://arxiv.org/pdf/2403.04290
- 代码/Code: None

**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**

- 论文/Paper: https://arxiv.org/abs/2403.06951
- 代码/Code: https://github.com/Tianhao-Qi/DEADiff_code

**Face2Diffusion for Fast and Editable Face Personalization**

- 论文/Paper: http://arxiv.org/pdf/2403.05094
- 代码/Code: https://github.com/mapooon/Face2Diffusion

**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**

- 论文/Paper: http://arxiv.org/pdf/2403.06951
- 代码/Code: None

**MACE: Mass Concept Erasure in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2403.06135
- 代码/Code: https://github.com/Shilin-LU/MACE

**It's All About Your Sketch: Democratising Sketch Control in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2403.07234
- 代码/Code: https://github.com/subhadeepkoley/demosketch2rgb

**SemCity: Semantic Scene Generation with Triplane Diffusion**

- 论文/Paper: http://arxiv.org/pdf/2403.07773
- 代码/Code: https://github.com/zoomin-lee/semcity

[返回目录/back](#Contents)

## Text-to-Image

**RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization**

- 论文/Paper: http://arxiv.org/pdf/2403.00483
- 代码/Code: None

**NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging**

- 论文/Paper: http://arxiv.org/pdf/2403.03485
- 代码/Code: https://github.com/univ-esuty/noisecollage

**Discriminative Probing and Tuning for Text-to-Image Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.04321
- 代码/Code: None

**Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.05239
- 代码/Code: None

**Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.06452
- 代码/Code: https://github.com/mulns/Text2QR

**Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers**

- 论文/Paper: http://arxiv.org/pdf/2403.07214
- 代码/Code: None

[返回目录/back](#Contents)

## NAS

[返回目录/back](#Contents)

# NeRF

**GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding**

- 论文/Paper: http://arxiv.org/pdf/2403.03608
- 代码/Code: None

**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**

- 论文/Paper: http://arxiv.org/pdf/2403.06912
- 代码/Code: https://github.com/fictionarry/dngaussian

**S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.06205
- 代码/Code: None

[返回目录/back](#Contents)

## Knowledge Distillation

**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**

- 论文/Paper: http://arxiv.org/pdf/2403.02781
- 代码/Code: https://github.com/zhengli97/PromptKD

**Logit Standardization in Knowledge Distillation**

- 论文/Paper: https://arxiv.org/abs/2403.01427
- 代码/Code: https://github.com/sunshangquan/logit-standardization-KD

**RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features**

- 论文/Paper: http://arxiv.org/pdf/2403.05061
- 代码/Code: None

**$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections**

- 论文/Paper: http://arxiv.org/pdf/2403.06213
- 代码/Code: https://github.com/roymiles/vkd

[返回目录/back](#Contents)

## 多模态 / Multimodal

**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**

- 论文/Paper: https://arxiv.org/abs/2312.07472
- 代码/Code: https://github.com/IranQin/MP5
- 主页/Website：https://iranqin.github.io/MP5.github.io/

**Polos: Multimodal Metric Learning from Human Feedback for Image Captioning**

- 论文/Paper: http://arxiv.org/pdf/2402.18091
- 代码/Code: None

**MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer**

- 论文/Paper: http://arxiv.org/pdf/2403.02991
- 代码/Code: None

**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM

**MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric**

- 论文/Paper: http://arxiv.org/pdf/2403.07839
- 代码/Code: None

**Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework**

- 论文/Paper: http://arxiv.org/pdf/2403.07636
- 代码/Code: https://github.com/hieuphan33/mavl

**Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations**

- 论文/Paper: http://arxiv.org/pdf/2403.07241
- 代码/Code: None

[返回目录/back](#Contents)

## Contrastive Learning

**Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.06122
- 代码/Code: https://github.com/root0yang/blindnet

[返回目录/back](#Contents)

# 胶囊网络 / Capsule Network

[返回目录/back](#Contents)

# 图像分类 / Image Classification

[返回目录/back](#Contents)

## 目标检测/Object Detection

**UniMODE: Unified Monocular 3D Object Detection**

- 论文/Paper: http://arxiv.org/pdf/2402.18573
- 代码/Code: None

**CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images**

- 论文/Paper: http://arxiv.org/pdf/2403.04198
- 代码/Code: https://github.com/SerCharles/CN-RMA

**Memory-based Adapters for Online 3D Scene Perception**

- 论文/Paper: https://arxiv.org/abs/2403.06974
- 代码/Code:https://github.com/xuxw98/Online3D

**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**

- 论文/Paper: https://arxiv.org/abs/2403.16131

- 代码/Code:https://github.com/xiuqhou/Salience-DETR

**Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors**

- 论文/Paper: http://arxiv.org/pdf/2403.06093
- 代码/Code: https://github.com/nullmax-vision/QAF2D

**SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.05817
- 代码/Code: https://github.com/zhanggang001/hednet

[返回目录/back](#Contents)

# 目标跟踪/Object Tracking

**DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking**

- 论文/Paper: http://arxiv.org/pdf/2403.02767
- 代码/Code: None

**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**

- 论文/Paper: http://arxiv.org/pdf/2403.04700
- 代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

[返回目录/back](#Contents)

# 3D Object Tracking

[返回目录/back](#Contents)

## 轨迹预测/Trajectory Prediction

[返回目录/back](#Contents)

## 语义分割/Segmentation

**PEM: Prototype-based Efficient MaskFormer for Image Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2402.19422
- 代码/Code: https://github.com/niccolocavagnero/pem

**Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06462
- 代码/Code: https://github.com/Gavinwxy/DDFP

**Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06247
- 代码/Code: None

[返回目录/back](#Contents)

## 弱监督语义分割/Weakly Supervised Semantic Segmentation

[返回目录/back](#Contents)

# 医学图像/Medical Image

**Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration**

- 论文/Paper: http://arxiv.org/pdf/2402.18933
- 代码/Code: None

[返回目录/back](#Contents)

# 视频目标分割/Video Object Segmentation

**Depth-aware Test-Time Training for Zero-shot Video Object Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.04258
- 代码/Code: None

[返回目录/back](#Contents)

# 交互式视频目标分割/Interactive Video Object Segmentation

[返回目录/back](#Contents)

# Visual Transformer

**Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery**

- 论文/Paper: http://arxiv.org/pdf/2403.05419
- 代码/Code: https://github.com/techmn/satmae_pp

[返回目录/back](#Contents)

## 深度估计/Depth Estimation

**Representations for Recognition and Retrieval**

- 论文/Paper: https://arxiv.org/pdf/2403.07535.pdf
- 代码/Code: https://github.com/Junda24/AFNet

[返回目录/back](#Contents)

# 图像、视频检索 / Image Retrieval/Video retrieval

**Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.00272
- 代码/Code: None

**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM

**How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?**

- 论文/Paper: http://arxiv.org/pdf/2403.07203
- 代码/Code: None

[返回目录/back](#Contents)

## 超分辨率/Super Resolution

**SeD: Semantic-Aware Discriminator for Image Super-Resolution**

- 论文/Paper: http://arxiv.org/pdf/2402.19387
- 代码/Code: None

**Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts**

- 论文/Paper: http://arxiv.org/pdf/2402.19215
- 代码/Code: https://github.com/mandalinadagi/wgsr

**CAMixerSR: Only Details Need More "Attention"**

- 论文/Paper: http://arxiv.org/pdf/2402.19289
- 代码/Code: https://github.com/icandle/camixersr

**Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.02601
- 代码/Code: None

[返回目录/back](#Contents)

## 图像复原/Image Restoration

**Boosting Image Restoration via Priors from Pre-trained Models**

- 论文/Paper: http://arxiv.org/pdf/2403.06793
- 代码/Code: None

[返回目录/back](#Contents)

## 图像去噪/Image Denoising

[返回目录/back](#Contents)

# 图像编辑/Image Editing

**Doubly Abductive Counterfactual Inference for Text-based Image Editing**

- 论文/Paper: http://arxiv.org/pdf/2403.02981
- 代码/Code: https://github.com/xuesong39/DAC

[返回目录/back](#Contents)

# 图像压缩/Image Compression

[返回目录/back](#Contents)

## 图像去模糊/Image Deblur

**A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.02611
- 代码/Code: https://github.com/PieceZhang/MPT-CataBlur

[返回目录/back](#Contents)

## 自动驾驶 / Autonomous Driving

**Abductive Ego-View Accident Video Understanding for Safe Driving Perception**

- 论文/Paper: http://arxiv.org/pdf/2403.00436
- 代码/Code: None

**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**

- 论文/Paper: http://arxiv.org/pdf/2403.07535
- 代码/Code: website:https://github.com/Junda24/AFNet/

[返回目录/back](#Contents)

# 人脸识别/Face Recognition

[返回目录/back](#Contents)

# 人脸检测/Face Detection

[返回目录/back](#Contents)

# 人脸活体检测/Face Anti-Spoofing

**Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing**

- 论文/Paper: http://arxiv.org/pdf/2402.19298
- 代码/Code: https://github.com/omggggg/mmdg

[返回目录/back](#Contents)

## 人脸重建/Face Reconstruction

[返回目录/back](#Contents)

# 视频动作检测/Video Action Detection

[返回目录/back](#Contents)

# 手语翻译/Sign Language Translation

[返回目录/back](#Contents)

# 行人重识别/Person Re-identification

[返回目录/back](#Contents)

# Talking Face

[返回目录/back](#Contents)

# 姿态估计/Pose Estimation

**FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.03221
- 代码/Code: None

**Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.04381
- 代码/Code: https://github.com/MickeyLLG/S2DHand

**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**

- 论文/Paper: https://arxiv.org/pdf/2311.12028.pdf
- 代码/Code: https://github.com/NationalGAILab/HoT

[返回目录/back](#Contents)

# GAN

[返回目录/back](#Contents)

# 人脸年龄估计/Age Estimation

[返回目录/back](#Contents)

# 人脸表情识别/Facial Expression Recognition

[返回目录/back](#Contents)

## 手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)

[返回目录/back](#Contents)

## 3D Reconstruction

**UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets**

- 论文/Paper: http://arxiv.org/pdf/2403.05086
- 代码/Code: https://github.com/Youngju-Na/UFORecon

**DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2403.05005
- 代码/Code: None

**Memory-based Adapters for Online 3D Scene Perception**

- 论文/Paper: http://arxiv.org/pdf/2403.06974
- 代码/Code: None

**Bayesian Diffusion Models for 3D Shape Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2403.06973
- 代码/Code: None

[返回目录/back](#Contents)

## 视频插帧/Frame Interpolation

[返回目录/back](#Contents)

## 3D点云/3D point cloud

**Rethinking Few-shot 3D Point Cloud Semantic Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.00592
- 代码/Code: https://github.com/ZhaochongAn/COSeg

**Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension**

- 论文/Paper: http://arxiv.org/pdf/2403.03532
- 代码/Code: https://github.com/liuquan98/eyoc

**Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds**

- 论文/Paper: http://arxiv.org/pdf/2403.05247
- 代码/Code: https://github.com/TRLou/HiT-ADV

[返回目录/back](#Contents)

# Anomaly Detection

**Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts**

- 论文/Paper: http://arxiv.org/pdf/2403.06495
- 代码/Code: https://github.com/mala-lab/inctrl

**RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.05897
- 代码/Code: https://github.com/cnulab/realnet

[返回目录/back](#Contents)

## 其他/Other

**DisCo: Disentangled Control for Realistic Human Dance Generation**

- 论文/Paper: https://arxiv.org/abs/2307.00040
- 代码/Code: https://github.com/Wangt-CN/DisCo

**Gradient Reweighting: Towards Imbalanced Class-Incremental Learning**

- 论文/Paper: http://arxiv.org/pdf/2402.18528
- 代码/Code: None

**TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding**

- 论文/Paper: http://arxiv.org/pdf/2402.18490
- 代码/Code: None

**Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting**

- 论文/Paper: http://arxiv.org/pdf/2402.18330
- 代码/Code: https://github.com/tho-kn/egotap

**Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing**

- 论文/Paper: http://arxiv.org/pdf/2402.18277
- 代码/Code: None

**Misalignment-Robust Frequency Distribution Loss for Image Transformation**

- 论文/Paper: http://arxiv.org/pdf/2402.18192
- 代码/Code: https://github.com/eezkni/FDL

**3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling**

- 论文/Paper: http://arxiv.org/pdf/2402.18146
- 代码/Code: https://github.com/jiangchaokang/3dsflabelling

**OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction**

- 论文/Paper: http://arxiv.org/pdf/2402.18140
- 代码/Code: None

**UniVS: Unified and Universal Video Segmentation with Prompts as Queries**

- 论文/Paper: http://arxiv.org/pdf/2402.18115
- 代码/Code: https://github.com/minghanli/univs

**Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2402.18078
- 代码/Code: https://github.com/YanzuoLu/CFLD

**Boosting Neural Representations for Videos with a Conditional Decoder**

- 论文/Paper: http://arxiv.org/pdf/2402.18152
- 代码/Code: None

**Classes Are Not Equal: An Empirical Study on Image Recognition Fairness**

- 论文/Paper: http://arxiv.org/pdf/2402.18133
- 代码/Code: None

**QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2402.17951
- 代码/Code: None

**Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers**

- 论文/Paper: http://arxiv.org/pdf/2402.19479
- 代码/Code: None

**SeMoLi: What Moves Together Belongs Together**

- 论文/Paper: http://arxiv.org/pdf/2402.19463
- 代码/Code: None

**Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction**

- 论文/Paper: http://arxiv.org/pdf/2402.19326
- 代码/Code: None

**CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition**

- 论文/Paper: http://arxiv.org/pdf/2402.19231
- 代码/Code: https://github.com/lu-feng/cricavpr

**MemoNav: Working Memory Model for Visual Navigation**

- 论文/Paper: http://arxiv.org/pdf/2402.19161
- 代码/Code: None

**VideoMAC: Video Masked Autoencoders Meet ConvNets**

- 论文/Paper: http://arxiv.org/pdf/2402.19082
- 代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac

**Theoretically Achieving Continuous Representation of Oriented Bounding Boxes**

- 论文/Paper: http://arxiv.org/pdf/2402.18975
- 代码/Code: https://github.com/Jittor/JDet

**OHTA: One-shot Hand Avatar via Data-driven Implicit Priors**

- 论文/Paper: http://arxiv.org/pdf/2402.18969
- 代码/Code: None

**WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts**

- 论文/Paper: http://arxiv.org/pdf/2402.18956
- 代码/Code: None

**Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation**

- 论文/Paper: http://arxiv.org/pdf/2402.18920
- 代码/Code: None

**SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting**

- 论文/Paper: http://arxiv.org/pdf/2402.18848
- 代码/Code: None

**ViewFusion: Towards Multi-View Consistency via Interpolated Denoising**

- 论文/Paper: http://arxiv.org/pdf/2402.18842
- 代码/Code: None

**OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition**

- 论文/Paper: http://arxiv.org/pdf/2402.18786
- 代码/Code: None

**NARUTO: Neural Active Reconstruction from Uncertain Target Observations**

- 论文/Paper: http://arxiv.org/pdf/2402.18771
- 代码/Code: None

**Towards Generalizable Tumor Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2402.19470
- 代码/Code: None

**Rethinking Multi-domain Generalization with A General Learning Objective**

- 论文/Paper: http://arxiv.org/pdf/2402.18853
- 代码/Code: None

**Rethinking Inductive Biases for Surface Normal Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.00712
- 代码/Code: https://github.com/baegwangbin/DSINE

**SURE: SUrvey REcipes for building reliable and robust deep networks**

- 论文/Paper: http://arxiv.org/pdf/2403.00543
- 代码/Code: https://github.com/YutingLi0606/SURE

**Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching**

- 论文/Paper: http://arxiv.org/pdf/2403.00486
- 代码/Code: https://github.com/Windsrain/Selective-Stereo.

**Deformable One-shot Face Stylization via DINO Semantic Guidance**

- 论文/Paper: http://arxiv.org/pdf/2403.00459
- 代码/Code: https://github.com/zichongc/DoesFS

**CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.00274
- 代码/Code: None

**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**

- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None

**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**

- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None

**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None

**Learning Group Activity Features Through Person Attribute Prediction**

- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.

**Interactive Continual Learning: Fast and Slow Thinking**

- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None

**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**

- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None

**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**

- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None

**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None

**Learning Group Activity Features Through Person Attribute Prediction**

- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.

**Interactive Continual Learning: Fast and Slow Thinking**

- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None

**Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation**

- 论文/Paper: http://arxiv.org/pdf/2403.03890
- 代码/Code: None

**DART: Implicit Doppler Tomography for Radar Novel View Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2403.03896
- 代码/Code: None

**MeaCap: Memory-Augmented Zero-shot Image Captioning**

- 论文/Paper: http://arxiv.org/pdf/2403.03715
- 代码/Code: https://github.com/joeyz0z/MeaCap

**HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations**

- 论文/Paper: http://arxiv.org/pdf/2403.03561
- 代码/Code: None

**Continual Segmentation with Disentangled Objectness Learning and Class Recognition**

- 论文/Paper: http://arxiv.org/pdf/2403.03477
- 代码/Code: https://github.com/jordangong/CoMasTRe

**HDRFlow: Real-Time HDR Video Reconstruction with Large Motions**

- 论文/Paper: http://arxiv.org/pdf/2403.03447
- 代码/Code: None

**LEAD: Learning Decomposition for Source-free Universal Domain Adaptation**

- 论文/Paper: http://arxiv.org/pdf/2403.03421
- 代码/Code: https://github.com/ispc-lab/lead

**F$^3$Loc: Fusion and Filtering for Floorplan Localization**

- 论文/Paper: http://arxiv.org/pdf/2403.03370
- 代码/Code: None

**Enhancing Vision-Language Pre-training with Rich Supervisions**

- 论文/Paper: http://arxiv.org/pdf/2403.03346
- 代码/Code: None

**Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed**

- 论文/Paper: http://arxiv.org/pdf/2403.04765
- 代码/Code: None

**Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.04492
- 代码/Code: https://github.com/rashindrie/dipa

**Learning to Remove Wrinkled Transparent Film with Polarized Prior**

- 论文/Paper: http://arxiv.org/pdf/2403.04368
- 代码/Code: https://github.com/jqtangust/filmremoval

**LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking**

- 论文/Paper: http://arxiv.org/pdf/2403.04303
- 代码/Code: None

**Active Generalized Category Discovery**

- 论文/Paper: http://arxiv.org/pdf/2403.04272
- 代码/Code: https://github.com/mashijie1028/activegcd

**MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection**

- 论文/Paper: http://arxiv.org/pdf/2403.04149
- 代码/Code: https://github.com/ispc-lab/map

**A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition**

- 论文/Paper: http://arxiv.org/pdf/2403.04245
- 代码/Code: https://github.com/dalision/modalbiasavsr

**Seamless Human Motion Composition with Blended Positional Encodings**

- 论文/Paper: https://arxiv.org/abs/2402.15509
- 代码/Code:https://github.com/BarqueroGerman/FlowMDM

**DiffusionLight: Light Probes for Free by Painting a Chrome Ball**

- 论文/Paper: https://arxiv.org/abs/2312.09168
- 代码/Code:https://github.com/DiffusionLight/DiffusionLight

**SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting**

- 论文/Paper: http://arxiv.org/pdf/2403.05087
- 代码/Code: https://github.com/initialneil/SplattingAvatar

**Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation**

- 论文/Paper: http://arxiv.org/pdf/2403.06946
- 代码/Code: https://github.com/tl-uestc/unimos

**Real-Time Simulated Avatar from Head-Mounted Sensors**

- 论文/Paper: http://arxiv.org/pdf/2403.06862
- 代码/Code: None

**DiaLoc: An Iterative Approach to Embodied Dialog Localization**

- 论文/Paper: http://arxiv.org/pdf/2403.06846
- 代码/Code: None

**FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.06775
- 代码/Code: https://github.com/modelscope/facechain

**EarthLoc: Astronaut Photography Localization by Indexing Earth from Space**

- 论文/Paper: http://arxiv.org/pdf/2403.06758
- 代码/Code: https://github.com/gmberton/earthloc

**CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective**

- 论文/Paper: http://arxiv.org/pdf/2403.06676
- 代码/Code: https://github.com/snskysk/cam-back-again

**Distributionally Generative Augmentation for Fair Facial Attribute Classification**

- 论文/Paper: http://arxiv.org/pdf/2403.06606
- 代码/Code: https://github.com/heqianpei/diga

**Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.06592
- 代码/Code: None

**MoST: Motion Style Transformer between Diverse Action Contents**

- 论文/Paper: http://arxiv.org/pdf/2403.06225
- 代码/Code: https://github.com/Boeun-Kim/MoST.

**Coherent Temporal Synthesis for Incremental Action Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06102
- 代码/Code: None

**Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?**

- 论文/Paper: http://arxiv.org/pdf/2403.06092
- 代码/Code: None

**LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content**

- 论文/Paper: http://arxiv.org/pdf/2403.05854
- 代码/Code: None

**PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor**

- 论文/Paper: http://arxiv.org/pdf/2403.06668
- 代码/Code: None

**SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.03170
- 代码/Code: None

**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**

- 论文/Paper: https://arxiv.org/abs/2403.17749
- 代码/Code: https://github.com/YuqiYang213/MLoRE

**Beyond Text: Frozen Large Language Models in Visual Signal Comprehension**

- 论文/Paper: http://arxiv.org/pdf/2403.07874
- 代码/Code: https://github.com/zh460045050/v2l-tokenizer

**Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis**

- 论文/Paper: http://arxiv.org/pdf/2403.07719
- 代码/Code: https://github.com/wonderlandxd/wikg

**Robust Synthetic-to-Real Transfer for Stereo Matching**

- 论文/Paper: http://arxiv.org/pdf/2403.07705
- 代码/Code: https://github.com/jiaw-z/dkt-stereo

**CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers**

- 论文/Paper: http://arxiv.org/pdf/2403.07700
- 代码/Code: https://github.com/shahaf-arica/cuvler

**Masked AutoDecoder is Effective Multi-Task Vision Generalist**

- 论文/Paper: http://arxiv.org/pdf/2403.07692
- 代码/Code: https://github.com/hanqiu-hq/mad

**PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution**

- 论文/Paper: http://arxiv.org/pdf/2403.07589
- 代码/Code: None

**Unleashing Network Potentials for Semantic Scene Completion**

- 论文/Paper: http://arxiv.org/pdf/2403.07560
- 代码/Code: https://github.com/fereenwong/ammnet

**Open-World Semantic Segmentation Including Class Similarity**

- 论文/Paper: http://arxiv.org/pdf/2403.07532
- 代码/Code: https://github.com/PRBonn/ContMAV

**ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions**

- 论文/Paper: http://arxiv.org/pdf/2403.07392
- 代码/Code: https://github.com/Traffic-X/ViT-CoMer

**FSC: Few-point Shape Completion**

- 论文/Paper: http://arxiv.org/pdf/2403.07359
- 代码/Code: None

**Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture**

- 论文/Paper: http://arxiv.org/pdf/2403.07347
- 代码/Code: https://github.com/jiafei127/fd4mm

**A Bayesian Approach to OOD Robustness in Image Classification**

- 论文/Paper: http://arxiv.org/pdf/2403.07277
- 代码/Code: None

[返回目录/back](#Contents)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

Awesome Lists containing this project

README