Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo

computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation

Last synced: about 1 month ago
JSON representation

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

Awesome Lists containing this project

README

        

# CVPR2024-Papers-with-Code-Demo

:star_and_crescent:**添加微信: nvshenj125, 备注方向,进交流学习群**

欢迎关注公众号:AI算法与图像处理

:star2: [CVPR 2024](https://cvpr.thecvf.com/Conferences/2024) 持续更新最新论文/paper和相应的开源代码/code!

B站demo:https://space.bilibili.com/288489574

> :hand: ​注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目
>
> 往年顶会论文汇总:
>
> [CVPR2021](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2021.md)
>
> [CVPR2022](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2022.md)
>
> [CVPR2023](https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo/blob/main/CVPR2023.md)
>
> [ICCV2021](https://github.com/DWCTOD/ICCV2021-Papers-with-Code-Demo)
>
> [ECCV2022](https://github.com/DWCTOD/ECCV2022-Papers-with-Code-Demo)

### **:fireworks: 欢迎进群** | Welcome

CVPR 2024 论文/paper交流群已成立!已经收录的同学,可以添加微信:**nvshenj125**,请备注:**CVPR+姓名+学校/公司名称**!一定要根据格式申请,可以拉你进群。

### :hammer: **目录 |Table of Contents(点击直接跳转)**

目录(右侧点击可折叠)

- [Backbone](#Backbone)
- [数据集/Dataset](#Dataset)
- [Diffusion Model](#DiffusionModel)
- [Text-to-Image](#T2I)
- [NAS](#NAS)
- [NeRF](#NeRF)
- [Knowledge Distillation](#KnowledgeDistillation)
- [多模态 / Multimodal ](#Multimodal)
- [对比学习/Contrastive Learning](#ContrastiveLearning)
- [图神经网络 / Graph Neural Networks](#GNN)
- [胶囊网络 / Capsule Network](#CapsuleNetwork)
- [图像分类 / Image Classification](#ImageClassification)
- [目标检测/Object Detection](#ObjectDetection)
- [目标跟踪/Object Tracking](#ObjectTracking)
- [轨迹预测/Trajectory Prediction](#TrajectoryPrediction)
- [语义分割/Segmentation](#Segmentation)
- [弱监督语义分割/Weakly Supervised Semantic Segmentation](#WSSS)
- [医学图像分割](#MedicalImageSegmentation)
- [视频目标分割/Video Object Segmentation](#VideoObjectSegmentation)
- [交互式视频目标分割/Interactive Video Object Segmentation](#InteractiveVideoObjectSegmentation)
- [Visual Transformer](#VisualTransformer)
- [深度估计/Depth Estimation](#DepthEstimation)
- [人脸识别/Face Recognition](#FaceRecognition)
- [人脸检测/Face Detection](#FaceDetection)
- [人脸活体检测/Face Anti-Spoofing](#FaceAnti-Spoofing)
- [人脸年龄估计/Age Estimation](#AgeEstimation)
- [人脸表情识别/Facial Expression Recognition](#FacialExpressionRecognition)
- [人脸属性识别/Facial Attribute Recognition](#FacialAttributeRecognition)
- [人脸编辑/Facial Editing](#FacialEditing)
- [人脸重建/Face Reconstruction](#FaceReconstruction)
- [Talking Face](#TalkingFace)
- [换脸/Face Swap](#FaceSwap)
- [姿态估计/Pose Estimation](#HumanPoseEstimation)
- [手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)](#HandPoseEstimation)
- [视频动作检测/Video Action Detection](#VideoActionDetection)
- [手语翻译/Sign Language Translation](#SignLanguageTranslation)
- [3D人体重建](#3D人体重建)
- [行人重识别/Person Re-identification](#PersonRe-identification)
- [行人搜索/Person Search](#PersonSearch)
- [人群计数 / Crowd Counting](#CrowdCounting)
- [GAN](#GAN)
- [彩妆迁移 / Color-Pattern Makeup Transfer](#CPM)
- [字体生成 / Font Generation](#FontGeneration)
- [场景文本检测、识别/Scene Text Detection/Recognition](#OCR)
- [图像、视频检索 / Image Retrieval/Video retrieval](#Retrieval)
- [Image Animation](#ImageAnimation)
- [抠图/Image Matting](#ImageMatting)
- [超分辨率/Super Resolution](#SuperResolution)
- [图像复原/Image Restoration](#ImageRestoration)
- [图像补全/Image Inpainting](#ImageInpainting)
- [图像去噪/Image Denoising](#ImageDenoising)
- [图像编辑/Image Editing](#ImageEditing)
- [图像拼接/Image stitching](#Imagestitching)
- [图像匹配/Image Matching](#ImageMatching)
- [图像融合/Image Blending](#ImageBlending)
- [图像去雾/Image Dehazing](#ImageDehazing)
- [图像去模糊/Image Deblur](#ImageDeblur)
- [图像压缩/Image Compression](#ImageCompression)
- [反光去除/Reflection Removal](#ReflectionRemoval)
- [车道线检测/Lane Detection](#LaneDetection)
- [自动驾驶 / Autonomous Driving](#AutonomousDriving)
- [流体重建/Fluid Reconstruction](#FluidReconstruction)
- [场景重建 / Scene Reconstruction](#SceneReconstruction)
- [3D Reconstruction](#3DReconstruction)
- [视频插帧/Frame Interpolation](#FrameInterpolation)
- [视频超分 / Video Super-Resolution](#VideoSuper-Resolution)
- [3D点云/3D point cloud](#3DPointCloud)
- [标签噪声 / Label-Noise](#Label-Noise)
- [对抗样本/Adversarial Examples](#AdversarialExamples)
- [Anomaly Detection](#AnomalyDetection)
- [其他/Other](#Other)

## Backbone

[返回目录/back](#Contents)

## 数据集/Dataset

**HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative**

- 论文/Paper: http://arxiv.org/pdf/2403.02640
- 代码/Code: None

**Traffic Scene Parsing through the TSP6K Dataset**

- 论文/Paper: https://arxiv.org/pdf/2303.02835.pdf
- 代码/Code: https://github.com/PengtaoJiang/TSP6K

[返回目录/back](#Contents)

# Diffusion Model

**Balancing Act: Distribution-Guided Debiasing in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2402.18206
- 代码/Code: None

**DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2402.19481
- 代码/Code: https://github.com/mit-han-lab/distrifuser

**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly**

- 论文/Paper: http://arxiv.org/pdf/2402.19302
- 代码/Code: https://github.com/iit-pavis/diffassemble

**Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks**

- 论文/Paper: http://arxiv.org/pdf/2403.00644
- 代码/Code: None

**Few-shot Learner Parameterization by Diffusion Time-steps**

- 论文/Paper: http://arxiv.org/pdf/2403.02649
- 代码/Code: https://github.com/yue-zhongqi/tif

**MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant**

- 论文/Paper: http://arxiv.org/pdf/2403.04290
- 代码/Code: None

**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**

- 论文/Paper: https://arxiv.org/abs/2403.06951
- 代码/Code: https://github.com/Tianhao-Qi/DEADiff_code

**Face2Diffusion for Fast and Editable Face Personalization**

- 论文/Paper: http://arxiv.org/pdf/2403.05094
- 代码/Code: https://github.com/mapooon/Face2Diffusion

**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**

- 论文/Paper: http://arxiv.org/pdf/2403.06951
- 代码/Code: None

**MACE: Mass Concept Erasure in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2403.06135
- 代码/Code: https://github.com/Shilin-LU/MACE

**It's All About Your Sketch: Democratising Sketch Control in Diffusion Models**

- 论文/Paper: http://arxiv.org/pdf/2403.07234
- 代码/Code: https://github.com/subhadeepkoley/demosketch2rgb

**SemCity: Semantic Scene Generation with Triplane Diffusion**

- 论文/Paper: http://arxiv.org/pdf/2403.07773
- 代码/Code: https://github.com/zoomin-lee/semcity

[返回目录/back](#Contents)

## Text-to-Image

**RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization**

- 论文/Paper: http://arxiv.org/pdf/2403.00483
- 代码/Code: None

**NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging**

- 论文/Paper: http://arxiv.org/pdf/2403.03485
- 代码/Code: https://github.com/univ-esuty/noisecollage

**Discriminative Probing and Tuning for Text-to-Image Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.04321
- 代码/Code: None

**Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.05239
- 代码/Code: None

**Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.06452
- 代码/Code: https://github.com/mulns/Text2QR

**Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers**

- 论文/Paper: http://arxiv.org/pdf/2403.07214
- 代码/Code: None

[返回目录/back](#Contents)

## NAS

[返回目录/back](#Contents)

# NeRF

**GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding**

- 论文/Paper: http://arxiv.org/pdf/2403.03608
- 代码/Code: None

**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**

- 论文/Paper: http://arxiv.org/pdf/2403.06912
- 代码/Code: https://github.com/fictionarry/dngaussian

**S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.06205
- 代码/Code: None

[返回目录/back](#Contents)

## Knowledge Distillation

**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**

- 论文/Paper: http://arxiv.org/pdf/2403.02781
- 代码/Code: https://github.com/zhengli97/PromptKD

**Logit Standardization in Knowledge Distillation**

- 论文/Paper: https://arxiv.org/abs/2403.01427
- 代码/Code: https://github.com/sunshangquan/logit-standardization-KD

**RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features**

- 论文/Paper: http://arxiv.org/pdf/2403.05061
- 代码/Code: None

**$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections**

- 论文/Paper: http://arxiv.org/pdf/2403.06213
- 代码/Code: https://github.com/roymiles/vkd

[返回目录/back](#Contents)

## 多模态 / Multimodal

**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**

- 论文/Paper: https://arxiv.org/abs/2312.07472
- 代码/Code: https://github.com/IranQin/MP5
- 主页/Website:https://iranqin.github.io/MP5.github.io/

**Polos: Multimodal Metric Learning from Human Feedback for Image Captioning**

- 论文/Paper: http://arxiv.org/pdf/2402.18091
- 代码/Code: None

**MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer**

- 论文/Paper: http://arxiv.org/pdf/2403.02991
- 代码/Code: None

**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM

**MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric**

- 论文/Paper: http://arxiv.org/pdf/2403.07839
- 代码/Code: None

**Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework**

- 论文/Paper: http://arxiv.org/pdf/2403.07636
- 代码/Code: https://github.com/hieuphan33/mavl

**Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations**

- 论文/Paper: http://arxiv.org/pdf/2403.07241
- 代码/Code: None

[返回目录/back](#Contents)

## Contrastive Learning

**Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.06122
- 代码/Code: https://github.com/root0yang/blindnet

[返回目录/back](#Contents)

# 胶囊网络 / Capsule Network

[返回目录/back](#Contents)

# 图像分类 / Image Classification

[返回目录/back](#Contents)

## 目标检测/Object Detection

**UniMODE: Unified Monocular 3D Object Detection**

- 论文/Paper: http://arxiv.org/pdf/2402.18573
- 代码/Code: None

**CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images**

- 论文/Paper: http://arxiv.org/pdf/2403.04198
- 代码/Code: https://github.com/SerCharles/CN-RMA

**Memory-based Adapters for Online 3D Scene Perception**

- 论文/Paper: https://arxiv.org/abs/2403.06974
- 代码/Code:https://github.com/xuxw98/Online3D

**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**

- 论文/Paper: https://arxiv.org/abs/2403.16131

- 代码/Code:https://github.com/xiuqhou/Salience-DETR

**Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors**

- 论文/Paper: http://arxiv.org/pdf/2403.06093
- 代码/Code: https://github.com/nullmax-vision/QAF2D

**SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.05817
- 代码/Code: https://github.com/zhanggang001/hednet

[返回目录/back](#Contents)

# 目标跟踪/Object Tracking

**DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking**

- 论文/Paper: http://arxiv.org/pdf/2403.02767
- 代码/Code: None

**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**

- 论文/Paper: http://arxiv.org/pdf/2403.04700
- 代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

[返回目录/back](#Contents)

# 3D Object Tracking

[返回目录/back](#Contents)

## 轨迹预测/Trajectory Prediction

[返回目录/back](#Contents)

## 语义分割/Segmentation

**PEM: Prototype-based Efficient MaskFormer for Image Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2402.19422
- 代码/Code: https://github.com/niccolocavagnero/pem

**Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06462
- 代码/Code: https://github.com/Gavinwxy/DDFP

**Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06247
- 代码/Code: None

[返回目录/back](#Contents)

## 弱监督语义分割/Weakly Supervised Semantic Segmentation

[返回目录/back](#Contents)

# 医学图像/Medical Image

**Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration**

- 论文/Paper: http://arxiv.org/pdf/2402.18933
- 代码/Code: None

[返回目录/back](#Contents)

# 视频目标分割/Video Object Segmentation

**Depth-aware Test-Time Training for Zero-shot Video Object Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.04258
- 代码/Code: None

[返回目录/back](#Contents)

# 交互式视频目标分割/Interactive Video Object Segmentation

[返回目录/back](#Contents)

# Visual Transformer

**Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery**

- 论文/Paper: http://arxiv.org/pdf/2403.05419
- 代码/Code: https://github.com/techmn/satmae_pp

[返回目录/back](#Contents)

## 深度估计/Depth Estimation

**Representations for Recognition and Retrieval**

- 论文/Paper: https://arxiv.org/pdf/2403.07535.pdf
- 代码/Code: https://github.com/Junda24/AFNet

[返回目录/back](#Contents)

# 图像、视频检索 / Image Retrieval/Video retrieval

**Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.00272
- 代码/Code: None

**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**

- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM

**How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?**

- 论文/Paper: http://arxiv.org/pdf/2403.07203
- 代码/Code: None

[返回目录/back](#Contents)

## 超分辨率/Super Resolution

**SeD: Semantic-Aware Discriminator for Image Super-Resolution**

- 论文/Paper: http://arxiv.org/pdf/2402.19387
- 代码/Code: None

**Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts**

- 论文/Paper: http://arxiv.org/pdf/2402.19215
- 代码/Code: https://github.com/mandalinadagi/wgsr

**CAMixerSR: Only Details Need More "Attention"**

- 论文/Paper: http://arxiv.org/pdf/2402.19289
- 代码/Code: https://github.com/icandle/camixersr

**Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.02601
- 代码/Code: None

[返回目录/back](#Contents)

## 图像复原/Image Restoration

**Boosting Image Restoration via Priors from Pre-trained Models**

- 论文/Paper: http://arxiv.org/pdf/2403.06793
- 代码/Code: None

[返回目录/back](#Contents)

## 图像去噪/Image Denoising

[返回目录/back](#Contents)

# 图像编辑/Image Editing

**Doubly Abductive Counterfactual Inference for Text-based Image Editing**

- 论文/Paper: http://arxiv.org/pdf/2403.02981
- 代码/Code: https://github.com/xuesong39/DAC

[返回目录/back](#Contents)

# 图像压缩/Image Compression

[返回目录/back](#Contents)

## 图像去模糊/Image Deblur

**A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.02611
- 代码/Code: https://github.com/PieceZhang/MPT-CataBlur

[返回目录/back](#Contents)

## 自动驾驶 / Autonomous Driving

**Abductive Ego-View Accident Video Understanding for Safe Driving Perception**

- 论文/Paper: http://arxiv.org/pdf/2403.00436
- 代码/Code: None

**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**

- 论文/Paper: http://arxiv.org/pdf/2403.07535
- 代码/Code: website:https://github.com/Junda24/AFNet/

[返回目录/back](#Contents)

# 人脸识别/Face Recognition

[返回目录/back](#Contents)

# 人脸检测/Face Detection

[返回目录/back](#Contents)

# 人脸活体检测/Face Anti-Spoofing

**Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing**

- 论文/Paper: http://arxiv.org/pdf/2402.19298
- 代码/Code: https://github.com/omggggg/mmdg

[返回目录/back](#Contents)

## 人脸重建/Face Reconstruction

[返回目录/back](#Contents)

# 视频动作检测/Video Action Detection

[返回目录/back](#Contents)

# 手语翻译/Sign Language Translation

[返回目录/back](#Contents)

# 行人重识别/Person Re-identification

[返回目录/back](#Contents)

# Talking Face

[返回目录/back](#Contents)

# 姿态估计/Pose Estimation

**FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.03221
- 代码/Code: None

**Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.04381
- 代码/Code: https://github.com/MickeyLLG/S2DHand

**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**

- 论文/Paper: https://arxiv.org/pdf/2311.12028.pdf
- 代码/Code: https://github.com/NationalGAILab/HoT

[返回目录/back](#Contents)

# GAN

[返回目录/back](#Contents)

# 人脸年龄估计/Age Estimation

[返回目录/back](#Contents)

# 人脸表情识别/Facial Expression Recognition

[返回目录/back](#Contents)

## 手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)

[返回目录/back](#Contents)

## 3D Reconstruction

**UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets**

- 论文/Paper: http://arxiv.org/pdf/2403.05086
- 代码/Code: https://github.com/Youngju-Na/UFORecon

**DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2403.05005
- 代码/Code: None

**Memory-based Adapters for Online 3D Scene Perception**

- 论文/Paper: http://arxiv.org/pdf/2403.06974
- 代码/Code: None

**Bayesian Diffusion Models for 3D Shape Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2403.06973
- 代码/Code: None

[返回目录/back](#Contents)

## 视频插帧/Frame Interpolation

[返回目录/back](#Contents)

## 3D点云/3D point cloud

**Rethinking Few-shot 3D Point Cloud Semantic Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.00592
- 代码/Code: https://github.com/ZhaochongAn/COSeg

**Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension**

- 论文/Paper: http://arxiv.org/pdf/2403.03532
- 代码/Code: https://github.com/liuquan98/eyoc

**Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds**

- 论文/Paper: http://arxiv.org/pdf/2403.05247
- 代码/Code: https://github.com/TRLou/HiT-ADV

[返回目录/back](#Contents)

# Anomaly Detection

**Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts**

- 论文/Paper: http://arxiv.org/pdf/2403.06495
- 代码/Code: https://github.com/mala-lab/inctrl

**RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.05897
- 代码/Code: https://github.com/cnulab/realnet

[返回目录/back](#Contents)

## 其他/Other

**DisCo: Disentangled Control for Realistic Human Dance Generation**

- 论文/Paper: https://arxiv.org/abs/2307.00040
- 代码/Code: https://github.com/Wangt-CN/DisCo

**Gradient Reweighting: Towards Imbalanced Class-Incremental Learning**

- 论文/Paper: http://arxiv.org/pdf/2402.18528
- 代码/Code: None

**TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding**

- 论文/Paper: http://arxiv.org/pdf/2402.18490
- 代码/Code: None

**Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting**

- 论文/Paper: http://arxiv.org/pdf/2402.18330
- 代码/Code: https://github.com/tho-kn/egotap

**Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing**

- 论文/Paper: http://arxiv.org/pdf/2402.18277
- 代码/Code: None

**Misalignment-Robust Frequency Distribution Loss for Image Transformation**

- 论文/Paper: http://arxiv.org/pdf/2402.18192
- 代码/Code: https://github.com/eezkni/FDL

**3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling**

- 论文/Paper: http://arxiv.org/pdf/2402.18146
- 代码/Code: https://github.com/jiangchaokang/3dsflabelling

**OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction**

- 论文/Paper: http://arxiv.org/pdf/2402.18140
- 代码/Code: None

**UniVS: Unified and Universal Video Segmentation with Prompts as Queries**

- 论文/Paper: http://arxiv.org/pdf/2402.18115
- 代码/Code: https://github.com/minghanli/univs

**Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2402.18078
- 代码/Code: https://github.com/YanzuoLu/CFLD

**Boosting Neural Representations for Videos with a Conditional Decoder**

- 论文/Paper: http://arxiv.org/pdf/2402.18152
- 代码/Code: None

**Classes Are Not Equal: An Empirical Study on Image Recognition Fairness**

- 论文/Paper: http://arxiv.org/pdf/2402.18133
- 代码/Code: None

**QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction**

- 论文/Paper: http://arxiv.org/pdf/2402.17951
- 代码/Code: None

**Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers**

- 论文/Paper: http://arxiv.org/pdf/2402.19479
- 代码/Code: None

**SeMoLi: What Moves Together Belongs Together**

- 论文/Paper: http://arxiv.org/pdf/2402.19463
- 代码/Code: None

**Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction**

- 论文/Paper: http://arxiv.org/pdf/2402.19326
- 代码/Code: None

**CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition**

- 论文/Paper: http://arxiv.org/pdf/2402.19231
- 代码/Code: https://github.com/lu-feng/cricavpr

**MemoNav: Working Memory Model for Visual Navigation**

- 论文/Paper: http://arxiv.org/pdf/2402.19161
- 代码/Code: None

**VideoMAC: Video Masked Autoencoders Meet ConvNets**

- 论文/Paper: http://arxiv.org/pdf/2402.19082
- 代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac

**Theoretically Achieving Continuous Representation of Oriented Bounding Boxes**

- 论文/Paper: http://arxiv.org/pdf/2402.18975
- 代码/Code: https://github.com/Jittor/JDet

**OHTA: One-shot Hand Avatar via Data-driven Implicit Priors**

- 论文/Paper: http://arxiv.org/pdf/2402.18969
- 代码/Code: None

**WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts**

- 论文/Paper: http://arxiv.org/pdf/2402.18956
- 代码/Code: None

**Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation**

- 论文/Paper: http://arxiv.org/pdf/2402.18920
- 代码/Code: None

**SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting**

- 论文/Paper: http://arxiv.org/pdf/2402.18848
- 代码/Code: None

**ViewFusion: Towards Multi-View Consistency via Interpolated Denoising**

- 论文/Paper: http://arxiv.org/pdf/2402.18842
- 代码/Code: None

**OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition**

- 论文/Paper: http://arxiv.org/pdf/2402.18786
- 代码/Code: None

**NARUTO: Neural Active Reconstruction from Uncertain Target Observations**

- 论文/Paper: http://arxiv.org/pdf/2402.18771
- 代码/Code: None

**Towards Generalizable Tumor Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2402.19470
- 代码/Code: None

**Rethinking Multi-domain Generalization with A General Learning Objective**

- 论文/Paper: http://arxiv.org/pdf/2402.18853
- 代码/Code: None

**Rethinking Inductive Biases for Surface Normal Estimation**

- 论文/Paper: http://arxiv.org/pdf/2403.00712
- 代码/Code: https://github.com/baegwangbin/DSINE

**SURE: SUrvey REcipes for building reliable and robust deep networks**

- 论文/Paper: http://arxiv.org/pdf/2403.00543
- 代码/Code: https://github.com/YutingLi0606/SURE

**Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching**

- 论文/Paper: http://arxiv.org/pdf/2403.00486
- 代码/Code: https://github.com/Windsrain/Selective-Stereo.

**Deformable One-shot Face Stylization via DINO Semantic Guidance**

- 论文/Paper: http://arxiv.org/pdf/2403.00459
- 代码/Code: https://github.com/zichongc/DoesFS

**CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.00274
- 代码/Code: None

**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**

- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None

**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**

- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None

**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None

**Learning Group Activity Features Through Person Attribute Prediction**

- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.

**Interactive Continual Learning: Fast and Slow Thinking**

- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None

**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**

- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None

**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**

- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None

**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**

- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None

**Learning Group Activity Features Through Person Attribute Prediction**

- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.

**Interactive Continual Learning: Fast and Slow Thinking**

- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None

**Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation**

- 论文/Paper: http://arxiv.org/pdf/2403.03890
- 代码/Code: None

**DART: Implicit Doppler Tomography for Radar Novel View Synthesis**

- 论文/Paper: http://arxiv.org/pdf/2403.03896
- 代码/Code: None

**MeaCap: Memory-Augmented Zero-shot Image Captioning**

- 论文/Paper: http://arxiv.org/pdf/2403.03715
- 代码/Code: https://github.com/joeyz0z/MeaCap

**HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations**

- 论文/Paper: http://arxiv.org/pdf/2403.03561
- 代码/Code: None

**Continual Segmentation with Disentangled Objectness Learning and Class Recognition**

- 论文/Paper: http://arxiv.org/pdf/2403.03477
- 代码/Code: https://github.com/jordangong/CoMasTRe

**HDRFlow: Real-Time HDR Video Reconstruction with Large Motions**

- 论文/Paper: http://arxiv.org/pdf/2403.03447
- 代码/Code: None

**LEAD: Learning Decomposition for Source-free Universal Domain Adaptation**

- 论文/Paper: http://arxiv.org/pdf/2403.03421
- 代码/Code: https://github.com/ispc-lab/lead

**F$^3$Loc: Fusion and Filtering for Floorplan Localization**

- 论文/Paper: http://arxiv.org/pdf/2403.03370
- 代码/Code: None

**Enhancing Vision-Language Pre-training with Rich Supervisions**

- 论文/Paper: http://arxiv.org/pdf/2403.03346
- 代码/Code: None

**Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed**

- 论文/Paper: http://arxiv.org/pdf/2403.04765
- 代码/Code: None

**Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning**

- 论文/Paper: http://arxiv.org/pdf/2403.04492
- 代码/Code: https://github.com/rashindrie/dipa

**Learning to Remove Wrinkled Transparent Film with Polarized Prior**

- 论文/Paper: http://arxiv.org/pdf/2403.04368
- 代码/Code: https://github.com/jqtangust/filmremoval

**LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking**

- 论文/Paper: http://arxiv.org/pdf/2403.04303
- 代码/Code: None

**Active Generalized Category Discovery**

- 论文/Paper: http://arxiv.org/pdf/2403.04272
- 代码/Code: https://github.com/mashijie1028/activegcd

**MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection**

- 论文/Paper: http://arxiv.org/pdf/2403.04149
- 代码/Code: https://github.com/ispc-lab/map

**A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition**

- 论文/Paper: http://arxiv.org/pdf/2403.04245
- 代码/Code: https://github.com/dalision/modalbiasavsr

**Seamless Human Motion Composition with Blended Positional Encodings**

- 论文/Paper: https://arxiv.org/abs/2402.15509
- 代码/Code:https://github.com/BarqueroGerman/FlowMDM

**DiffusionLight: Light Probes for Free by Painting a Chrome Ball**

- 论文/Paper: https://arxiv.org/abs/2312.09168
- 代码/Code:https://github.com/DiffusionLight/DiffusionLight

**SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting**

- 论文/Paper: http://arxiv.org/pdf/2403.05087
- 代码/Code: https://github.com/initialneil/SplattingAvatar

**Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation**

- 论文/Paper: http://arxiv.org/pdf/2403.06946
- 代码/Code: https://github.com/tl-uestc/unimos

**Real-Time Simulated Avatar from Head-Mounted Sensors**

- 论文/Paper: http://arxiv.org/pdf/2403.06862
- 代码/Code: None

**DiaLoc: An Iterative Approach to Embodied Dialog Localization**

- 论文/Paper: http://arxiv.org/pdf/2403.06846
- 代码/Code: None

**FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation**

- 论文/Paper: http://arxiv.org/pdf/2403.06775
- 代码/Code: https://github.com/modelscope/facechain

**EarthLoc: Astronaut Photography Localization by Indexing Earth from Space**

- 论文/Paper: http://arxiv.org/pdf/2403.06758
- 代码/Code: https://github.com/gmberton/earthloc

**CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective**

- 论文/Paper: http://arxiv.org/pdf/2403.06676
- 代码/Code: https://github.com/snskysk/cam-back-again

**Distributionally Generative Augmentation for Fair Facial Attribute Classification**

- 论文/Paper: http://arxiv.org/pdf/2403.06606
- 代码/Code: https://github.com/heqianpei/diga

**Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.06592
- 代码/Code: None

**MoST: Motion Style Transformer between Diverse Action Contents**

- 论文/Paper: http://arxiv.org/pdf/2403.06225
- 代码/Code: https://github.com/Boeun-Kim/MoST.

**Coherent Temporal Synthesis for Incremental Action Segmentation**

- 论文/Paper: http://arxiv.org/pdf/2403.06102
- 代码/Code: None

**Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?**

- 论文/Paper: http://arxiv.org/pdf/2403.06092
- 代码/Code: None

**LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content**

- 论文/Paper: http://arxiv.org/pdf/2403.05854
- 代码/Code: None

**PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor**

- 论文/Paper: http://arxiv.org/pdf/2403.06668
- 代码/Code: None

**SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection**

- 论文/Paper: http://arxiv.org/pdf/2403.03170
- 代码/Code: None

**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**

- 论文/Paper: https://arxiv.org/abs/2403.17749
- 代码/Code: https://github.com/YuqiYang213/MLoRE

**Beyond Text: Frozen Large Language Models in Visual Signal Comprehension**

- 论文/Paper: http://arxiv.org/pdf/2403.07874
- 代码/Code: https://github.com/zh460045050/v2l-tokenizer

**Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis**

- 论文/Paper: http://arxiv.org/pdf/2403.07719
- 代码/Code: https://github.com/wonderlandxd/wikg

**Robust Synthetic-to-Real Transfer for Stereo Matching**

- 论文/Paper: http://arxiv.org/pdf/2403.07705
- 代码/Code: https://github.com/jiaw-z/dkt-stereo

**CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers**

- 论文/Paper: http://arxiv.org/pdf/2403.07700
- 代码/Code: https://github.com/shahaf-arica/cuvler

**Masked AutoDecoder is Effective Multi-Task Vision Generalist**

- 论文/Paper: http://arxiv.org/pdf/2403.07692
- 代码/Code: https://github.com/hanqiu-hq/mad

**PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution**

- 论文/Paper: http://arxiv.org/pdf/2403.07589
- 代码/Code: None

**Unleashing Network Potentials for Semantic Scene Completion**

- 论文/Paper: http://arxiv.org/pdf/2403.07560
- 代码/Code: https://github.com/fereenwong/ammnet

**Open-World Semantic Segmentation Including Class Similarity**

- 论文/Paper: http://arxiv.org/pdf/2403.07532
- 代码/Code: https://github.com/PRBonn/ContMAV

**ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions**

- 论文/Paper: http://arxiv.org/pdf/2403.07392
- 代码/Code: https://github.com/Traffic-X/ViT-CoMer

**FSC: Few-point Shape Completion**

- 论文/Paper: http://arxiv.org/pdf/2403.07359
- 代码/Code: None

**Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture**

- 论文/Paper: http://arxiv.org/pdf/2403.07347
- 代码/Code: https://github.com/jiafei127/fd4mm

**A Bayesian Approach to OOD Robustness in Image Classification**

- 论文/Paper: http://arxiv.org/pdf/2403.07277
- 代码/Code: None

[返回目录/back](#Contents)