https://github.com/amusi/CVPR2025-Papers-with-Code

CVPR 2025 论文和开源项目合集
https://github.com/amusi/CVPR2025-Papers-with-Code

computer-vision cvpr cvpr2020 cvpr2021 cvpr2022 cvpr2023 cvpr2024 cvpr2025 deep-learning image-processing image-segmentation machine-learning object-detection paper python semantic-segmentation transformer transformers visual-tracking

Last synced: 9 months ago
JSON representation

CVPR 2025 论文和开源项目合集

Host: GitHub
URL: https://github.com/amusi/CVPR2025-Papers-with-Code
Owner: amusi
Created: 2020-02-26T06:04:25.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2025-02-27T09:35:33.000Z (10 months ago)
Last Synced: 2025-02-27T12:41:39.186Z (10 months ago)
Topics: computer-vision, cvpr, cvpr2020, cvpr2021, cvpr2022, cvpr2023, cvpr2024, cvpr2025, deep-learning, image-processing, image-segmentation, machine-learning, object-detection, paper, python, semantic-segmentation, transformer, transformers, visual-tracking
Homepage:
Size: 456 KB
Stars: 18,856
Watchers: 290
Forks: 2,630
Open Issues: 19
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # CVPR 2025 论文和开源项目合集(Papers with Code)

CVPR 2025 decisions are now available on OpenReview！22.1% = 2878 / 13008

> 注1：欢迎各位大佬提交issue，分享CVPR 2025论文和开源项目！

>

> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

>

> - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)

> - [CVPR 2024](CVPR2024-Papers-with-Code.md)

欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！

![](CVer学术交流群.png)

# 【CVPR 2025 论文开源目录】

- [3DGS(Gaussian Splatting)](#3DGS)

- [Avatars](#Avatars)

- [Backbone](#Backbone)

- [CLIP](#CLIP)

- [Mamba](#Mamba)

- [Embodied AI](#Embodied-AI)

- [GAN](#GAN)

- [GNN](#GNN)

- [多模态大语言模型(MLLM)](#MLLM)

- [大语言模型(LLM)](#LLM)

- [NAS](#NAS)

- [OCR](#OCR)

- [NeRF](#NeRF)

- [DETR](#DETR)

- [扩散模型(Diffusion Models)](#Diffusion)

- [ReID(重识别)](#ReID)

- [长尾分布(Long-Tail)](#Long-Tail)

- [Vision Transformer](#Vision-Transformer)

- [视觉和语言(Vision-Language)](#VL)

- [自监督学习(Self-supervised Learning)](#SSL)

- [数据增强(Data Augmentation)](#DA)

- [目标检测(Object Detection)](#Object-Detection)

- [异常检测(Anomaly Detection)](#Anomaly-Detection)

- [目标跟踪(Visual Tracking)](#VT)

- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)

- [实例分割(Instance Segmentation)](#Instance-Segmentation)

- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)

- [医学图像(Medical Image)](#MI)

- [医学图像分割(Medical Image Segmentation)](#MIS)

- [视频目标分割(Video Object Segmentation)](#VOS)

- [视频实例分割(Video Instance Segmentation)](#VIS)

- [参考图像分割(Referring Image Segmentation)](#RIS)

- [图像抠图(Image Matting)](#Matting)

- [图像编辑(Image Editing)](#Image-Editing)

- [Low-level Vision](#LLV)

- [超分辨率(Super-Resolution)](#SR)

- [去噪(Denoising)](#Denoising)

- [去模糊(Deblur)](#Deblur)

- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)

- [3D点云(3D Point Cloud)](#3D-Point-Cloud)

- [3D目标检测(3D Object Detection)](#3DOD)

- [3D语义分割(3D Semantic Segmentation)](#3DSS)

- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)

- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)

- [3D配准(3D Registration)](#3D-Registration)

- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)

- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)

- [医学图像(Medical Image)](#Medical-Image)

- [图像生成(Image Generation)](#Image-Generation)

- [视频生成(Video Generation)](#Video-Generation)

- [3D生成(3D Generation)](#3D-Generation)

- [视频理解(Video Understanding)](#Video-Understanding)

- [行为检测(Action Detection)](#Action-Detection)

- [具身智能(Embodied AI)](#Embodied)

- [文本检测(Text Detection)](#Text-Detection)

- [知识蒸馏(Knowledge Distillation)](#KD)

- [模型剪枝(Model Pruning)](#Pruning)

- [图像压缩(Image Compression)](#IC)

- [三维重建(3D Reconstruction)](#3D-Reconstruction)

- [深度估计(Depth Estimation)](#Depth-Estimation)

- [轨迹预测(Trajectory Prediction)](#TP)

- [车道线检测(Lane Detection)](#Lane-Detection)

- [图像描述(Image Captioning)](#Image-Captioning)

- [视觉问答(Visual Question Answering)](#VQA)

- [手语识别(Sign Language Recognition)](#SLR)

- [视频预测(Video Prediction)](#Video-Prediction)

- [新视点合成(Novel View Synthesis)](#NVS)

- [Zero-Shot Learning(零样本学习)](#ZSL)

- [立体匹配(Stereo Matching)](#Stereo-Matching)

- [特征匹配(Feature Matching)](#Feature-Matching)

- [暗光图像增强(Low-light Image Enhancement)](#Low-light)

- [场景图生成(Scene Graph Generation)](#SGG)

- [风格迁移(Style Transfer)](#ST)

- [隐式神经表示(Implicit Neural Representations)](#INR)

- [图像质量评价(Image Quality Assessment)](#IQA)

- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)

- [数据集(Datasets)](#Datasets)

- [新任务(New Tasks)](#New-Tasks)

- [其他(Others)](#Others)



# 3DGS(Gaussian Splatting)



# Avatars

# Backbone



# CLIP



# Mamba

**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**

- Paper: https://arxiv.org/abs/2407.08083

- Code: https://github.com/NVlabs/MambaVision

**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**

- Paper: https://arxiv.org/abs/2411.15941

- Code: https://github.com/lewandofskee/MobileMamba



# Embodied AI



# GAN



# OCR



# NeRF



# DETR



# Prompt



# 多模态大语言模型(MLLM)

**LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences**

- Paper： https://arxiv.org/abs/2412.01292

- Code: https://github.com/Hoyyyaard/LSceneLLM

**DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution**

- Paper: https://arxiv.org/abs/2405.16071

- Code: https://github.com/callsys/DynRefer



# 大语言模型(LLM)



# NAS



# ReID(重识别)



# 扩散模型(Diffusion Models)

**TinyFusion: Diffusion Transformers Learned Shallow**

- Paper: https://arxiv.org/abs/2412.01199

- Code: https://github.com/VainF/TinyFusion



# Vision Transformer



# 视觉和语言(Vision-Language)



# 目标检测(Object Detection)

**LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models**

- Paper: https://arxiv.org/abs/2501.18954

- Code：https://github.com/iSEE-Laboratory/LLMDet



# 异常检测(Anomaly Detection)



# 目标跟踪(Object Tracking)

**Multiple Object Tracking as ID Prediction**

- Paper：https://arxiv.org/abs/2403.16848

- Code: https://github.com/MCG-NJU/MOTIP



# 医学图像(Medical Image)

# 医学图像分割(Medical Image Segmentation)



# 自动驾驶(Autonomous Driving)

**LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes**

- Project: https://ldkong.com/LiMoE

- Paper: https://arxiv.org/abs/2501.04004

- Code: https://github.com/Xiangxu-0103/LiMoE

# 3D点云(3D-Point-Cloud)



# 3D目标检测(3D Object Detection)



# 3D语义分割(3D Semantic Segmentation)



# 图像编辑(Image Editing)

**Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing**

- Paper: https://arxiv.org/abs/2411.16832

- Code: https://github.com/taco-group/FaceLock



# 视频编辑(Video Editing)



# Low-level Vision



# 超分辨率(Super-Resolution)

**AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution**

- Paper: https://arxiv.org/abs/2412.00124

- Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution



# 去噪(Denoising)

## 图像去噪(Image Denoising)



# 3D人体姿态估计(3D Human Pose Estimation)



# 图像生成(Image Generation)

**Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models**

- Paper: https://arxiv.org/abs/2501.01423

- Code: https://github.com/hustvl/LightningDiT

**SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models**

- Paper: https://arxiv.org/abs/2412.04852

- Code: https://github.com/taco-group/SleeperMark

**TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation**

- Homepage: https://byteflow-ai.github.io/TokenFlow/

- Code: https://github.com/ByteFlow-AI/TokenFlow

- Paper:https://arxiv.org/abs/2412.03069

**PAR: Parallelized Autoregressive Visual Generation**

- Project: https://epiphqny.github.io/PAR-project/

- Paper: https://arxiv.org/abs/2412.15119

- Code: https://github.com/Epiphqny/PAR



# 视频生成(Video Generation)

**Identity-Preserving Text-to-Video Generation by Frequency Decomposition**

- Paper: https://arxiv.org/abs/2411.17440

- Code: https://github.com/PKU-YuanGroup/ConsisID

**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**

- Paper: https://arxiv.org/abs/2407.15642

- Code: https://github.com/maxin-cn/Cinemo

**X-Dyna: Expressive Dynamic Human Image Animation**

- Paper: https://arxiv.org/abs/2501.10021

- Code: https://github.com/bytedance/X-Dyna

**PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation**

- Paper: https://arxiv.org/pdf/2412.00596

- Code: https://github.com/pittisl/PhyT2V

**Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model**

- Project: https://liewfeng.github.io/TeaCache/

- Paper: https://arxiv.org/abs/2411.19108

- Code: https://github.com/ali-vilab/TeaCache



# 3D生成

**Generative Gaussian Splatting for Unbounded 3D City Generation**

- Project: https://haozhexie.com/project/gaussian-city

- Paper: https://arxiv.org/abs/2406.06526

- Code: https://github.com/hzxie/GaussianCity



# 视频理解(Video Understanding)



# 具身智能(Embodied AI)

**Universal Actions for Enhanced Embodied Foundation Models**

- Project: https://2toinf.github.io/UniAct/

- Paper: https://arxiv.org/abs/2501.10105

- Code: https://github.com/2toinf/UniAct



# 知识蒸馏(Knowledge Distillation)



# 深度估计(Depth Estimation)

**DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos**

- Project: https://depthcrafter.github.io

- Paper: https://arxiv.org/abs/2409.02095

- Code: https://github.com/Tencent/DepthCrafter

**MonSter: Marry Monodepth to Stereo Unleashes Power**

- Paper: https://arxiv.org/abs/2501.08643

- Code: https://github.com/Junda24/MonSter



# 立体匹配(Stereo Matching)

**MonSter: Marry Monodepth to Stereo Unleashes Power**

- Paper: https://arxiv.org/abs/2501.08643

- Code: https://github.com/Junda24/MonSter



# 暗光图像增强(Low-light Image Enhancement)

**HVI: A New color space for Low-light Image Enhancement**

- Paper: https://arxiv.org/abs/2502.20272

- Code: https://github.com/Fediory/HVI-CIDNet

- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_



# 场景图生成(Scene Graph Generation)



# 风格迁移(Style Transfer)

**StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements**

- Project: https://stylestudio-official.github.io/

- Paper: https://arxiv.org/abs/2412.08503

- Code: https://github.com/Westlake-AGI-Lab/StyleStudio



# 视频质量评价(Video Quality Assessment)



# 数据集(Datasets)



# 其他(Others)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amusi/CVPR2025-Papers-with-Code

Awesome Lists containing this project

README