{"id":26047138,"url":"https://github.com/amusi/CVPR2025-Papers-with-Code","last_synced_at":"2025-03-07T22:01:48.630Z","repository":{"id":37280109,"uuid":"243181735","full_name":"amusi/CVPR2025-Papers-with-Code","owner":"amusi","description":"CVPR 2025 论文和开源项目合集","archived":false,"fork":false,"pushed_at":"2025-02-27T09:35:33.000Z","size":467,"stargazers_count":18856,"open_issues_count":19,"forks_count":2630,"subscribers_count":290,"default_branch":"main","last_synced_at":"2025-02-27T12:41:39.186Z","etag":null,"topics":["computer-vision","cvpr","cvpr2020","cvpr2021","cvpr2022","cvpr2023","cvpr2024","cvpr2025","deep-learning","image-processing","image-segmentation","machine-learning","object-detection","paper","python","semantic-segmentation","transformer","transformers","visual-tracking"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amusi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-26T06:04:25.000Z","updated_at":"2025-02-27T12:39:51.000Z","dependencies_parsed_at":"2024-04-06T07:31:22.148Z","dependency_job_id":"2a79c989-ae05-4fb2-a25d-f5da922a4065","html_url":"https://github.com/amusi/CVPR2025-Papers-with-Code","commit_stats":null,"previous_names":["amusi/cvpr2024-papers-with-code","amusi/cvpr2021-papers-with-code","amusi/cvpr2022-papers-with-code","amusi/cvpr2023-papers-with-code","amusi/cvpr2025-papers-with-code"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2025-Papers-with-Code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2025-Papers-with-Code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2025-Papers-with-Code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2025-Papers-with-Code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amusi","download_url":"https://codeload.github.com/amusi/CVPR2025-Papers-with-Code/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242467564,"owners_count":20133114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr","cvpr2020","cvpr2021","cvpr2022","cvpr2023","cvpr2024","cvpr2025","deep-learning","image-processing","image-segmentation","machine-learning","object-detection","paper","python","semantic-segmentation","transformer","transformers","visual-tracking"],"created_at":"2025-03-07T22:01:47.906Z","updated_at":"2025-03-07T22:01:48.616Z","avatar_url":"https://github.com/amusi.png","language":null,"funding_links":[],"categories":["Coding \u0026 Development","Others"],"sub_categories":[],"readme":"# CVPR 2025 论文和开源项目合集(Papers with Code)\n\nCVPR 2025 decisions are now available on OpenReview！22.1% = 2878 / 13008\n\n\n\u003e 注1：欢迎各位大佬提交issue，分享CVPR 2025论文和开源项目！\n\u003e\n\u003e 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n\u003e\n\u003e - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)\n\u003e - [CVPR 2024](CVPR2024-Papers-with-Code.md)\n\n欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！\n\n![](CVer学术交流群.png)\n\n# 【CVPR 2025 论文开源目录】\n\n- [3DGS(Gaussian Splatting)](#3DGS)\n- [Avatars](#Avatars)\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [Mamba](#Mamba)\n- [Embodied AI](#Embodied-AI)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [多模态大语言模型(MLLM)](#MLLM)\n- [大语言模型(LLM)](#LLM)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [扩散模型(Diffusion Models)](#Diffusion)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [异常检测(Anomaly Detection)](#Anomaly-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像(Medical Image)](#MI)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [3D生成(3D Generation)](#3D-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [具身智能(Embodied AI)](#Embodied)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [暗光图像增强(Low-light Image Enhancement)](#Low-light)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [风格迁移(Style Transfer)](#ST)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n\u003ca name=\"3DGS\"\u003e\u003c/a\u003e\n\n# 3DGS(Gaussian Splatting)\n\n\n\n\n\u003ca name=\"Avatars\"\u003e\u003c/a\u003e\n\n# Avatars\n\n\n# Backbone\n\n\n\n\u003ca name=\"CLIP\"\u003e\u003c/a\u003e\n\n# CLIP\n\n\n\n\u003ca name=\"Mamba\"\u003e\u003c/a\u003e\n\n# Mamba\n\n\n**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**\n\n- Paper: https://arxiv.org/abs/2407.08083\n- Code: https://github.com/NVlabs/MambaVision\n\n**MobileMamba: Lightweight Multi-Receptive Visual Mamba Network**\n\n- Paper: https://arxiv.org/abs/2411.15941\n- Code: https://github.com/lewandofskee/MobileMamba\n\n\u003ca name=\"Embodied-AI\"\u003e\u003c/a\u003e\n\n# Embodied AI\n\n\n\u003ca name=\"GAN\"\u003e\u003c/a\u003e\n\n# GAN\n\n\u003ca name=\"OCR\"\u003e\u003c/a\u003e\n\n# OCR\n\n\n\u003ca name=\"NeRF\"\u003e\u003c/a\u003e\n\n# NeRF\n\n\n\n\u003ca name=\"DETR\"\u003e\u003c/a\u003e\n\n# DETR\n\n\n\n\u003ca name=\"Prompt\"\u003e\u003c/a\u003e\n\n# Prompt\n\n\u003ca name=\"MLLM\"\u003e\u003c/a\u003e\n\n# 多模态大语言模型(MLLM)\n\n**LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences**\n\n- Paper： https://arxiv.org/abs/2412.01292\n- Code: https://github.com/Hoyyyaard/LSceneLLM\n\n\n**DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution**\n\n- Paper: https://arxiv.org/abs/2405.16071\n- Code: https://github.com/callsys/DynRefer\n\n\n\u003ca name=\"LLM\"\u003e\u003c/a\u003e\n\n# 大语言模型(LLM)\n\n\n\n\n\u003ca name=\"NAS\"\u003e\u003c/a\u003e\n\n# NAS\n\n\u003ca name=\"ReID\"\u003e\u003c/a\u003e\n\n# ReID(重识别)\n\n\n\n\u003ca name=\"Diffusion\"\u003e\u003c/a\u003e\n\n# 扩散模型(Diffusion Models)\n\n**TinyFusion: Diffusion Transformers Learned Shallow**\n\n- Paper: https://arxiv.org/abs/2412.01199\n- Code: https://github.com/VainF/TinyFusion\n\n\u003ca name=\"Vision-Transformer\"\u003e\u003c/a\u003e\n\n# Vision Transformer\n\n\n\n\u003ca name=\"VL\"\u003e\u003c/a\u003e\n\n# 视觉和语言(Vision-Language)\n\n\n\n\u003ca name=\"Object-Detection\"\u003e\u003c/a\u003e\n\n# 目标检测(Object Detection)\n\n\n**LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models**\n\n- Paper: https://arxiv.org/abs/2501.18954\n- Code：https://github.com/iSEE-Laboratory/LLMDet\n\n\n\u003ca name=\"Anomaly-Detection\"\u003e\u003c/a\u003e\n\n# 异常检测(Anomaly Detection)\n\n\n\n\u003ca name=\"VT\"\u003e\u003c/a\u003e\n\n# 目标跟踪(Object Tracking)\n\n**Multiple Object Tracking as ID Prediction**\n\n- Paper：https://arxiv.org/abs/2403.16848\n- Code: https://github.com/MCG-NJU/MOTIP\n\n\n\u003ca name=\"MI\"\u003e\u003c/a\u003e\n\n# 医学图像(Medical Image)\n\n\n\n# 医学图像分割(Medical Image Segmentation)\n\n\n\n\u003ca name=\"Autonomous-Driving\"\u003e\u003c/a\u003e\n\n# 自动驾驶(Autonomous Driving)\n\n**LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes**\n\n- Project: https://ldkong.com/LiMoE\n- Paper: https://arxiv.org/abs/2501.04004\n- Code: https://github.com/Xiangxu-0103/LiMoE\n\n\n\n# 3D点云(3D-Point-Cloud)\n\n\n\n\u003ca name=\"3DOD\"\u003e\u003c/a\u003e\n\n# 3D目标检测(3D Object Detection)\n\n\n\n\u003ca name=\"3DOD\"\u003e\u003c/a\u003e\n\n# 3D语义分割(3D Semantic Segmentation)\n\n\u003ca name=\"Image-Editing\"\u003e\u003c/a\u003e\n\n# 图像编辑(Image Editing)\n\n**Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing**\n\n- Paper: https://arxiv.org/abs/2411.16832\n- Code: https://github.com/taco-group/FaceLock\n\n\n\u003ca name=\"Video-Editing\"\u003e\u003c/a\u003e\n\n# 视频编辑(Video Editing)\n\n\n\n\n\u003ca name=\"LLV\"\u003e\u003c/a\u003e\n\n# Low-level Vision\n\n\n\n\u003ca name=\"SR\"\u003e\u003c/a\u003e\n\n# 超分辨率(Super-Resolution)\n\n**AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2412.00124\n- Code: https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution\n\n\n\u003ca name=\"Denoising\"\u003e\u003c/a\u003e\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n\u003ca name=\"3D-Human-Pose-Estimation\"\u003e\u003c/a\u003e\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n\n\n\n\u003ca name=\"Image-Generation\"\u003e\u003c/a\u003e\n\n# 图像生成(Image Generation)\n\n**Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2501.01423\n- Code: https://github.com/hustvl/LightningDiT\n\n**SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2412.04852\n- Code: https://github.com/taco-group/SleeperMark\n\n\n**TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation**\n\n- Homepage: https://byteflow-ai.github.io/TokenFlow/\n- Code: https://github.com/ByteFlow-AI/TokenFlow\n- Paper:https://arxiv.org/abs/2412.03069\n\n**PAR: Parallelized Autoregressive Visual Generation**\n\n- Project: https://epiphqny.github.io/PAR-project/\n- Paper: https://arxiv.org/abs/2412.15119\n- Code: https://github.com/Epiphqny/PAR\n\n\n\u003ca name=\"Video-Generation\"\u003e\u003c/a\u003e\n\n# 视频生成(Video Generation)\n\n**Identity-Preserving Text-to-Video Generation by Frequency Decomposition**\n\n- Paper: https://arxiv.org/abs/2411.17440\n- Code: https://github.com/PKU-YuanGroup/ConsisID\n\n\n**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2407.15642\n- Code: https://github.com/maxin-cn/Cinemo\n\n**X-Dyna: Expressive Dynamic Human Image Animation**\n\n- Paper: https://arxiv.org/abs/2501.10021\n- Code: https://github.com/bytedance/X-Dyna\n\n**PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation**\n\n- Paper: https://arxiv.org/pdf/2412.00596\n- Code: https://github.com/pittisl/PhyT2V\n\n\n**Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model**\n\n- Project: https://liewfeng.github.io/TeaCache/\n- Paper: https://arxiv.org/abs/2411.19108\n- Code: https://github.com/ali-vilab/TeaCache\n\n\n\n\u003ca name=\"3D-Generation\"\u003e\u003c/a\u003e\n\n# 3D生成\n\n\n**Generative Gaussian Splatting for Unbounded 3D City Generation**\n\n- Project: https://haozhexie.com/project/gaussian-city\n- Paper: https://arxiv.org/abs/2406.06526\n- Code: https://github.com/hzxie/GaussianCity\n\n\n\n\u003ca name=\"Video-Understanding\"\u003e\u003c/a\u003e\n\n# 视频理解(Video Understanding)\n\n\u003ca name=\"Embodied\"\u003e\u003c/a\u003e\n\n# 具身智能(Embodied AI)\n\n**Universal Actions for Enhanced Embodied Foundation Models**\n\n- Project: https://2toinf.github.io/UniAct/\n- Paper: https://arxiv.org/abs/2501.10105\n- Code: https://github.com/2toinf/UniAct\n\n\n\n\u003ca name=\"KD\"\u003e\u003c/a\u003e\n\n# 知识蒸馏(Knowledge Distillation)\n\n\u003ca name=\"Depth-Estimation\"\u003e\u003c/a\u003e\n\n\n# 深度估计(Depth Estimation)\n\n**DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos**\n\n- Project: https://depthcrafter.github.io\n- Paper: https://arxiv.org/abs/2409.02095\n- Code: https://github.com/Tencent/DepthCrafter\n\n\n**MonSter: Marry Monodepth to Stereo Unleashes Power**\n\n- Paper: https://arxiv.org/abs/2501.08643\n- Code: https://github.com/Junda24/MonSter\n\n\n\n\u003ca name=\"Stereo-Matching\"\u003e\u003c/a\u003e\n\n# 立体匹配(Stereo Matching)\n\n**MonSter: Marry Monodepth to Stereo Unleashes Power**\n\n- Paper: https://arxiv.org/abs/2501.08643\n- Code: https://github.com/Junda24/MonSter\n\n\n\u003ca name=\"Low-light\"\u003e\u003c/a\u003e\n\n# 暗光图像增强(Low-light Image Enhancement)\n\n\n**HVI: A New color space for Low-light Image Enhancement**\n\n- Paper: https://arxiv.org/abs/2502.20272\n- Code: https://github.com/Fediory/HVI-CIDNet\n- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_\n\n\n\u003ca name=\"SGG\"\u003e\u003c/a\u003e\n\n# 场景图生成(Scene Graph Generation)\n\n\n\n\u003ca name=\"ST\"\u003e\u003c/a\u003e\n\n# 风格迁移(Style Transfer)\n\n**StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements**\n\n- Project: https://stylestudio-official.github.io/\n- Paper: https://arxiv.org/abs/2412.08503\n- Code: https://github.com/Westlake-AGI-Lab/StyleStudio\n\n\n\u003ca name=\"Video-Quality-Assessment\"\u003e\u003c/a\u003e\n\n# 视频质量评价(Video Quality Assessment)\n\n\n\n\u003ca name=\"Datasets\"\u003e\u003c/a\u003e\n\n# 数据集(Datasets)\n\n\n\n\n\u003ca name=\"Others\"\u003e\u003c/a\u003e\n\n# 其他(Others)\n\n\n  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famusi%2FCVPR2025-Papers-with-Code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famusi%2FCVPR2025-Papers-with-Code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famusi%2FCVPR2025-Papers-with-Code/lists"}