{"id":15035080,"url":"https://github.com/amusi/cvpr2024-papers-with-code","last_synced_at":"2025-02-25T12:16:28.692Z","repository":{"id":37280109,"uuid":"243181735","full_name":"amusi/CVPR2024-Papers-with-Code","owner":"amusi","description":"CVPR 2024 论文和开源项目合集","archived":false,"fork":false,"pushed_at":"2024-07-04T10:00:19.000Z","size":454,"stargazers_count":18250,"open_issues_count":19,"forks_count":2591,"subscribers_count":291,"default_branch":"master","last_synced_at":"2024-11-12T18:02:29.330Z","etag":null,"topics":["computer-vision","cvpr","cvpr2020","cvpr2021","cvpr2022","cvpr2023","cvpr2024","deep-learning","image-processing","image-segmentation","machine-learning","object-detection","paper","python","semantic-segmentation","transformer","transformers","visual-tracking"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amusi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-26T06:04:25.000Z","updated_at":"2024-11-12T15:53:39.000Z","dependencies_parsed_at":"2024-04-06T07:31:22.148Z","dependency_job_id":"2a79c989-ae05-4fb2-a25d-f5da922a4065","html_url":"https://github.com/amusi/CVPR2024-Papers-with-Code","commit_stats":null,"previous_names":["amusi/cvpr2024-papers-with-code","amusi/cvpr2021-papers-with-code","amusi/cvpr2022-papers-with-code","amusi/cvpr2023-papers-with-code"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2024-Papers-with-Code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2024-Papers-with-Code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2024-Papers-with-Code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amusi%2FCVPR2024-Papers-with-Code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amusi","download_url":"https://codeload.github.com/amusi/CVPR2024-Papers-with-Code/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240663555,"owners_count":19837378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr","cvpr2020","cvpr2021","cvpr2022","cvpr2023","cvpr2024","deep-learning","image-processing","image-segmentation","machine-learning","object-detection","paper","python","semantic-segmentation","transformer","transformers","visual-tracking"],"created_at":"2024-09-24T20:27:26.858Z","updated_at":"2025-02-25T12:16:28.310Z","avatar_url":"https://github.com/amusi.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# CVPR 2024 论文和开源项目合集(Papers with Code)\n\nCVPR 2024 decisions are now available on OpenReview！\n\n\n\u003e 注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！\n\u003e\n\u003e 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision\n\u003e\n\u003e - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code)\n\u003e - [CVPR 2023](CVPR2022-Papers-with-Code.md)\n\n欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！\n\n![](CVer学术交流群.png)\n\n# 【CVPR 2024 论文开源目录】\n\n- [3DGS(Gaussian Splatting)](#3DGS)\n- [Avatars](#Avatars)\n- [Backbone](#Backbone)\n- [CLIP](#CLIP)\n- [MAE](#MAE)\n- [Embodied AI](#Embodied-AI)\n- [GAN](#GAN)\n- [GNN](#GNN)\n- [多模态大语言模型(MLLM)](#MLLM)\n- [大语言模型(LLM)](#LLM)\n- [NAS](#NAS)\n- [OCR](#OCR)\n- [NeRF](#NeRF)\n- [DETR](#DETR)\n- [Prompt](#Prompt)\n- [扩散模型(Diffusion Models)](#Diffusion)\n- [ReID(重识别)](#ReID)\n- [长尾分布(Long-Tail)](#Long-Tail)\n- [Vision Transformer](#Vision-Transformer)\n- [视觉和语言(Vision-Language)](#VL)\n- [自监督学习(Self-supervised Learning)](#SSL)\n- [数据增强(Data Augmentation)](#DA)\n- [目标检测(Object Detection)](#Object-Detection)\n- [异常检测(Anomaly Detection)](#Anomaly-Detection)\n- [目标跟踪(Visual Tracking)](#VT)\n- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)\n- [实例分割(Instance Segmentation)](#Instance-Segmentation)\n- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)\n- [医学图像(Medical Image)](#MI)\n- [医学图像分割(Medical Image Segmentation)](#MIS)\n- [视频目标分割(Video Object Segmentation)](#VOS)\n- [视频实例分割(Video Instance Segmentation)](#VIS)\n- [参考图像分割(Referring Image Segmentation)](#RIS)\n- [图像抠图(Image Matting)](#Matting)\n- [图像编辑(Image Editing)](#Image-Editing)\n- [Low-level Vision](#LLV)\n- [超分辨率(Super-Resolution)](#SR)\n- [去噪(Denoising)](#Denoising)\n- [去模糊(Deblur)](#Deblur)\n- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)\n- [3D点云(3D Point Cloud)](#3D-Point-Cloud)\n- [3D目标检测(3D Object Detection)](#3DOD)\n- [3D语义分割(3D Semantic Segmentation)](#3DSS)\n- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)\n- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)\n- [3D配准(3D Registration)](#3D-Registration)\n- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)\n- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)\n- [医学图像(Medical Image)](#Medical-Image)\n- [图像生成(Image Generation)](#Image-Generation)\n- [视频生成(Video Generation)](#Video-Generation)\n- [3D生成(3D Generation)](#3D-Generation)\n- [视频理解(Video Understanding)](#Video-Understanding)\n- [行为检测(Action Detection)](#Action-Detection)\n- [文本检测(Text Detection)](#Text-Detection)\n- [知识蒸馏(Knowledge Distillation)](#KD)\n- [模型剪枝(Model Pruning)](#Pruning)\n- [图像压缩(Image Compression)](#IC)\n- [三维重建(3D Reconstruction)](#3D-Reconstruction)\n- [深度估计(Depth Estimation)](#Depth-Estimation)\n- [轨迹预测(Trajectory Prediction)](#TP)\n- [车道线检测(Lane Detection)](#Lane-Detection)\n- [图像描述(Image Captioning)](#Image-Captioning)\n- [视觉问答(Visual Question Answering)](#VQA)\n- [手语识别(Sign Language Recognition)](#SLR)\n- [视频预测(Video Prediction)](#Video-Prediction)\n- [新视点合成(Novel View Synthesis)](#NVS)\n- [Zero-Shot Learning(零样本学习)](#ZSL)\n- [立体匹配(Stereo Matching)](#Stereo-Matching)\n- [特征匹配(Feature Matching)](#Feature-Matching)\n- [场景图生成(Scene Graph Generation)](#SGG)\n- [隐式神经表示(Implicit Neural Representations)](#INR)\n- [图像质量评价(Image Quality Assessment)](#IQA)\n- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)\n- [数据集(Datasets)](#Datasets)\n- [新任务(New Tasks)](#New-Tasks)\n- [其他(Others)](#Others)\n\n\u003ca name=\"3DGS\"\u003e\u003c/a\u003e\n\n# 3DGS(Gaussian Splatting)\n\n**Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**\n\n- Homepage: https://city-super.github.io/scaffold-gs/\n- Paper: https://arxiv.org/abs/2312.00109\n- Code: https://github.com/city-super/Scaffold-GS\n\n**GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**\n\n- Homepage: https://shunyuanzheng.github.io/GPS-Gaussian \n- Paper: https://arxiv.org/abs/2312.02155\n- Code: https://github.com/ShunyuanZheng/GPS-Gaussian\n\n**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**\n\n- Paper: https://arxiv.org/abs/2312.02134\n- Code: https://github.com/huliangxiao/GaussianAvatar\n\n**GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**\n\n- Paper: https://arxiv.org/abs/2311.14521\n- Code: https://github.com/buaacyw/GaussianEditor \n\n**Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**\n\n- Homepage: https://ingra14m.github.io/Deformable-Gaussians/ \n- Paper: https://arxiv.org/abs/2309.13101\n- Code: https://github.com/ingra14m/Deformable-3D-Gaussians\n\n**SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**\n\n- Homepage: https://yihua7.github.io/SC-GS-web/ \n- Paper: https://arxiv.org/abs/2312.14937\n- Code: https://github.com/yihua7/SC-GS\n\n**Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**\n\n- Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ \n- Paper: https://arxiv.org/abs/2312.16812\n- Code: https://github.com/oppo-us-research/SpacetimeGaussians\n\n**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**\n\n- Homepage: https://fictionarry.github.io/DNGaussian/\n- Paper: https://arxiv.org/abs/2403.06912\n- Code: https://github.com/Fictionarry/DNGaussian\n\n**4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**\n\n- Paper: https://arxiv.org/abs/2310.08528\n- Code: https://github.com/hustvl/4DGaussians\n\n**GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2310.08529\n- Code: https://github.com/hustvl/GaussianDreamer\n\n\u003ca name=\"Avatars\"\u003e\u003c/a\u003e\n\n# Avatars\n\n**GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**\n\n- Paper: https://arxiv.org/abs/2312.02134\n- Code: https://github.com/huliangxiao/GaussianAvatar\n\n**Real-Time Simulated Avatar from Head-Mounted Sensors**\n\n- Homepage: https://www.zhengyiluo.com/SimXR/\n- Paper: https://arxiv.org/abs/2403.06862\n\n\u003ca name=\"Backbone\"\u003e\u003c/a\u003e\n\n# Backbone\n\n**RepViT: Revisiting Mobile CNN From ViT Perspective**\n\n- Paper: https://arxiv.org/abs/2307.09283\n- Code: https://github.com/THU-MIG/RepViT\n\n**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**\n\n- Paper: https://arxiv.org/abs/2311.17132\n- Code: https://github.com/DaiShiResearch/TransNeXt\n\n\u003ca name=\"CLIP\"\u003e\u003c/a\u003e\n\n# CLIP\n\n**Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**\n\n- Paper: https://arxiv.org/abs/2312.03818\n- Code: https://github.com/SunzeY/AlphaCLIP\n\n**FairCLIP: Harnessing Fairness in Vision-Language Learning**\n\n- Paper: https://arxiv.org/abs/2403.19949\n- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP\n\n\u003ca name=\"MAE\"\u003e\u003c/a\u003e\n\n# MAE\n\n\u003ca name=\"Embodied-AI\"\u003e\u003c/a\u003e\n\n# Embodied AI\n\n**EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**\n\n- Homepage: https://tai-wang.github.io/embodiedscan/\n- Paper: https://arxiv.org/abs/2312.16170\n- Code: https://github.com/OpenRobotLab/EmbodiedScan\n\n**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**\n\n- Homepage: https://iranqin.github.io/MP5.github.io/ \n- Paper: https://arxiv.org/abs/2312.07472\n- Code: https://github.com/IranQin/MP5\n\n**LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**\n\n- Paper: https://arxiv.org/abs/2312.08963\n- Code: https://github.com/yyvhang/lemon_3d \n\n\u003ca name=\"GAN\"\u003e\u003c/a\u003e\n\n# GAN\n\n\u003ca name=\"OCR\"\u003e\u003c/a\u003e\n\n# OCR\n\n**An Empirical Study of Scaling Law for OCR**\n\n- Paper: https://arxiv.org/abs/2401.00028\n- Code: https://github.com/large-ocr-model/large-ocr-model.github.io\n\n**ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**\n\n- Paper: https://arxiv.org/abs/2403.00303\n- Code: https://github.com/PriNing/ODM \n\n\u003ca name=\"NeRF\"\u003e\u003c/a\u003e\n\n# NeRF\n\n**PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**\n\n- Paper: https://arxiv.org/abs/2311.13099\n- Code: https://github.com/FYTalon/pienerf/ \n\n\u003ca name=\"DETR\"\u003e\u003c/a\u003e\n\n# DETR\n\n**DETRs Beat YOLOs on Real-time Object Detection**\n\n- Paper: https://arxiv.org/abs/2304.08069\n- Code: https://github.com/lyuwenyu/RT-DETR\n\n**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**\n\n- Paper: https://arxiv.org/abs/2403.16131\n- Code: https://github.com/xiuqhou/Salience-DETR\n\n\u003ca name=\"Prompt\"\u003e\u003c/a\u003e\n\n# Prompt\n\n\u003ca name=\"MLLM\"\u003e\u003c/a\u003e\n\n# 多模态大语言模型(MLLM)\n\n**mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**\n\n- Paper: https://arxiv.org/abs/2311.04257\n- Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2\n\n**Link-Context Learning for Multimodal LLMs**\n\n- Paper: https://arxiv.org/abs/2308.07891\n- Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main \n\n**OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**\n\n- Paper: https://arxiv.org/abs/2311.17911\n- Code: https://github.com/shikiw/OPERA\n\n**Making Large Multimodal Models Understand Arbitrary Visual Prompts**\n\n- Homepage: https://vip-llava.github.io/ \n- Paper: https://arxiv.org/abs/2312.00784\n\n**Pink: Unveiling the power of referential comprehension for multi-modal llms**\n\n- Paper: https://arxiv.org/abs/2310.00582\n- Code: https://github.com/SY-Xuan/Pink\n\n**Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**\n\n- Paper: https://arxiv.org/abs/2311.08046\n- Code: https://github.com/PKU-YuanGroup/Chat-UniVi\n\n**OneLLM: One Framework to Align All Modalities with Language**\n\n- Paper: https://arxiv.org/abs/2312.03700\n- Code: https://github.com/csuhan/OneLLM\n\n\u003ca name=\"LLM\"\u003e\u003c/a\u003e\n\n# 大语言模型(LLM)\n\n**VTimeLLM: Empower LLM to Grasp Video Moments**\n\n- Paper: https://arxiv.org/abs/2311.18445\n- Code: https://github.com/huangb23/VTimeLLM \n\n\u003ca name=\"NAS\"\u003e\u003c/a\u003e\n\n# NAS\n\n\u003ca name=\"ReID\"\u003e\u003c/a\u003e\n\n# ReID(重识别)\n\n**Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**\n\n- Paper: https://arxiv.org/abs/2403.10254\n- Code: https://github.com/924973292/EDITOR \n\n**Noisy-Correspondence Learning for Text-to-Image Person Re-identification**\n\n- Paper: https://arxiv.org/abs/2308.09911\n\n- Code : https://github.com/QinYang79/RDE \n\n\u003ca name=\"Diffusion\"\u003e\u003c/a\u003e\n\n# 扩散模型(Diffusion Models)\n\n**InstanceDiffusion: Instance-level Control for Image Generation**\n\n- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/\n\n- Paper: https://arxiv.org/abs/2402.03290\n- Code: https://github.com/frank-xwang/InstanceDiffusion\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**DeepCache: Accelerating Diffusion Models for Free**\n\n- Paper: https://arxiv.org/abs/2312.00858\n- Code: https://github.com/horseee/DeepCache\n\n**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**\n\n- Homepage: https://tianhao-qi.github.io/DEADiff/ \n\n- Paper: https://arxiv.org/abs/2403.06951\n- Code: https://github.com/Tianhao-Qi/DEADiff_code\n\n**SVGDreamer: Text Guided SVG Generation with Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.16476\n- Code: https://ximinng.github.io/SVGDreamer-project/\n\n**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.05849\n- Code: https://github.com/jiuntian/interactdiffusion\n\n**MMA-Diffusion: MultiModal Attack on Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2311.17516\n- Code: https://github.com/yangyijune/MMA-Diffusion\n\n**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**\n\n- Homeoage: https://video-motion-customization.github.io/ \n- Paper: https://arxiv.org/abs/2312.00845\n- Code: https://github.com/HyeonHo99/Video-Motion-Customization\n\n\u003ca name=\"Vision-Transformer\"\u003e\u003c/a\u003e\n\n# Vision Transformer\n\n**TransNeXt: Robust Foveal Visual Perception for Vision Transformers**\n\n- Paper: https://arxiv.org/abs/2311.17132\n- Code: https://github.com/DaiShiResearch/TransNeXt\n\n**RepViT: Revisiting Mobile CNN From ViT Perspective**\n\n- Paper: https://arxiv.org/abs/2307.09283\n- Code: https://github.com/THU-MIG/RepViT\n\n**A General and Efficient Training for Transformer via Token Expansion**\n\n- Paper: https://arxiv.org/abs/2404.00672\n- Code: https://github.com/Osilly/TokenExpansion \n\n\u003ca name=\"VL\"\u003e\u003c/a\u003e\n\n# 视觉和语言(Vision-Language)\n\n**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**\n\n- Paper: https://arxiv.org/abs/2403.02781\n- Code: https://github.com/zhengli97/PromptKD\n\n**FairCLIP: Harnessing Fairness in Vision-Language Learning**\n\n- Paper: https://arxiv.org/abs/2403.19949\n- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP\n\n\u003ca name=\"Object-Detection\"\u003e\u003c/a\u003e\n\n# 目标检测(Object Detection)\n\n**DETRs Beat YOLOs on Real-time Object Detection**\n\n- Paper: https://arxiv.org/abs/2304.08069\n- Code: https://github.com/lyuwenyu/RT-DETR\n\n**Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**\n\n- Paper: https://arxiv.org/abs/2312.01220\n- Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation \n\n**YOLO-World: Real-Time Open-Vocabulary Object Detection**\n\n- Paper: https://arxiv.org/abs/2401.17270\n- Code: https://github.com/AILab-CVC/YOLO-World\n\n**Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**\n\n- Paper: https://arxiv.org/abs/2403.16131\n- Code: https://github.com/xiuqhou/Salience-DETR\n\n\u003ca name=\"Anomaly-Detection\"\u003e\u003c/a\u003e\n\n# 异常检测(Anomaly Detection)\n\n**Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**\n\n- Paper: https://arxiv.org/abs/2310.12790\n- Code: https://github.com/mala-lab/AHL\n\n\u003ca name=\"VT\"\u003e\u003c/a\u003e\n\n# 目标跟踪(Object Tracking)\n\n**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**\n\n- Paper: https://arxiv.org/abs/2403.04700\n- Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT \n\n\u003ca name=\"Semantic-Segmentation\"\u003e\u003c/a\u003e\n\n# 语义分割(Semantic Segmentation)\n\n**Stronger, Fewer, \u0026 Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2312.04265\n- Code: https://github.com/w1oves/Rein\n\n**SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**\n\n- Paper: https://arxiv.org/abs/2311.15537\n- Code: https://github.com/xb534/SED \n\n\u003ca name=\"MI\"\u003e\u003c/a\u003e\n\n# 医学图像(Medical Image)\n\n**Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**\n\n- Paper: https://arxiv.org/abs/2402.17228\n- Code: https://github.com/DearCaat/RRT-MIL\n\n**VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**\n\n- Paper: https://arxiv.org/abs/2402.17300\n- Code: https://github.com/Luffy03/VoCo\n\n**ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**\n\n- Paper: https://arxiv.org/abs/2311.15264\n- Code: https://github.com/nicoboou/chada_vit \n\n\u003ca name=\"MIS\"\u003e\u003c/a\u003e\n\n# 医学图像分割(Medical Image Segmentation)\n\n\n\n\u003ca name=\"Autonomous-Driving\"\u003e\u003c/a\u003e\n\n# 自动驾驶(Autonomous Driving)\n\n**UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2310.08370\n- Code: https://github.com/Nightmare-n/UniPAD\n\n**Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**\n\n- Paper: https://arxiv.org/abs/2311.17663\n- Code: https://github.com/haomo-ai/Cam4DOcc\n\n**Memory-based Adapters for Online 3D Scene Perception**\n\n- Paper: https://arxiv.org/abs/2403.06974\n- Code: https://github.com/xuxw98/Online3D\n\n**Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**\n\n- Paper: https://arxiv.org/abs/2306.15670\n- Code: https://github.com/hustvl/Symphonies\n\n**A Real-world Large-scale Dataset for Roadside Cooperative Perception**\n\n- Paper: https://arxiv.org/abs/2403.10145\n- Code: https://github.com/AIR-THU/DAIR-RCooper\n\n**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**\n\n- Paper: https://arxiv.org/abs/2403.07535\n- Code: https://github.com/Junda24/AFNet\n\n**Traffic Scene Parsing through the TSP6K Dataset**\n\n- Paper: https://arxiv.org/pdf/2303.02835.pdf\n- Code: https://github.com/PengtaoJiang/TSP6K \n\n\u003ca name=\"3D-Point-Cloud\"\u003e\u003c/a\u003e\n\n# 3D点云(3D-Point-Cloud)\n\n\n\n\u003ca name=\"3DOD\"\u003e\u003c/a\u003e\n\n# 3D目标检测(3D Object Detection)\n\n**PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**\n\n- Paper: https://arxiv.org/abs/2312.08371\n- Code: https://github.com/kuanchihhuang/PTT\n\n**UniMODE: Unified Monocular 3D Object Detection**\n\n- Paper: https://arxiv.org/abs/2402.18573\n\n\u003ca name=\"3DOD\"\u003e\u003c/a\u003e\n\n# 3D语义分割(3D Semantic Segmentation)\n\n\u003ca name=\"Image-Editing\"\u003e\u003c/a\u003e\n\n# 图像编辑(Image Editing)\n\n**Edit One for All: Interactive Batch Image Editing**\n\n- Homepage: https://thaoshibe.github.io/edit-one-for-all \n- Paper: https://arxiv.org/abs/2401.10219\n- Code: https://github.com/thaoshibe/edit-one-for-all\n\n\u003ca name=\"Video-Editing\"\u003e\u003c/a\u003e\n\n# 视频编辑(Video Editing)\n\n**MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**\n\n- Homepage:  [https://maskint.github.io](https://maskint.github.io/) \n\n- Paper: https://arxiv.org/abs/2312.12468\n\n\u003ca name=\"LLV\"\u003e\u003c/a\u003e\n\n# Low-level Vision\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**Boosting Image Restoration via Priors from Pre-trained Models**\n\n- Paper: https://arxiv.org/abs/2403.06793\n\n\u003ca name=\"SR\"\u003e\u003c/a\u003e\n\n# 超分辨率(Super-Resolution)\n\n**SeD: Semantic-Aware Discriminator for Image Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2402.19387\n- Code: https://github.com/lbc12345/SeD\n\n**APISR: Anime Production Inspired Real-World Anime Super-Resolution**\n\n- Paper: https://arxiv.org/abs/2403.01598\n- Code: https://github.com/Kiteretsu77/APISR \n\n\u003ca name=\"Denoising\"\u003e\u003c/a\u003e\n\n# 去噪(Denoising)\n\n## 图像去噪(Image Denoising)\n\n\u003ca name=\"3D-Human-Pose-Estimation\"\u003e\u003c/a\u003e\n\n# 3D人体姿态估计(3D Human Pose Estimation)\n\n**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**\n\n- Paper: https://arxiv.org/abs/2311.12028\n- Code: https://github.com/NationalGAILab/HoT \n\n\u003ca name=\"Image-Generation\"\u003e\u003c/a\u003e\n\n# 图像生成(Image Generation)\n\n**InstanceDiffusion: Instance-level Control for Image Generation**\n\n- Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/\n\n- Paper: https://arxiv.org/abs/2402.03290\n- Code: https://github.com/frank-xwang/InstanceDiffusion\n\n**ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**\n\n- Homepage: https://eclipse-t2i.vercel.app/\n- Paper: https://arxiv.org/abs/2312.04655\n\n- Code: https://github.com/eclipse-t2i/eclipse-inference\n\n**Instruct-Imagen: Image Generation with Multi-modal Instruction**\n\n- Paper: https://arxiv.org/abs/2401.01952\n\n**Residual Denoising Diffusion Models**\n\n- Paper: https://arxiv.org/abs/2308.13712\n- Code: https://github.com/nachifur/RDDM\n\n**UniGS: Unified Representation for Image Generation and Segmentation**\n\n- Paper: https://arxiv.org/abs/2312.01985\n\n**Multi-Instance Generation Controller for Text-to-Image Synthesis**\n\n- Paper: https://arxiv.org/abs/2402.05408\n- Code: https://github.com/limuloo/migc\n\n**SVGDreamer: Text Guided SVG Generation with Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.16476\n- Code: https://ximinng.github.io/SVGDreamer-project/\n\n**InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**\n\n- Paper: https://arxiv.org/abs/2312.05849\n- Code: https://github.com/jiuntian/interactdiffusion\n\n**Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**\n\n- Paper: https://arxiv.org/abs/2311.17002\n- Code: https://github.com/ali-vilab/Ranni\n\n\u003ca name=\"Video-Generation\"\u003e\u003c/a\u003e\n\n# 视频生成(Video Generation)\n\n**Vlogger: Make Your Dream A Vlog**\n\n- Paper: https://arxiv.org/abs/2401.09414\n- Code: https://github.com/Vchitect/Vlogger\n\n**VBench: Comprehensive Benchmark Suite for Video Generative Models**\n\n- Homepage: https://vchitect.github.io/VBench-project/ \n- Paper: https://arxiv.org/abs/2311.17982\n- Code: https://github.com/Vchitect/VBench\n\n**VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**\n\n- Homeoage: https://video-motion-customization.github.io/ \n- Paper: https://arxiv.org/abs/2312.00845\n- Code: https://github.com/HyeonHo99/Video-Motion-Customization\n\n\u003ca name=\"3D-Generation\"\u003e\u003c/a\u003e\n\n# 3D生成\n\n**CityDreamer: Compositional Generative Model of Unbounded 3D Cities**\n\n- Homepage: https://haozhexie.com/project/city-dreamer/ \n- Paper: https://arxiv.org/abs/2309.00610\n- Code: https://github.com/hzxie/city-dreamer\n\n**LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**\n\n- Paper: https://arxiv.org/abs/2311.11284\n- Code: https://github.com/EnVision-Research/LucidDreamer \n\n\u003ca name=\"Video-Understanding\"\u003e\u003c/a\u003e\n\n# 视频理解(Video Understanding)\n\n**MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**\n\n- Paper: https://arxiv.org/abs/2311.17005\n- Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 \n\n\u003ca name=\"KD\"\u003e\u003c/a\u003e\n\n# 知识蒸馏(Knowledge Distillation)\n\n**Logit Standardization in Knowledge Distillation**\n\n- Paper: https://arxiv.org/abs/2403.01427\n- Code: https://github.com/sunshangquan/logit-standardization-KD\n\n**Efficient Dataset Distillation via Minimax Diffusion**\n\n- Paper: https://arxiv.org/abs/2311.15529\n- Code: https://github.com/vimar-gu/MinimaxDiffusion\n\n\u003ca name=\"Stereo-Matching\"\u003e\u003c/a\u003e\n\n# 立体匹配(Stereo Matching)\n\n**Neural Markov Random Field for Stereo Matching**\n\n- Paper: https://arxiv.org/abs/2403.11193\n- Code: https://github.com/aeolusguan/NMRF \n\n\u003ca name=\"SGG\"\u003e\u003c/a\u003e\n\n# 场景图生成(Scene Graph Generation)\n\n**HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**\n\n- Homepage: https://zhangce01.github.io/HiKER-SGG/ \n- Paper : https://arxiv.org/abs/2403.12033\n- Code: https://github.com/zhangce01/HiKER-SGG\n\n\u003ca name=\"Video-Quality-Assessment\"\u003e\u003c/a\u003e\n\n# 视频质量评价(Video Quality Assessment)\n\n**KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**\n\n- Homepage: https://lixinustc.github.io/projects/KVQ/ \n\n- Paper: https://arxiv.org/abs/2402.07220\n- Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024\n\n\u003ca name=\"Datasets\"\u003e\u003c/a\u003e\n\n# 数据集(Datasets)\n\n**A Real-world Large-scale Dataset for Roadside Cooperative Perception**\n\n- Paper: https://arxiv.org/abs/2403.10145\n- Code: https://github.com/AIR-THU/DAIR-RCooper\n\n**Traffic Scene Parsing through the TSP6K Dataset**\n\n- Paper: https://arxiv.org/pdf/2303.02835.pdf\n- Code: https://github.com/PengtaoJiang/TSP6K \n\n\u003ca name=\"Others\"\u003e\u003c/a\u003e\n\n# 其他(Others)\n\n**Object Recognition as Next Token Prediction**\n\n- Paper: https://arxiv.org/abs/2312.02142\n- Code: https://github.com/kaiyuyue/nxtp\n\n**ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**\n\n- Paper: https://arxiv.org/abs/2306.14525\n- Code: https://parameternet.github.io/ \n\n**Seamless Human Motion Composition with Blended Positional Encodings**\n\n- Paper: https://arxiv.org/abs/2402.15509\n- Code: https://github.com/BarqueroGerman/FlowMDM \n\n**LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**\n\n- Homepage:  https://ll3da.github.io/ \n\n- Paper: https://arxiv.org/abs/2311.18651\n- Code: https://github.com/Open3DA/LL3DA\n\n **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**\n\n- Homepage: https://clova-tool.github.io/ \n- Paper: https://arxiv.org/abs/2312.10908\n\n**MoMask: Generative Masked Modeling of 3D Human Motions**\n\n- Paper: https://arxiv.org/abs/2312.00063\n- Code: https://github.com/EricGuo5513/momask-codes\n\n **Amodal Ground Truth and Completion in the Wild**\n\n- Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ \n- Paper: https://arxiv.org/abs/2312.17247\n- Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild\n\n**Improved Visual Grounding through Self-Consistent Explanations**\n\n- Paper: https://arxiv.org/abs/2312.04554\n- Code: https://github.com/uvavision/SelfEQ\n\n**ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**\n\n- Homepage: https://chenshuang-zhang.github.io/imagenet_d/\n- Paper: https://arxiv.org/abs/2403.18775\n- Code: https://github.com/chenshuang-zhang/imagenet_d\n\n**Learning from Synthetic Human Group Activities**\n\n- Homepage: https://cjerry1243.github.io/M3Act/ \n- Paper  https://arxiv.org/abs/2306.16772\n- Code: https://github.com/cjerry1243/M3Act\n\n**A Cross-Subject Brain Decoding Framework**\n\n- Homepage: https://littlepure2333.github.io/MindBridge/\n- Paper: https://arxiv.org/abs/2404.07850\n- Code: https://github.com/littlepure2333/MindBridge\n\n**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**\n\n- Paper : https://arxiv.org/abs/2403.17749\n- Code: https://github.com/YuqiYang213/MLoRE\n\n**Contrastive Mean-Shift Learning for Generalized Category Discovery**\n\n- Homepage: https://postech-cvlab.github.io/cms/ \n- Paper: https://arxiv.org/abs/2404.09451\n- Code: https://github.com/sua-choi/CMS\n  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famusi%2Fcvpr2024-papers-with-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famusi%2Fcvpr2024-papers-with-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famusi%2Fcvpr2024-papers-with-code/lists"}