{"id":18110101,"url":"https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo","last_synced_at":"2025-03-29T17:32:17.601Z","repository":{"id":37368207,"uuid":"347364162","full_name":"DWCTOD/CVPR2024-Papers-with-Code-Demo","owner":"DWCTOD","description":"收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!","archived":false,"fork":false,"pushed_at":"2024-04-25T14:34:34.000Z","size":140,"stargazers_count":1335,"open_issues_count":1,"forks_count":150,"subscribers_count":27,"default_branch":"main","last_synced_at":"2025-03-23T12:41:42.767Z","etag":null,"topics":["computer-vision","cvpr","cvpr2021","cvpr2022","cvpr2023","cvpr2024","llm","multimodal-deep-learning","object-detection","segment-anything","segmentation"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DWCTOD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-13T12:26:36.000Z","updated_at":"2025-03-19T15:22:11.000Z","dependencies_parsed_at":"2024-03-17T06:30:13.436Z","dependency_job_id":"1e891a31-d824-4bc1-bf8c-b3ac4f508776","html_url":"https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo","commit_stats":null,"previous_names":["dwctod/cvpr2024-papers-with-code-demo","dwctod/cvpr2022-papers-with-code-demo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DWCTOD%2FCVPR2024-Papers-with-Code-Demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DWCTOD%2FCVPR2024-Papers-with-Code-Demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DWCTOD%2FCVPR2024-Papers-with-Code-Demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DWCTOD%2FCVPR2024-Papers-with-Code-Demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DWCTOD","download_url":"https://codeload.github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246220854,"owners_count":20742864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr","cvpr2021","cvpr2022","cvpr2023","cvpr2024","llm","multimodal-deep-learning","object-detection","segment-anything","segmentation"],"created_at":"2024-11-01T00:02:07.499Z","updated_at":"2025-03-29T17:32:17.306Z","avatar_url":"https://github.com/DWCTOD.png","language":null,"funding_links":[],"categories":["100 + 𝗔𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗟𝗶𝘀𝘁 𝘄𝗶𝘁𝗵 𝗰𝗼𝗱𝗲"],"sub_categories":[],"readme":"# CVPR2024-Papers-with-Code-Demo\n\n :star_and_crescent:**添加微信: nvshenj125, 备注方向，进交流学习群**\n\n\n欢迎关注公众号：AI算法与图像处理\n\n:star2: [CVPR 2024](https://cvpr.thecvf.com/Conferences/2024) 持续更新最新论文/paper和相应的开源代码/code！\n\n\n\nB站demo：https://space.bilibili.com/288489574\n\n\u003e :hand: ​注：欢迎各位大佬提交issue，分享CVPR 2022论文/paper和开源项目！共同完善这个项目\n\u003e\n\u003e 往年顶会论文汇总：\n\u003e\n\u003e [CVPR2021](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2021.md)\n\u003e\n\u003e [CVPR2022](https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo/blob/main/CVPR2022.md)\n\u003e\n\u003e [CVPR2023](https://github.com/DWCTOD/CVPR2024-Papers-with-Code-Demo/blob/main/CVPR2023.md)\n\u003e\n\u003e [ICCV2021](https://github.com/DWCTOD/ICCV2021-Papers-with-Code-Demo)\n\u003e\n\u003e [ECCV2022](https://github.com/DWCTOD/ECCV2022-Papers-with-Code-Demo)\n\n### **:fireworks: 欢迎进群** | Welcome\n\nCVPR 2024 论文/paper交流群已成立！已经收录的同学，可以添加微信：**nvshenj125**，请备注：**CVPR+姓名+学校/公司名称**！一定要根据格式申请，可以拉你进群。\n\n\u003ca name=\"Contents\"\u003e\u003c/a\u003e\n\n\n\n### :hammer: **目录 |Table of Contents（点击直接跳转）**\n\n\u003cdetails open\u003e\n\u003csummary\u003e 目录（右侧点击可折叠）\u003c/summary\u003e\n\n- [Backbone](#Backbone)\n- [数据集/Dataset](#Dataset)\n- [Diffusion Model](#DiffusionModel)\n- [Text-to-Image](#T2I)\n- [NAS](#NAS)\n- [NeRF](#NeRF)\n- [Knowledge Distillation](#KnowledgeDistillation)\n- [多模态 / Multimodal ](#Multimodal)\n- [对比学习/Contrastive Learning](#ContrastiveLearning)\n- [图神经网络 / Graph Neural Networks](#GNN)\n- [胶囊网络 / Capsule Network](#CapsuleNetwork)\n- [图像分类 / Image Classification](#ImageClassification)\n- [目标检测/Object Detection](#ObjectDetection)\n- [目标跟踪/Object Tracking](#ObjectTracking)\n- [轨迹预测/Trajectory Prediction](#TrajectoryPrediction)\n- [语义分割/Segmentation](#Segmentation)\n- [弱监督语义分割/Weakly Supervised Semantic Segmentation](#WSSS)\n- [医学图像分割](#MedicalImageSegmentation)\n- [视频目标分割/Video Object Segmentation](#VideoObjectSegmentation)\n- [交互式视频目标分割/Interactive Video Object Segmentation](#InteractiveVideoObjectSegmentation)\n- [Visual Transformer](#VisualTransformer)\n- [深度估计/Depth Estimation](#DepthEstimation)\n- [人脸识别/Face Recognition](#FaceRecognition)\n- [人脸检测/Face Detection](#FaceDetection)\n- [人脸活体检测/Face Anti-Spoofing](#FaceAnti-Spoofing)\n- [人脸年龄估计/Age Estimation](#AgeEstimation)\n- [人脸表情识别/Facial Expression Recognition](#FacialExpressionRecognition)\n- [人脸属性识别/Facial Attribute Recognition](#FacialAttributeRecognition)\n- [人脸编辑/Facial Editing](#FacialEditing)\n- [人脸重建/Face Reconstruction](#FaceReconstruction)\n- [Talking Face](#TalkingFace)\n- [换脸/Face Swap](#FaceSwap)\n- [姿态估计/Pose Estimation](#HumanPoseEstimation)\n- [手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)](#HandPoseEstimation)\n- [视频动作检测/Video Action Detection](#VideoActionDetection)\n- [手语翻译/Sign Language Translation](#SignLanguageTranslation)\n- [3D人体重建](#3D人体重建)\n- [行人重识别/Person Re-identification](#PersonRe-identification)\n- [行人搜索/Person Search](#PersonSearch)\n- [人群计数 / Crowd Counting](#CrowdCounting)\n- [GAN](#GAN)\n- [彩妆迁移 / Color-Pattern Makeup Transfer](#CPM)\n- [字体生成 / Font Generation](#FontGeneration)\n- [场景文本检测、识别/Scene Text Detection/Recognition](#OCR)\n- [图像、视频检索 / Image Retrieval/Video retrieval](#Retrieval)\n- [Image Animation](#ImageAnimation)\n- [抠图/Image Matting](#ImageMatting)\n- [超分辨率/Super Resolution](#SuperResolution)\n- [图像复原/Image Restoration](#ImageRestoration)\n- [图像补全/Image Inpainting](#ImageInpainting)\n- [图像去噪/Image Denoising](#ImageDenoising)\n- [图像编辑/Image Editing](#ImageEditing)\n- [图像拼接/Image stitching](#Imagestitching)\n- [图像匹配/Image Matching](#ImageMatching)\n- [图像融合/Image Blending](#ImageBlending)\n- [图像去雾/Image Dehazing](#ImageDehazing)\n- [图像去模糊/Image Deblur](#ImageDeblur)\n- [图像压缩/Image Compression](#ImageCompression)\n- [反光去除/Reflection Removal](#ReflectionRemoval)\n- [车道线检测/Lane Detection](#LaneDetection)\n- [自动驾驶 / Autonomous Driving](#AutonomousDriving)\n- [流体重建/Fluid Reconstruction](#FluidReconstruction)\n- [场景重建 / Scene Reconstruction](#SceneReconstruction)\n- [3D Reconstruction](#3DReconstruction)\n- [视频插帧/Frame Interpolation](#FrameInterpolation)\n- [视频超分 / Video Super-Resolution](#VideoSuper-Resolution)\n- [3D点云/3D point cloud](#3DPointCloud)\n- [标签噪声 / Label-Noise](#Label-Noise)\n- [对抗样本/Adversarial Examples](#AdversarialExamples)\n- [Anomaly Detection](#AnomalyDetection)\n- [其他/Other](#Other)\n\n\n\u003c/details\u003e\n\n\u003ca name=\"Backbone\"\u003e\u003c/a\u003e\n\n## Backbone\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"Dataset\"\u003e\u003c/a\u003e \n\n## 数据集/Dataset\n\n**HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02640\n- 代码/Code: None\n\n**Traffic Scene Parsing through the TSP6K Dataset**\n\n- 论文/Paper: https://arxiv.org/pdf/2303.02835.pdf\n- 代码/Code: https://github.com/PengtaoJiang/TSP6K\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"DiffusionModel\"\u003e\u003c/a\u003e \n\n# Diffusion Model\n\n**Balancing Act: Distribution-Guided Debiasing in Diffusion Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18206\n- 代码/Code: None\n\n**DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19481\n- 代码/Code: https://github.com/mit-han-lab/distrifuser\n\n**DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19302\n- 代码/Code: https://github.com/iit-pavis/diffassemble\n\n**Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00644\n- 代码/Code: None\n\n**Few-shot Learner Parameterization by Diffusion Time-steps**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02649\n- 代码/Code: https://github.com/yue-zhongqi/tif\n\n**MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04290\n- 代码/Code: None\n\n**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**\n\n- 论文/Paper: https://arxiv.org/abs/2403.06951\n- 代码/Code: https://github.com/Tianhao-Qi/DEADiff_code\n\n**Face2Diffusion for Fast and Editable Face Personalization**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05094\n- 代码/Code: https://github.com/mapooon/Face2Diffusion\n\n**DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06951\n- 代码/Code: None\n\n**MACE: Mass Concept Erasure in Diffusion Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06135\n- 代码/Code: https://github.com/Shilin-LU/MACE\n\n**It's All About Your Sketch: Democratising Sketch Control in Diffusion Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07234\n- 代码/Code: https://github.com/subhadeepkoley/demosketch2rgb\n\n**SemCity: Semantic Scene Generation with Triplane Diffusion**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07773\n- 代码/Code: https://github.com/zoomin-lee/semcity\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"T2I\"\u003e\u003c/a\u003e \n\n## Text-to-Image\n\n**RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00483\n- 代码/Code: None\n\n**NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03485\n- 代码/Code: https://github.com/univ-esuty/noisecollage\n\n**Discriminative Probing and Tuning for Text-to-Image Generation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04321\n- 代码/Code: None\n\n**Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05239\n- 代码/Code: None\n\n**Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06452\n- 代码/Code: https://github.com/mulns/Text2QR\n\n**Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07214\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"NAS\"\u003e\u003c/a\u003e \n\n## NAS\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"NeRF\"\u003e\u003c/a\u003e \n\n# NeRF\n\n**GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03608\n- 代码/Code: None\n\n**DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06912\n- 代码/Code: https://github.com/fictionarry/dngaussian\n\n**S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06205\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"KnowledgeDistillation\"\u003e\u003c/a\u003e \n\n## Knowledge Distillation\n\n**PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02781\n- 代码/Code: https://github.com/zhengli97/PromptKD\n\n**Logit Standardization in Knowledge Distillation**\n\n- 论文/Paper: https://arxiv.org/abs/2403.01427\n- 代码/Code: https://github.com/sunshangquan/logit-standardization-KD\n\n**RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05061\n- 代码/Code: None\n\n**$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06213\n- 代码/Code: https://github.com/roymiles/vkd\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"Multimodal\"\u003e\u003c/a\u003e \n\n## 多模态 / Multimodal\n\n**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**\n\n- 论文/Paper: https://arxiv.org/abs/2312.07472\n- 代码/Code: https://github.com/IranQin/MP5\n- 主页/Website：https://iranqin.github.io/MP5.github.io/\n\n**Polos: Multimodal Metric Learning from Human Feedback for Image Captioning**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18091\n- 代码/Code: None\n\n**MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02991\n- 代码/Code: None\n\n**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05105\n- 代码/Code: https://github.com/hhc1997/L2RM\n\n**MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07839\n- 代码/Code: None\n\n**Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07636\n- 代码/Code: https://github.com/hieuphan33/mavl\n\n**Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07241\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ContrastiveLearning\"\u003e\u003c/a\u003e \n\n## Contrastive Learning\n\n**Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06122\n- 代码/Code: https://github.com/root0yang/blindnet\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"CapsuleNetwork\"\u003e\u003c/a\u003e \n\n# 胶囊网络 / Capsule Network\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageClassification\"\u003e\u003c/a\u003e \n\n# 图像分类 / Image Classification\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ObjectDetection\"\u003e\u003c/a\u003e \n\n## 目标检测/Object Detection\n\n**UniMODE: Unified Monocular 3D Object Detection**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18573\n- 代码/Code: None\n\n**CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04198\n- 代码/Code: https://github.com/SerCharles/CN-RMA\n\n**Memory-based Adapters for Online 3D Scene Perception**\n\n- 论文/Paper: https://arxiv.org/abs/2403.06974\n- 代码/Code:https://github.com/xuxw98/Online3D\n\n **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**\n\n- 论文/Paper: https://arxiv.org/abs/2403.16131\n\n- 代码/Code:https://github.com/xiuqhou/Salience-DETR\n\n**Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06093\n- 代码/Code: https://github.com/nullmax-vision/QAF2D\n\n**SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05817\n- 代码/Code: https://github.com/zhanggang001/hednet\n\n[返回目录/back](#Contents)\n\n\n\n\u003ca name=\"ObjectTracking\"\u003e\u003c/a\u003e \n\n# 目标跟踪/Object Tracking\n\n**DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02767\n- 代码/Code: None\n\n**Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04700\n- 代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT\n\n[返回目录/back](#Contents)\n\n# 3D Object Tracking\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"TrajectoryPrediction\"\u003e\u003c/a\u003e \n\n## 轨迹预测/Trajectory Prediction\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"Segmentation\"\u003e\u003c/a\u003e \n\n## 语义分割/Segmentation\n\n**PEM: Prototype-based Efficient MaskFormer for Image Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19422\n- 代码/Code: https://github.com/niccolocavagnero/pem\n\n**Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06462\n- 代码/Code: https://github.com/Gavinwxy/DDFP\n\n**Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06247\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"WSSS\"\u003e\u003c/a\u003e\n\n## 弱监督语义分割/Weakly Supervised Semantic Segmentation\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"MedicalImageSegmentation\"\u003e\u003c/a\u003e\n\n# 医学图像/Medical Image\n\n**Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18933\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"VideoObjectSegmentation\"\u003e\u003c/a\u003e\n\n# 视频目标分割/Video Object Segmentation\n\n**Depth-aware Test-Time Training for Zero-shot Video Object Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04258\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"InteractiveVideoObjectSegmentation\"\u003e\u003c/a\u003e\n\n# 交互式视频目标分割/Interactive Video Object Segmentation\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"VisualTransformer\"\u003e\u003c/a\u003e\n\n# Visual Transformer\n\n**Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05419\n- 代码/Code: https://github.com/techmn/satmae_pp\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"DepthEstimation\"\u003e\u003c/a\u003e\n\n## 深度估计/Depth Estimation\n\n**Representations for Recognition and Retrieval**\n\n- 论文/Paper: https://arxiv.org/pdf/2403.07535.pdf\n- 代码/Code: https://github.com/Junda24/AFNet\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"Retrieval\"\u003e\u003c/a\u003e\n\n# 图像、视频检索 / Image Retrieval/Video retrieval\n\n**Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00272\n- 代码/Code: None\n\n**Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05105\n- 代码/Code: https://github.com/hhc1997/L2RM\n\n**How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07203\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"SuperResolution\"\u003e\u003c/a\u003e\n\n## 超分辨率/Super Resolution\n\n**SeD: Semantic-Aware Discriminator for Image Super-Resolution**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19387\n- 代码/Code: None\n\n**Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19215\n- 代码/Code: https://github.com/mandalinadagi/wgsr\n\n**CAMixerSR: Only Details Need More \"Attention\"**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19289\n- 代码/Code: https://github.com/icandle/camixersr\n\n**Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02601\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageRestoration\"\u003e\u003c/a\u003e\n\n## 图像复原/Image Restoration\n\n**Boosting Image Restoration via Priors from Pre-trained Models**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06793\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageDenoising\"\u003e\u003c/a\u003e\n\n## 图像去噪/Image Denoising\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageEditing\"\u003e\u003c/a\u003e\n\n# 图像编辑/Image Editing\n\n**Doubly Abductive Counterfactual Inference for Text-based Image Editing**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02981\n- 代码/Code: https://github.com/xuesong39/DAC\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageCompression\"\u003e\u003c/a\u003e\n\n# 图像压缩/Image Compression\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"ImageDeblur\"\u003e\u003c/a\u003e\n\n## 图像去模糊/Image Deblur\n\n**A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02611\n- 代码/Code: https://github.com/PieceZhang/MPT-CataBlur\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"AutonomousDriving\"\u003e\u003c/a\u003e\n\n## 自动驾驶 / Autonomous Driving\n\n**Abductive Ego-View Accident Video Understanding for Safe Driving Perception**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00436\n- 代码/Code: None\n\n**Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07535\n- 代码/Code: website:https://github.com/Junda24/AFNet/\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FaceRecognition\"\u003e\u003c/a\u003e\n\n# 人脸识别/Face Recognition\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FaceDetection\"\u003e\u003c/a\u003e\n\n# 人脸检测/Face Detection\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FaceAnti-Spoofing\"\u003e\u003c/a\u003e\n\n# 人脸活体检测/Face Anti-Spoofing\n\n**Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19298\n- 代码/Code: https://github.com/omggggg/mmdg\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FaceReconstruction\"\u003e\u003c/a\u003e\n\n## 人脸重建/Face Reconstruction\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"VideoActionDetection\"\u003e\u003c/a\u003e\n\n# 视频动作检测/Video Action Detection\n\n\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"SignLanguageTranslation\"\u003e\u003c/a\u003e\n\n# 手语翻译/Sign Language Translation\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"PersonRe-identification\"\u003e\u003c/a\u003e\n\n# 行人重识别/Person Re-identification\n\n\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"TalkingFace\"\u003e\u003c/a\u003e\n\n# Talking Face\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"HumanPoseEstimation\"\u003e\u003c/a\u003e\n\n# 姿态估计/Pose Estimation\n\n**FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03221\n- 代码/Code: None\n\n**Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04381\n- 代码/Code: https://github.com/MickeyLLG/S2DHand\n\n**Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**\n\n- 论文/Paper: https://arxiv.org/pdf/2311.12028.pdf\n- 代码/Code: https://github.com/NationalGAILab/HoT\n\n[返回目录/back](#Contents)\n\n\n\n\u003ca name=\"GAN\"\u003e\u003c/a\u003e\n\n# GAN\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"AgeEstimation\"\u003e\u003c/a\u003e\n\n# 人脸年龄估计/Age Estimation\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FacialExpressionRecognition\"\u003e\u003c/a\u003e\n\n# 人脸表情识别/Facial Expression Recognition\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"HandPoseEstimation\"\u003e\u003c/a\u003e\n\n## 手势姿态估计（重建）/Hand Pose Estimation( Hand Mesh Recovery)\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"3DReconstruction\"\u003e\u003c/a\u003e\n\n## 3D Reconstruction\n\n**UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05086\n- 代码/Code: https://github.com/Youngju-Na/UFORecon\n\n**DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05005\n- 代码/Code: None\n\n**Memory-based Adapters for Online 3D Scene Perception**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06974\n- 代码/Code: None\n\n**Bayesian Diffusion Models for 3D Shape Reconstruction**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06973\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"FrameInterpolation\"\u003e\u003c/a\u003e\n\n## 视频插帧/Frame Interpolation\n\n\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"3DPointCloud\"\u003e\u003c/a\u003e\n\n## 3D点云/3D point cloud\n\n**Rethinking Few-shot 3D Point Cloud Semantic Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00592\n- 代码/Code: https://github.com/ZhaochongAn/COSeg\n\n**Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03532\n- 代码/Code: https://github.com/liuquan98/eyoc\n\n**Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05247\n- 代码/Code: https://github.com/TRLou/HiT-ADV\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"AnomalyDetection\"\u003e\u003c/a\u003e\n\n# Anomaly Detection\n\n**Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06495\n- 代码/Code: https://github.com/mala-lab/inctrl\n\n**RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05897\n- 代码/Code: https://github.com/cnulab/realnet\n\n[返回目录/back](#Contents)\n\n\u003ca name=\"Other\"\u003e\u003c/a\u003e\n\n## 其他/Other\n\n**DisCo: Disentangled Control for Realistic Human Dance Generation**\n\n- 论文/Paper: https://arxiv.org/abs/2307.00040\n- 代码/Code: https://github.com/Wangt-CN/DisCo\n\n**Gradient Reweighting: Towards Imbalanced Class-Incremental Learning**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18528\n- 代码/Code: None\n\n**TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18490\n- 代码/Code: None\n\n**Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18330\n- 代码/Code: https://github.com/tho-kn/egotap\n\n**Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18277\n- 代码/Code: None\n\n**Misalignment-Robust Frequency Distribution Loss for Image Transformation**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18192\n- 代码/Code: https://github.com/eezkni/FDL\n\n**3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18146\n- 代码/Code: https://github.com/jiangchaokang/3dsflabelling\n\n**OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18140\n- 代码/Code: None\n\n**UniVS: Unified and Universal Video Segmentation with Prompts as Queries**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18115\n- 代码/Code: https://github.com/minghanli/univs\n\n**Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18078\n- 代码/Code: https://github.com/YanzuoLu/CFLD\n\n**Boosting Neural Representations for Videos with a Conditional Decoder**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18152\n- 代码/Code: None\n\n**Classes Are Not Equal: An Empirical Study on Image Recognition Fairness**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18133\n- 代码/Code: None\n\n**QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.17951\n- 代码/Code: None\n\n**Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19479\n- 代码/Code: None\n\n**SeMoLi: What Moves Together Belongs Together**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19463\n- 代码/Code: None\n\n**Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19326\n- 代码/Code: None\n\n**CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19231\n- 代码/Code: https://github.com/lu-feng/cricavpr\n\n**MemoNav: Working Memory Model for Visual Navigation**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19161\n- 代码/Code: None\n\n**VideoMAC: Video Masked Autoencoders Meet ConvNets**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19082\n- 代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac\n\n**Theoretically Achieving Continuous Representation of Oriented Bounding Boxes**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18975\n- 代码/Code: https://github.com/Jittor/JDet\n\n**OHTA: One-shot Hand Avatar via Data-driven Implicit Priors**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18969\n- 代码/Code: None\n\n**WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18956\n- 代码/Code: None\n\n**Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18920\n- 代码/Code: None\n\n**SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18848\n- 代码/Code: None\n\n**ViewFusion: Towards Multi-View Consistency via Interpolated Denoising**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18842\n- 代码/Code: None\n\n**OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18786\n- 代码/Code: None\n\n**NARUTO: Neural Active Reconstruction from Uncertain Target Observations**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18771\n- 代码/Code: None\n\n**Towards Generalizable Tumor Synthesis**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.19470\n- 代码/Code: None\n\n**Rethinking Multi-domain Generalization with A General Learning Objective**\n\n- 论文/Paper: http://arxiv.org/pdf/2402.18853\n- 代码/Code: None\n\n**Rethinking Inductive Biases for Surface Normal Estimation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00712\n- 代码/Code: https://github.com/baegwangbin/DSINE\n\n**SURE: SUrvey REcipes for building reliable and robust deep networks**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00543\n- 代码/Code: https://github.com/YutingLi0606/SURE\n\n**Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00486\n- 代码/Code: https://github.com/Windsrain/Selective-Stereo.\n\n**Deformable One-shot Face Stylization via DINO Semantic Guidance**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00459\n- 代码/Code: https://github.com/zichongc/DoesFS\n\n**CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.00274\n- 代码/Code: None\n\n**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03122\n- 代码/Code: None\n\n**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02782\n- 代码/Code: None\n\n**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02769\n- 代码/Code: None\n\n**Learning Group Activity Features Through Person Attribute Prediction**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02753\n- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.\n\n**Interactive Continual Learning: Fast and Slow Thinking**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02628\n- 代码/Code: None\n\n**NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03122\n- 代码/Code: None\n\n**Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02782\n- 代码/Code: None\n\n**HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02769\n- 代码/Code: None\n\n**Learning Group Activity Features Through Person Attribute Prediction**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02753\n- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.\n\n**Interactive Continual Learning: Fast and Slow Thinking**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.02628\n- 代码/Code: None\n\n**Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03890\n- 代码/Code: None\n\n**DART: Implicit Doppler Tomography for Radar Novel View Synthesis**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03896\n- 代码/Code: None\n\n**MeaCap: Memory-Augmented Zero-shot Image Captioning**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03715\n- 代码/Code: https://github.com/joeyz0z/MeaCap\n\n**HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03561\n- 代码/Code: None\n\n**Continual Segmentation with Disentangled Objectness Learning and Class Recognition**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03477\n- 代码/Code: https://github.com/jordangong/CoMasTRe\n\n**HDRFlow: Real-Time HDR Video Reconstruction with Large Motions**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03447\n- 代码/Code: None\n\n**LEAD: Learning Decomposition for Source-free Universal Domain Adaptation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03421\n- 代码/Code: https://github.com/ispc-lab/lead\n\n**F$^3$Loc: Fusion and Filtering for Floorplan Localization**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03370\n- 代码/Code: None\n\n**Enhancing Vision-Language Pre-training with Rich Supervisions**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03346\n- 代码/Code: None\n\n**Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04765\n- 代码/Code: None\n\n**Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04492\n- 代码/Code: https://github.com/rashindrie/dipa\n\n**Learning to Remove Wrinkled Transparent Film with Polarized Prior**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04368\n- 代码/Code: https://github.com/jqtangust/filmremoval\n\n**LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04303\n- 代码/Code: None\n\n**Active Generalized Category Discovery**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04272\n- 代码/Code: https://github.com/mashijie1028/activegcd\n\n**MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04149\n- 代码/Code: https://github.com/ispc-lab/map\n\n**A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.04245\n- 代码/Code: https://github.com/dalision/modalbiasavsr\n\n**Seamless Human Motion Composition with Blended Positional Encodings**\n\n- 论文/Paper: https://arxiv.org/abs/2402.15509\n- 代码/Code:https://github.com/BarqueroGerman/FlowMDM\n\n**DiffusionLight: Light Probes for Free by Painting a Chrome Ball**\n\n- 论文/Paper: https://arxiv.org/abs/2312.09168\n- 代码/Code:https://github.com/DiffusionLight/DiffusionLight\n\n**SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05087\n- 代码/Code: https://github.com/initialneil/SplattingAvatar\n\n**Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06946\n- 代码/Code: https://github.com/tl-uestc/unimos\n\n**Real-Time Simulated Avatar from Head-Mounted Sensors**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06862\n- 代码/Code: None\n\n**DiaLoc: An Iterative Approach to Embodied Dialog Localization**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06846\n- 代码/Code: None\n\n**FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06775\n- 代码/Code: https://github.com/modelscope/facechain\n\n**EarthLoc: Astronaut Photography Localization by Indexing Earth from Space**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06758\n- 代码/Code: https://github.com/gmberton/earthloc\n\n**CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06676\n- 代码/Code: https://github.com/snskysk/cam-back-again\n\n**Distributionally Generative Augmentation for Fair Facial Attribute Classification**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06606\n- 代码/Code: https://github.com/heqianpei/diga\n\n**Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06592\n- 代码/Code: None\n\n**MoST: Motion Style Transformer between Diverse Action Contents**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06225\n- 代码/Code: https://github.com/Boeun-Kim/MoST.\n\n**Coherent Temporal Synthesis for Incremental Action Segmentation**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06102\n- 代码/Code: None\n\n**Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06092\n- 代码/Code: None\n\n**LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.05854\n- 代码/Code: None\n\n**PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.06668\n- 代码/Code: None\n\n**SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.03170\n- 代码/Code: None\n\n**Multi-Task Dense Prediction via Mixture of Low-Rank Experts**\n\n- 论文/Paper: https://arxiv.org/abs/2403.17749\n- 代码/Code: https://github.com/YuqiYang213/MLoRE\n\n**Beyond Text: Frozen Large Language Models in Visual Signal Comprehension**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07874\n- 代码/Code: https://github.com/zh460045050/v2l-tokenizer\n\n**Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07719\n- 代码/Code: https://github.com/wonderlandxd/wikg\n\n**Robust Synthetic-to-Real Transfer for Stereo Matching**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07705\n- 代码/Code: https://github.com/jiaw-z/dkt-stereo\n\n**CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07700\n- 代码/Code: https://github.com/shahaf-arica/cuvler\n\n**Masked AutoDecoder is Effective Multi-Task Vision Generalist**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07692\n- 代码/Code: https://github.com/hanqiu-hq/mad\n\n**PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07589\n- 代码/Code: None\n\n**Unleashing Network Potentials for Semantic Scene Completion**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07560\n- 代码/Code: https://github.com/fereenwong/ammnet\n\n**Open-World Semantic Segmentation Including Class Similarity**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07532\n- 代码/Code: https://github.com/PRBonn/ContMAV\n\n**ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07392\n- 代码/Code: https://github.com/Traffic-X/ViT-CoMer\n\n**FSC: Few-point Shape Completion**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07359\n- 代码/Code: None\n\n**Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07347\n- 代码/Code: https://github.com/jiafei127/fd4mm\n\n**A Bayesian Approach to OOD Robustness in Image Classification**\n\n- 论文/Paper: http://arxiv.org/pdf/2403.07277\n- 代码/Code: None\n\n[返回目录/back](#Contents)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDWCTOD%2FCVPR2024-Papers-with-Code-Demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDWCTOD%2FCVPR2024-Papers-with-Code-Demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDWCTOD%2FCVPR2024-Papers-with-Code-Demo/lists"}