https://github.com/amusi/eccv2024-papers-with-code

ECCV 2024 论文和开源项目合集，同时欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目
https://github.com/amusi/eccv2024-papers-with-code
computer-vision deep-learning eccv eccv-2020 eccv2020 eccv2022 eccv2024 image-classification image-segmentation neural-network object-detection
Last synced: 7 months ago
JSON representation
ECCV 2024 论文和开源项目合集，同时欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目
Host: GitHub
URL: https://github.com/amusi/eccv2024-papers-with-code
Owner: amusi
Created: 2020-07-03T01:14:38.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-08-07T06:58:45.000Z (about 1 year ago)
Last Synced: 2025-02-01T01:51:18.104Z (8 months ago)
Topics: computer-vision, deep-learning, eccv, eccv-2020, eccv2020, eccv2022, eccv2024, image-classification, image-segmentation, neural-network, object-detection
Homepage: https://mp.weixin.qq.com/s/NRjCfZxJF2Z0Ugbhj-8G4g
Size: 293 KB
Stars: 2,098
Watchers: 38
Forks: 271
Open Issues: 10
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ECCV 2024 论文和开源项目合集(Papers with Code)

ECCV 2024 decisions are now available！

> 注1：欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目！

>

> 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

>

> - [CVPR 2024](https://github.com/amusi/CVPR2024-Papers-with-Code)

> - [ECCV 2022](ECCV2022-Papers-with-Code.md)

> - [ECCV 2020](ECCV2020-Papers-with-Code.md)

想看ECCV 2024和最新最全的顶会工作，欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

![](CVer学术交流群.png)

# 【ECCV 2024 论文开源目录】

- [3DGS(Gaussian Splatting)](#3DGS)

- [Mamba / SSM)](#Mamba)

- [Avatars](#Avatars)

- [Backbone](#Backbone)

- [CLIP](#CLIP)

- [MAE](#MAE)

- [Embodied AI](#Embodied-AI)

- [GAN](#GAN)

- [GNN](#GNN)

- [多模态大语言模型(MLLM)](#MLLM)

- [大语言模型(LLM)](#LLM)

- [NAS](#NAS)

- [OCR](#OCR)

- [NeRF](#NeRF)

- [DETR](#DETR)

- [Prompt](#Prompt)

- [扩散模型(Diffusion Models)](#Diffusion)

- [ReID(重识别)](#ReID)

- [长尾分布(Long-Tail)](#Long-Tail)

- [Vision Transformer](#Vision-Transformer)

- [视觉和语言(Vision-Language)](#VL)

- [自监督学习(Self-supervised Learning)](#SSL)

- [数据增强(Data Augmentation)](#DA)

- [目标检测(Object Detection)](#Object-Detection)

- [异常检测(Anomaly Detection)](#Anomaly-Detection)

- [目标跟踪(Visual Tracking)](#VT)

- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)

- [实例分割(Instance Segmentation)](#Instance-Segmentation)

- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)

- [医学图像(Medical Image)](#MI)

- [医学图像分割(Medical Image Segmentation)](#MIS)

- [视频目标分割(Video Object Segmentation)](#VOS)

- [视频实例分割(Video Instance Segmentation)](#VIS)

- [参考图像分割(Referring Image Segmentation)](#RIS)

- [图像抠图(Image Matting)](#Matting)

- [图像编辑(Image Editing)](#Image-Editing)

- [Low-level Vision](#LLV)

- [超分辨率(Super-Resolution)](#SR)

- [去噪(Denoising)](#Denoising)

- [去模糊(Deblur)](#Deblur)

- [自动驾驶(Autonomous Driving)](#Autonomous-Driving)

- [3D点云(3D Point Cloud)](#3D-Point-Cloud)

- [3D目标检测(3D Object Detection)](#3DOD)

- [3D语义分割(3D Semantic Segmentation)](#3DSS)

- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)

- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)

- [3D配准(3D Registration)](#3D-Registration)

- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)

- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)

- [医学图像(Medical Image)](#Medical-Image)

- [图像生成(Image Generation)](#Image-Generation)

- [视频生成(Video Generation)](#Video-Generation)

- [3D生成(3D Generation)](#3D-Generation)

- [视频理解(Video Understanding)](#Video-Understanding)

- [行为识别(Action Recognition)](#Action-Recognition)

- [行为检测(Action Detection)](#Action-Detection)

- [文本检测(Text Detection)](#Text-Detection)

- [知识蒸馏(Knowledge Distillation)](#KD)

- [模型剪枝(Model Pruning)](#Pruning)

- [图像压缩(Image Compression)](#IC)

- [三维重建(3D Reconstruction)](#3D-Reconstruction)

- [深度估计(Depth Estimation)](#Depth-Estimation)

- [轨迹预测(Trajectory Prediction)](#TP)

- [车道线检测(Lane Detection)](#Lane-Detection)

- [图像描述(Image Captioning)](#Image-Captioning)

- [视觉问答(Visual Question Answering)](#VQA)

- [手语识别(Sign Language Recognition)](#SLR)

- [视频预测(Video Prediction)](#Video-Prediction)

- [新视点合成(Novel View Synthesis)](#NVS)

- [Zero-Shot Learning(零样本学习)](#ZSL)

- [立体匹配(Stereo Matching)](#Stereo-Matching)

- [特征匹配(Feature Matching)](#Feature-Matching)

- [场景图生成(Scene Graph Generation)](#SGG)

- [计数(Counting)](#Counting)

- [隐式神经表示(Implicit Neural Representations)](#INR)

- [图像质量评价(Image Quality Assessment)](#IQA)

- [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)

- [数据集(Datasets)](#Datasets)

- [新任务(New Tasks)](#New-Tasks)

- [其他(Others)](#Others)



# 3DGS(Gaussian Splatting)

**MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images**

- Project: https://donydchen.github.io/mvsplat

- Paper: https://arxiv.org/abs/2403.14627

- Code：https://github.com/donydchen/mvsplat

**CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians**

- Paper: https://arxiv.org/abs/2404.01133

- Code: https://github.com/DekuLiuTesla/CityGaussian

**FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting**

- Project: https://zehaozhu.github.io/FSGS/

- Paper: https://arxiv.org/abs/2312.00451

- Code: https://github.com/VITA-Group/FSGS



# Mamba / SSM

**VideoMamba: State Space Model for Efficient Video Understanding**

- Paper: https://arxiv.org/abs/2403.06977

- Code: https://github.com/OpenGVLab/VideoMamba

**ZIGMA: A DiT-style Zigzag Mamba Diffusion Model**

- Paper: https://arxiv.org/abs/2403.13802

- Code: https://taohu.me/zigma/



# Avatars



# Backbone



# CLIP



# MAE



# Embodied AI



# GAN



# OCR

**Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors**

- Paper: https://arxiv.org/pdf/2312.05286

- Code: https://github.com/SJTU-DeepVisionLab/FreeReal 

**PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer**

- Paper: https://arxiv.org/abs/2407.07764

- Code: https://github.com/SJTU-DeepVisionLab/PosFormer



# Occupancy

**Fully Sparse 3D Occupancy Prediction**

- Paper: https://arxiv.org/abs/2312.17118

- Code: https://github.com/MCG-NJU/SparseOcc



# NeRF

**NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields**

- Project: https://nerf-mae.github.io/

- Paper: https://arxiv.org/pdf/2404.01300

- Code: https://github.com/zubair-irshad/NeRF-MAE 



# DETR



# Prompt



# 多模态大语言模型(MLLM)

**SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant**

- Paper: https://arxiv.org/abs/2403.11299

- Code: https://github.com/heliossun/SQ-LLaVA

**ControlCap: Controllable Region-level Captioning**

- Paper: https://arxiv.org/abs/2401.17910

- Code: https://github.com/callsys/ControlCap 



# 大语言模型(LLM)



# NAS



# ReID(重识别)



# 扩散模型(Diffusion Models)

**ZIGMA: A DiT-style Zigzag Mamba Diffusion Model**

- Paper: https://arxiv.org/abs/2403.13802

- Code: https://taohu.me/zigma/

**Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation**

- Paper: https://arxiv.org/abs/2403.16394

- Code: https://github.com/zdxdsw/skewed_relations_T2I

**The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization**

- Project: https://ut-mao.github.io/noise.github.io/

- Paper: https://arxiv.org/abs/2312.08872

- Code: https://github.com/UT-Mao/Initial-Noise-Construction



# Vision Transformer

**GiT: Towards Generalist Vision Transformer through Universal Language Interface**

- Paper: https://arxiv.org/abs/2403.09394

- Code: https://github.com/Haiyang-W/GiT



# 视觉和语言(Vision-Language)

**GalLoP: Learning Global and Local Prompts for Vision-Language Models**

- Paper：https://arxiv.org/abs/2407.01400



# 目标检测(Object Detection)

**Relation DETR: Exploring Explicit Position Relation Prior for Object Detection**

- Paper: https://arxiv.org/abs/2407.11699v1

- Code: https://github.com/xiuqhou/Relation-DETR

- Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k 

**Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector**

- Project: http://yuqianfu.com/CDFSOD-benchmark/

- Paper: https://arxiv.org/pdf/2402.03094

- Code: https://github.com/lovelyqian/CDFSOD-benchmark 



# 异常检测(Anomaly Detection)



# 目标跟踪(Object Tracking)



# 语义分割(Semantic Segmentation)

**Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation**

- Paper: https://arxiv.org/abs/2405.06228

- Code: https://github.com/nizhenliang/CGRSeg



# 医学图像(Medical Image)

**Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging**

- Paper: https://arxiv.org/abs/2311.16914

- Code: https://github.com/peirong26/Brain-ID 

**FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification**

- Project: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k

- Paper : https://arxiv.org/abs/2407.08813

- Dataset: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O

- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain



# 医学图像分割(Medical Image Segmentation)

**ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image**

- Project: https://scribbleprompt.csail.mit.edu/

- Paper: https://arxiv.org/abs/2312.07381

- Code: https://github.com/halleewong/ScribblePrompt

**AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking**

- Paper: https://arxiv.org/abs/2407.06468

- Code: https://github.com/ricklisz/AnatoMask

**Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures**

- Paper: https://arxiv.org/abs/2407.14754

- Code: https://github.com/cbmi-group/FFM-Multi-Decoder-Network 



# 视频目标分割(Video Object Segmentation)

**DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries**

- Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/

- Paper: https://arxiv.org/abs/2404.00086

- Code: https://github.com/zhang-tao-whu/DVIS_Plus 



# 自动驾驶(Autonomous Driving)

**Fully Sparse 3D Occupancy Prediction**

- Paper: https://arxiv.org/abs/2312.17118

- Code: https://github.com/MCG-NJU/SparseOcc

**milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing**

- Paper: https://arxiv.org/abs/2306.17010

- Code: https://github.com/Toytiny/milliFlow/

 **4D Contrastive Superflows are Dense 3D Representation Learners**

- Paper : https://arxiv.org/abs/2407.06190

- Code: https://github.com/Xiangxu-0103/SuperFlow 



# 3D点云(3D-Point-Cloud)



# 3D目标检测(3D Object Detection)

**3D Small Object Detection with Dynamic Spatial Pruning**

- Project: https://xuxw98.github.io/DSPDet3D/

- Paper: https://arxiv.org/abs/2305.03716

- Code: https://github.com/xuxw98/DSPDet3D

**Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection**

- Paper: https://arxiv.org/abs/2402.03634

- Code: https://github.com/LiewFeng/RayDN 



# 3D语义分割(3D Semantic Segmentation)



# 图像编辑(Image Editing)



# 图像补全/图像修复(Image Inpainting)

**BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion**

- Project https://tencentarc.github.io/BrushNet/

- Paper: https://arxiv.org/abs/2403.06976

- Code: https://github.com/TencentARC/BrushNet



# 视频编辑(Video Editing)



# Low-level Vision

**Restoring Images in Adverse Weather Conditions via Histogram Transformer**

- Paper: https://arxiv.org/abs/2407.10172

- Code: https://github.com/sunshangquan/Histoformer

**OneRestore: A Universal Restoration Framework for Composite Degradation**

- Project  https://gy65896.github.io/projects/ECCV2024_OneRestore

- Paper: https://arxiv.org/abs/2407.04621

- Code: https://github.com/gy65896/OneRestore 

# 超分辨率(Super-Resolution)



# 去噪(Denoising)

## 图像去噪(Image Denoising)



# 3D人体姿态估计(3D Human Pose Estimation)



# 图像生成(Image Generation)

**Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models**

- Paper: https://arxiv.org/abs/2404.07389

- Code: https://github.com/YasminZhang/EBAMA

**Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization**

- Project: https://kaminyou.com/Dense-Normalization/

- Paper: https://arxiv.org/abs/2407.04245

- Code: https://github.com/Kaminyou/Dense-Normalization 

**ZIGMA: A DiT-style Zigzag Mamba Diffusion Model**

- Paper: https://arxiv.org/abs/2403.13802

- Code: https://taohu.me/zigma/

**Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation**

- Paper: https://arxiv.org/abs/2403.16394

- Code: https://github.com/zdxdsw/skewed_relations_T2I 



# 视频生成(Video Generation)

**VideoStudio: Generating Consistent-Content and Multi-Scene Videos**

- Project: https://vidstudio.github.io/

- Code: https://github.com/FuchenUSTC/VideoStudio 



# 3D生成



# 视频理解(Video Understanding)

**VideoMamba: State Space Model for Efficient Video Understanding**

- Paper: https://arxiv.org/abs/2403.06977

- Code: https://github.com/OpenGVLab/VideoMamba

**C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition**

- Paper: https://arxiv.org/abs/2407.06113

- Code: https://github.com/RongchangLi/ZSCAR_C2C



# 行为识别(Action Recognition)

**SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders**

- Paper: https://arxiv.org/abs/2407.13460

- Code: https://github.com/pha123661/SA-DVAE 



# 知识蒸馏(Knowledge Distillation)



# 图像压缩(Image Compression)

**Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation**

- Code: https://github.com/qingshi9974/ECCV2024-AdpatICMH

- Paper: http://arxiv.org/abs/2407.09853 



# 立体匹配(Stereo Matching)



# 场景图生成(Scene Graph Generation)



# 计数(Counting)

**Zero-shot Object Counting with Good Exemplars**

- Paper: https://arxiv.org/abs/2407.04948

- Code: https://github.com/HopooLinZ/VA-Count 



# 视频质量评价(Video Quality Assessment)



# 数据集(Datasets)

# 其他(Others)

**Multi-branch Collaborative Learning Network for 3D Visual Grounding**

- Paper: https://arxiv.org/abs/2407.05363v2

- Code: https://github.com/qzp2018/MCLN 

**PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers**

- Code: https://github.com/ananthu-aniraj/pdiscoformer

- Paper: https://arxiv.org/abs/2407.04538

**SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments**

- Project: https://fraunhoferhhi.github.io/spvloc/ 

- Paper: https://arxiv.org/abs/2404.10527

- Code: https://github.com/fraunhoferhhi/spvloc

**REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices**

- Project: https://xdimlab.github.io/REFRAME/

- Paper: https://arxiv.org/abs/2403.16481

- Code: https://github.com/MARVELOUSJI/REFRAME
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amusi/eccv2024-papers-with-code

Awesome Lists containing this project

README