https://github.com/52cv/eccv-2024-papers

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/52cv/eccv-2024-papers
Owner: 52CV
Created: 2024-03-19T08:15:40.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-12-09T02:33:30.000Z (over 1 year ago)
Last Synced: 2025-06-30T11:02:01.858Z (about 1 year ago)
Size: 578 KB
Stars: 101
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ECCV-2024-Papers

![Alt text](%E5%BE%AE%E4%BF%A1%E5%9B%BE%E7%89%87_20240319161853.png)

## 官网链接：https://eccv.ecva.net/

### 主会 :bell:：9 月 29 日（周日）至 10 月 4 日

## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~

## 2025 年论文分类汇总戳这里

↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)

↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)

## 2024 年论文分类汇总戳这里

↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)

↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)

↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)

## [2022 年论文分类汇总戳这里](#0000)

## [2022 年论文分类汇总戳这里](#000)

## [2021 年论文分类汇总戳这里](#00)

## [2020 年论文分类汇总戳这里](#0)

## 💥💥💥全部论文已分类完毕


:thumbsup:[ECCV 2024奖项公布，哥大摘得最佳论文奖桂冠](https://mp.weixin.qq.com/s/2uFlMQUW1TVrNOIC01U8Pg)

## 🏆Best Paper Award(最佳论文奖)

* [Minimalist Vision with Freeform Pixels](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08113.pdf)
:house:[project](https://cave.cs.columbia.edu/projects/categories/project?cid=Computational+Imaging&pid=Minimalist+Vision+with+Freeform+Pixels)

## 🏅Best Paper Honorable Mention(最佳论文荣誉提名奖)

* [Rasterized Edge Gradients: Handling Discontinuities Differentiably](https://arxiv.org/abs/2405.02508)

* [Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models](https://arxiv.org/abs/2404.13706)
:house:[project](https://cs-people.bu.edu/vpetsiuk/arc/)

## 目录

|:cat:|:dog:|:tiger:|:wolf:|

|------|------|------|------|

|[1.Other(其它)](#1)|[2.3D Visual](#2)|[3.Face(人脸)](#3)|[4.Pose(姿态估计)](#4)|

|[5.OCR](#5)|[6.Object Tracking(目标跟踪)](#6)|[7.Object Detection(目标检测)](#7)|[8.Super-Resolution(超分辨率)](#8)|

|[9.Image Progress(图像/视频处理)](#9)|[10.Image Classification(图像分类)](#10)|[11.Image Segmentation(图像分割)](#11)|[12.Image Retrieval(图像检索)](#12)|

|[13.Image/video Compression(图像/视频压缩)](#13)|[14.Image Captioning(图像/视频字幕)](#14)|[15.GAN/Image Synthesis(图像生成)](#15)|[16.Medical Image Progress(医学影响处理)](#16)|

|[17.Video](#17)|[18.Automated Driving(自动驾驶)](#18)|[19.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#19)|[20.Scene ](#20)|

|[21.Vision-Language(视觉语言)](#21)|[22.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#22)|[23.Machine Learning(机器学习)](#23)|[24.Vision Transformer](#24)|

|[25.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#25)|[26.NAS](#26)|[27.GNN/GCN](#27)|[28.Novel Class Discovery(新类发现)](#28)|

|[29.Semi/self-supervised learning(半/自监督)](#29)|[30.Anomaly Detection(异常检测)](#30)|[31.Point Clouds(点云)](#31)|[32.Person Re-Identification(人员重识别)](#32)|

|[33.Motion Generation(人体运动生成)](#33)|[34.Visual Question Answering(视觉问答)](#34)|[35.Action Detection(动作检测)](#35)|[36.Gaze Estimation](#36)|

|[37.Style Transfer(风格迁移)](#37)|[38.Human-Object Interaction(人机交互)](#38)|[39.Robots(机器人)](#39)|[40.Object Pose Estimation(物体姿态估计)](#40)|

|[41.Biomedical(生物特征识别)](#41)|[42.Optical Flow Estimation(光流估计)](#42)|[43.Sound](#43)|[44.Dataset/Benchmark(数据集/基准)](#44)|

|[45.Neural Radiance Fields](#45)|[46.Rendering(渲染)](#46)|[47.Animal](#47)|[48.Computer Graphics(计算机图形学)](#48)|

|[49.Light-Field(光场)](#49)|[50.Sketches(草图)](#50)|[51.Feature Matching ](#51)|[52.Visual Entity Recognition(视觉实体识别)](#52)|

|[53.Keypoint Detection(关键点检测)](#53)|[54.Deepfake Detection](#54)|[55.Information Security(信息安全)](#55)|[56.Dense Prediction(密集预测)](#56)|

|[57.Visual Relationship Detection(视觉关系检测)](#57)|[58.全家桶](#58)|



## 58.全家桶

* [X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06140.pdf)
:star:[code](https://github.com/salesforce/LAVIS/tree/main/projects/xinstructbl)



## 57.Visual Relationship Detection(视觉关系检测)

* [Visual Relationship Transformation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08217.pdf)

* [Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection](https://arxiv.org/abs/2403.14270)



## 56.Dense Prediction(密集预测)

* [Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild](https://arxiv.org/abs/2404.18459)(https://github.com/GitGyun/chameleon)密集视觉预测

* [Unsupervised Dense Prediction using Differentiable Normalized Cuts](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05675.pdf)

* [Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05837.pdf)

* [Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09133.pdf)
:star:[code](https://github.com/MilknoCandy/Token-Adapter)



## 55.Information Security(信息安全)

* 版权保护

  * [Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03084.pdf)
:star:[code](https://github.com/jjh6297/UndercoverBias)保护数据集版权

* 图像水印

  * [Certifiably Robust Image Watermark](http://arxiv.org/abs/2407.04086v1)
:star:[code](https://github.com/zhengyuan-jiang/Watermark-Library)

  * [A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05695.pdf)图像水印

  * [Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data](https://arxiv.org/abs/2403.10663)
:star:[code](https://github.com/liyuxuan-github/MAT)

  * [A Watermark-Conditioned Diffusion Model for IP Protection](https://arxiv.org/abs/2403.10893)
:star:[code](https://github.com/rmin2000/WaDiff)

  * [A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08419.pdf)

  * [LaWa: Using Latent Space for In-Generation Image Watermarking](https://arxiv.org/abs/2408.05868)



## 54.Deepfake Detection

* [Real Appearance Modeling for More General Deepfake Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06913.pdf)

* [Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities](http://arxiv.org/abs/2407.20337v1)
:star:[code](https://github.com/aimagelab/CoDE)

* [Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection](http://arxiv.org/abs/2409.14444v1)

* [Common Sense Reasoning for Deep Fake Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12295.pdf)
:star:[code](https://github.com/Reality-Defender/Research-DD-VQA)

* 图像伪造检测和定位

  * [Noise-assisted Prompt Learning for Image Forgery Detection and Localization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01688.pdf)

  * [AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06023.pdf)
:star:[code](https://github.com/LMIAPC/AdaIFL)

* 文档图像篡改检测 

  * [Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04834.pdf)
:thumbsup:[文档图像篡改检测 (DITD) 方法——特征融合与分解网络 (FFDN)](https://std.xmu.edu.cn/2024/0710/c4739a488273/page.htm)

* 合成图像检测

  * [Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection](https://arxiv.org/abs/2402.19091)
:star:[code](https://github.com/mever-team/rine)



## 53.Keypoint Detection(关键点检测)

* [OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection](https://arxiv.org/abs/2409.19899)
:star:[code](https://github.com/AlanLuSun/OpenKD)

* [KeypointDETR: An End-to-End 3D Keypoint Detector](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09481.pdf)
:star:[code](github.com/bibi547/KeypointDETR)



## 52.Visual Entity Recognition(视觉实体识别)

* [Grounding Language Models for Visual Entity Recognition](https://arxiv.org/abs/2402.18695)视觉实体识别



## 51.Feature Matching 

* [Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching](http://arxiv.org/abs/2407.07789v1)

* 图像匹配

  * [CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching](https://arxiv.org/abs/2404.16972)
:star:[code](https://github.com/Samia067/CriSp)



## 50.Sketches(草图)

* [Do Generalised Classifiers really work on Human Drawn Sketches?](http://arxiv.org/abs/2407.03893v1)



## 49.Light-Field(光场)

* [Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation](http://arxiv.org/abs/2407.08149v1)

* 相机重定位

  * [Differentiable Product Quantization for Memory Efficient Camera Relocalization](http://arxiv.org/abs/2407.15540v1)



## 48.Computer Graphics(计算机图形学)

* 高动态范围成像

  * [SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging](http://arxiv.org/abs/2407.16308v1)
:star:[code](https://github.com/ltkong218/SAFNet)



## 47.Animal

* [Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos](https://arxiv.org/abs/2403.17103)
:house:[project](https://remysabathier.github.io/animalavatar.github.io)

* [Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos](https://arxiv.org/abs/2312.13604)
:house:[project](https://keqiangsun.github.io/projects/ponymation)3D动物运动

* [Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification](https://arxiv.org/abs/2410.06977)
:star:[code](https://github.com/JigglypuffStitch/AdaFreq.git)



## 46.Rendering(渲染)

* [City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web](https://arxiv.org/abs/2312.16457)
:star:[code](https://github.com/USTC3DV/MERFStudio)
:house:[project](https://ustc3dv.github.io/City-on-Web/)

* [A Probability-guided Sampler for Neural Implicit Surface Rendering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05407.pdf)
:house:[project](https://merl.com/research/highlights/ps-neus)渲染

* [TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering](https://arxiv.org/abs/2311.16465)
:house:[project](https://aka.ms/textdiffuser-2)

* [AnyLens: A Generative Diffusion Model with Any Rendering Lens](https://arxiv.org/abs/2311.17609)(https://anylens-diffusion.github.io/)

* [CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians](http://arxiv.org/abs/2404.01133)
:star:[code](https://github.com/DekuLiuTesla/CityGaussian)
:house:[project](https://dekuliutesla.github.io/citygs/)

* [METACAP: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering](https://arxiv.org/pdf/2403.18820.pdf)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/MetaCap/)

* [GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views](http://arxiv.org/abs/2407.08221v1)

* [MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References](http://arxiv.org/abs/2407.13745v1)
:star:[code](https://boelukas.github.io/mariner/)

* [Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors](http://arxiv.org/abs/2407.16396v1)
:star:[code](https://wen-yuan-zhang.github.io/VolumeRenderingPriors/)

* [CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering](https://arxiv.org/abs/2311.15510)
:house:[project](https://haidongz-usc.github.io/project/caesarnerf)

* [IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination](https://arxiv.org/abs/2404.11593)
:star:[code](https://github.com/zju3dv/IntrinsicAnything)渲染

* [Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering](https://arxiv.org/abs/2408.09702)
:house:[project](https://research.nvidia.com/labs/toronto-ai/DiPIR/)

* [VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03032.pdf)神经渲染

* [UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation](http://arxiv.org/abs/2407.19542v1)
:star:[code](https://github.com/freemantom/UniVoxel)

* [Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering](http://arxiv.org/abs/2408.09702v1)
:house:[project](https://research.nvidia.com/labs/toronto-ai/DiPIR/)

* [GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering](https://arxiv.org/abs/2403.11324)
:star:[code](https://github.com/yanyan-li/GeoGaussian)场景渲染

* [GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer](https://arxiv.org/abs/2410.00672)
:star:[code](https://github.com/yh-yoon/GMT)

* [Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering](https://arxiv.org/abs/2407.10389)



## 45.Neural Radiance Fields

* [Invertible Neural Warp for NeRF](http://arxiv.org/abs/2407.12354v1)
:star:[code](https://sfchng.github.io/ineurowarping-github.io/)

* [VF-NeRF: Viewshed Fields for Rigid NeRF Registration](https://arxiv.org/abs/2404.03349)

* [NeRF-XL: NeRF at Any Scale with Multi-GPU](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06424.pdf)
:house:[project](https://research.nvidia.com/labs/toronto-ai/nerfxl/)

* [Regularizing Dynamic Radiance Fields with Kinematic Fields](http://arxiv.org/abs/2407.14059v1)

* [KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter](http://arxiv.org/abs/2407.13185v1)
:star:[code](https://github.com/Yifever20002/KFD-NeRF)

* [Dynamic Neural Radiance Field From Defocused Monocular Video](http://arxiv.org/abs/2407.05586v1)

* [Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering](https://arxiv.org/abs/2409.05867)
:house:[project](https://benattal.github.io/flash-cache/)

* [Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model](http://arxiv.org/abs/2407.07735v1)
:house:[project](https://qsong2001.github.io/NeRFProtector)

* [GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01453.pdf)
:star:[code](https://github.com/kevinhuangxf/GeometrySticker)
:house:[project](https://kevinhuangxf.github.io/GeometrySticker/)

* [Efficient NeRF Optimization - Not All Samples Remain Equally Hard](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05300.pdf)

* [MeshFeat: Multi-Resolution Features for Neural Fields on Meshes](http://arxiv.org/abs/2407.13592v1)
:house:[project](https://maharajamihir.github.io/MeshFeat/)

* [DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images](https://arxiv.org/abs/2403.13199)
:house:[project](https://zaidtas.github.io/decentnerfs/index.html)

* [TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks](http://arxiv.org/abs/2408.10739v1)
:star:[code](https://tracknerf.github.io/)

* [BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream](http://arxiv.org/abs/2407.02174v1)
:star:[code](https://github.com/WU-CVGL/BeNeRF)

* [TriNeRFLet: A Wavelet Based Multiscale Triplane NeRF Representation](https://arxiv.org/abs/2401.06191)
:house:[project](https://rajaeekh.github.io/trinerflet-web)

* [RS-NeRF: Neural Radiance Fields from Rolling Shutter Images](http://arxiv.org/abs/2407.10267v1)
:star:[code](https://github.com/MyNiuuu/RS-NeRF)

* [Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling](http://arxiv.org/abs/2407.11962v1)
:star:[code](https://github.com/stevejaehyeok/MoCo-NeRF)
:house:[project](https://stevejaehyeok.github.io/publications/moco-nerf)

* [RaFE: Generative Radiance Fields Restoration](https://arxiv.org/abs/2404.03654)
:house:[project](https://zkaiwu.github.io/RaFE-Project/)

* [Few-shot NeRF by Adaptive Rendering Loss Regularization](https://arxiv.org/abs/2410.17839)
:star:[code](https://github.com/GhiXu/AR-NeRF)

* [Depth-guided NeRF Training via Earth Mover’s Distance](https://arxiv.org/abs/2403.13206)

* [DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields](https://arxiv.org/abs/2311.12063)
:star:[code](https://ychgoaround.github.io/projects/DatasetNeRF/)

* [Flowed Time of Flight Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07941.pdf)

* [Volumetric Rendering with Baked Quadrature Fields](https://arxiv.org/abs/2312.02202)

* [BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream](https://arxiv.org/abs/2407.02174)
:star:[code](https://github.com/wu-cvgl/BeNeRF)

* [Taming Latent Diffusion Model for Neural Radiance Field Inpainting](https://arxiv.org/abs/2404.09995)
:house:[project](https://hubert0527.github.io/MALD-NeRF)

* [Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation](https://arxiv.org/abs/2403.19319)
:house:[project](https://terencecyj.github.io/projects/Mesh2NeRF/)
🤗[huggingface](https://huggingface.co/papers/2403.19319)

* [SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields](https://www.arxiv.org/abs/2408.06697)
:house:[project](https://slotlifter.github.io/)

* [FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02130.pdf)
:star:[code](https://github.com/JiangWenPL/FisherRF)

* [DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07243.pdf)NeRF

* [Single-Mask Inpainting for Voxel-based Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07404.pdf)

* [Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization](https://arxiv.org/abs/2410.19483)
:star:[code](https://github.com/WeihangLiu2024/Content_Aware_NeRF)

* [Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering](https://arxiv.org/abs/2403.14554)
:house:[project](https://anttwo.github.io/frosting/)

* [Physically Plausible Color Correction for Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06042.pdf)

* [Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions](https://arxiv.org/abs/2403.14053)NeRF

* [PointNeRF++: A multi-scale, point-based Neural Radiance Field](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05521.pdf)
:house:[project](https://pointnerfpp.github.io/)

* [Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields](https://arxiv.org/abs/2403.11131)

* [High-Fidelity and Transferable NeRF Editing by Frequency Decomposition](https://arxiv.org/abs/2404.02514)
:house:[project](https://aigc3d.github.io/freditor)

* [TriNeRFLet: A Wavelet Based Triplane NeRF Representation](https://arxiv.org/abs/2401.06191)
:house:[project](https://rajaeekh.github.io/trinerflet-web)

* [Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction](https://arxiv.org/abs/2305.15171)
:house:[project](https://xinhangliu.com/deceptive-nerf-3dgs)

* [G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03259.pdf)

* [NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields](https://arxiv.org/abs/2404.01300)
:house:[project](https://nerf-mae.github.io/)

* 新视图合成

  * [Fast View Synthesis of Casual Videos](https://arxiv.org/abs/2312.02135)
:house:[project](https://casual-fvs.github.io/)

  * [PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis](https://arxiv.org/abs/2402.17986)
:house:[project](https://yorkucvil.github.io/PolyOculus-NVS/)

  * [RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields](https://arxiv.org/abs/2312.03357)

  * [Structured-NeRF: Hierarchical Scene Graph with Neural Representation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05154.pdf)

  * [URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields](https://arxiv.org/abs/2403.10119)

  * [A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis](https://arxiv.org/abs/2311.12897)
:star:[code](https://github.com/raven38/EfficientDynamic3DGaussian/)
:house:[project](https://compactdynamic3dgaussian.github.io/)

  * [High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00368.pdf)
:star:[code](https://github.com/XrKang/DL-GS)

  * [Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07158.pdf)
:star:[code](https://github.com/Yukun66/MemE)

  * [NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image](https://arxiv.org/abs/2312.07315)
:star:[code](https://github.com/kakaobrain/nvs-adapter)

  * [FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting](https://arxiv.org/abs/2312.00451)
:star:[code](https://github.com/VITA-Group/FSGS)

  * [Fast View Synthesis of Casual Videos with Soup-of-Planes](https://arxiv.org/abs/2312.02135)
:house:[project](https://casual-fvs.github.io/)

  * [CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians](https://arxiv.org/abs/2403.19495)
:house:[project](https://people.engr.tamu.edu/nimak/Papers/CoherentGS)

  * [MegaScenes: Scene-Level View Synthesis at Scale](https://arxiv.org/abs/2406.11819)
:star:[code](https://github.com/MegaScenes/nvs)

  * [Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis](https://arxiv.org/abs/2403.04116)
:star:[code](https://github.com/caiyuanhao1998/X-Gaussian)视图合成

  * [NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis](http://arxiv.org/abs/2407.10482v1)

  * [Efficient Depth-Guided Urban View Synthesis](http://arxiv.org/abs/2407.12395v1)
:star:[code](https://xdimlab.github.io/EDUS/)

  * [Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis](https://arxiv.org/abs/2405.14868)
:star:[code](https://github.com/basilevh/gcd)

  * [Generalizable Human Gaussians for Sparse View Synthesis](https://arxiv.org/abs/2407.12777)
:house:[project](https://humansensinglab.github.io/Generalizable-Human-Gaussians/)

  * [Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis](http://arxiv.org/abs/2409.08042v1)
:star:[code](https://github.com/mzzcdf/Thermal3DGS)



## 44.Dataset/Benchmark(数据集/基准)

* [FYI: Flip Your Images for Dataset Distillation](http://arxiv.org/abs/2407.08113v1)

* [Neural Spectral Decomposition for Dataset Distillation](http://arxiv.org/abs/2408.16236v1)
:star:[code](https://github.com/slyang2021/NSD)

* [Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching](https://arxiv.org/abs/2410.07579)
:star:[code](https://github.com/Lexie-YU/Teddy)

* [Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation](https://arxiv.org/abs/2305.18381)
:star:[code](https://github.com/silicx/GoldFromOres-BiLP)

* [COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark](https://arxiv.org/abs/2408.02272)

* 基准

  * [MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models](https://arxiv.org/abs/2311.17600)
:star:[code](https://github.com/isXinLiu/MM-SafetyBench)

  * [DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition](http://arxiv.org/abs/2407.05106v1)
:star:[code](https://github.com/QiWang233/DailyDVS-200)

  * [Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter](http://arxiv.org/abs/2407.08109v1)
:star:[code](https://github.com/zhang-chenxu/LSM-Adapter)

  * [MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes](http://arxiv.org/abs/2407.10121v1)

  * [BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events](https://arxiv.org/abs/2410.20451)br>:house:[project](https://www.blinkvision.net/)

  * [SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09762.pdf)
:star:[code](https://github.com/aidecentralized/InferenceBenchmark)

  * [A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11606.pdf)
:star:[code](https://github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench)

  * [BAFFLE: A Baseline of Backpropagation-Free Federated Learning](https://arxiv.org/abs/2301.12195)
:star:[code](https://github.com/FengHZ/BAFFLE)

  * [Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking](https://arxiv.org/abs/2406.04316)

  * [Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations](https://arxiv.org/abs/2308.16349)
:house:[project](https://affective-visual-dialog.github.io/)

  * [UniIR: Training and Benchmarking Universal Multimodal Information Retrievers](https://arxiv.org/abs/2311.17136)
:house:[project](https://tiger-ai-lab.github.io/UniIR/)

  * [HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis](http://arxiv.org/abs/2407.16269v1)

  * [OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding](https://arxiv.org/abs/2406.07471)
:house:[project](https://minghu0830.github.io/OphNet-benchmark/)

  * [PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines](https://arxiv.org/abs/2407.08418)
:star:[code](https://github.com/OpenEarthLab/PredBench)

  * [Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach](https://arxiv.org/abs/2408.07500)
:star:[code](https://github.com/FHR-L/VSLA-CLIP)

  * [R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations](https://arxiv.org/abs/2403.04924)
:star:[code](https://github.com/lxa9867/r2bench)

  * [m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01486.pdf)
:star:[code](https://github.com/RAIVNLab/mms)
🤗[huggingface](https://huggingface.co/datasets/zixianma/mms)

  * [PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology](https://arxiv.org/abs/2401.16355)
🤗[huggingface](https://huggingface.co/papers/2401.16355)

  * [LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow](http://arxiv.org/abs/2409.05688v1)
:house:[project](https://layeredflow.cs.princeton.edu)

  * [HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects](http://arxiv.org/abs/2407.12371v1)
:star:[code](https://lvxintao.github.io/himo)

  * [When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset](http://arxiv.org/abs/2407.10125v1)
:star:[code](https://github.com/BubblyYi/MMPedestron)

* 数据集

  * [VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models](https://arxiv.org/abs/2311.17404)
:star:[code](https://github.com/lscpku/VITATECS)

  * [HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning](http://arxiv.org/abs/2407.15680v1)
:star:[code](https://github.com/google/haloquest)

  * [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553)

  * [COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08183.pdf)
:sunflower:[dataset](https://github.com/omron-sinicx/com_kitchens)

  * [Seeing Faces in Things: A Model and Dataset for Pareidolia](https://arxiv.org/abs/2409.16143)
:sunflower:[dataset](https://aka.ms/faces-in-things)

  * [Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08206.pdf)
:sunflower:[dataset](https://github.com/dualtransparency/TCLD)

  * [GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns](https://arxiv.org/abs/2405.17609)
:house:[project](https://igl.ethz.ch/projects/GarmentCodeData/)

  * [SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03555.pdf)
:sunflower:[dataset](https://github.com/sutdcv/SemTrack)

  * [WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing](https://arxiv.org/abs/2402.09430)
:star:[code](https://github.com/huangshk/WiMANS)

  * [BugNIST - a Large Volumetric Dataset for Detection under Domain Shift](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04613.pdf)

  * [Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics](https://arxiv.org/abs/2310.17316)
:star:[code](https://github.com/EnVision-Research/Defect_Spectrum)
:house:[project](https://envision-research.github.io/Defect_Spectrum/)大规模缺陷数据集

  * [Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal](http://arxiv.org/abs/2407.16957v1)
:star:[code](https://github.com/jinyeying/RaindropClarity)

  * [PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition](http://arxiv.org/abs/2407.10918v1)
:star:[code](https://github.com/LixiaoTHU/PartImageNetPP)

  * [WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding](http://arxiv.org/abs/2407.15350v1)
:star:[code](https://woven-visionai.github.io/wts-dataset-homepage/)

  * [MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception](https://arxiv.org/abs/2406.10708)

  * [SkyScenes: A Synthetic Dataset for Aerial Scene Understanding](https://arxiv.org/abs/2312.06719)
:house:[project](https://hoffman-group.github.io/SkyScenes/)

  * [Caltech Aerial RGB-Thermal Dataset in the Wild](https://arxiv.org/abs/2403.08997)
:star:[code](https://github.com/aerorobotics/caltech-aerial-rgbt-dataset)

  * [V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception](https://arxiv.org/abs/2403.16034)

  * [H-V2X: A Large Scale Highway Dataset for BEV Perception](https://eccv.ecva.net/virtual/2024/poster/126)

  * [PetFace: A Large-Scale Dataset and Benchmark for Animal Identification](http://arxiv.org/abs/2407.13555v1)
:star:[code](https://dahlian00.github.io/PetFacePage/)

  * [Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework](http://arxiv.org/abs/2407.08377v1)

  * [OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects](http://arxiv.org/abs/2407.08711v1)
:star:[code](https://omninocs.github.io)

  * [SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark](https://arxiv.org/abs/2310.20436)
:star:[code](https://github.com/ZhengdiYu/SignAvatars)
:house:[project](https://signavatars.github.io/)

  * [Insect Identification in the Wild: The AMI Dataset](https://arxiv.org/abs/2406.12452)
:star:[code](https://github.com/RolnickLab/ami-dataset)野外昆虫识别：AMI 数据集

  * [RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception](https://arxiv.org/abs/2405.09883)
:sunflower:[dataset](https://github.com/xiaosu-zhu/RoScenes)

* 数据增强

  * [SUMix: Mixup with Semantic and Uncertain Information](http://arxiv.org/abs/2407.07805v1)
:star:[code](https://github.com/JinXins/SUMix)

  * [Data Augmentation via Latent Diffusion for Saliency Prediction](http://arxiv.org/abs/2409.07307v1)

  * [FreeAugment: Data Augmentation Search Across All Degrees of Freedom](http://arxiv.org/abs/2409.04820v1)
:star:[code](https://tombekor.github.io/FreeAugment-web)

  * [Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective](https://arxiv.org/abs/2312.04763)
:star:[code](https://github.com/Noah888/DAR)



## 43.Sound

* [Audio-Synchronized Visual Animation](https://arxiv.org/abs/2403.05659)
:star:[code](https://github.com/lzhangbj/ASVA)
:house:[project](https://lzhangbj.github.io/projects/asva/asva.html)

* [Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation](https://arxiv.org/pdf/2305.03907.pdf)
:house:[project](https://bolinlai.github.io/CSTS-EgoGazeAnticipation/)

* [Label-anticipated Event Disentanglement for Audio-Visual Video Parsing](http://arxiv.org/abs/2407.08126v1)

* [Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity](http://arxiv.org/abs/2407.10387v1)
:star:[code](https://maskvat.github.io)

* [Spherical World-Locking for Audio-Visual Localization in Egocentric Videos](https://arxiv.org/abs/2408.05364)

* [Self-Supervised Audio-Visual Soundscape Stylization](http://arxiv.org/abs/2409.14340v1)
:house:[project](https://tinglok.netlify.app/files/avsoundscape/)

* [CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios](https://arxiv.org/abs/2403.04640)
:star:[code](https://github.com/rikeilong/Bay-CAT)视听场景

* [Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores](https://arxiv.org/abs/2404.07336)

* [Siamese Vision Transformers are Scalable Audio-visual Learners](https://arxiv.org/abs/2403.19638)
:star:[code](https://github.com/GenjiB/AVSiam)视听学习器

* [Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos](https://arxiv.org/abs/2406.09272)
:house:[project](https://vision.cs.utexas.edu/projects/action2sound)生成环境感知的动作声音

* [Audio-visual Generalized Zero-shot Learning the Easy Way](https://arxiv.org/abs/2407.13095)

* 视听分割

  * [Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes](http://arxiv.org/abs/2407.10957v1)
:star:[code](https://gewu-lab.github.io/Ref-AVS)

  * [Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation](http://arxiv.org/abs/2407.11820v1)
:star:[code](https://gewu-lab.github.io/stepping_stones)
:star:[code](https://gewu-lab.github.io/stepping_stones/)

  * [CPM: Class-conditional Prompting Machine for Audio-visual Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01634.pdf)视听分割



## 42.Optical Flow Estimation(光流估计)

* [SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow](https://arxiv.org/abs/2405.14793)
:star:[code](https://github.com/princeton-vl/SEA-RAFT)



## 41.Biomedical(生物特征识别)

* [Open-Set Biometrics: Beyond Good Closed-Set Models](http://arxiv.org/abs/2407.16133v1)
:star:[code](https://github.com/prevso1088/open-set-biometrics)



## 40.Object Pose Estimation(物体姿态估计)

* [SCAPE: A Simple and Strong Category-Agnostic Pose Estimator](http://arxiv.org/abs/2407.13483v1)
:star:[code](https://github.com/tiny-smart/SCAPE)

* [SRPose: Two-view Relative Pose Estimation with Sparse Keypoints](http://arxiv.org/abs/2407.08199v1)
:house:[project](https://frickyinn.github.io/srpose)

* [FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation](http://arxiv.org/abs/2409.16600v1)
:star:[code](github.com/tjy0703/FAFA)

* [A Graph-Based Approach for Category-Agnostic Pose Estimation](https://arxiv.org/abs/2311.17891)
:house:[project](https://orhir.github.io/pose-anything/)

* [GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence](https://arxiv.org/abs/2311.13777)

* [OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation](http://arxiv.org/abs/2408.16547v1)
:star:[code](https://github.com/YC-Che/OP-Align)

* [FoundPose: Unseen Object Pose Estimation with Foundation Features](https://arxiv.org/abs/2311.18809)
:house:[project](http://evinpinar.github.io/foundpose)

* [LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation](http://arxiv.org/abs/2409.15727v1)
:star:[code](https://github.com/lolrudy/LaPose)

* [U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01566.pdf)

* [PACE: Pose Annotations in Cluttered Environments](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06837.pdf)
:star:[code](https://github.com/qq456cvb/PACE)

* 6-DoF

  * [An Economic Framework for 6-DoF Grasp Detection](http://arxiv.org/abs/2407.08366v1)
:star:[code](https://github.com/iSEE-Laboratory/EconomicGrasp)

  * [Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation](https://arxiv.org/abs/2311.09500)

  * [Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance](http://arxiv.org/abs/2407.13842v1)
:star:[code](https://airvlab.github.io/grasp-anything)

  * [Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation](https://arxiv.org/abs/2409.18261)
:star:[code](https://github.com/3dtopia/omni6d)

  * [6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model](http://arxiv.org/abs/2407.15484v1)
:star:[code](https://mbortolon97.github.io/6dgs/)

  * [FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models](https://arxiv.org/abs/2312.00947)
:house:[project](https://andreacaraffa.github.io/freeze/)

* 相机姿态估计

  * [ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation](http://arxiv.org/abs/2408.09042v1)

  * [Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection](https://arxiv.org/abs/2312.04527)

* 计数

  * [AFreeCA: Annotation-Free Counting for All](https://arxiv.org/abs/2403.04943)计数

  * [Zero-shot Object Counting with Good Exemplars](https://arxiv.org/abs/2407.04948)

  * [ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting](https://arxiv.org/abs/2309.04820)
:star:[code](https://github.com/ActiveVisionLab/ABC123)
:house:[project](https://abc123.active.vision/)计数

  * [Class-Agnostic Object Counting with Text-to-Image Diffusion Model](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08663.pdf)

  * [Shifted Autoencoders for Point Annotation Restoration in Object Counting](https://arxiv.org/abs/2312.07190)



## 39.Robots(机器人)

* [See and Think: Embodied Agent in Virtual Environment](https://arxiv.org/abs/2311.15209)
:house:[project](https://rese1f.github.io/STEVE/)

* [SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs](https://arxiv.org/abs/2404.00469)

* [V-IRL: Grounding Virtual Intelligence in Real Life](https://arxiv.org/abs/2402.03310)
:star:[code](https://github.com/VIRL-Platform/VIRL)

* 机器人

  * [Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation](https://arxiv.org/abs/2401.07487)
:house:[project](https://tea-lab.github.io/Robo-ABC/)

  * [Learning Cross-hand Policies of High-DOF Reaching and Grasping](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04377.pdf)机器人

  * [DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control](http://arxiv.org/abs/2407.14758v1)
:star:[code](https://github.com/AllenXuuu/DISCO)

  * [Real-time Holistic Robot Pose Estimation with Unknown States](https://arxiv.org/abs/2402.05655)
:star:[code](https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation)

  * [ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation](https://arxiv.org/abs/2403.08321)
:star:[code](https://github.com/GuanxingLu/ManiGaussian)
:house:[project](https://guanxinglu.github.io/ManiGaussian/)

  * [Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts](http://arxiv.org/abs/2407.14872v1)

  * [GraspXL: Generating Grasping Motions for Diverse Objects at Scale](https://arxiv.org/pdf/2403.19649.pdf)
:star:[code](https://github.com/zdchan/graspxl)
:house:[project](https://eth-ait.github.io/graspxl/)

  * [UGG: Unified Generative Grasping](https://arxiv.org/abs/2311.16917)
:house:[project](https://jiaxin-lu.github.io/ugg/)机器人

  * [Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation](http://arxiv.org/abs/2407.14062v1)
:star:[code](https://github.com/florasion/D-VQVAE)

  * [Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation](https://arxiv.org/abs/2405.01527)
:house:[project](https://homangab.github.io/track2act/)机器人

* 导航

  * [NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models](http://arxiv.org/abs/2407.12366v1)
:star:[code](https://github.com/GengzeZhou/NavGPT-2) 

  * [Prioritized Semantic Learning for Zero-shot Instance Navigation](https://arxiv.org/abs/2403.11650)
:star:[code](https://github.com/XinyuSun/PSL-InstanceNav)导航 

* VPR

  * [Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition](https://arxiv.org/abs/2407.02422)
:star:[code](https://github.com/serizba/cliquemining)

  * [Navigation Instruction Generation with BEV Perception and Large Language Models](http://arxiv.org/abs/2407.15087v1)
:star:[code](https://github.com/FanScy/BEVInstructor)

  * [Revisit Anything: Visual Place Recognition via Image Segment Retrieval](http://arxiv.org/abs/2409.18049v1)
:star:[code](https://github.com/AnyLoc/Revisit-Anything)

  * [VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition](https://arxiv.org/abs/2409.19293)
:star:[code](https://github.com/Ahmedest61/VLAD-BuFF/)

  * [MeshVPR: Citywide Visual Place Recognition Using 3D Meshes](https://arxiv.org/abs/2406.02776)
:star:[code](https://github.com/gmberton/MeshVPR)

* SLAM

  * [Deep Patch Visual SLAM](https://arxiv.org/abs/2408.01654)
:star:[code](https://github.com/princeton-vl/DPVO)

  * [RGBD GS-ICP SLAM](https://arxiv.org/abs/2403.12550)
:star:[code](https://github.com/Lab-of-AI-and-Robotics/GS_ICP_SLAM)

  * [I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM](https://arxiv.org/abs/2407.11347)

  * [Hyperion - A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM](http://arxiv.org/abs/2407.07074v1)
:star:[code](https://github.com/VIS4ROB-lab/hyperion)

  * [SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM](https://arxiv.org/abs/2402.03246)

  * [LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10364.pdf)

  * [I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM](http://arxiv.org/abs/2407.11347v1)

  * [Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM](https://arxiv.org/abs/2407.13338)

  * [Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11219.pdf)

  * [CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field](https://arxiv.org/abs/2403.16095)
:star:[code](https://github.com/hjr37/CG-SLAM)

* Try-On

  * [Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models](https://arxiv.org/abs/2403.07371)

  * [Improving Virtual Try-On with Garment-focused Diffusion Models](http://arxiv.org/abs/2409.08258v1)
:star:[code](https://github.com/siqi0905/GarDiff/tree/master)

  * [Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment](https://arxiv.org/abs/2403.12965)
:star:[code](https://github.com/mengtingchen/wear-any-way-page)
:house:[project](https://mengtingchen.github.io/wear-any-way-page/)

  * [Improving Diffusion Models for Authentic Virtual Try-on in the Wild](https://arxiv.org/abs/2403.05139)
:star:[code](https://github.com/yisol/IDM-VTON)

  * [D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On](https://arxiv.org/abs/2407.15111)
:star:[code](https://github.com/Jerome-Young/D4-VTON)

  * [WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models](https://arxiv.org/abs/2407.10625)
:star:[code](https://github.com/scnuhealthy/video_try_on)

* 交叉地理定位

  * [GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers](http://arxiv.org/abs/2408.02840v1)
:star:[code](https://github.com/manupillai308/GAReT)

  * [Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network](https://arxiv.org/abs/2408.05475)
:star:[code](https://github.com/yejy53/EP-BEV)

  * [ConGeo: Robust Cross-view Geo-localization across Ground View Variations](https://arxiv.org/abs/2403.13965)
:star:[code](https://github.com/eceo-epfl/ConGeo)
:house:[project](https://eceo-epfl.github.io/ConGeo/)交叉视角地理定位 

  * [Benchmarking the Robustness of Cross-view Geo-localization Models](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11762.pdf)

  * [CityGuessr: City-Level Video Geo-Localization on a Global Scale](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08031.pdf)

* 地理定位

  * [Statewide Visual Geolocalization in the Wild](https://arxiv.org/abs/2409.16763)
:star:[code](https://github.com/fferflo/statewide-visual-geolocalization)

* Avatars(虚拟人)

  * [CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images](http://arxiv.org/abs/2407.04345v1)
:star:[code](https://github.com/jsshin98/CanonicalFusion)

  * [RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models](http://arxiv.org/abs/2407.06938v1)
:star:[code](https://rodinhd.github.io/)

  * [MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos](https://arxiv.org/abs/2407.08414)
:star:[code](https://github.com/shad0wta9/meshavatar)

  * [PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations](https://arxiv.org/abs/2404.04421)
:house:[project](https://qingqing-zhao.github.io/PhysAvatar)

  * [iHuman: Instant Animatable Digital Humans From Monocular Videos](http://arxiv.org/abs/2407.11174v1)

  * [PAV: Personalized Head Avatar from Unstructured Video Collection](https://arxiv.org/abs/2407.21047)
:house:[project](https://akincaliskan3d.github.io/PAV)

  * [Disentangled Clothed Avatar Generation from Text Descriptions](https://arxiv.org/abs/2312.05295)
:house:[project](https://shanemankiw.github.io/SO-SMPL/)服装头像生成

  * [MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08299.pdf)
:house:[project](https://syntec-research.github.io/MagicMirror/)

  * [3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03433.pdf)

  * [FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10720.pdf)
:star:[code](https://github.com/humansensinglab/FAMOUS)3D 人体数字化  

  * [Instant 3D Human Avatar Generation using Image Diffusion Models](https://arxiv.org/abs/2406.07516)
:house:[project](https://www.nikoskolot.com/avatarpopup/)

  * [Let the Avatar Talk using Texts without Paired Training Data](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12305.pdf)

* VR

  * [EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10261.pdf)



## 38.Human-Object Interaction(人机交互)

* [Controllable Human-Object Interaction Synthesis](https://arxiv.org/pdf/2312.03913.pdf)
:house:[project](https://lijiaman.github.io/projects/chois/)

* [F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions](http://arxiv.org/abs/2407.12435v1)

* [Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04769.pdf)
:star:[code](https://github.com/southnx/IcH-Vid-HOI)

* [Look Hear: Gaze Prediction for Speech-directed Human Attention](http://arxiv.org/abs/2407.19605v1)
:star:[code](https://github.com/cvlab-stonybrook/ART)

* [Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model](http://arxiv.org/abs/2408.01044v1)
:star:[code](https://github.com/jinyang06/SamGOP)

* [Revisit Human-Scene Interaction via Space Occupancy](https://arxiv.org/abs/2312.02700)
:house:[project](https://foruck.github.io/occu-page/)人机交互

* [Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection](https://arxiv.org/abs/2408.02484)
:star:[code](https://github.com/ltttpku/CMMP)

* [AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation](https://arxiv.org/abs/2406.01194)

* 手-物

  * [NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model](http://arxiv.org/abs/2407.12727v1)

  * [Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics](http://arxiv.org/abs/2409.04033v1)
:star:[code](https://hograspnet2024.github.io/)

  * [Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?](https://arxiv.org/abs/2312.02672)
:star:[code](https://github.com/fpv-iplab/HOI-Synth)

  * [Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06748.pdf) 



## 37.Style Transfer(风格迁移)

* [Towards compact reversible image representations for neural style transfer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08321.pdf)

* 运动迁移

  * [Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation](http://arxiv.org/abs/2407.11266v1)
:star:[code](https://github.com/rongakowang/MMDMC)



## 36.Gaze Estimation

* [De-confounded Gaze Estimation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03367.pdf)

* [3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03191.pdf)
:star:[code](https://github.com/eververas/3DGazeNet)

* [LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation](https://arxiv.org/abs/2411.08606)

* [Gaze Target Detection Based on Head-Local-Global Coordination](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03933.pdf)



## 35.Action Detection(动作检测)

* [LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning](https://arxiv.org/pdf/2312.03849.pdf)
:star:[code](https://github.com/BolinLai/LEGO)
:house:[project](https://bolinlai.github.io/Lego_EgoActGen/)

* [ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos](http://arxiv.org/abs/2407.12987v1)

* [Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition](https://arxiv.org/abs/2403.14113)

* [Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10749.pdf)运动关键帧插值

* 基于骨架的动作识别

  * [SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders](http://arxiv.org/abs/2407.13460v1)
:star:[code](https://github.com/pha123661/SA-DVAE)

  * [Towards Physical World Backdoor Attacks against Skeleton Action Recognition](https://arxiv.org/abs/2408.08671)
:house:[project](https://qichenzheng.github.io/psba-website/)

  * [S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04755.pdf)
:house:[project](https://sjepa.github.io)

  * [Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03717.pdf)
:star:[code](https://github.com/LanglandsLin/IGM)

  * [CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner](https://arxiv.org/abs/2403.10082)

* 小样本动作识别

  * [Trajectory-aligned Space-time Tokens for Few-shot Action Recognition](http://arxiv.org/abs/2407.18249v1)
:house:[project](https://www.cs.umd.edu/~pulkit/tats)

  * [Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00305.pdf)
:star:[code](https://github.com/cong-wu/EMP-Net)

* 时序动作检测

  * [DyFADet: Dynamic Feature Aggregation for Temporal Action Detection](http://arxiv.org/abs/2407.03197v1)
:star:[code](https://github.com/yangle15/DyFADet-pytorch)

  * [UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection](https://arxiv.org/abs/2404.04933)
:star:[code](https://github.com/yingsen1/UniMD)

* 时序动作定位

  * [HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization](https://arxiv.org/abs/2408.06437)
:star:[code](https://github.com/sakibreza/ECCV24-HAT)

  * [Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization](http://arxiv.org/abs/2407.07673v1)

  * [Online Temporal Action Localization with Memory-Augmented Transformer](http://arxiv.org/abs/2408.02957v1)
:house:[project](https://cvlab.postech.ac.kr/research/MATR/)

  * [Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01159.pdf)

* 时序动作分割

  * [Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment](http://arxiv.org/abs/2408.09919v1)
:star:[code](https://github.com/pangzhan27/GTLA)

  * [Two-Stage Active Learning for Efficient Temporal Action Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06348.pdf)

  * [Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07145.pdf)
:star:[code](https://github.com/HaoyuJi/LaSA)

  * [Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs](https://arxiv.org/abs/2312.02638)
:star:[code](https://github.com/fpv-iplab/synchronization-is-all-you-need)

* 动作质量评估

  * [Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment](http://arxiv.org/abs/2407.19675v1)
:star:[code](https://github.com/wuli55555/TRS)

  * [RICA^2: Rubric-Informed, Calibrated Assessment of Actions](https://arxiv.org/abs/2408.02138)
:house:[project](https://abrarmajeedi.github.io/rica2_aqa/)

  * [Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05909.pdf)动作质量评估

  * [MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment](https://arxiv.org/abs/2403.04398)
:star:[code](https://github.com/ZhouKanglei/MAGR_CAQA)

* 动作预测 

  * [Semantically Guided Representation Learning For Action Anticipation](http://arxiv.org/abs/2407.02309v1)
:star:[code](https://github.com/ADiko1997/S-GEAR)

  * [PALM: Predicting Actions through Language Models](https://arxiv.org/abs/2311.17944)预测动作

* 动作识别

  * [Referring Atomic Video Action Recognition](https://arxiv.org/abs/2407.01872)
:star:[code](https://github.com/KPeng9510/RAVAR)

  * [DEAR: Depth-Enhanced Action Recognition](https://arxiv.org/abs/2408.15679)

  * [Bayesian Evidential Deep Learning for Online Action Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02475.pdf)

  * [C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition](http://arxiv.org/abs/2407.06113v1)
:star:[code](https://github.com/RongchangLi/ZSCAR_C2C)

  * [Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition](http://arxiv.org/abs/2407.06628v1)

  * [Classification Matters: Improving Video Action Detection with Class-Specific Attention](http://arxiv.org/abs/2407.19698v1)

  * [FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition](http://arxiv.org/abs/2409.01448v1)
:house:[project](https://daveishan.github.io/finepsuedo-webpage/)

  * [Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10056.pdf)

  * [Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition](https://arxiv.org/abs/2405.19917)
:house:[project](https://masashi-hatano.github.io/MM-CDFSL/)

  * [On the Utility of 3D Hand Poses for Action Recognition](https://arxiv.org/abs/2403.09805)
:house:[project](https://s-shamil.github.io/HandFormer/)

  * [POET: Prompt Offset Tuning for Continual Human Action Adaptation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08141.pdf)
:star:[code](https://github.com/humansensinglab/)

  * [Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01016.pdf)
:star:[code](https://github.com/BNU-IVC/OccGait)

  * [Leveraging temporal contextualization for video action recognition](https://arxiv.org/abs/2404.09490)
:star:[code](https://github.com/naver-ai/tc-clip)

  * [Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01635.pdf)

  * [SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition](https://arxiv.org/abs/2403.09508)
:house:[project](https://kaist-viclab.github.io/SkateFormer_site/)

* 动作理解  

  * [EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding](https://arxiv.org/abs/2406.08877)
:star:[code](https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main)

* 群体动作识别

  * [Towards More Practical Group Activity Detection: A New Benchmark and Model](https://arxiv.org/abs/2312.02878)
:house:[project](https://cvlab.postech.ac.kr/research/CAFE)

  * [Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition](https://arxiv.org/abs/2405.18012)

  * [Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph](https://arxiv.org/abs/2407.19497)
:star:[code](https://github.com/mgiant/MP-GCN)

* 癫痫发作检测

  * [VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG](https://arxiv.org/abs/2311.14775)



## 34.Visual Question Answering(视觉问答)

* [DriveLM: Driving with Graph Visual Question Answering](https://arxiv.org/abs/2312.14150)
:star:[code](https://github.com/OpenDriveLab/DriveLM)

* [Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following](https://arxiv.org/abs/2406.02774)

* [WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering](http://arxiv.org/abs/2407.05603v1)
:star:[code](https://github.com/cpystan/WSI-VQA)

* [GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02569.pdf)

* [Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge](https://arxiv.org/abs/2401.10712)
:star:[code](https://github.com/WHB139426/QA-Prompts)

* [Compositional Substitutivity of Visual Reasoning for Visual Question Answering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06434.pdf)
:star:[code](https://github.com/NeverMoreLCH/CG-SPS)

* [Fully Authentic Visual Question Answering Dataset from Online Communities](https://arxiv.org/abs/2311.15562)
:house:[project](https://vqaonline.github.io/)

* [An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08395.pdf)

* 音视频问答

  * [Learning Trimodal Relation for AVQA with Missing Modality](http://arxiv.org/abs/2407.16171v1)

* 视频问答

  * [Video Question Answering with Procedural Programs](https://arxiv.org/abs/2312.00937)
:house:[project](https://rccchoudhury.github.io/proviq2023/)

  * [ViLA: Efficient Video-Language Alignment for Video Question Answering](https://arxiv.org/abs/2312.08367)
:star:[code](https://github.com/xijun-cs/ViLA)

  * [TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00720.pdf)VQA

  * [AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering](https://arxiv.org/abs/2311.14906)
:star:[code](https://github.com/Xiuyuan-Chen/AutoEval-Video)

* 视听问答

  * [Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality](https://arxiv.org/abs/2407.16171)



## 33.Motion Generation(人体运动生成)

* [Event-Based Motion Magnification](https://arxiv.org/abs/2402.11957)
:star:[code](https://github.com/OpenImagingLab/emm)

* [Learning-based Axial Video Motion Magnification](https://arxiv.org/abs/2312.09551)
:house:[project](https://axial-momag.github.io/axial-momag/)

* [SMooDi: Stylized Motion Diffusion Model](http://arxiv.org/abs/2407.12783v1)
:star:[code](https://neu-vi.github.io/SMooDi/)

* [Length-Aware Motion Synthesis via Latent Diffusion](http://arxiv.org/abs/2407.11532v1)
:star:[code](https://github.com/AlessioSam/LADiff)

* [HUMOS: Human Motion Model Conditioned on Body Shape](http://arxiv.org/abs/2409.03944v1)
:star:[code](https://CarstenEpic.github.io/humos/)

* [HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance](http://arxiv.org/abs/2407.06937v1)
:star:[code](https://github.com/Enderfga/HumanRefiner)

* [Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07885.pdf)
:house:[project](https://panwliu.github.io/mhc/)

* [Generating Human Interaction Motions in Scenes with Text Control](https://arxiv.org/abs/2404.10685)
:house:[project](https://research.nvidia.com/labs/toronto-ai/tesmo/)运动生成

* [Motion Mamba: Efficient and Long Sequence Motion Generation](https://arxiv.org/abs/2403.07487)
:star:[code](https://github.com/steve-zeyu-zhang/MotionMamba/)
:house:[project](https://steve-zeyu-zhang.github.io/MotionMamba/)

* [Large Motion Model for Unified Multi-Modal Motion Generation](https://arxiv.org/abs/2404.01284)
:house:[project](https://mingyuan-zhang.github.io/projects/LMM.html)

* [EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation](https://arxiv.org/abs/2312.02256)
:star:[code](https://github.com/Frank-ZY-Dou/EMDM)
:house:[project](https://frank-zy-dou.github.io/projects/EMDM/index.html)

* [Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases](https://arxiv.org/abs/2310.04189)
:house:[project](https://foruck.github.io/KP/)人体运动

* [TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01796.pdf)
:house:[project](https://yufu-wang.github.io/tram4d/)人体运动

* [Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03541.pdf)人体运动

* [FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models](https://arxiv.org/abs/2406.10740)

* [MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model](https://arxiv.org/abs/2404.19759)
:star:[code](https://github.com/Dai-Wenxun/MotionLCM)

* [Realistic Human Motion Generation with Cross-Diffusion Models](https://arxiv.org/abs/2312.10993)
:house:[project](https://wonderno.github.io/CrossDiff-webpage/)人体运动

* [CoMo: Controllable Motion Generation through Language Guided Pose Code Editing](https://arxiv.org/abs/2403.13900)
:house:[project](https://yh2371.github.io/como/)生成可控运动

* [TLControl: Trajectory and Language Control for Human Motion Synthesis](https://arxiv.org/abs/2311.17135)
:house:[project](https://tlcontrol.weilinwl.com/)人体运动合成

* [Retrieval Robust to Object Motion Blur](https://arxiv.org/abs/2404.18025)
:star:[code]((https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur)

* 三维人体运动合成

  * [ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions](https://arxiv.org/pdf/2311.17057.pdf)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/remos/)

* 文本-动作合成

  * [FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis](https://arxiv.org/pdf/2405.15763)

  * [Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation](http://arxiv.org/abs/2407.10528v1)
:star:[code](https://jpthu17.github.io/GuidedMotion-project/)

  * [Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation](https://arxiv.org/abs/2312.14828)
:house:[project](https://moonsliu.github.io/Pro-Motion/)

  * [ParCo: Part-Coordinating Text-to-Motion Synthesis](https://arxiv.org/abs/2403.18512)
:star:[code](https://github.com/qrzou/ParCo)

* 人体运动预测

  * [Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04599.pdf)人体运动预测

  * [Scene-aware Human Motion Forecasting via Mutual Distance Prediction](https://arxiv.org/abs/2310.00615)

* 人体运动估计

  * [MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation](https://static.siplab.org/papers/eccv2024-manikin.pdf)
:house:[project](https://siplab.org/projects/MANIKIN)

* 运动估计

  * [Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation](http://arxiv.org/abs/2407.10802v1)
:star:[code](https://github.com/tub-rip/MotionPriorCMax)

  * [COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation](https://arxiv.org/abs/2408.16426)

* 舞蹈生成

  * [Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation](https://arxiv.org/abs/2407.07554)
:house:[project](https://zikaihuangscut.github.io/Beat-It/)

* 行为生成

  * [DIM: Dyadic Interaction Modeling for Social Behavior Generation](https://arxiv.org/abs/2403.09069)
:star:[code](https://github.com/Boese0601/Dyadic-Interaction-Modeling)

* 运动迁移  

  * [Temporal Residual Jacobians for Rig-free Motion Transfer](https://arxiv.org/abs/2407.14958)
:house:[project](https://temporaljacobians.github.io/)
🤗[huggingface](https://huggingface.co/papers/2407.14958)

* 运动预测

  * [Enhanced Motion Forecasting with Visual Relation Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07336.pdf)



## 32.Person Re-Identification(人员重识别)

* [Human-in-the-Loop Visual Re-ID for Population Size Estimation](https://arxiv.org/abs/2312.05287)
:star:[code](https://github.com/cvl-umass/counting-clusters)

* 行人重识别

  * [Keypoint Promptable Re-Identification](https://arxiv.org/abs/2407.18112)
:star:[code](https://github.com/VlSomers/keypoint_promptable_reidentification)

  * [Privacy-Preserving Adaptive Re-Identification without Image Transfer](http://arxiv.org/abs/2407.12589v1)

  * [Rethinking Normalization Layers for Domain Generalizable Person Re-identification](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08753.pdf)
:star:[code](https://github.com/3699nr/ReNorm)

  * [Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09119.pdf)

  * VI-ReID

    * [Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification](https://arxiv.org/abs/2401.06825)
:thumbsup:[无监督可见光-红外行人重识别（USL-VI-ReID）](https://std.xmu.edu.cn/2024/0710/c4739a488273/page.htm)

    * [WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification](https://www.arxiv.org/abs/2408.10624)

* 人物搜索

  * [PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery](https://arxiv.org/abs/2409.13475)基于文本的人物搜索

* 步态识别

  * [Camera-LiDAR Cross-modality Gait Recognition](https://arxiv.org/abs/2407.02038)

  * [Free Lunch for Gait Recognition: A Novel Relation Descriptor](https://arxiv.org/abs/2308.11487)

  * [Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition](http://arxiv.org/abs/2407.12519v1)

  * [Cut out the Middleman: Revisiting Pose-based Gait Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04501.pdf)
:star:[code](https://github.com/BNU-IVC/FastPoseGait)

* 计数

  * [CountFormer: Multi-View Crowd Counting Transformer](http://arxiv.org/abs/2407.02047v1)

  * [Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM](https://arxiv.org/abs/2402.17514)

  * [Multi-modal Crowd Counting via a Broker Modality](http://arxiv.org/abs/2407.07518v1)
:star:[code](https://github.com/HenryCilence/Broker-Modality-Crowd-Counting)

  * [Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance](https://arxiv.org/abs/2405.10589)
:star:[code](https://github.com/AaronCIH/APGCC)



## 31.Point Clouds(点云)

* [SEED: A Simple and Effective 3D DETR in Point Clouds](http://arxiv.org/abs/2407.10749v1)
:star:[code](https://github.com/happinesslz/SEED)

* [PointLLM: Empowering Large Language Models to Understand Point Clouds](https://arxiv.org/abs/2308.16911)
:star:[code](https://github.com/OpenRobotLab/PointLLM)
:house:[project](https://runsenxu.com/projects/PointLLM/)

* [TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds](https://arxiv.org/abs/2407.12702)

* [Learning to Adapt SAM for Segmenting Cross-domain Point Clouds](https://arxiv.org/abs/2310.08820)

* [Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time](https://export.arxiv.org/abs/2407.01851)

* [milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing](https://arxiv.org/abs/2306.17010)
:star:[code](https://github.com/Toytiny/milliFlow)

* [Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement](http://arxiv.org/abs/2408.02966v1)

* [Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes](http://arxiv.org/abs/2408.14279v1)
:star:[code](https://github.com/chenchao15/Unseen)

* [T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning](https://arxiv.org/abs/2312.10217)
:star:[code](https://github.com/codename1995/T-MAE)

* [Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds](https://arxiv.org/abs/2311.16474v2)
:star:[code](https://github.com/xiaoyao3302/PCFEA)

* [PFGS: High Fidelity Point Cloud Rendering via Feature Splatting](https://arxiv.org/abs/2407.03857)
:star:[code](https://github.com/Mercerai/PFGS)

* [Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09814.pdf)
:star:[code](https://github.com/yh-han/M2PSC.git)

* [To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning](https://arxiv.org/abs/2403.17869)

* [Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing](https://arxiv.org/abs/2311.16043)
:star:[code](https://github.com/NJU-3DV/Relightable3DGaussian)

* [FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation](https://arxiv.org/abs/2410.19573)
:star:[code](https://github.com/genuszty/FastPCI)

* 点云生成

  * [RangeLDM: Fast Realistic LiDAR Point Cloud Generation](https://arxiv.org/abs/2403.10094)
:star:[code](https://github.com/WoodwindHu/RangeLDM)

  * [Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07328.pdf)
:star:[code](https://github.com/wuyang98/Text2LiDAR)

  * [Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation](https://arxiv.org/abs/2312.07231)
:house:[project](https://dit-3d.github.io/FastDiT-3D/)

  * [FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation](https://arxiv.org/abs/2311.12090)
:house:[project](https://chenliang-zhou.github.io/FrePolad/)

* 点云完成

  * [Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion](http://arxiv.org/abs/2407.02887v1)
:star:[code](https://github.com/WHU-USI3DV/EGIInet)

  * [T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy](http://arxiv.org/abs/2407.05008v1)
:star:[code](https://github.com/df-boy/T-CorresNet)

  * [AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01714.pdf)

  * [EINet: Point Cloud Completion via Extrapolation and Interpolation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05687.pdf)
:star:[code](https://github.com/corecai163/EINet)

  * [Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06768.pdf)
:star:[code](https://github.com/yun-seo/PPCC)

  * [ProtoComp: Diverse Point Cloud Completion with Controllable Prototype](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06685.pdf)
:star:[code](https://github.com/Yanbo-23/Proto-Comp)

* 点云重建

  * [DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction](https://arxiv.org/abs/2312.03298)
:star:[code](https://github.com/TyraelDLee/DiffPMAE)

* 点云理解

  * [DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding](https://arxiv.org/abs/2407.08801)

* 点云配准

  * [ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency](http://arxiv.org/abs/2407.09862v1)
:star:[code](https://github.com/Laka-3DV/ML-SemReg)

  * [PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training](http://arxiv.org/abs/2407.14054v1)
:star:[code](https://github.com/Chen-Suyi/PointRegGPT)

  * [SemReg: Semantics Constrained Point Cloud Registration](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05759.pdf)
:star:[code](https://github.com/SheldonFung98/SemReg.git)

  * [Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning](http://arxiv.org/abs/2407.20223v1)
:house:[project](https://sites.google.com/view/eccv24-equivalign)

  * [UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11688.pdf)
:star:[code](https://github.com/yuvalH9/UMERegRobust)

  * [PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration](https://arxiv.org/abs/2407.10142)
:star:[code](https://github.com/yaorz97/PARENet)

  * [UMERegRobust -- Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration](http://arxiv.org/abs/2408.12380v1)
:star:[code](https://github.com/yuvalH9/UMERegRobust)

  * [Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration](https://arxiv.org/abs/2410.05729)点云配准

* 点云分割

  * [Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation](http://arxiv.org/abs/2407.12489v1)

  * [HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation](http://arxiv.org/abs/2407.12387v1)
:star:[code](https://github.com/tpzou/HGL)

  * [SegPoint: Segment Any Point Cloud via Large Language Model](http://arxiv.org/abs/2407.13761v1)
:star:[code](https://heshuting555.github.io/SegPoint)

  * [Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation](https://arxiv.org/abs/2408.13752)

  * [Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05346.pdf)
:star:[code](https://github.com/jimtsai23/PseudoEmbed)

  * [Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation](https://www.arxiv.org/abs/2408.10537)
:star:[code](https://github.com/Javion11/PointLiBR.git)

* 点云理解

  * [GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding](https://arxiv.org/abs/2407.13519)
:star:[code](https://github.com/changshuowang/GPSFormer)

* 3D点云

  * [Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds](http://arxiv.org/abs/2407.13342v1)
:star:[code](https://list17.github.io/ImplicitFilter)

  * [CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation](http://arxiv.org/abs/2407.16193v1)
:star:[code](https://github.com/shimazing/CloudFixer)

  * [FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00951.pdf)

  * [RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation](https://arxiv.org/abs/2408.06110)

  * [P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising](http://arxiv.org/abs/2408.16325v1)
:star:[code](https://p2p-bridge.github.io)

  * [Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03785.pdf)

  * [Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04444.pdf)3D 点云攻击

  * [Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06843.pdf)
:star:[code](https://github.com/qpwodlsqp/CSEConv)

  * [Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08282.pdf)



## 30.Anomaly Detection(异常检测)

* [Continuous Memory Representation for Anomaly Detection](https://arxiv.org/abs/2402.18293)
:star:[code](https://github.com/tae-mo/CRAD)

* [Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection](https://arxiv.org/abs/2302.14696)
:star:[code](https://github.com/shijianjian/DIA.git)

* [Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11002.pdf)
:star:[code](https://github.com/gaobb/AnoGen)

* [GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features](http://arxiv.org/abs/2407.12427v1)
:star:[code](https://github.com/LucStrater/GeneralAD)

* [Learning Diffusion Models for Multi-View Anomaly Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04907.pdf)

* [Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection](https://arxiv.org/abs/2403.13349)
:star:[code](https://github.com/xcyao00/HGAD)

* [TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection](https://arxiv.org/abs/2311.09999)
:star:[code](https://github.com/MaticFuc/ECCV_TransFusion)

* [Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07868.pdf)
:star:[code](https://github.com/DeclanMcIntosh/Online_InReaCh)

* [MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11465.pdf)
:star:[code](https://github.com/TheStarOfMSY/MoEAD)

* 缺陷检测

  * [An Incremental Unified Framework for Small Defect Inspection](https://arxiv.org/abs/2312.08917)
:star:[code](https://github.com/jqtangust/IUF)

* 故障检测

  * [DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation](http://arxiv.org/abs/2408.00331v1)
:star:[code](https://github.com/kowshikthopalli/DECIDER/)

* 3D异常检测

  * [R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection](http://arxiv.org/abs/2407.10862v1)

* 工业异常检测

  * [Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection](https://arxiv.org/abs/2401.03145)

  * [A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization](https://arxiv.org/abs/2407.09359)
:star:[code](https://github.com/cqylunlun/GLASS)

  * [GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection](https://arxiv.org/abs/2406.07487)
:star:[code](https://github.com/hyao1/GLAD)

  * [AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08661.pdf)

  * [Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08462.pdf)

* 零样本异常检测

  * [AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection](http://arxiv.org/abs/2407.15795v1)
:star:[code](https://github.com/caoyunkang/AdaCLIP)

* 多类异常检测

  * [Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection](https://arxiv.org/abs/2403.11561)

* OOD

  * [Gradient-Regularized Out-of-Distribution Detection](https://export.arxiv.org/abs/2404.12368)

  * [SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning](https://arxiv.org/abs/2407.03036)

  * [PixOOD: Pixel-Level Out-of-Distribution Detection](https://arxiv.org/abs/2405.19882)
:star:[code](https://github.com/vojirt/PixOOD)

  * [An Information Theoretical View for Out-Of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07242.pdf)

  * [Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection](http://arxiv.org/abs/2407.04022v1)

  * [LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models](http://arxiv.org/abs/2407.08966v1)
:star:[code](https://github.com/YBZh/LAPT)

  * [ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection](http://arxiv.org/abs/2407.11735v1)
:star:[code](https://github.com/walline/prosub)

  * [Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond](http://arxiv.org/abs/2407.15739v1)
:star:[code](https://ade-ood.github.io/)

  * [Can Your Generative Model Detect Out-of-Distribution Covariate Shift?](http://arxiv.org/abs/2409.03043v1)

  * [Gradient-based Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02138.pdf)

  * [Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11399.pdf)

  * [TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09304.pdf)
:star:[code](https://github.com/XixiLiu95/TAG)

* 异常值检测

  * [Rethinking Unsupervised Outlier Detection via Multiple Thresholding](https://arxiv.org/abs/2407.05382)无监督异常值检测

* 零样本异常分割

  * [VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation](https://arxiv.org/abs/2407.12276)
:star:[code](https://github.com/xiaozhen228/VCP-CLIP)



## 29.Semi/self-supervised learning(半/自监督)

* [SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers](https://arxiv.org/abs/2407.06305)
:house:[project](https://mingrui-zhao.github.io/SweepNet/)

* [Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning](https://arxiv.org/abs/2403.10252)
:star:[code](https://github.com/HereNowL/Region-aware-Distribution-Contrast)

* 自监督

  * [CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning](http://arxiv.org/abs/2407.12188v1)
:star:[code](https://github.com/ErumMushtaq/CroMo-Mixup)

  * [HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion](http://arxiv.org/abs/2407.05638v1)
:star:[code](https://github.com/Zeudfish/HPFF)

  * [SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning](http://arxiv.org/abs/2407.08148v1)
:star:[code](https://github.com/RM-Zhang/SCPNet)

  * [Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing](http://arxiv.org/abs/2407.11168v1)

  * [OmniSat: Self-Supervised Modality Fusion for Earth Observation](https://arxiv.org/pdf/2404.08351)
:star:[code](https://github.com/gastruc/OmniSat)
:house:[project](https://gastruc.github.io/projects/omnisat.html)
:sunflower:[dataset](https://huggingface.co/datasets/IGNF/PASTIS-HD)

  * [FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning](https://arxiv.org/abs/2310.02903)

  * [Self-supervised visual learning from interactions with objects](https://arxiv.org/abs/2407.06704)

  * [Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense](https://arxiv.org/abs/2409.08509)

  * [GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning](https://arxiv.org/abs/2403.12003)
:star:[code](https://github.com/xiaojieli0903/genview)

  * [On Pretraining Data Diversity for Self-Supervised Learning](https://arxiv.org/abs/2403.13808)
:star:[code](https://github.com/hammoudhasan/DiversitySSL)

  * [Decoupling Common and Unique Representations for Multimodal Self-supervised Learning](https://arxiv.org/abs/2309.05300)
:star:[code](https://github.com/zhu-xlab/DeCUR)

  * [POA: Pre-training Once for Models of All Sizes](http://arxiv.org/abs/2408.01031v1)
:star:[code](https://github.com/Qichuzyy/POA)

  * [ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders](https://arxiv.org/abs/2303.12001)自监督表示学习

  * [Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization](https://arxiv.org/abs/2403.14973)
:house:[project](https://pwang.pw/trajSSL/)自监督学习

  * [SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning](https://arxiv.org/abs/2303.09079)
:star:[code](https://github.com/ucf-ml-research/ssl-cleanse)

* 半监督

  * [Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning](https://arxiv.org/abs/2408.12614)  

  * [Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data](http://arxiv.org/abs/2409.13977v1)
:star:[code](https://github.com/snehaputul/AllMatch)

  * [SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning](http://arxiv.org/abs/2409.17512v1)
:star:[code](https://github.com/komejisatori/SCOMatch)

  * [ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11377.pdf)

  * [Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03287.pdf)半监督学习

  * [Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning](https://arxiv.org/abs/2407.15837)
:star:[code](https://github.com/yibingwei-1/LatentMIM)

  * [Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration](https://arxiv.org/abs/2306.04621)
:star:[code](https://github.com/emasa/ADELLO-LTSSL)

 



## 28.Novel Class Discovery(新类发现)

* [Self-Cooperation Knowledge Distillation for Novel Class Discovery](http://arxiv.org/abs/2407.01930v1)



## 27.GNN/GCN

* [GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition](https://arxiv.org/abs/2308.14378)
:star:[code](https://github.com/jin-s13/GKGNet)GNN

* [Graph Neural Network Causal Explanation via Neural Causal Models](https://arxiv.org/abs/2407.09378)
:star:[code](https://github.com/ArmanBehnam/CXGNN)

* [On the Topology Awareness and Generalization Performance of Graph Neural Networks](https://arxiv.org/abs/2403.04482)

* [Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12325.pdf)



## 26.NAS

* [Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00668.pdf)
:star:[code](https://github.com/lliai/Auto-GAS)

* [Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00676.pdf)
:star:[code](https://github.com/lliai/Auto-DAS)蒸馏感

* [SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference](https://arxiv.org/abs/2301.10879)

* [Dependency-aware Differentiable Neural Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07216.pdf)



## 25.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)

* [DεpS: Delayed ε-Shrinking for Faster Once-For-All Training](http://arxiv.org/abs/2407.06167v1)

* 模型压缩

  * [Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07761.pdf)

* 剪枝

  * [Non-transferable Pruning](https://arxiv.org/abs/2410.08015)

  * [Straightforward Layer-wise Pruning for More Efficient Visual Adaptation](http://arxiv.org/abs/2407.14330v1)

  * [Isomorphic Pruning for Vision Models](https://arxiv.org/abs/2407.04616)
:star:[code](https://github.com/VainF/Isomorphic-Pruning)

  * [LPViT: Low-Power Semi-structured Pruning for Vision Transformers](https://arxiv.org/abs/2407.02068)

  * [PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference](https://arxiv.org/abs/2403.16020)
:star:[code](https://github.com/tanvir-utexas/PaPr)剪

  * [Enhanced Sparsification via Stimulative Training](https://arxiv.org/abs/2403.06417)
:star:[code](https://github.com/tsj-001/STP)

  * [SNP: Structured Neuron-level Pruning to Preserve Attention Scores](https://arxiv.org/abs/2404.11630)
:star:[code](https://github.com/Nota-NetsPresso/SNP)

* 量化

  * [GenQ: Quantization in Low Data Regimes with Generative Synthetic Data](https://arxiv.org/abs/2312.05272v2)
:star:[code](https://github.com/Intelligent-Computing-Lab-Yale/GenQ)

  * [MetaAug: Meta-Data Augmentation for Post-Training Quantization](http://arxiv.org/abs/2407.14726v1)

  * [Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients](http://arxiv.org/abs/2407.12637v1)

  * [CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs](http://arxiv.org/abs/2407.05266v1)
:star:[code](https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git)

  * [AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer](http://arxiv.org/abs/2407.12951v1)
:star:[code](https://github.com/GoatWu/AdaLog)

  * [POCA: Post-training Quantization with Temporal Alignment for Codec Avatars](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05670.pdf)
:house:[project](https://mengjian0502.github.io/poca.github.io/)量化

* KD

  * [Simple Unsupervised Knowledge Distillation With Space Similarity](https://arxiv.org/abs/2409.13939)知识蒸馏

  * [Direct Distillation between Different Domains](https://arxiv.org/abs/2401.06826)KD

  * [Harmonizing knowledge Transfer in Neural Network with Unified Distillation](https://arxiv.org/abs/2409.18565)

  * [Good Teachers Explain: Explanation-Enhanced Knowledge Distillation](https://arxiv.org/abs/2402.03119)

  * [The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers](https://arxiv.org/abs/2302.10494)

  * [Improving Knowledge Distillation via Regularizing Feature Direction and Norm](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03432.pdf)

  * [Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00499.pdf)蒸馏

  * [Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation](https://arxiv.org/abs/2407.03056)
:star:[code](https://github.com/miccunifi/KDPL)

  * [UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation](https://arxiv.org/abs/2212.10950)
:star:[code](https://github.com/dreamguo/UNIKD)

  * [BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation](https://arxiv.org/abs/2407.09083)

  * [Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation](https://arxiv.org/abs/2405.11614)

  * [How to Train the Teacher Model for Effective Knowledge Distillation](https://arxiv.org/abs/2407.18041)

  * [Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12478.pdf)



## 24.Vision Transformer

* [Spline-based Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11525.pdf)

* [Denoising Vision Transformers](https://arxiv.org/abs/2401.02957)

* [FairViT: Fair Vision Transformer via Adaptive Masking](http://arxiv.org/abs/2407.14799v1)

* [Rotary Position Embedding for Vision Transformer](https://arxiv.org/abs/2403.13298)
:star:[code](https://github.com/naver-ai/rope-vit)

* [Bidirectional Progressive Transformer for Interaction Intention Anticipation](https://arxiv.org/abs/2405.05552)

* [Robustness Tokens: Towards Adversarial Robustness of Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07642.pdf)

* [SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization](https://arxiv.org/abs/2402.03317)
:star:[code](https://github.com/microsoft/robustlearn)

* [PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers](http://arxiv.org/abs/2407.04538v1)

* [OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction](http://arxiv.org/abs/2407.13335v1)
:star:[code](https://github.com/HKUST-NISL/oat_eccv24)

* [AugDETR: Improving Multi-scale Learning for Detection Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03484.pdf)Transformer

* [AttnZero: Efficient Attention Discovery for Vision Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00666.pdf)
:star:[code](https://github.com/lliai/AttnZero)

* [SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02019.pdf)
:star:[code](https://github.com/Euphoria16/SpatialFormer)

* [Efficient Vision Transformers with Partial Attention](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11047.pdf)

* [SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers](https://arxiv.org/abs/2401.08740)
:star:[code](https://github.com/willisma/SiT)

* [Stitched ViTs are Flexible Vision Backbones](https://arxiv.org/abs/2307.00154)
:star:[code](https://github.com/ziplab/SN-Netv2)

* [Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02429.pdf)

* [Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00861.pdf)
:star:[code](https://github.com/bianlab/Specformer)

* [GiT: Towards Generalist Vision Transformer through Universal Language Interface](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04158.pdf)
:star:[code](https://github.com/Haiyang-W/GiT)

* [An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06958.pdf)

* [Fairness-aware Vision Transformer via Debiased Self-Attention](https://arxiv.org/abs/2301.13803)
:star:[code](https://github.com/qiangyao1988/DSA)

* [ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention](https://arxiv.org/abs/2401.00912)
:star:[code](https://github.com/skyhehe123/ScatterFormer)

* [LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors](https://arxiv.org/abs/2403.14625)
:house:[project](https://www.cs.umd.edu/~sakshams/LiFT/)

* [Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach](http://arxiv.org/abs/2407.06964v1)
:house:[project](https://synqt.github.io/)

* [LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer](https://arxiv.org/abs/2212.09877)
:star:[code](https://github.com/salesforce/LayoutDETR)

* [Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators](https://arxiv.org/abs/2408.05710)
:star:[code](https://github.com/LeapLabTHU/Attention-Mediators)

* [BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos](https://arxiv.org/abs/2312.00083)
:star:[code](https://github.com/Pilhyeon/BAM-DETR)

* [An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding](http://arxiv.org/abs/2408.01120v1)
:star:[code](https://github.com/chenwei746/EEVG)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/52cv/eccv-2024-papers

Awesome Lists containing this project

README