https://github.com/52cv/eccv-2024-papers
https://github.com/52cv/eccv-2024-papers
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/52cv/eccv-2024-papers
- Owner: 52CV
- Created: 2024-03-19T08:15:40.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-09T02:33:30.000Z (over 1 year ago)
- Last Synced: 2025-06-30T11:02:01.858Z (about 1 year ago)
- Size: 578 KB
- Stars: 101
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ECCV-2024-Papers

## 官网链接:https://eccv.ecva.net/
### 主会 :bell::9 月 29 日(周日)至 10 月 4 日
## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~
## 2025 年论文分类汇总戳这里
↘️[WACV-2025-Papers](https://github.com/52CV/WACV-2025-Papers)
↘️[CVPR-2025-Papers](https://github.com/52CV/CVPR-2025-Papers)
## 2024 年论文分类汇总戳这里
↘️[WACV-2024-Papers](https://github.com/52CV/WACV-2024-Papers)
↘️[CVPR-2024-Papers](https://github.com/52CV/CVPR-2024-Papers)
↘️[ECCV-2024-Papers](https://github.com/52CV/ECCV-2024-Papers)
## [2022 年论文分类汇总戳这里](#0000)
## [2022 年论文分类汇总戳这里](#000)
## [2021 年论文分类汇总戳这里](#00)
## [2020 年论文分类汇总戳这里](#0)
## 💥💥💥全部论文已分类完毕
:thumbsup:[ECCV 2024奖项公布,哥大摘得最佳论文奖桂冠](https://mp.weixin.qq.com/s/2uFlMQUW1TVrNOIC01U8Pg)
## 🏆Best Paper Award(最佳论文奖)
* [Minimalist Vision with Freeform Pixels](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08113.pdf)
:house:[project](https://cave.cs.columbia.edu/projects/categories/project?cid=Computational+Imaging&pid=Minimalist+Vision+with+Freeform+Pixels)
## 🏅Best Paper Honorable Mention(最佳论文荣誉提名奖)
* [Rasterized Edge Gradients: Handling Discontinuities Differentiably](https://arxiv.org/abs/2405.02508)
* [Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models](https://arxiv.org/abs/2404.13706)
:house:[project](https://cs-people.bu.edu/vpetsiuk/arc/)
## 目录
|:cat:|:dog:|:tiger:|:wolf:|
|------|------|------|------|
|[1.Other(其它)](#1)|[2.3D Visual](#2)|[3.Face(人脸)](#3)|[4.Pose(姿态估计)](#4)|
|[5.OCR](#5)|[6.Object Tracking(目标跟踪)](#6)|[7.Object Detection(目标检测)](#7)|[8.Super-Resolution(超分辨率)](#8)|
|[9.Image Progress(图像/视频处理)](#9)|[10.Image Classification(图像分类)](#10)|[11.Image Segmentation(图像分割)](#11)|[12.Image Retrieval(图像检索)](#12)|
|[13.Image/video Compression(图像/视频压缩)](#13)|[14.Image Captioning(图像/视频字幕)](#14)|[15.GAN/Image Synthesis(图像生成)](#15)|[16.Medical Image Progress(医学影响处理)](#16)|
|[17.Video](#17)|[18.Automated Driving(自动驾驶)](#18)|[19.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)](#19)|[20.Scene ](#20)|
|[21.Vision-Language(视觉语言)](#21)|[22.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)](#22)|[23.Machine Learning(机器学习)](#23)|[24.Vision Transformer](#24)|
|[25.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)](#25)|[26.NAS](#26)|[27.GNN/GCN](#27)|[28.Novel Class Discovery(新类发现)](#28)|
|[29.Semi/self-supervised learning(半/自监督)](#29)|[30.Anomaly Detection(异常检测)](#30)|[31.Point Clouds(点云)](#31)|[32.Person Re-Identification(人员重识别)](#32)|
|[33.Motion Generation(人体运动生成)](#33)|[34.Visual Question Answering(视觉问答)](#34)|[35.Action Detection(动作检测)](#35)|[36.Gaze Estimation](#36)|
|[37.Style Transfer(风格迁移)](#37)|[38.Human-Object Interaction(人机交互)](#38)|[39.Robots(机器人)](#39)|[40.Object Pose Estimation(物体姿态估计)](#40)|
|[41.Biomedical(生物特征识别)](#41)|[42.Optical Flow Estimation(光流估计)](#42)|[43.Sound](#43)|[44.Dataset/Benchmark(数据集/基准)](#44)|
|[45.Neural Radiance Fields](#45)|[46.Rendering(渲染)](#46)|[47.Animal](#47)|[48.Computer Graphics(计算机图形学)](#48)|
|[49.Light-Field(光场)](#49)|[50.Sketches(草图)](#50)|[51.Feature Matching ](#51)|[52.Visual Entity Recognition(视觉实体识别)](#52)|
|[53.Keypoint Detection(关键点检测)](#53)|[54.Deepfake Detection](#54)|[55.Information Security(信息安全)](#55)|[56.Dense Prediction(密集预测)](#56)|
|[57.Visual Relationship Detection(视觉关系检测)](#57)|[58.全家桶](#58)|
## 58.全家桶
* [X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06140.pdf)
:star:[code](https://github.com/salesforce/LAVIS/tree/main/projects/xinstructbl)
## 57.Visual Relationship Detection(视觉关系检测)
* [Visual Relationship Transformation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08217.pdf)
* [Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection](https://arxiv.org/abs/2403.14270)
## 56.Dense Prediction(密集预测)
* [Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild](https://arxiv.org/abs/2404.18459)(https://github.com/GitGyun/chameleon)密集视觉预测
* [Unsupervised Dense Prediction using Differentiable Normalized Cuts](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05675.pdf)
* [Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05837.pdf)
* [Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09133.pdf)
:star:[code](https://github.com/MilknoCandy/Token-Adapter)
## 55.Information Security(信息安全)
* 版权保护
* [Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03084.pdf)
:star:[code](https://github.com/jjh6297/UndercoverBias)保护数据集版权
* 图像水印
* [Certifiably Robust Image Watermark](http://arxiv.org/abs/2407.04086v1)
:star:[code](https://github.com/zhengyuan-jiang/Watermark-Library)
* [A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05695.pdf)图像水印
* [Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data](https://arxiv.org/abs/2403.10663)
:star:[code](https://github.com/liyuxuan-github/MAT)
* [A Watermark-Conditioned Diffusion Model for IP Protection](https://arxiv.org/abs/2403.10893)
:star:[code](https://github.com/rmin2000/WaDiff)
* [A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08419.pdf)
* [LaWa: Using Latent Space for In-Generation Image Watermarking](https://arxiv.org/abs/2408.05868)
## 54.Deepfake Detection
* [Real Appearance Modeling for More General Deepfake Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06913.pdf)
* [Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities](http://arxiv.org/abs/2407.20337v1)
:star:[code](https://github.com/aimagelab/CoDE)
* [Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection](http://arxiv.org/abs/2409.14444v1)
* [Common Sense Reasoning for Deep Fake Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12295.pdf)
:star:[code](https://github.com/Reality-Defender/Research-DD-VQA)
* 图像伪造检测和定位
* [Noise-assisted Prompt Learning for Image Forgery Detection and Localization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01688.pdf)
* [AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06023.pdf)
:star:[code](https://github.com/LMIAPC/AdaIFL)
* 文档图像篡改检测
* [Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04834.pdf)
:thumbsup:[文档图像篡改检测 (DITD) 方法——特征融合与分解网络 (FFDN)](https://std.xmu.edu.cn/2024/0710/c4739a488273/page.htm)
* 合成图像检测
* [Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection](https://arxiv.org/abs/2402.19091)
:star:[code](https://github.com/mever-team/rine)
## 53.Keypoint Detection(关键点检测)
* [OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection](https://arxiv.org/abs/2409.19899)
:star:[code](https://github.com/AlanLuSun/OpenKD)
* [KeypointDETR: An End-to-End 3D Keypoint Detector](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09481.pdf)
:star:[code](github.com/bibi547/KeypointDETR)
## 52.Visual Entity Recognition(视觉实体识别)
* [Grounding Language Models for Visual Entity Recognition](https://arxiv.org/abs/2402.18695)视觉实体识别
## 51.Feature Matching
* [Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching](http://arxiv.org/abs/2407.07789v1)
* 图像匹配
* [CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching](https://arxiv.org/abs/2404.16972)
:star:[code](https://github.com/Samia067/CriSp)
## 50.Sketches(草图)
* [Do Generalised Classifiers really work on Human Drawn Sketches?](http://arxiv.org/abs/2407.03893v1)
## 49.Light-Field(光场)
* [Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation](http://arxiv.org/abs/2407.08149v1)
* 相机重定位
* [Differentiable Product Quantization for Memory Efficient Camera Relocalization](http://arxiv.org/abs/2407.15540v1)
## 48.Computer Graphics(计算机图形学)
* 高动态范围成像
* [SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging](http://arxiv.org/abs/2407.16308v1)
:star:[code](https://github.com/ltkong218/SAFNet)
## 47.Animal
* [Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos](https://arxiv.org/abs/2403.17103)
:house:[project](https://remysabathier.github.io/animalavatar.github.io)
* [Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos](https://arxiv.org/abs/2312.13604)
:house:[project](https://keqiangsun.github.io/projects/ponymation)3D动物运动
* [Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification](https://arxiv.org/abs/2410.06977)
:star:[code](https://github.com/JigglypuffStitch/AdaFreq.git)
## 46.Rendering(渲染)
* [City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web](https://arxiv.org/abs/2312.16457)
:star:[code](https://github.com/USTC3DV/MERFStudio)
:house:[project](https://ustc3dv.github.io/City-on-Web/)
* [A Probability-guided Sampler for Neural Implicit Surface Rendering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05407.pdf)
:house:[project](https://merl.com/research/highlights/ps-neus)渲染
* [TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering](https://arxiv.org/abs/2311.16465)
:house:[project](https://aka.ms/textdiffuser-2)
* [AnyLens: A Generative Diffusion Model with Any Rendering Lens](https://arxiv.org/abs/2311.17609)(https://anylens-diffusion.github.io/)
* [CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians](http://arxiv.org/abs/2404.01133)
:star:[code](https://github.com/DekuLiuTesla/CityGaussian)
:house:[project](https://dekuliutesla.github.io/citygs/)
* [METACAP: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering](https://arxiv.org/pdf/2403.18820.pdf)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/MetaCap/)
* [GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views](http://arxiv.org/abs/2407.08221v1)
* [MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References](http://arxiv.org/abs/2407.13745v1)
:star:[code](https://boelukas.github.io/mariner/)
* [Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors](http://arxiv.org/abs/2407.16396v1)
:star:[code](https://wen-yuan-zhang.github.io/VolumeRenderingPriors/)
* [CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering](https://arxiv.org/abs/2311.15510)
:house:[project](https://haidongz-usc.github.io/project/caesarnerf)
* [IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination](https://arxiv.org/abs/2404.11593)
:star:[code](https://github.com/zju3dv/IntrinsicAnything)渲染
* [Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering](https://arxiv.org/abs/2408.09702)
:house:[project](https://research.nvidia.com/labs/toronto-ai/DiPIR/)
* [VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03032.pdf)神经渲染
* [UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation](http://arxiv.org/abs/2407.19542v1)
:star:[code](https://github.com/freemantom/UniVoxel)
* [Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering](http://arxiv.org/abs/2408.09702v1)
:house:[project](https://research.nvidia.com/labs/toronto-ai/DiPIR/)
* [GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering](https://arxiv.org/abs/2403.11324)
:star:[code](https://github.com/yanyan-li/GeoGaussian)场景渲染
* [GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer](https://arxiv.org/abs/2410.00672)
:star:[code](https://github.com/yh-yoon/GMT)
* [Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering](https://arxiv.org/abs/2407.10389)
## 45.Neural Radiance Fields
* [Invertible Neural Warp for NeRF](http://arxiv.org/abs/2407.12354v1)
:star:[code](https://sfchng.github.io/ineurowarping-github.io/)
* [VF-NeRF: Viewshed Fields for Rigid NeRF Registration](https://arxiv.org/abs/2404.03349)
* [NeRF-XL: NeRF at Any Scale with Multi-GPU](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06424.pdf)
:house:[project](https://research.nvidia.com/labs/toronto-ai/nerfxl/)
* [Regularizing Dynamic Radiance Fields with Kinematic Fields](http://arxiv.org/abs/2407.14059v1)
* [KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter](http://arxiv.org/abs/2407.13185v1)
:star:[code](https://github.com/Yifever20002/KFD-NeRF)
* [Dynamic Neural Radiance Field From Defocused Monocular Video](http://arxiv.org/abs/2407.05586v1)
* [Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering](https://arxiv.org/abs/2409.05867)
:house:[project](https://benattal.github.io/flash-cache/)
* [Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model](http://arxiv.org/abs/2407.07735v1)
:house:[project](https://qsong2001.github.io/NeRFProtector)
* [GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01453.pdf)
:star:[code](https://github.com/kevinhuangxf/GeometrySticker)
:house:[project](https://kevinhuangxf.github.io/GeometrySticker/)
* [Efficient NeRF Optimization - Not All Samples Remain Equally Hard](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05300.pdf)
* [MeshFeat: Multi-Resolution Features for Neural Fields on Meshes](http://arxiv.org/abs/2407.13592v1)
:house:[project](https://maharajamihir.github.io/MeshFeat/)
* [DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images](https://arxiv.org/abs/2403.13199)
:house:[project](https://zaidtas.github.io/decentnerfs/index.html)
* [TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks](http://arxiv.org/abs/2408.10739v1)
:star:[code](https://tracknerf.github.io/)
* [BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream](http://arxiv.org/abs/2407.02174v1)
:star:[code](https://github.com/WU-CVGL/BeNeRF)
* [TriNeRFLet: A Wavelet Based Multiscale Triplane NeRF Representation](https://arxiv.org/abs/2401.06191)
:house:[project](https://rajaeekh.github.io/trinerflet-web)
* [RS-NeRF: Neural Radiance Fields from Rolling Shutter Images](http://arxiv.org/abs/2407.10267v1)
:star:[code](https://github.com/MyNiuuu/RS-NeRF)
* [Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling](http://arxiv.org/abs/2407.11962v1)
:star:[code](https://github.com/stevejaehyeok/MoCo-NeRF)
:house:[project](https://stevejaehyeok.github.io/publications/moco-nerf)
* [RaFE: Generative Radiance Fields Restoration](https://arxiv.org/abs/2404.03654)
:house:[project](https://zkaiwu.github.io/RaFE-Project/)
* [Few-shot NeRF by Adaptive Rendering Loss Regularization](https://arxiv.org/abs/2410.17839)
:star:[code](https://github.com/GhiXu/AR-NeRF)
* [Depth-guided NeRF Training via Earth Mover’s Distance](https://arxiv.org/abs/2403.13206)
* [DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields](https://arxiv.org/abs/2311.12063)
:star:[code](https://ychgoaround.github.io/projects/DatasetNeRF/)
* [Flowed Time of Flight Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07941.pdf)
* [Volumetric Rendering with Baked Quadrature Fields](https://arxiv.org/abs/2312.02202)
* [BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream](https://arxiv.org/abs/2407.02174)
:star:[code](https://github.com/wu-cvgl/BeNeRF)
* [Taming Latent Diffusion Model for Neural Radiance Field Inpainting](https://arxiv.org/abs/2404.09995)
:house:[project](https://hubert0527.github.io/MALD-NeRF)
* [Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation](https://arxiv.org/abs/2403.19319)
:house:[project](https://terencecyj.github.io/projects/Mesh2NeRF/)
🤗[huggingface](https://huggingface.co/papers/2403.19319)
* [SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields](https://www.arxiv.org/abs/2408.06697)
:house:[project](https://slotlifter.github.io/)
* [FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02130.pdf)
:star:[code](https://github.com/JiangWenPL/FisherRF)
* [DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07243.pdf)NeRF
* [Single-Mask Inpainting for Voxel-based Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07404.pdf)
* [Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization](https://arxiv.org/abs/2410.19483)
:star:[code](https://github.com/WeihangLiu2024/Content_Aware_NeRF)
* [Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering](https://arxiv.org/abs/2403.14554)
:house:[project](https://anttwo.github.io/frosting/)
* [Physically Plausible Color Correction for Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06042.pdf)
* [Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions](https://arxiv.org/abs/2403.14053)NeRF
* [PointNeRF++: A multi-scale, point-based Neural Radiance Field](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05521.pdf)
:house:[project](https://pointnerfpp.github.io/)
* [Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields](https://arxiv.org/abs/2403.11131)
* [High-Fidelity and Transferable NeRF Editing by Frequency Decomposition](https://arxiv.org/abs/2404.02514)
:house:[project](https://aigc3d.github.io/freditor)
* [TriNeRFLet: A Wavelet Based Triplane NeRF Representation](https://arxiv.org/abs/2401.06191)
:house:[project](https://rajaeekh.github.io/trinerflet-web)
* [Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction](https://arxiv.org/abs/2305.15171)
:house:[project](https://xinhangliu.com/deceptive-nerf-3dgs)
* [G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03259.pdf)
* [NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields](https://arxiv.org/abs/2404.01300)
:house:[project](https://nerf-mae.github.io/)
* 新视图合成
* [Fast View Synthesis of Casual Videos](https://arxiv.org/abs/2312.02135)
:house:[project](https://casual-fvs.github.io/)
* [PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis](https://arxiv.org/abs/2402.17986)
:house:[project](https://yorkucvil.github.io/PolyOculus-NVS/)
* [RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields](https://arxiv.org/abs/2312.03357)
* [Structured-NeRF: Hierarchical Scene Graph with Neural Representation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05154.pdf)
* [URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields](https://arxiv.org/abs/2403.10119)
* [A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis](https://arxiv.org/abs/2311.12897)
:star:[code](https://github.com/raven38/EfficientDynamic3DGaussian/)
:house:[project](https://compactdynamic3dgaussian.github.io/)
* [High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00368.pdf)
:star:[code](https://github.com/XrKang/DL-GS)
* [Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07158.pdf)
:star:[code](https://github.com/Yukun66/MemE)
* [NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image](https://arxiv.org/abs/2312.07315)
:star:[code](https://github.com/kakaobrain/nvs-adapter)
* [FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting](https://arxiv.org/abs/2312.00451)
:star:[code](https://github.com/VITA-Group/FSGS)
* [Fast View Synthesis of Casual Videos with Soup-of-Planes](https://arxiv.org/abs/2312.02135)
:house:[project](https://casual-fvs.github.io/)
* [CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians](https://arxiv.org/abs/2403.19495)
:house:[project](https://people.engr.tamu.edu/nimak/Papers/CoherentGS)
* [MegaScenes: Scene-Level View Synthesis at Scale](https://arxiv.org/abs/2406.11819)
:star:[code](https://github.com/MegaScenes/nvs)
* [Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis](https://arxiv.org/abs/2403.04116)
:star:[code](https://github.com/caiyuanhao1998/X-Gaussian)视图合成
* [NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis](http://arxiv.org/abs/2407.10482v1)
* [Efficient Depth-Guided Urban View Synthesis](http://arxiv.org/abs/2407.12395v1)
:star:[code](https://xdimlab.github.io/EDUS/)
* [Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis](https://arxiv.org/abs/2405.14868)
:star:[code](https://github.com/basilevh/gcd)
* [Generalizable Human Gaussians for Sparse View Synthesis](https://arxiv.org/abs/2407.12777)
:house:[project](https://humansensinglab.github.io/Generalizable-Human-Gaussians/)
* [Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis](http://arxiv.org/abs/2409.08042v1)
:star:[code](https://github.com/mzzcdf/Thermal3DGS)
## 44.Dataset/Benchmark(数据集/基准)
* [FYI: Flip Your Images for Dataset Distillation](http://arxiv.org/abs/2407.08113v1)
* [Neural Spectral Decomposition for Dataset Distillation](http://arxiv.org/abs/2408.16236v1)
:star:[code](https://github.com/slyang2021/NSD)
* [Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching](https://arxiv.org/abs/2410.07579)
:star:[code](https://github.com/Lexie-YU/Teddy)
* [Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation](https://arxiv.org/abs/2305.18381)
:star:[code](https://github.com/silicx/GoldFromOres-BiLP)
* [COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark](https://arxiv.org/abs/2408.02272)
* 基准
* [MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models](https://arxiv.org/abs/2311.17600)
:star:[code](https://github.com/isXinLiu/MM-SafetyBench)
* [DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition](http://arxiv.org/abs/2407.05106v1)
:star:[code](https://github.com/QiWang233/DailyDVS-200)
* [Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter](http://arxiv.org/abs/2407.08109v1)
:star:[code](https://github.com/zhang-chenxu/LSM-Adapter)
* [MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes](http://arxiv.org/abs/2407.10121v1)
* [BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events](https://arxiv.org/abs/2410.20451)br>:house:[project](https://www.blinkvision.net/)
* [SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09762.pdf)
:star:[code](https://github.com/aidecentralized/InferenceBenchmark)
* [A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11606.pdf)
:star:[code](https://github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench)
* [BAFFLE: A Baseline of Backpropagation-Free Federated Learning](https://arxiv.org/abs/2301.12195)
:star:[code](https://github.com/FengHZ/BAFFLE)
* [Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking](https://arxiv.org/abs/2406.04316)
* [Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations](https://arxiv.org/abs/2308.16349)
:house:[project](https://affective-visual-dialog.github.io/)
* [UniIR: Training and Benchmarking Universal Multimodal Information Retrievers](https://arxiv.org/abs/2311.17136)
:house:[project](https://tiger-ai-lab.github.io/UniIR/)
* [HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis](http://arxiv.org/abs/2407.16269v1)
* [OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding](https://arxiv.org/abs/2406.07471)
:house:[project](https://minghu0830.github.io/OphNet-benchmark/)
* [PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines](https://arxiv.org/abs/2407.08418)
:star:[code](https://github.com/OpenEarthLab/PredBench)
* [Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach](https://arxiv.org/abs/2408.07500)
:star:[code](https://github.com/FHR-L/VSLA-CLIP)
* [R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations](https://arxiv.org/abs/2403.04924)
:star:[code](https://github.com/lxa9867/r2bench)
* [m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01486.pdf)
:star:[code](https://github.com/RAIVNLab/mms)
🤗[huggingface](https://huggingface.co/datasets/zixianma/mms)
* [PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology](https://arxiv.org/abs/2401.16355)
🤗[huggingface](https://huggingface.co/papers/2401.16355)
* [LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow](http://arxiv.org/abs/2409.05688v1)
:house:[project](https://layeredflow.cs.princeton.edu)
* [HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects](http://arxiv.org/abs/2407.12371v1)
:star:[code](https://lvxintao.github.io/himo)
* [When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset](http://arxiv.org/abs/2407.10125v1)
:star:[code](https://github.com/BubblyYi/MMPedestron)
* 数据集
* [VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models](https://arxiv.org/abs/2311.17404)
:star:[code](https://github.com/lscpku/VITATECS)
* [HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning](http://arxiv.org/abs/2407.15680v1)
:star:[code](https://github.com/google/haloquest)
* [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553)
* [COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08183.pdf)
:sunflower:[dataset](https://github.com/omron-sinicx/com_kitchens)
* [Seeing Faces in Things: A Model and Dataset for Pareidolia](https://arxiv.org/abs/2409.16143)
:sunflower:[dataset](https://aka.ms/faces-in-things)
* [Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08206.pdf)
:sunflower:[dataset](https://github.com/dualtransparency/TCLD)
* [GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns](https://arxiv.org/abs/2405.17609)
:house:[project](https://igl.ethz.ch/projects/GarmentCodeData/)
* [SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03555.pdf)
:sunflower:[dataset](https://github.com/sutdcv/SemTrack)
* [WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing](https://arxiv.org/abs/2402.09430)
:star:[code](https://github.com/huangshk/WiMANS)
* [BugNIST - a Large Volumetric Dataset for Detection under Domain Shift](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04613.pdf)
* [Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics](https://arxiv.org/abs/2310.17316)
:star:[code](https://github.com/EnVision-Research/Defect_Spectrum)
:house:[project](https://envision-research.github.io/Defect_Spectrum/)大规模缺陷数据集
* [Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal](http://arxiv.org/abs/2407.16957v1)
:star:[code](https://github.com/jinyeying/RaindropClarity)
* [PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition](http://arxiv.org/abs/2407.10918v1)
:star:[code](https://github.com/LixiaoTHU/PartImageNetPP)
* [WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding](http://arxiv.org/abs/2407.15350v1)
:star:[code](https://woven-visionai.github.io/wts-dataset-homepage/)
* [MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception](https://arxiv.org/abs/2406.10708)
* [SkyScenes: A Synthetic Dataset for Aerial Scene Understanding](https://arxiv.org/abs/2312.06719)
:house:[project](https://hoffman-group.github.io/SkyScenes/)
* [Caltech Aerial RGB-Thermal Dataset in the Wild](https://arxiv.org/abs/2403.08997)
:star:[code](https://github.com/aerorobotics/caltech-aerial-rgbt-dataset)
* [V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception](https://arxiv.org/abs/2403.16034)
* [H-V2X: A Large Scale Highway Dataset for BEV Perception](https://eccv.ecva.net/virtual/2024/poster/126)
* [PetFace: A Large-Scale Dataset and Benchmark for Animal Identification](http://arxiv.org/abs/2407.13555v1)
:star:[code](https://dahlian00.github.io/PetFacePage/)
* [Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework](http://arxiv.org/abs/2407.08377v1)
* [OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects](http://arxiv.org/abs/2407.08711v1)
:star:[code](https://omninocs.github.io)
* [SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark](https://arxiv.org/abs/2310.20436)
:star:[code](https://github.com/ZhengdiYu/SignAvatars)
:house:[project](https://signavatars.github.io/)
* [Insect Identification in the Wild: The AMI Dataset](https://arxiv.org/abs/2406.12452)
:star:[code](https://github.com/RolnickLab/ami-dataset)野外昆虫识别:AMI 数据集
* [RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception](https://arxiv.org/abs/2405.09883)
:sunflower:[dataset](https://github.com/xiaosu-zhu/RoScenes)
* 数据增强
* [SUMix: Mixup with Semantic and Uncertain Information](http://arxiv.org/abs/2407.07805v1)
:star:[code](https://github.com/JinXins/SUMix)
* [Data Augmentation via Latent Diffusion for Saliency Prediction](http://arxiv.org/abs/2409.07307v1)
* [FreeAugment: Data Augmentation Search Across All Degrees of Freedom](http://arxiv.org/abs/2409.04820v1)
:star:[code](https://tombekor.github.io/FreeAugment-web)
* [Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective](https://arxiv.org/abs/2312.04763)
:star:[code](https://github.com/Noah888/DAR)
## 43.Sound
* [Audio-Synchronized Visual Animation](https://arxiv.org/abs/2403.05659)
:star:[code](https://github.com/lzhangbj/ASVA)
:house:[project](https://lzhangbj.github.io/projects/asva/asva.html)
* [Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation](https://arxiv.org/pdf/2305.03907.pdf)
:house:[project](https://bolinlai.github.io/CSTS-EgoGazeAnticipation/)
* [Label-anticipated Event Disentanglement for Audio-Visual Video Parsing](http://arxiv.org/abs/2407.08126v1)
* [Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity](http://arxiv.org/abs/2407.10387v1)
:star:[code](https://maskvat.github.io)
* [Spherical World-Locking for Audio-Visual Localization in Egocentric Videos](https://arxiv.org/abs/2408.05364)
* [Self-Supervised Audio-Visual Soundscape Stylization](http://arxiv.org/abs/2409.14340v1)
:house:[project](https://tinglok.netlify.app/files/avsoundscape/)
* [CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios](https://arxiv.org/abs/2403.04640)
:star:[code](https://github.com/rikeilong/Bay-CAT)视听场景
* [Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores](https://arxiv.org/abs/2404.07336)
* [Siamese Vision Transformers are Scalable Audio-visual Learners](https://arxiv.org/abs/2403.19638)
:star:[code](https://github.com/GenjiB/AVSiam)视听学习器
* [Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos](https://arxiv.org/abs/2406.09272)
:house:[project](https://vision.cs.utexas.edu/projects/action2sound)生成环境感知的动作声音
* [Audio-visual Generalized Zero-shot Learning the Easy Way](https://arxiv.org/abs/2407.13095)
* 视听分割
* [Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes](http://arxiv.org/abs/2407.10957v1)
:star:[code](https://gewu-lab.github.io/Ref-AVS)
* [Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation](http://arxiv.org/abs/2407.11820v1)
:star:[code](https://gewu-lab.github.io/stepping_stones)
:star:[code](https://gewu-lab.github.io/stepping_stones/)
* [CPM: Class-conditional Prompting Machine for Audio-visual Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01634.pdf)视听分割
## 42.Optical Flow Estimation(光流估计)
* [SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow](https://arxiv.org/abs/2405.14793)
:star:[code](https://github.com/princeton-vl/SEA-RAFT)
## 41.Biomedical(生物特征识别)
* [Open-Set Biometrics: Beyond Good Closed-Set Models](http://arxiv.org/abs/2407.16133v1)
:star:[code](https://github.com/prevso1088/open-set-biometrics)
## 40.Object Pose Estimation(物体姿态估计)
* [SCAPE: A Simple and Strong Category-Agnostic Pose Estimator](http://arxiv.org/abs/2407.13483v1)
:star:[code](https://github.com/tiny-smart/SCAPE)
* [SRPose: Two-view Relative Pose Estimation with Sparse Keypoints](http://arxiv.org/abs/2407.08199v1)
:house:[project](https://frickyinn.github.io/srpose)
* [FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation](http://arxiv.org/abs/2409.16600v1)
:star:[code](github.com/tjy0703/FAFA)
* [A Graph-Based Approach for Category-Agnostic Pose Estimation](https://arxiv.org/abs/2311.17891)
:house:[project](https://orhir.github.io/pose-anything/)
* [GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence](https://arxiv.org/abs/2311.13777)
* [OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation](http://arxiv.org/abs/2408.16547v1)
:star:[code](https://github.com/YC-Che/OP-Align)
* [FoundPose: Unseen Object Pose Estimation with Foundation Features](https://arxiv.org/abs/2311.18809)
:house:[project](http://evinpinar.github.io/foundpose)
* [LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation](http://arxiv.org/abs/2409.15727v1)
:star:[code](https://github.com/lolrudy/LaPose)
* [U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01566.pdf)
* [PACE: Pose Annotations in Cluttered Environments](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06837.pdf)
:star:[code](https://github.com/qq456cvb/PACE)
* 6-DoF
* [An Economic Framework for 6-DoF Grasp Detection](http://arxiv.org/abs/2407.08366v1)
:star:[code](https://github.com/iSEE-Laboratory/EconomicGrasp)
* [Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation](https://arxiv.org/abs/2311.09500)
* [Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance](http://arxiv.org/abs/2407.13842v1)
:star:[code](https://airvlab.github.io/grasp-anything)
* [Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation](https://arxiv.org/abs/2409.18261)
:star:[code](https://github.com/3dtopia/omni6d)
* [6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model](http://arxiv.org/abs/2407.15484v1)
:star:[code](https://mbortolon97.github.io/6dgs/)
* [FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models](https://arxiv.org/abs/2312.00947)
:house:[project](https://andreacaraffa.github.io/freeze/)
* 相机姿态估计
* [ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation](http://arxiv.org/abs/2408.09042v1)
* [Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection](https://arxiv.org/abs/2312.04527)
* 计数
* [AFreeCA: Annotation-Free Counting for All](https://arxiv.org/abs/2403.04943)计数
* [Zero-shot Object Counting with Good Exemplars](https://arxiv.org/abs/2407.04948)
* [ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting](https://arxiv.org/abs/2309.04820)
:star:[code](https://github.com/ActiveVisionLab/ABC123)
:house:[project](https://abc123.active.vision/)计数
* [Class-Agnostic Object Counting with Text-to-Image Diffusion Model](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08663.pdf)
* [Shifted Autoencoders for Point Annotation Restoration in Object Counting](https://arxiv.org/abs/2312.07190)
## 39.Robots(机器人)
* [See and Think: Embodied Agent in Virtual Environment](https://arxiv.org/abs/2311.15209)
:house:[project](https://rese1f.github.io/STEVE/)
* [SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs](https://arxiv.org/abs/2404.00469)
* [V-IRL: Grounding Virtual Intelligence in Real Life](https://arxiv.org/abs/2402.03310)
:star:[code](https://github.com/VIRL-Platform/VIRL)
* 机器人
* [Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation](https://arxiv.org/abs/2401.07487)
:house:[project](https://tea-lab.github.io/Robo-ABC/)
* [Learning Cross-hand Policies of High-DOF Reaching and Grasping](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04377.pdf)机器人
* [DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control](http://arxiv.org/abs/2407.14758v1)
:star:[code](https://github.com/AllenXuuu/DISCO)
* [Real-time Holistic Robot Pose Estimation with Unknown States](https://arxiv.org/abs/2402.05655)
:star:[code](https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation)
* [ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation](https://arxiv.org/abs/2403.08321)
:star:[code](https://github.com/GuanxingLu/ManiGaussian)
:house:[project](https://guanxinglu.github.io/ManiGaussian/)
* [Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts](http://arxiv.org/abs/2407.14872v1)
* [GraspXL: Generating Grasping Motions for Diverse Objects at Scale](https://arxiv.org/pdf/2403.19649.pdf)
:star:[code](https://github.com/zdchan/graspxl)
:house:[project](https://eth-ait.github.io/graspxl/)
* [UGG: Unified Generative Grasping](https://arxiv.org/abs/2311.16917)
:house:[project](https://jiaxin-lu.github.io/ugg/)机器人
* [Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation](http://arxiv.org/abs/2407.14062v1)
:star:[code](https://github.com/florasion/D-VQVAE)
* [Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation](https://arxiv.org/abs/2405.01527)
:house:[project](https://homangab.github.io/track2act/)机器人
* 导航
* [NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models](http://arxiv.org/abs/2407.12366v1)
:star:[code](https://github.com/GengzeZhou/NavGPT-2)
* [Prioritized Semantic Learning for Zero-shot Instance Navigation](https://arxiv.org/abs/2403.11650)
:star:[code](https://github.com/XinyuSun/PSL-InstanceNav)导航
* VPR
* [Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition](https://arxiv.org/abs/2407.02422)
:star:[code](https://github.com/serizba/cliquemining)
* [Navigation Instruction Generation with BEV Perception and Large Language Models](http://arxiv.org/abs/2407.15087v1)
:star:[code](https://github.com/FanScy/BEVInstructor)
* [Revisit Anything: Visual Place Recognition via Image Segment Retrieval](http://arxiv.org/abs/2409.18049v1)
:star:[code](https://github.com/AnyLoc/Revisit-Anything)
* [VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition](https://arxiv.org/abs/2409.19293)
:star:[code](https://github.com/Ahmedest61/VLAD-BuFF/)
* [MeshVPR: Citywide Visual Place Recognition Using 3D Meshes](https://arxiv.org/abs/2406.02776)
:star:[code](https://github.com/gmberton/MeshVPR)
* SLAM
* [Deep Patch Visual SLAM](https://arxiv.org/abs/2408.01654)
:star:[code](https://github.com/princeton-vl/DPVO)
* [RGBD GS-ICP SLAM](https://arxiv.org/abs/2403.12550)
:star:[code](https://github.com/Lab-of-AI-and-Robotics/GS_ICP_SLAM)
* [I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM](https://arxiv.org/abs/2407.11347)
* [Hyperion - A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM](http://arxiv.org/abs/2407.07074v1)
:star:[code](https://github.com/VIS4ROB-lab/hyperion)
* [SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM](https://arxiv.org/abs/2402.03246)
* [LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10364.pdf)
* [I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM](http://arxiv.org/abs/2407.11347v1)
* [Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM](https://arxiv.org/abs/2407.13338)
* [Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11219.pdf)
* [CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field](https://arxiv.org/abs/2403.16095)
:star:[code](https://github.com/hjr37/CG-SLAM)
* Try-On
* [Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models](https://arxiv.org/abs/2403.07371)
* [Improving Virtual Try-On with Garment-focused Diffusion Models](http://arxiv.org/abs/2409.08258v1)
:star:[code](https://github.com/siqi0905/GarDiff/tree/master)
* [Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment](https://arxiv.org/abs/2403.12965)
:star:[code](https://github.com/mengtingchen/wear-any-way-page)
:house:[project](https://mengtingchen.github.io/wear-any-way-page/)
* [Improving Diffusion Models for Authentic Virtual Try-on in the Wild](https://arxiv.org/abs/2403.05139)
:star:[code](https://github.com/yisol/IDM-VTON)
* [D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On](https://arxiv.org/abs/2407.15111)
:star:[code](https://github.com/Jerome-Young/D4-VTON)
* [WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models](https://arxiv.org/abs/2407.10625)
:star:[code](https://github.com/scnuhealthy/video_try_on)
* 交叉地理定位
* [GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers](http://arxiv.org/abs/2408.02840v1)
:star:[code](https://github.com/manupillai308/GAReT)
* [Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network](https://arxiv.org/abs/2408.05475)
:star:[code](https://github.com/yejy53/EP-BEV)
* [ConGeo: Robust Cross-view Geo-localization across Ground View Variations](https://arxiv.org/abs/2403.13965)
:star:[code](https://github.com/eceo-epfl/ConGeo)
:house:[project](https://eceo-epfl.github.io/ConGeo/)交叉视角地理定位
* [Benchmarking the Robustness of Cross-view Geo-localization Models](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11762.pdf)
* [CityGuessr: City-Level Video Geo-Localization on a Global Scale](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08031.pdf)
* 地理定位
* [Statewide Visual Geolocalization in the Wild](https://arxiv.org/abs/2409.16763)
:star:[code](https://github.com/fferflo/statewide-visual-geolocalization)
* Avatars(虚拟人)
* [CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images](http://arxiv.org/abs/2407.04345v1)
:star:[code](https://github.com/jsshin98/CanonicalFusion)
* [RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models](http://arxiv.org/abs/2407.06938v1)
:star:[code](https://rodinhd.github.io/)
* [MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos](https://arxiv.org/abs/2407.08414)
:star:[code](https://github.com/shad0wta9/meshavatar)
* [PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations](https://arxiv.org/abs/2404.04421)
:house:[project](https://qingqing-zhao.github.io/PhysAvatar)
* [iHuman: Instant Animatable Digital Humans From Monocular Videos](http://arxiv.org/abs/2407.11174v1)
* [PAV: Personalized Head Avatar from Unstructured Video Collection](https://arxiv.org/abs/2407.21047)
:house:[project](https://akincaliskan3d.github.io/PAV)
* [Disentangled Clothed Avatar Generation from Text Descriptions](https://arxiv.org/abs/2312.05295)
:house:[project](https://shanemankiw.github.io/SO-SMPL/)服装头像生成
* [MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08299.pdf)
:house:[project](https://syntec-research.github.io/MagicMirror/)
* [3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03433.pdf)
* [FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10720.pdf)
:star:[code](https://github.com/humansensinglab/FAMOUS)3D 人体数字化
* [Instant 3D Human Avatar Generation using Image Diffusion Models](https://arxiv.org/abs/2406.07516)
:house:[project](https://www.nikoskolot.com/avatarpopup/)
* [Let the Avatar Talk using Texts without Paired Training Data](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12305.pdf)
* VR
* [EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10261.pdf)
## 38.Human-Object Interaction(人机交互)
* [Controllable Human-Object Interaction Synthesis](https://arxiv.org/pdf/2312.03913.pdf)
:house:[project](https://lijiaman.github.io/projects/chois/)
* [F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions](http://arxiv.org/abs/2407.12435v1)
* [Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04769.pdf)
:star:[code](https://github.com/southnx/IcH-Vid-HOI)
* [Look Hear: Gaze Prediction for Speech-directed Human Attention](http://arxiv.org/abs/2407.19605v1)
:star:[code](https://github.com/cvlab-stonybrook/ART)
* [Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model](http://arxiv.org/abs/2408.01044v1)
:star:[code](https://github.com/jinyang06/SamGOP)
* [Revisit Human-Scene Interaction via Space Occupancy](https://arxiv.org/abs/2312.02700)
:house:[project](https://foruck.github.io/occu-page/)人机交互
* [Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection](https://arxiv.org/abs/2408.02484)
:star:[code](https://github.com/ltttpku/CMMP)
* [AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation](https://arxiv.org/abs/2406.01194)
* 手-物
* [NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model](http://arxiv.org/abs/2407.12727v1)
* [Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics](http://arxiv.org/abs/2409.04033v1)
:star:[code](https://hograspnet2024.github.io/)
* [Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?](https://arxiv.org/abs/2312.02672)
:star:[code](https://github.com/fpv-iplab/HOI-Synth)
* [Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06748.pdf)
## 37.Style Transfer(风格迁移)
* [Towards compact reversible image representations for neural style transfer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08321.pdf)
* 运动迁移
* [Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation](http://arxiv.org/abs/2407.11266v1)
:star:[code](https://github.com/rongakowang/MMDMC)
## 36.Gaze Estimation
* [De-confounded Gaze Estimation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03367.pdf)
* [3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03191.pdf)
:star:[code](https://github.com/eververas/3DGazeNet)
* [LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation](https://arxiv.org/abs/2411.08606)
* [Gaze Target Detection Based on Head-Local-Global Coordination](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03933.pdf)
## 35.Action Detection(动作检测)
* [LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning](https://arxiv.org/pdf/2312.03849.pdf)
:star:[code](https://github.com/BolinLai/LEGO)
:house:[project](https://bolinlai.github.io/Lego_EgoActGen/)
* [ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos](http://arxiv.org/abs/2407.12987v1)
* [Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition](https://arxiv.org/abs/2403.14113)
* [Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10749.pdf)运动关键帧插值
* 基于骨架的动作识别
* [SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders](http://arxiv.org/abs/2407.13460v1)
:star:[code](https://github.com/pha123661/SA-DVAE)
* [Towards Physical World Backdoor Attacks against Skeleton Action Recognition](https://arxiv.org/abs/2408.08671)
:house:[project](https://qichenzheng.github.io/psba-website/)
* [S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04755.pdf)
:house:[project](https://sjepa.github.io)
* [Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03717.pdf)
:star:[code](https://github.com/LanglandsLin/IGM)
* [CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner](https://arxiv.org/abs/2403.10082)
* 小样本动作识别
* [Trajectory-aligned Space-time Tokens for Few-shot Action Recognition](http://arxiv.org/abs/2407.18249v1)
:house:[project](https://www.cs.umd.edu/~pulkit/tats)
* [Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00305.pdf)
:star:[code](https://github.com/cong-wu/EMP-Net)
* 时序动作检测
* [DyFADet: Dynamic Feature Aggregation for Temporal Action Detection](http://arxiv.org/abs/2407.03197v1)
:star:[code](https://github.com/yangle15/DyFADet-pytorch)
* [UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection](https://arxiv.org/abs/2404.04933)
:star:[code](https://github.com/yingsen1/UniMD)
* 时序动作定位
* [HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization](https://arxiv.org/abs/2408.06437)
:star:[code](https://github.com/sakibreza/ECCV24-HAT)
* [Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization](http://arxiv.org/abs/2407.07673v1)
* [Online Temporal Action Localization with Memory-Augmented Transformer](http://arxiv.org/abs/2408.02957v1)
:house:[project](https://cvlab.postech.ac.kr/research/MATR/)
* [Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01159.pdf)
* 时序动作分割
* [Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment](http://arxiv.org/abs/2408.09919v1)
:star:[code](https://github.com/pangzhan27/GTLA)
* [Two-Stage Active Learning for Efficient Temporal Action Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06348.pdf)
* [Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07145.pdf)
:star:[code](https://github.com/HaoyuJi/LaSA)
* [Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs](https://arxiv.org/abs/2312.02638)
:star:[code](https://github.com/fpv-iplab/synchronization-is-all-you-need)
* 动作质量评估
* [Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment](http://arxiv.org/abs/2407.19675v1)
:star:[code](https://github.com/wuli55555/TRS)
* [RICA^2: Rubric-Informed, Calibrated Assessment of Actions](https://arxiv.org/abs/2408.02138)
:house:[project](https://abrarmajeedi.github.io/rica2_aqa/)
* [Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05909.pdf)动作质量评估
* [MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment](https://arxiv.org/abs/2403.04398)
:star:[code](https://github.com/ZhouKanglei/MAGR_CAQA)
* 动作预测
* [Semantically Guided Representation Learning For Action Anticipation](http://arxiv.org/abs/2407.02309v1)
:star:[code](https://github.com/ADiko1997/S-GEAR)
* [PALM: Predicting Actions through Language Models](https://arxiv.org/abs/2311.17944)预测动作
* 动作识别
* [Referring Atomic Video Action Recognition](https://arxiv.org/abs/2407.01872)
:star:[code](https://github.com/KPeng9510/RAVAR)
* [DEAR: Depth-Enhanced Action Recognition](https://arxiv.org/abs/2408.15679)
* [Bayesian Evidential Deep Learning for Online Action Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02475.pdf)
* [C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition](http://arxiv.org/abs/2407.06113v1)
:star:[code](https://github.com/RongchangLi/ZSCAR_C2C)
* [Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition](http://arxiv.org/abs/2407.06628v1)
* [Classification Matters: Improving Video Action Detection with Class-Specific Attention](http://arxiv.org/abs/2407.19698v1)
* [FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition](http://arxiv.org/abs/2409.01448v1)
:house:[project](https://daveishan.github.io/finepsuedo-webpage/)
* [Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/10056.pdf)
* [Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition](https://arxiv.org/abs/2405.19917)
:house:[project](https://masashi-hatano.github.io/MM-CDFSL/)
* [On the Utility of 3D Hand Poses for Action Recognition](https://arxiv.org/abs/2403.09805)
:house:[project](https://s-shamil.github.io/HandFormer/)
* [POET: Prompt Offset Tuning for Continual Human Action Adaptation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08141.pdf)
:star:[code](https://github.com/humansensinglab/)
* [Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01016.pdf)
:star:[code](https://github.com/BNU-IVC/OccGait)
* [Leveraging temporal contextualization for video action recognition](https://arxiv.org/abs/2404.09490)
:star:[code](https://github.com/naver-ai/tc-clip)
* [Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01635.pdf)
* [SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition](https://arxiv.org/abs/2403.09508)
:house:[project](https://kaist-viclab.github.io/SkateFormer_site/)
* 动作理解
* [EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding](https://arxiv.org/abs/2406.08877)
:star:[code](https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main)
* 群体动作识别
* [Towards More Practical Group Activity Detection: A New Benchmark and Model](https://arxiv.org/abs/2312.02878)
:house:[project](https://cvlab.postech.ac.kr/research/CAFE)
* [Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition](https://arxiv.org/abs/2405.18012)
* [Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph](https://arxiv.org/abs/2407.19497)
:star:[code](https://github.com/mgiant/MP-GCN)
* 癫痫发作检测
* [VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG](https://arxiv.org/abs/2311.14775)
## 34.Visual Question Answering(视觉问答)
* [DriveLM: Driving with Graph Visual Question Answering](https://arxiv.org/abs/2312.14150)
:star:[code](https://github.com/OpenDriveLab/DriveLM)
* [Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following](https://arxiv.org/abs/2406.02774)
* [WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering](http://arxiv.org/abs/2407.05603v1)
:star:[code](https://github.com/cpystan/WSI-VQA)
* [GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02569.pdf)
* [Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge](https://arxiv.org/abs/2401.10712)
:star:[code](https://github.com/WHB139426/QA-Prompts)
* [Compositional Substitutivity of Visual Reasoning for Visual Question Answering](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06434.pdf)
:star:[code](https://github.com/NeverMoreLCH/CG-SPS)
* [Fully Authentic Visual Question Answering Dataset from Online Communities](https://arxiv.org/abs/2311.15562)
:house:[project](https://vqaonline.github.io/)
* [An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08395.pdf)
* 音视频问答
* [Learning Trimodal Relation for AVQA with Missing Modality](http://arxiv.org/abs/2407.16171v1)
* 视频问答
* [Video Question Answering with Procedural Programs](https://arxiv.org/abs/2312.00937)
:house:[project](https://rccchoudhury.github.io/proviq2023/)
* [ViLA: Efficient Video-Language Alignment for Video Question Answering](https://arxiv.org/abs/2312.08367)
:star:[code](https://github.com/xijun-cs/ViLA)
* [TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00720.pdf)VQA
* [AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering](https://arxiv.org/abs/2311.14906)
:star:[code](https://github.com/Xiuyuan-Chen/AutoEval-Video)
* 视听问答
* [Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality](https://arxiv.org/abs/2407.16171)
## 33.Motion Generation(人体运动生成)
* [Event-Based Motion Magnification](https://arxiv.org/abs/2402.11957)
:star:[code](https://github.com/OpenImagingLab/emm)
* [Learning-based Axial Video Motion Magnification](https://arxiv.org/abs/2312.09551)
:house:[project](https://axial-momag.github.io/axial-momag/)
* [SMooDi: Stylized Motion Diffusion Model](http://arxiv.org/abs/2407.12783v1)
:star:[code](https://neu-vi.github.io/SMooDi/)
* [Length-Aware Motion Synthesis via Latent Diffusion](http://arxiv.org/abs/2407.11532v1)
:star:[code](https://github.com/AlessioSam/LADiff)
* [HUMOS: Human Motion Model Conditioned on Body Shape](http://arxiv.org/abs/2409.03944v1)
:star:[code](https://CarstenEpic.github.io/humos/)
* [HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance](http://arxiv.org/abs/2407.06937v1)
:star:[code](https://github.com/Enderfga/HumanRefiner)
* [Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07885.pdf)
:house:[project](https://panwliu.github.io/mhc/)
* [Generating Human Interaction Motions in Scenes with Text Control](https://arxiv.org/abs/2404.10685)
:house:[project](https://research.nvidia.com/labs/toronto-ai/tesmo/)运动生成
* [Motion Mamba: Efficient and Long Sequence Motion Generation](https://arxiv.org/abs/2403.07487)
:star:[code](https://github.com/steve-zeyu-zhang/MotionMamba/)
:house:[project](https://steve-zeyu-zhang.github.io/MotionMamba/)
* [Large Motion Model for Unified Multi-Modal Motion Generation](https://arxiv.org/abs/2404.01284)
:house:[project](https://mingyuan-zhang.github.io/projects/LMM.html)
* [EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation](https://arxiv.org/abs/2312.02256)
:star:[code](https://github.com/Frank-ZY-Dou/EMDM)
:house:[project](https://frank-zy-dou.github.io/projects/EMDM/index.html)
* [Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases](https://arxiv.org/abs/2310.04189)
:house:[project](https://foruck.github.io/KP/)人体运动
* [TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01796.pdf)
:house:[project](https://yufu-wang.github.io/tram4d/)人体运动
* [Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03541.pdf)人体运动
* [FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models](https://arxiv.org/abs/2406.10740)
* [MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model](https://arxiv.org/abs/2404.19759)
:star:[code](https://github.com/Dai-Wenxun/MotionLCM)
* [Realistic Human Motion Generation with Cross-Diffusion Models](https://arxiv.org/abs/2312.10993)
:house:[project](https://wonderno.github.io/CrossDiff-webpage/)人体运动
* [CoMo: Controllable Motion Generation through Language Guided Pose Code Editing](https://arxiv.org/abs/2403.13900)
:house:[project](https://yh2371.github.io/como/)生成可控运动
* [TLControl: Trajectory and Language Control for Human Motion Synthesis](https://arxiv.org/abs/2311.17135)
:house:[project](https://tlcontrol.weilinwl.com/)人体运动合成
* [Retrieval Robust to Object Motion Blur](https://arxiv.org/abs/2404.18025)
:star:[code]((https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur)
* 三维人体运动合成
* [ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions](https://arxiv.org/pdf/2311.17057.pdf)
:house:[project](https://vcai.mpi-inf.mpg.de/projects/remos/)
* 文本-动作合成
* [FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis](https://arxiv.org/pdf/2405.15763)
* [Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation](http://arxiv.org/abs/2407.10528v1)
:star:[code](https://jpthu17.github.io/GuidedMotion-project/)
* [Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation](https://arxiv.org/abs/2312.14828)
:house:[project](https://moonsliu.github.io/Pro-Motion/)
* [ParCo: Part-Coordinating Text-to-Motion Synthesis](https://arxiv.org/abs/2403.18512)
:star:[code](https://github.com/qrzou/ParCo)
* 人体运动预测
* [Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04599.pdf)人体运动预测
* [Scene-aware Human Motion Forecasting via Mutual Distance Prediction](https://arxiv.org/abs/2310.00615)
* 人体运动估计
* [MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation](https://static.siplab.org/papers/eccv2024-manikin.pdf)
:house:[project](https://siplab.org/projects/MANIKIN)
* 运动估计
* [Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation](http://arxiv.org/abs/2407.10802v1)
:star:[code](https://github.com/tub-rip/MotionPriorCMax)
* [COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation](https://arxiv.org/abs/2408.16426)
* 舞蹈生成
* [Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation](https://arxiv.org/abs/2407.07554)
:house:[project](https://zikaihuangscut.github.io/Beat-It/)
* 行为生成
* [DIM: Dyadic Interaction Modeling for Social Behavior Generation](https://arxiv.org/abs/2403.09069)
:star:[code](https://github.com/Boese0601/Dyadic-Interaction-Modeling)
* 运动迁移
* [Temporal Residual Jacobians for Rig-free Motion Transfer](https://arxiv.org/abs/2407.14958)
:house:[project](https://temporaljacobians.github.io/)
🤗[huggingface](https://huggingface.co/papers/2407.14958)
* 运动预测
* [Enhanced Motion Forecasting with Visual Relation Reasoning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07336.pdf)
## 32.Person Re-Identification(人员重识别)
* [Human-in-the-Loop Visual Re-ID for Population Size Estimation](https://arxiv.org/abs/2312.05287)
:star:[code](https://github.com/cvl-umass/counting-clusters)
* 行人重识别
* [Keypoint Promptable Re-Identification](https://arxiv.org/abs/2407.18112)
:star:[code](https://github.com/VlSomers/keypoint_promptable_reidentification)
* [Privacy-Preserving Adaptive Re-Identification without Image Transfer](http://arxiv.org/abs/2407.12589v1)
* [Rethinking Normalization Layers for Domain Generalizable Person Re-identification](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08753.pdf)
:star:[code](https://github.com/3699nr/ReNorm)
* [Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09119.pdf)
* VI-ReID
* [Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification](https://arxiv.org/abs/2401.06825)
:thumbsup:[无监督可见光-红外行人重识别(USL-VI-ReID)](https://std.xmu.edu.cn/2024/0710/c4739a488273/page.htm)
* [WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification](https://www.arxiv.org/abs/2408.10624)
* 人物搜索
* [PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery](https://arxiv.org/abs/2409.13475)基于文本的人物搜索
* 步态识别
* [Camera-LiDAR Cross-modality Gait Recognition](https://arxiv.org/abs/2407.02038)
* [Free Lunch for Gait Recognition: A Novel Relation Descriptor](https://arxiv.org/abs/2308.11487)
* [Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition](http://arxiv.org/abs/2407.12519v1)
* [Cut out the Middleman: Revisiting Pose-based Gait Recognition](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04501.pdf)
:star:[code](https://github.com/BNU-IVC/FastPoseGait)
* 计数
* [CountFormer: Multi-View Crowd Counting Transformer](http://arxiv.org/abs/2407.02047v1)
* [Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM](https://arxiv.org/abs/2402.17514)
* [Multi-modal Crowd Counting via a Broker Modality](http://arxiv.org/abs/2407.07518v1)
:star:[code](https://github.com/HenryCilence/Broker-Modality-Crowd-Counting)
* [Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance](https://arxiv.org/abs/2405.10589)
:star:[code](https://github.com/AaronCIH/APGCC)
## 31.Point Clouds(点云)
* [SEED: A Simple and Effective 3D DETR in Point Clouds](http://arxiv.org/abs/2407.10749v1)
:star:[code](https://github.com/happinesslz/SEED)
* [PointLLM: Empowering Large Language Models to Understand Point Clouds](https://arxiv.org/abs/2308.16911)
:star:[code](https://github.com/OpenRobotLab/PointLLM)
:house:[project](https://runsenxu.com/projects/PointLLM/)
* [TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds](https://arxiv.org/abs/2407.12702)
* [Learning to Adapt SAM for Segmenting Cross-domain Point Clouds](https://arxiv.org/abs/2310.08820)
* [Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time](https://export.arxiv.org/abs/2407.01851)
* [milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing](https://arxiv.org/abs/2306.17010)
:star:[code](https://github.com/Toytiny/milliFlow)
* [Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement](http://arxiv.org/abs/2408.02966v1)
* [Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes](http://arxiv.org/abs/2408.14279v1)
:star:[code](https://github.com/chenchao15/Unseen)
* [T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning](https://arxiv.org/abs/2312.10217)
:star:[code](https://github.com/codename1995/T-MAE)
* [Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds](https://arxiv.org/abs/2311.16474v2)
:star:[code](https://github.com/xiaoyao3302/PCFEA)
* [PFGS: High Fidelity Point Cloud Rendering via Feature Splatting](https://arxiv.org/abs/2407.03857)
:star:[code](https://github.com/Mercerai/PFGS)
* [Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09814.pdf)
:star:[code](https://github.com/yh-han/M2PSC.git)
* [To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning](https://arxiv.org/abs/2403.17869)
* [Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing](https://arxiv.org/abs/2311.16043)
:star:[code](https://github.com/NJU-3DV/Relightable3DGaussian)
* [FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation](https://arxiv.org/abs/2410.19573)
:star:[code](https://github.com/genuszty/FastPCI)
* 点云生成
* [RangeLDM: Fast Realistic LiDAR Point Cloud Generation](https://arxiv.org/abs/2403.10094)
:star:[code](https://github.com/WoodwindHu/RangeLDM)
* [Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07328.pdf)
:star:[code](https://github.com/wuyang98/Text2LiDAR)
* [Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation](https://arxiv.org/abs/2312.07231)
:house:[project](https://dit-3d.github.io/FastDiT-3D/)
* [FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation](https://arxiv.org/abs/2311.12090)
:house:[project](https://chenliang-zhou.github.io/FrePolad/)
* 点云完成
* [Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion](http://arxiv.org/abs/2407.02887v1)
:star:[code](https://github.com/WHU-USI3DV/EGIInet)
* [T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy](http://arxiv.org/abs/2407.05008v1)
:star:[code](https://github.com/df-boy/T-CorresNet)
* [AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01714.pdf)
* [EINet: Point Cloud Completion via Extrapolation and Interpolation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05687.pdf)
:star:[code](https://github.com/corecai163/EINet)
* [Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06768.pdf)
:star:[code](https://github.com/yun-seo/PPCC)
* [ProtoComp: Diverse Point Cloud Completion with Controllable Prototype](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06685.pdf)
:star:[code](https://github.com/Yanbo-23/Proto-Comp)
* 点云重建
* [DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction](https://arxiv.org/abs/2312.03298)
:star:[code](https://github.com/TyraelDLee/DiffPMAE)
* 点云理解
* [DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding](https://arxiv.org/abs/2407.08801)
* 点云配准
* [ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency](http://arxiv.org/abs/2407.09862v1)
:star:[code](https://github.com/Laka-3DV/ML-SemReg)
* [PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training](http://arxiv.org/abs/2407.14054v1)
:star:[code](https://github.com/Chen-Suyi/PointRegGPT)
* [SemReg: Semantics Constrained Point Cloud Registration](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05759.pdf)
:star:[code](https://github.com/SheldonFung98/SemReg.git)
* [Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning](http://arxiv.org/abs/2407.20223v1)
:house:[project](https://sites.google.com/view/eccv24-equivalign)
* [UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11688.pdf)
:star:[code](https://github.com/yuvalH9/UMERegRobust)
* [PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration](https://arxiv.org/abs/2407.10142)
:star:[code](https://github.com/yaorz97/PARENet)
* [UMERegRobust -- Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration](http://arxiv.org/abs/2408.12380v1)
:star:[code](https://github.com/yuvalH9/UMERegRobust)
* [Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration](https://arxiv.org/abs/2410.05729)点云配准
* 点云分割
* [Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation](http://arxiv.org/abs/2407.12489v1)
* [HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation](http://arxiv.org/abs/2407.12387v1)
:star:[code](https://github.com/tpzou/HGL)
* [SegPoint: Segment Any Point Cloud via Large Language Model](http://arxiv.org/abs/2407.13761v1)
:star:[code](https://heshuting555.github.io/SegPoint)
* [Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation](https://arxiv.org/abs/2408.13752)
* [Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05346.pdf)
:star:[code](https://github.com/jimtsai23/PseudoEmbed)
* [Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation](https://www.arxiv.org/abs/2408.10537)
:star:[code](https://github.com/Javion11/PointLiBR.git)
* 点云理解
* [GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding](https://arxiv.org/abs/2407.13519)
:star:[code](https://github.com/changshuowang/GPSFormer)
* 3D点云
* [Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds](http://arxiv.org/abs/2407.13342v1)
:star:[code](https://list17.github.io/ImplicitFilter)
* [CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation](http://arxiv.org/abs/2407.16193v1)
:star:[code](https://github.com/shimazing/CloudFixer)
* [FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00951.pdf)
* [RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation](https://arxiv.org/abs/2408.06110)
* [P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising](http://arxiv.org/abs/2408.16325v1)
:star:[code](https://p2p-bridge.github.io)
* [Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03785.pdf)
* [Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04444.pdf)3D 点云攻击
* [Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06843.pdf)
:star:[code](https://github.com/qpwodlsqp/CSEConv)
* [Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08282.pdf)
## 30.Anomaly Detection(异常检测)
* [Continuous Memory Representation for Anomaly Detection](https://arxiv.org/abs/2402.18293)
:star:[code](https://github.com/tae-mo/CRAD)
* [Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection](https://arxiv.org/abs/2302.14696)
:star:[code](https://github.com/shijianjian/DIA.git)
* [Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11002.pdf)
:star:[code](https://github.com/gaobb/AnoGen)
* [GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features](http://arxiv.org/abs/2407.12427v1)
:star:[code](https://github.com/LucStrater/GeneralAD)
* [Learning Diffusion Models for Multi-View Anomaly Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04907.pdf)
* [Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection](https://arxiv.org/abs/2403.13349)
:star:[code](https://github.com/xcyao00/HGAD)
* [TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection](https://arxiv.org/abs/2311.09999)
:star:[code](https://github.com/MaticFuc/ECCV_TransFusion)
* [Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07868.pdf)
:star:[code](https://github.com/DeclanMcIntosh/Online_InReaCh)
* [MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11465.pdf)
:star:[code](https://github.com/TheStarOfMSY/MoEAD)
* 缺陷检测
* [An Incremental Unified Framework for Small Defect Inspection](https://arxiv.org/abs/2312.08917)
:star:[code](https://github.com/jqtangust/IUF)
* 故障检测
* [DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation](http://arxiv.org/abs/2408.00331v1)
:star:[code](https://github.com/kowshikthopalli/DECIDER/)
* 3D异常检测
* [R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection](http://arxiv.org/abs/2407.10862v1)
* 工业异常检测
* [Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection](https://arxiv.org/abs/2401.03145)
* [A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization](https://arxiv.org/abs/2407.09359)
:star:[code](https://github.com/cqylunlun/GLASS)
* [GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection](https://arxiv.org/abs/2406.07487)
:star:[code](https://github.com/hyao1/GLAD)
* [AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08661.pdf)
* [Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08462.pdf)
* 零样本异常检测
* [AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection](http://arxiv.org/abs/2407.15795v1)
:star:[code](https://github.com/caoyunkang/AdaCLIP)
* 多类异常检测
* [Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection](https://arxiv.org/abs/2403.11561)
* OOD
* [Gradient-Regularized Out-of-Distribution Detection](https://export.arxiv.org/abs/2404.12368)
* [SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning](https://arxiv.org/abs/2407.03036)
* [PixOOD: Pixel-Level Out-of-Distribution Detection](https://arxiv.org/abs/2405.19882)
:star:[code](https://github.com/vojirt/PixOOD)
* [An Information Theoretical View for Out-Of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07242.pdf)
* [Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection](http://arxiv.org/abs/2407.04022v1)
* [LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models](http://arxiv.org/abs/2407.08966v1)
:star:[code](https://github.com/YBZh/LAPT)
* [ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection](http://arxiv.org/abs/2407.11735v1)
:star:[code](https://github.com/walline/prosub)
* [Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond](http://arxiv.org/abs/2407.15739v1)
:star:[code](https://ade-ood.github.io/)
* [Can Your Generative Model Detect Out-of-Distribution Covariate Shift?](http://arxiv.org/abs/2409.03043v1)
* [Gradient-based Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02138.pdf)
* [Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11399.pdf)
* [TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/09304.pdf)
:star:[code](https://github.com/XixiLiu95/TAG)
* 异常值检测
* [Rethinking Unsupervised Outlier Detection via Multiple Thresholding](https://arxiv.org/abs/2407.05382)无监督异常值检测
* 零样本异常分割
* [VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation](https://arxiv.org/abs/2407.12276)
:star:[code](https://github.com/xiaozhen228/VCP-CLIP)
## 29.Semi/self-supervised learning(半/自监督)
* [SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers](https://arxiv.org/abs/2407.06305)
:house:[project](https://mingrui-zhao.github.io/SweepNet/)
* [Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning](https://arxiv.org/abs/2403.10252)
:star:[code](https://github.com/HereNowL/Region-aware-Distribution-Contrast)
* 自监督
* [CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning](http://arxiv.org/abs/2407.12188v1)
:star:[code](https://github.com/ErumMushtaq/CroMo-Mixup)
* [HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion](http://arxiv.org/abs/2407.05638v1)
:star:[code](https://github.com/Zeudfish/HPFF)
* [SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning](http://arxiv.org/abs/2407.08148v1)
:star:[code](https://github.com/RM-Zhang/SCPNet)
* [Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing](http://arxiv.org/abs/2407.11168v1)
* [OmniSat: Self-Supervised Modality Fusion for Earth Observation](https://arxiv.org/pdf/2404.08351)
:star:[code](https://github.com/gastruc/OmniSat)
:house:[project](https://gastruc.github.io/projects/omnisat.html)
:sunflower:[dataset](https://huggingface.co/datasets/IGNF/PASTIS-HD)
* [FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning](https://arxiv.org/abs/2310.02903)
* [Self-supervised visual learning from interactions with objects](https://arxiv.org/abs/2407.06704)
* [Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense](https://arxiv.org/abs/2409.08509)
* [GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning](https://arxiv.org/abs/2403.12003)
:star:[code](https://github.com/xiaojieli0903/genview)
* [On Pretraining Data Diversity for Self-Supervised Learning](https://arxiv.org/abs/2403.13808)
:star:[code](https://github.com/hammoudhasan/DiversitySSL)
* [Decoupling Common and Unique Representations for Multimodal Self-supervised Learning](https://arxiv.org/abs/2309.05300)
:star:[code](https://github.com/zhu-xlab/DeCUR)
* [POA: Pre-training Once for Models of All Sizes](http://arxiv.org/abs/2408.01031v1)
:star:[code](https://github.com/Qichuzyy/POA)
* [ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders](https://arxiv.org/abs/2303.12001)自监督表示学习
* [Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization](https://arxiv.org/abs/2403.14973)
:house:[project](https://pwang.pw/trajSSL/)自监督学习
* [SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning](https://arxiv.org/abs/2303.09079)
:star:[code](https://github.com/ucf-ml-research/ssl-cleanse)
* 半监督
* [Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning](https://arxiv.org/abs/2408.12614)
* [Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data](http://arxiv.org/abs/2409.13977v1)
:star:[code](https://github.com/snehaputul/AllMatch)
* [SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning](http://arxiv.org/abs/2409.17512v1)
:star:[code](https://github.com/komejisatori/SCOMatch)
* [ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11377.pdf)
* [Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03287.pdf)半监督学习
* [Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning](https://arxiv.org/abs/2407.15837)
:star:[code](https://github.com/yibingwei-1/LatentMIM)
* [Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration](https://arxiv.org/abs/2306.04621)
:star:[code](https://github.com/emasa/ADELLO-LTSSL)
## 28.Novel Class Discovery(新类发现)
* [Self-Cooperation Knowledge Distillation for Novel Class Discovery](http://arxiv.org/abs/2407.01930v1)
## 27.GNN/GCN
* [GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition](https://arxiv.org/abs/2308.14378)
:star:[code](https://github.com/jin-s13/GKGNet)GNN
* [Graph Neural Network Causal Explanation via Neural Causal Models](https://arxiv.org/abs/2407.09378)
:star:[code](https://github.com/ArmanBehnam/CXGNN)
* [On the Topology Awareness and Generalization Performance of Graph Neural Networks](https://arxiv.org/abs/2403.04482)
* [Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12325.pdf)
## 26.NAS
* [Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00668.pdf)
:star:[code](https://github.com/lliai/Auto-GAS)
* [Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00676.pdf)
:star:[code](https://github.com/lliai/Auto-DAS)蒸馏感
* [SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference](https://arxiv.org/abs/2301.10879)
* [Dependency-aware Differentiable Neural Architecture Search](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07216.pdf)
## 25.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)
* [DεpS: Delayed ε-Shrinking for Faster Once-For-All Training](http://arxiv.org/abs/2407.06167v1)
* 模型压缩
* [Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07761.pdf)
* 剪枝
* [Non-transferable Pruning](https://arxiv.org/abs/2410.08015)
* [Straightforward Layer-wise Pruning for More Efficient Visual Adaptation](http://arxiv.org/abs/2407.14330v1)
* [Isomorphic Pruning for Vision Models](https://arxiv.org/abs/2407.04616)
:star:[code](https://github.com/VainF/Isomorphic-Pruning)
* [LPViT: Low-Power Semi-structured Pruning for Vision Transformers](https://arxiv.org/abs/2407.02068)
* [PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference](https://arxiv.org/abs/2403.16020)
:star:[code](https://github.com/tanvir-utexas/PaPr)剪
* [Enhanced Sparsification via Stimulative Training](https://arxiv.org/abs/2403.06417)
:star:[code](https://github.com/tsj-001/STP)
* [SNP: Structured Neuron-level Pruning to Preserve Attention Scores](https://arxiv.org/abs/2404.11630)
:star:[code](https://github.com/Nota-NetsPresso/SNP)
* 量化
* [GenQ: Quantization in Low Data Regimes with Generative Synthetic Data](https://arxiv.org/abs/2312.05272v2)
:star:[code](https://github.com/Intelligent-Computing-Lab-Yale/GenQ)
* [MetaAug: Meta-Data Augmentation for Post-Training Quantization](http://arxiv.org/abs/2407.14726v1)
* [Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients](http://arxiv.org/abs/2407.12637v1)
* [CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs](http://arxiv.org/abs/2407.05266v1)
:star:[code](https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git)
* [AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer](http://arxiv.org/abs/2407.12951v1)
:star:[code](https://github.com/GoatWu/AdaLog)
* [POCA: Post-training Quantization with Temporal Alignment for Codec Avatars](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05670.pdf)
:house:[project](https://mengjian0502.github.io/poca.github.io/)量化
* KD
* [Simple Unsupervised Knowledge Distillation With Space Similarity](https://arxiv.org/abs/2409.13939)知识蒸馏
* [Direct Distillation between Different Domains](https://arxiv.org/abs/2401.06826)KD
* [Harmonizing knowledge Transfer in Neural Network with Unified Distillation](https://arxiv.org/abs/2409.18565)
* [Good Teachers Explain: Explanation-Enhanced Knowledge Distillation](https://arxiv.org/abs/2402.03119)
* [The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers](https://arxiv.org/abs/2302.10494)
* [Improving Knowledge Distillation via Regularizing Feature Direction and Norm](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03432.pdf)
* [Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00499.pdf)蒸馏
* [Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation](https://arxiv.org/abs/2407.03056)
:star:[code](https://github.com/miccunifi/KDPL)
* [UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation](https://arxiv.org/abs/2212.10950)
:star:[code](https://github.com/dreamguo/UNIKD)
* [BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation](https://arxiv.org/abs/2407.09083)
* [Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation](https://arxiv.org/abs/2405.11614)
* [How to Train the Teacher Model for Effective Knowledge Distillation](https://arxiv.org/abs/2407.18041)
* [Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/12478.pdf)
## 24.Vision Transformer
* [Spline-based Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11525.pdf)
* [Denoising Vision Transformers](https://arxiv.org/abs/2401.02957)
* [FairViT: Fair Vision Transformer via Adaptive Masking](http://arxiv.org/abs/2407.14799v1)
* [Rotary Position Embedding for Vision Transformer](https://arxiv.org/abs/2403.13298)
:star:[code](https://github.com/naver-ai/rope-vit)
* [Bidirectional Progressive Transformer for Interaction Intention Anticipation](https://arxiv.org/abs/2405.05552)
* [Robustness Tokens: Towards Adversarial Robustness of Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07642.pdf)
* [SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization](https://arxiv.org/abs/2402.03317)
:star:[code](https://github.com/microsoft/robustlearn)
* [PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers](http://arxiv.org/abs/2407.04538v1)
* [OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction](http://arxiv.org/abs/2407.13335v1)
:star:[code](https://github.com/HKUST-NISL/oat_eccv24)
* [AugDETR: Improving Multi-scale Learning for Detection Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03484.pdf)Transformer
* [AttnZero: Efficient Attention Discovery for Vision Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00666.pdf)
:star:[code](https://github.com/lliai/AttnZero)
* [SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02019.pdf)
:star:[code](https://github.com/Euphoria16/SpatialFormer)
* [Efficient Vision Transformers with Partial Attention](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/11047.pdf)
* [SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers](https://arxiv.org/abs/2401.08740)
:star:[code](https://github.com/willisma/SiT)
* [Stitched ViTs are Flexible Vision Backbones](https://arxiv.org/abs/2307.00154)
:star:[code](https://github.com/ziplab/SN-Netv2)
* [Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02429.pdf)
* [Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00861.pdf)
:star:[code](https://github.com/bianlab/Specformer)
* [GiT: Towards Generalist Vision Transformer through Universal Language Interface](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04158.pdf)
:star:[code](https://github.com/Haiyang-W/GiT)
* [An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06958.pdf)
* [Fairness-aware Vision Transformer via Debiased Self-Attention](https://arxiv.org/abs/2301.13803)
:star:[code](https://github.com/qiangyao1988/DSA)
* [ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention](https://arxiv.org/abs/2401.00912)
:star:[code](https://github.com/skyhehe123/ScatterFormer)
* [LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors](https://arxiv.org/abs/2403.14625)
:house:[project](https://www.cs.umd.edu/~sakshams/LiFT/)
* [Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach](http://arxiv.org/abs/2407.06964v1)
:house:[project](https://synqt.github.io/)
* [LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer](https://arxiv.org/abs/2212.09877)
:star:[code](https://github.com/salesforce/LayoutDETR)
* [Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators](https://arxiv.org/abs/2408.05710)
:star:[code](https://github.com/LeapLabTHU/Attention-Mediators)
* [BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos](https://arxiv.org/abs/2312.00083)
:star:[code](https://github.com/Pilhyeon/BAM-DETR)
* [An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding](http://arxiv.org/abs/2408.01120v1)
:star:[code](https://github.com/chenwei746/EEVG)