Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amusi/iccv2023-papers-with-code
ICCV 2023 论文和开源项目合集
https://github.com/amusi/iccv2023-papers-with-code
artificial-intelligence computer-vision iccv iccv2021 iccv2023 object-detection semantic-segmentation transformer
Last synced: 22 days ago
JSON representation
ICCV 2023 论文和开源项目合集
- Host: GitHub
- URL: https://github.com/amusi/iccv2023-papers-with-code
- Owner: amusi
- Created: 2021-07-23T00:47:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-01T01:06:42.000Z (about 1 year ago)
- Last Synced: 2024-10-15T10:05:15.330Z (2 months ago)
- Topics: artificial-intelligence, computer-vision, iccv, iccv2021, iccv2023, object-detection, semantic-segmentation, transformer
- Homepage:
- Size: 102 KB
- Stars: 2,495
- Watchers: 39
- Forks: 250
- Open Issues: 41
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ICCV2023-Papers-with-Code
[ICCV 2023](http://iccv2023.thecvf.com/) 论文和开源项目合集(papers with code)!
2160 papers accepted!
ICCV 2023 收录论文IDs:https://t.co/A0mCH8gbOi
> 注1:欢迎各位大佬提交issue,分享ICCV 2023论文和开源项目!
>
> 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
>
> [ICCV 2021](ICCV2021-Papers-with-Code.md)如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【[CVer学术交流群](https://t.zsxq.com/10OGjThDw)】!互相学习,一起进步~
![](https://github.com/amusi/CVPR2023-Papers-with-Code/raw/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png)
# 【ICCV 2023 论文开源目录】
- [Backbone](#Backbone)
- [CLIP](#CLIP)
- [MAE](#MAE)
- [GAN](#GAN)
- [GNN](#GNN)
- [MLP](#MLP)
- [NAS](#NAS)
- [OCR](#OCR)
- [NeRF](#NeRF)
- [DETR](#DETR)
- [Prompt](#Prompt)
- [Diffusion Models(扩散模型)](#Diffusion)
- [Prompt](#Prompt)
- [Avatars](#Avatars)
- [ReID(重识别)](#ReID)
- [长尾分布(Long-Tail)](#Long-Tail)
- [Vision Transformer](#Vision-Transformer)
- [视觉和语言(Vision-Language)](#VL)
- [自监督学习(Self-supervised Learning)](#SSL)
- [数据增强(Data Augmentation)](#DA)
- [目标检测(Object Detection)](#Object-Detection)
- [目标跟踪(Visual Tracking)](#VT)
- [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
- [实例分割(Instance Segmentation)](#Instance-Segmentation)
- [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
- [医学图像分类(Medical Image Classfication)](#MIC)
- [医学图像分割(Medical Image Segmentation)](#MIS)
- [视频目标分割(Video Object Segmentation)](#VOS)
- [视频实例分割(Video Instance Segmentation)](#VIS)
- [参考图像分割(Referring Image Segmentation)](#RIS)
- [图像抠图(Image Matting)](#Matting)
- [Low-level Vision](#LLV)
- [超分辨率(Super-Resolution)](#SR)
- [去噪(Denoising)](#Denoising)
- [去模糊(Deblur)](#Deblur)
- [3D点云(3D Point Cloud)](#3D-Point-Cloud)
- [3D目标检测(3D Object Detection)](#3DOD)
- [3D语义分割(3D Semantic Segmentation)](#3DSS)
- [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
- [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
- [3D配准(3D Registration)](#3D-Registration)
- [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
- [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
- [医学图像(Medical Image)](#Medical-Image)
- [图像生成(Image Generation)](#Image-Generation)
- [视频生成(Video Generation)](#Video-Generation)
- [图像编辑(Image Editing)](#Image-Editing)
- [视频编辑(Video Editing)](#Video-Editing)
- [视频理解(Video Understanding)](#Video-Understanding)
- [人体运动生成(Human Motion Generation)](#Human-Motion-Generation)
- [低光照图像增强(Low-light Image Enhancement)](#Low-light-Image-Enhancement)
- [场景文本识别(Scene Text Recognition)](#STR)
- [图像检索(Image Retrieval)](#Image-Retrieval)
- [图像融合(Image Fusion)](#Image-Fusion)
- [轨迹预测(Trajectory Prediction) ](#Trajectory-Prediction)
- [人群计数(Crowd Counting)](#Crowd-Counting)
- [Video Quality Assessment(视频质量评价)](#Video-Quality-Assessment)
- [其它(Others)](#Others)# Avatars
**Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
Paper: https://arxiv.org/abs/2303.17606
Code: https://github.com/songrise/AvatarCraft
# Backbone
**Rethinking Mobile Block for Efficient Attention-based Models**
- Paper: https://arxiv.org/abs/2301.01146
- Code: https://github.com/zhangzjn/EMO# CLIP
**PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization**
- Paper: https://arxiv.org/abs/2307.15199
- Code: [https://PromptStyler.github.io/](https://promptstyler.github.io/)**CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation**
- Paper: https://arxiv.org/abs/2308.15226
- Code: http://www.github.com/devaansh100/CLIPTrans# NeRF
**IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis**
- Homepage: https://zju3dv.github.io/intrinsic_nerf/
- Paper: https://arxiv.org/abs/2210.00647
- Code: https://github.com/zju3dv/IntrinsicNeRF**Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
- Paper: https://arxiv.org/abs/2303.17606
- Code: https://github.com/songrise/AvatarCraft
**FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis**
- Homepage: https://shawn615.github.io/flipnerf/
- Code: https://github.com/shawn615/FlipNeRF
- Paper: https://arxiv.org/abs/2306.17723**Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields**
- Homepage: https://wbhu.github.io/projects/Tri-MipRF
- Paper: https://arxiv.org/abs/2307.11335
- Code: https://github.com/wbhu/Tri-MipRF# Diffusion Models(扩散模型)
**PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment**
- Paper: https://arxiv.org/abs/2306.15667
- Code: https://github.com/facebookresearch/PoseDiffusion**FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
- Paper: https://arxiv.org/abs/2303.09833
- Code: https://github.com/vvictoryuki/FreeDoM**BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
- Paper: https://arxiv.org/abs/2307.10816
- Code: https://github.com/Sierkinhane/BoxDiff**BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
- Paper: https://arxiv.org/abs/2211.14304
- Code: https://github.com/BarqueroGerman/BeLFusion**DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
- Paper: https://arxiv.org/abs/2303.06840
- Code: https://github.com/Zhaozixiang1228/MMIF-DDFM**DIRE for Diffusion-Generated Image Detection**
- Paper: https://arxiv.org/abs/2303.09295
- Code: https://github.com/ZhendongWang6/DIRE# Prompt
**Read-only Prompt Optimization for Vision-Language Few-shot Learning**
- Paper: https://arxiv.org/abs/2308.14960
- Code: https://github.com/mlvlab/RPO**Introducing Language Guidance in Prompt-based Continual Learning**
- Paper: https://arxiv.org/abs/2308.15827
- Code: None# 视觉和语言(Vision-Language)
**Read-only Prompt Optimization for Vision-Language Few-shot Learning**
- Paper: https://arxiv.org/abs/2308.14960
- Code: https://github.com/mlvlab/RPO# 目标检测(Object Detection)
**Femtodet: an object detection baseline for energy versus performance tradeoffs**
- Paper: https://arxiv.org/abs/2301.06719
- Code: https://github.com/yh-pengtu/FemtoDet**Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment**
- Paper: https://arxiv.org/abs/2207.13085
- Code: https://github.com/Atten4Vis/GroupDETR**Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection**
- Paper: https://arxiv.org/abs/2205.09613
- Code: https://github.com/LiewFeng/imTED**ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation**
- Paper: https://arxiv.org/abs/2308.09242
- Code: https://github.com/iSEE-Laboratory/ASAG# 目标跟踪(Visual Tracking)
**Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers**
- Paper: https://arxiv.org/abs/2307.04129
- Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker# 语义分割(Semantic Segmentation)
**Segment Anything**
- Homepage: https://segment-anything.com/
- Paper: https://arxiv.org/abs/2304.02643
- Code: https://github.com/facebookresearch/segment-anything**MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation**
- Paper: https://arxiv.org/abs/2304.09913
- Code: https://github.com/shjo-april/MARS**FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation**
- Paper: https://arxiv.org/abs/2307.07245
- Code: https://github.com/TY-Shi/FreeCOS**Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation**
- Paper: https://arxiv.org/abs/2211.14512
- Code: https://github.com/yyliu01**Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement**
- Paper: https://arxiv.org/abs/2307.09362
- Code: https://github.com/w1oves/DTP# 视频目标分割(Video Object Segmentation)
**Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus**
- Paper: https://arxiv.org/abs/2207.01203
- Code: https://github.com/lxa9867/R2VOS
# 视频实例分割(Video Instance Segmentation)
**DVIS: Decoupled Video Instance Segmentation Framework**
- Paper: https://arxiv.org/abs/2306.03413
- Code: https://github.com/zhang-tao-whu/DVIS# 医学图像分类
**BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification**
- Paper: https://arxiv.org/abs/2203.01937
- Code: https://github.com/cyh-0/BoMD
# 医学图像分割
**CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection**
- Paper: https://arxiv.org/abs/2301.00785
- Code: https://github.com/ljwztc/CLIP-Driven-Universal-Model# Low-level Vision
**Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive**
- Paper: https://arxiv.org/abs/2305.19862
- Code: https://github.com/shangwei5/SelfDRSC# 超分辨率(Super-Resolution)
**Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.**
- Paper: https://arxiv.org/abs/2303.08942
- Code: https://github.com/Zhaozixiang1228/GDSR-SSDNet# 3D点云(3D Point Cloud)
**Robo3D: Towards Robust and Reliable 3D Perception against Corruptions**
- Homepage: https://ldkong.com/Robo3D
- Paper: https://arxiv.org/abs/2303.17597
- Code: https://github.com/ldkong1205/Robo3D**Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models**
- Paper: https://arxiv.org/abs/2304.07221
- Code: https://github.com/zyh16143998882/ICCV23-IDPT**Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos**
- Paper: https://arxiv.org/abs/2308.09247
- Code: None# 3D目标检测(3D Object Detection)
**PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images**
- Paper: https://arxiv.org/abs/2206.01256
- Code: https://github.com/megvii-research/PETR**DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection**
- Paper: https://arxiv.org/abs/2304.13031
- Code: https://github.com/AIR-DISCOVER/DQS3D**SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection**
- Paper: https://arxiv.org/abs/2304.14340
- Code: https://github.com/yichen928/SparseFusion**StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection**
- Paper: https://arxiv.org/abs/2303.11926
- Code: https://github.com/exiawsh/StreamPETR.git**Cross Modal Transformer: Towards Fast and Robust 3D Object Detection**
- Paper: https://arxiv.org/abs/2301.01283
- Code: https://github.com/junjie18/CMT.git**MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation**
- Paper: https://arxiv.org/abs/2304.09801
- Project: https://chongjiange.github.io/metabev.html
- Code: https://github.com/ChongjianGE/MetaBEV**Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling**
- Paper: https://arxiv.org/abs/2307.07944
- Code: https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet**SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection**
- Paper: https://arxiv.org/abs/2307.11477
- Code: https://github.com/mengtan00/SA-BEV# 3D语义分割(3D Semantic Segmentation)
**Rethinking Range View Representation for LiDAR Segmentation**
- Homepage: https://ldkong.com/RangeFormer
- Paper: https://arxiv.org/abs/2303.05367
- Code: None# 3D目标跟踪(3D Object Tracking)
**MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors**
- Paper: https://arxiv.org/abs/2303.05071
- Code : https://github.com/slothfulxtx/MBPTrack3D# 视频理解(Video Understanding)
**Unmasked Teacher: Towards Training-Efficient Video Foundation Models**
- Paper: https://arxiv.org/abs/2303.16058
- Code: https://github.com/OpenGVLab/unmasked_teacher
# 图像生成(Image Generation)
**FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
- Paper: https://arxiv.org/abs/2303.09833
- Code: https://github.com/vvictoryuki/FreeDoM**BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
- Paper: https://arxiv.org/abs/2307.10816
- Code: https://github.com/Sierkinhane/BoxDiff# 视频生成(Video Generation)
**Simulating Fluids in Real-World Still Images**
- Homepage: https://slr-sfs.github.io/
- Paper: https://arxiv.org/abs/2204.11335
- Code: https://github.com/simon3dv/SLR-SFS# 图像编辑(Image Editing)
**Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing**
- Paper: https://arxiv.org/abs/2304.02051
- Code: https://github.com/aimagelab/multimodal-garment-designer# 视频编辑(Video Editing)
**FateZero: Fusing Attentions for Zero-shot Text-based Video Editing**
- Project: https://fate-zero-edit.github.io/
- Paper: https://arxiv.org/abs/2303.09535
- Code: https://github.com/ChenyangQiQi/FateZero# 人体运动生成(Human Motion Generation)
**BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
- Paper: https://arxiv.org/abs/2211.14304
- Code: https://github.com/BarqueroGerman/BeLFusion# 低光照图像增强(Low-light Image Enhancement)
**Implicit Neural Representation for Cooperative Low-light Image Enhancement**
- Paper: https://arxiv.org/abs/2303.11722
- Code: https://github.com/Ysz2022/NeRCo# 场景文本检测(Scene Text Detection)
# 场景文本识别(Scene Text Recognition)
**Self-supervised Character-to-Character Distillation for Text Recognition**
- Paper: https://arxiv.org/abs/2211.00288
- Code: https://github.com/TongkunGuan/CCD**MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition**
- Paper: https://arxiv.org/abs/2305.14758
- Code: https://github.com/simplify23/MRN
- 中文解读:https://zhuanlan.zhihu.com/p/643948935# 图像检索(Image Retrieval)
**Zero-Shot Composed Image Retrieval with Textual Inversion**
- Paper: https://arxiv.org/abs/2303.15247
- Code: https://github.com/miccunifi/SEARLE# 图像融合(Image Fusion)
**DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
- Paper: https://arxiv.org/abs/2303.06840
- Code: https://github.com/Zhaozixiang1228/MMIF-DDFM# 轨迹预测(Trajectory Prediction)
**EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting**
- Homepage: https://inhwanbae.github.io/publication/eigentrajectory/
- Paper: https://arxiv.org/abs/2307.09306
- Code: https://github.com/InhwanBae/EigenTrajectory# 人群计数(Crowd Counting)
**Point-Query Quadtree for Crowd Counting, Localization, and More**
- Paper: https://arxiv.org/abs/2308.13814
- Code: https://github.com/cxliu0/PET# Video Quality Assessment(视频质量评价)
**Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives**
- Paper: https://arxiv.org/abs/2211.04894
- Code: https://github.com/VQAssessment/DOVER# 其它(Others)
**MotionBERT: A Unified Perspective on Learning Human Motion Representations**
- Homepage: https://motionbert.github.io/
- Paper: https://arxiv.org/abs/2210.06551
- Code: https://github.com/Walter0807/MotionBERT**Graph Matching with Bi-level Noisy Correspondence**
- Paper: https://arxiv.org/pdf/2212.04085.pdf
- Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON**LDL: Line Distance Functions for Panoramic Localization**
- Paper: https://arxiv.org/abs/2308.13989
- Code: https://github.com/82magnolia/panoramic-localization**Active Neural Mapping**
- Homepage: https://zikeyan.github.io/active-INR/index.html
- Paper: https://arxiv.org/abs/2308.16246
- Code: https://zikeyan.github.io/active-INR/index.html#**Reconstructing Groups of People with Hypergraph Relational Reasoning**
- Paper: https://arxiv.org/abs/2308.15844
- Code: https://github.com/boycehbz/GroupRec