{"id":20972849,"url":"https://github.com/ashishpatel26/cvpr2024","last_synced_at":"2026-02-04T14:31:04.163Z","repository":{"id":246155585,"uuid":"820262639","full_name":"ashishpatel26/CVPR2024","owner":"ashishpatel26","description":"CVPR 2024 Research Paper with Code","archived":false,"fork":false,"pushed_at":"2024-06-28T08:10:00.000Z","size":558,"stargazers_count":48,"open_issues_count":1,"forks_count":9,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-07-04T09:48:38.212Z","etag":null,"topics":["computervision","cvpr","cvpr2024"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashishpatel26.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-26T06:06:29.000Z","updated_at":"2024-11-22T05:44:20.000Z","dependencies_parsed_at":"2025-03-13T08:40:01.957Z","dependency_job_id":null,"html_url":"https://github.com/ashishpatel26/CVPR2024","commit_stats":null,"previous_names":["ashishpatel26/cvpr2024"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ashishpatel26/CVPR2024","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashishpatel26%2FCVPR2024","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashishpatel26%2FCVPR2024/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashishpatel26%2FCVPR2024/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashishpatel26%2FCVPR2024/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashishpatel26","download_url":"https://codeload.github.com/ashishpatel26/CVPR2024/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashishpatel26%2FCVPR2024/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29087345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-04T03:31:03.593Z","status":"ssl_error","status_checked_at":"2026-02-04T03:29:50.742Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computervision","cvpr","cvpr2024"],"created_at":"2024-11-19T04:10:08.251Z","updated_at":"2026-02-04T14:31:04.145Z","avatar_url":"https://github.com/ashishpatel26.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# CVPR 2024 \n\n![](https://camo.githubusercontent.com/e98004b4a9a1fdbad3c3fe1be700c0f0546286942108c54fa7f009eb786df0d0/68747470733a2f2f6869726f6b617473756b6174616f6b6131362e6769746875622e696f2f435650522d323032342d4c494d49542f696d672f435650525f4c6f676f53656174746c655f323032345f5072696d6172792e6a7067)\n\n### Research Paper with Code\n\n![](mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png)\n\n---\n## Table of Contents\n- [3DGS (Gaussian Splatting)](#3dgs-gaussian-splatting)\n- [Avatars](#avatars)\n- [Backbone](#backbone)\n- [CLIP](#clip)\n- [Embodied AI](#embodied-ai)\n- [OCR](#ocr)\n- [NeRF](#nerf)\n- [DETR](#detr)\n- [ReID](#reid)\n- [Long-Tail](#long-tail)\n- [Vision Transformer](#vision-transformer)\n- [Vision-Language](#vision-language)\n- [Self-supervised Learning](#self-supervised-learning)\n- [Data Augmentation](#data-augmentation)\n- [Object Detection](#object-detection)\n- [Anomaly Detection](#anomaly-detection)\n- [Visual Tracking](#visual-tracking)\n- [Semantic Segmentation](#semantic-segmentation)\n- [Instance Segmentation](#instance-segmentation)\n- [Panoptic Segmentation](#panoptic-segmentation)\n- [Medical Image](#medical-image)\n- [Medical Image Segmentation](#medical-image-segmentation)\n- [Video Object Segmentation](#video-object-segmentation)\n- [Video Instance Segmentation](#video-instance-segmentation)\n- [Referring Image Segmentation](#referring-image-segmentation)\n- [Image Matting](#image-matting)\n- [Image Editing](#image-editing)\n- [Low-level Vision](#low-level-vision)\n- [Super-Resolution](#super-resolution)\n- [Denoising](#denoising)\n- [Deblur](#deblur)\n- [Autonomous Driving](#autonomous-driving)\n- [3D Point Cloud](#3d-point-cloud)\n- [3D Object Detection](#3d-object-detection)\n- [3D Semantic Segmentation](#3d-semantic-segmentation)\n- [3D Object Tracking](#3d-object-tracking)\n- [3D Semantic Scene Completion](#3d-semantic-scene-completion)\n- [3D Registration](#3d-registration)\n- [3D Human Pose Estimation](#3d-human-pose-estimation)\n- [3D Human Mesh Estimation](#3d-human-mesh-estimation)\n- [Image Generation](#image-generation)\n- [Video Generation](#video-generation)\n- [Video Understanding](#video-understanding)\n- [Knowledge Distillation](#knowledge-distillation)\n- [Stereo Matching](#stereo-matching)\n- [Scene Graph Generation](#scene-graph-generation)\n- [Video Quality Assessment](#video-quality-assessment)\n- [Datasets](#datasets)\n- [Others](#others)\n\n### Domain-wise Table\n\n#### 3DGS (Gaussian Splatting)\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                        | Official Repo                                                |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |\n| 1     | Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering | [Paper](https://arxiv.org/abs/2312.00109) | [Code](https://github.com/city-super/Scaffold-GS)           | [Homepage](https://city-super.github.io/scaffold-gs/)        |\n| 2     | GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis | [Paper](https://arxiv.org/abs/2312.02155) | [Code](https://github.com/ShunyuanZheng/GPS-Gaussian)       | [Homepage](https://shunyuanzheng.github.io/GPS-Gaussian)     |\n| 3     | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar)       | N/A                                                          |\n| 4     | GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting | [Paper](https://arxiv.org/abs/2311.14521) | [Code](https://github.com/buaacyw/GaussianEditor)           | N/A                                                          |\n| 5     | Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction | [Paper](https://arxiv.org/abs/2309.13101) | [Code](https://github.com/ingra14m/Deformable-3D-Gaussians) | [Homepage](https://ingra14m.github.io/Deformable-Gaussians/) |\n\n#### Avatars\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                  | Official Repo                                 |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------- | --------------------------------------------- |\n| 6     | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar) | N/A                                           |\n| 7     | Real-Time Simulated Avatar from Head-Mounted Sensors         | [Paper](https://arxiv.org/abs/2403.06862) | N/A                                                   | [Homepage](https://www.zhengyiluo.com/SimXR/) |\n\n#### Backbone\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |\n| 8     | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |\n| 9     | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |\n\n#### CLIP\n\n| Index | Paper Title                                               | Paper Link                                | Code                                                         | Official Repo |\n| ----- | --------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 10    | Alpha-CLIP: A CLIP Model Focusing on Wherever You Want    | [Paper](https://arxiv.org/abs/2312.03818) | [Code](https://github.com/SunzeY/AlphaCLIP)                  | N/A           |\n| 11    | FairCLIP: Harnessing Fairness in Vision-Language Learning | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |\n\n#### Embodied AI\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                 | Official Repo                                        |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ---------------------------------------------------- |\n| 12    | EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | [Paper](https://arxiv.org/abs/2312.16170) | [Code](https://github.com/OpenRobotLab/EmbodiedScan) | [Homepage](https://tai-wang.github.io/embodiedscan/) |\n| 13    | MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception | [Paper](https://arxiv.org/abs/2312.07472) | [Code](https://github.com/IranQin/MP5)               | [Homepage](https://iranqin.github.io/MP5.github.io/) |\n\n#### OCR\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 14    | An Empirical Study of Scaling Law for OCR                    | [Paper](https://arxiv.org/abs/2401.00028) | [Code](https://github.com/large-ocr-model/large-ocr-model.github.io) | N/A           |\n| 15    | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | [Paper](https://arxiv.org/abs/2403.00303) | [Code](https://github.com/PriNing/ODM)                       | N/A           |\n\n#### NeRF\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                        | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- |\n| 16    | PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF | [Paper](https://arxiv.org/abs/2311.13099) | [Code](https://github.com/FYTalon/pienerf/) | N/A           |\n\n#### DETR\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                             | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------ | ------------- |\n| 17    | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)      | N/A           |\n| 18    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR) | N/A           |\n\n#### ReID\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                        | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- |\n| 19    | Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | [Paper](https://arxiv.org/abs/2403.10254) | [Code](https://github.com/924973292/EDITOR) | N/A           |\n| 20    | Noisy-Correspondence Learning for Text-to-Image Person Re-identification | [Paper](https://arxiv.org/abs/2308.09911) | [Code](https://github.com/QinYang79/RDE)    | N/A           |\n\n#### Long-Tail\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 1     | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A           |\n\n#### Vision Transformer\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |\n| 2     | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |\n| 3     | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |\n\n#### Vision-Language\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 4     | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD)                | N/A           |\n| 5     | FairCLIP: Harnessing Fairness in Vision-Language Learning    | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |\n\n#### Self-supervised Learning\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 6     | N/A         | N/A        | N/A  | N/A           |\n\n#### Data Augmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 7     | N/A         | N/A        | N/A  | N/A           |\n\n#### Object Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 8     | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)                  | N/A           |\n| 9     | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A           |\n| 10    | YOLO-World: Real-Time Open-Vocabulary Object Detection       | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World)              | N/A           |\n| 11    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR)             | N/A           |\n\n#### Anomaly Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                    | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- |\n| 12    | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A           |\n\n#### Visual Tracking\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 13    | N/A         | N/A        | N/A  | N/A           |\n\n#### Semantic Segmentation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                   | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- |\n| 14    | Stronger, Fewer, \u0026 Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A           |\n| 15    | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED)   | N/A           |\n\n#### Instance Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 16    | N/A         | N/A        | N/A  | N/A           |\n\n#### Panoptic Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 17    | N/A         | N/A        | N/A  | N/A           |\n\n#### Medical Image\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |\n| 18    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |\n| 19    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |\n| 20    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |\n\n#### Medical Image Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 21    | N/A         | N/A        | N/A  | N/A           |\n\n#### Video Object Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 22    | N/A         | N/A        | N/A  | N/A           |\n\n#### Video Instance Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 23    | N/A         | N/A        | N/A  | N/A           |\n\n#### Referring Image Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 24    | N/A         | N/A        | N/A  | N/A           |\n\n#### Image Matting\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 25    | N/A         | N/A        | N/A  | N/A           |\n\n#### Image Editing\n\n| Index | Paper Title                                       | Paper Link                                | Code                                                  | Official Repo                                            |\n| ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |\n| 26    | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) |\n\n#### Low-level Vision\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                     | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- |\n| 27    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |\n| 28    | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A                                      | N/A           |\n\n#### Super-Resolution)\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                 | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ------------- |\n| 29    | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD)              | N/A           |\n| 30    | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiter### Domain-wise Table |               |\n\n#### Denoising\n\n| Index | Paper Title                         | Paper Link                                | Code                                     | Official Repo |\n| ----- | ----------------------------------- | ----------------------------------------- | ---------------------------------------- | ------------- |\n| 31    | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |\n\n#### Deblur\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 32    | N/A         | N/A        | N/A  | N/A           |\n\n#### Autonomous Driving\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                            | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------- | ------------- |\n| 33    | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370) | [Code](https://github.com/Nightmare-n/UniPAD)   | N/A           |\n| 34    | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663) | [Code](https://github.com/haomo-ai/Cam4DOcc)    | N/A           |\n| 35    | Memory-based Adapters for Online 3D Scene Perception         | [Paper](https://arxiv.org/abs/2403.06974) | [Code](https://github.com/xuxw98/Online3D)      | N/A           |\n| 36    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies)    | N/A           |\n| 37    | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145) | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |\n| 38    | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535) | [Code](https://github.com/Junda24/AFNet)        | N/A           |\n\n#### 3D Point Cloud\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 40    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Object Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |\n| 41    | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A           |\n| 42    | UniMODE: Unified Monocular 3D Object Detection               | [Paper](https://arxiv.org/abs/2402.18573) | N/A                                          | N/A           |\n\n#### 3D Semantic Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 43    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Object Tracking\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 44    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Semantic Scene Completion\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |\n| 45    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies) | N/A           |\n\n#### 3D Registration\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 46    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Human Pose Estimation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |\n| 47    | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A           |\n\n#### 3D Human Mesh Estimation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 48    | N/A         | N/A        | N/A  | N/A           |\n\n#### Medical Image\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |\n| 49    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |\n| 50    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |\n| 51    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |\n\n#### Image Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                     | Official Repo                                                |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |\n| 52    | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) |\n| 53    | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/)                  |\n| 54    | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A                                                      | N/A                                                          |\n| 55    | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A                                                      | N/A                                                          |\n| 56    | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc)                  | N/A                                                          |\n| 57    | SVGDreamer: Text Guided SVG Generation with Diffusion Model  | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/)    | N/A                                                          |\n| 58    | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion)    | N/A                                                          |\n| 59    | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni)               | N/A                                                          |\n\n#### Video Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                                |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| 60    | Vlogger: Make Your Dream A Vlog                              | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger)                  | N/A                                                          |\n| 61    | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench)                   | [Homepage](https://vchitect.github.io/VBench-project/)       |\n| 62    | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://github.com/HyeonHo99/Video-Motion-Customization) |\n\n#### Vision Transformer\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |\n| 63    | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |\n| 64    | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |\n| 65    | A General and Efficient Training for Transformer via Token Expansion | [Paper](https://arxiv.org/abs/2404.00672) | [Code](https://github.com/Osilly/TokenExpansion)    | N/A           |\n\n#### Vision-Language\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 66    | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD)                | N/A           |\n| 67    | FairCLIP: Harnessing Fairness in Vision-Language Learning    | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |\n\n#### Object Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 68    | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)                  | N/A           |\n| 69    | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A           |\n| 70    | YOLO-World: Real-Time Open-Vocabulary Object Detection       | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World)              | N/A           |\n| 71    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR)             | N/A           |\n\n#### Anomaly Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                    | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- |\n| 72    | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A           |\n\n#### Object Tracking\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 73    | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A           |\n\n#### Semantic Segmentation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                   | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- |\n| 74    | Stronger, Fewer, \u0026 Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A           |\n| 75    | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED)   | N/A           |\n\n#### Medical Image\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |\n| 76    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |\n| 77    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |\n| 78    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |\n\n#### Medical Image Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 76    | N/A         | N/A        | N/A  | N/A           |\n\n#### Autonomous Driving\n\n| Index | Paper Title                                                  | Paper Link                                    | Code                                            | Official Repo |\n| ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |\n| 77    | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370)     | [Code](https://github.com/Nightmare-n/UniPAD)   | N/A           |\n| 78    | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663)     | [Code](https://github.com/haomo-ai/Cam4DOcc)    | N/A           |\n| 79    | Memory-based Adapters for Online 3D Scene Perception         | [Paper](https://arxiv.org/abs/2403.06974)     | [Code](https://github.com/xuxw98/Online3D)      | N/A           |\n| 80    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670)     | [Code](https://github.com/hustvl/Symphonies)    | N/A           |\n| 81    | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145)     | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |\n| 82    | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535)     | [Code](https://github.com/Junda24/AFNet)        | N/A           |\n| 83    | Traffic Scene Parsing through the TSP6K Dataset              | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K)   | N/A           |\n\n#### 3D Point Cloud\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 84    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Object Detection\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |\n| 85    | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A           |\n| 86    | UniMODE: Unified Monocular 3D Object Detection               | [Paper](https://arxiv.org/abs/2402.18573) | N/A                                          | N/A           |\n\n#### 3D Semantic Segmentation\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 87    | N/A         | N/A        | N/A  | N/A           |\n\n#### Image Editing\n\n| Index | Paper Title                                       | Paper Link                                | Code                                                  | Official Repo                                            |\n| ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |\n| 88    | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) |\n\n#### Video Editing\n\n| Index | Paper Title                                                  | Paper Link                                | Code | Official Repo                         |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---- | ------------------------------------- |\n| 89    | MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers | [Paper](https://arxiv.org/abs/2312.12468) | N/A  | [Homepage](https://maskint.github.io) |\n\n#### Low-level Vision\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                     | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- |\n| 90    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |\n| 91    | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A                                      | N/A           |\n\n#### Super-Resolution\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |\n| 92    | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD)      | N/A           |\n| 93    | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiteretsu77/APISR) | N/A           |\n\n#### Denoising\n\n| Index | Paper Title | Paper Link | Code | Official Repo |\n| ----- | ----------- | ---------- | ---- | ------------- |\n| 94    | N/A         | N/A        | N/A  | N/A           |\n\n#### 3D Human Pose Estimation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |\n| 95    | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A           |\n\n#### Image Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                     | Official Repo                                                |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |\n| 96    | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) |\n| 97    | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/)                  |\n| 98    | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A                                                      | N/A                                                          |\n| 99    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM)                 | N/A                                                          |\n| 100   | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A                                                      | N/A                                                          |\n| 101   | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc)                  | N/A                                                          |\n| 102   | SVGDreamer: Text Guided SVG Generation with Diffusion Model  | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/)    | N/A                                                          |\n| 103   | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion)    | N/A                                                          |\n| 104   | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni)               | N/A                                                          |\n\n#### Video Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                             |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------- |\n| 105   | Vlogger: Make Your Dream A Vlog                              | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger)                  | N/A                                                       |\n| 106   | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench)                   | [Homepage](https://vchitect.github.io/VBench-project/)    |\n| 107   | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://video-motion-customization.github.io/) |\n\n#### 3D Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                      | Official Repo                                           |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------- |\n| 108   | CityDreamer: Compositional Generative Model of Unbounded 3D Cities | [Paper](https://arxiv.org/abs/2309.00610) | [Code](https://github.com/hzxie/city-dreamer)             | [Homepage](https://haozhexie.com/project/city-dreamer/) |\n| 109   | LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching | [Paper](https://arxiv.org/abs/2311.11284) | [Code](https://github.com/EnVision-Research/LucidDreamer) | N/A                                                     |\n\n#### Video Understanding\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 110   | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | [Paper](https://arxiv.org/abs/2311.17005) | [Code](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2) | N/A           |\n\n#### Knowledge Distillation\n\n| Index | Paper Title                                          | Paper Link                                | Code                                                         | Official Repo |\n| ----- | ---------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- |\n| 111   | Logit Standardization in Knowledge Distillation      | [Paper](https://arxiv.org/abs/2403.01427) | [Code](https://github.com/sunshangquan/logit-standardization-KD) | N/A           |\n| 112   | Efficient Dataset Distillation via Minimax Diffusion | [Paper](https://arxiv.org/abs/2311.15529) | [Code](https://github.com/vimar-gu/MinimaxDiffusion)         | N/A           |\n\n#### Stereo Matching\n\n| Index | Paper Title                                    | Paper Link                                | Code                                       | Official Repo |\n| ----- | ---------------------------------------------- | ----------------------------------------- | ------------------------------------------ | ------------- |\n| 113   | Neural Markov Random Field for Stereo Matching | [Paper](https://arxiv.org/abs/2403.11193) | [Code](https://github.com/aeolusguan/NMRF) | N/A           |\n\n#### Scene Graph Generation\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                           | Official Repo                                      |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------- | -------------------------------------------------- |\n| 114   | HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation | [Paper](https://arxiv.org/abs/2403.12033) | [Code](https://github.com/zhangce01/HiKER-SGG) | [Homepage](https://zhangce01.github.io/HiKER-SGG/) |\n\n#### Video Quality Assessment\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                         |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------- |\n| 115   | KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos | [Paper](https://arxiv.org/abs/2402.07220) | [Code](https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024) | [Homepage](https://lixinustc.github.io/projects/KVQ/) |\n\n#### Datasets\n\n| Index | Paper Title                                                  | Paper Link                                    | Code                                            | Official Repo |\n| ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |\n| 116   | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145)     | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |\n| 117   | Traffic Scene Parsing through the TSP6K Dataset              | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K)   | N/A           |\n\n#### Others\n\n| Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                                |\n| ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| 118   | Object Recognition as Next Token Prediction                  | [Paper](https://arxiv.org/abs/2312.02142) | [Code](https://github.com/kaiyuyue/nxtp)                     | N/A                                                          |\n| 119   | ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks | [Paper](https://arxiv.org/abs/2306.14525) | [Code](https://parameternet.github.io/)                      | N/A                                                          |\n| 120   | Seamless Human Motion Composition with Blended Positional Encodings | [Paper](https://arxiv.org/abs/2402.15509) | [Code](https://github.com/BarqueroGerman/FlowMDM)            | N/A                                                          |\n| 121   | LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning | [Paper](https://arxiv.org/abs/2311.18651) | [Code](https://github.com/Open3DA/LL3DA)                     | [Homepage](https://ll3da.github.io/)                         |\n| 122   | CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update | [Paper](https://arxiv.org/abs/2312.10908) | N/A                                                          | [Homepage](https://clova-tool.github.io/)                    |\n| 123   | MoMask: Generative Masked Modeling of 3D Human Motions       | [Paper](https://arxiv.org/abs/2312.00063) | [Code](https://github.com/EricGuo5513/momask-codes)          | N/A                                                          |\n| 124   | Amodal Ground Truth and Completion in the Wild               | [Paper](https://arxiv.org/abs/2312.17247) | [Code](https://github.com/Championchess/Amodal-Completion-in-the-Wild) | [Homepage](https://www.robots.ox.ac.uk/~vgg/research/amodal/) |\n| 125   | Improved Visual Grounding through Self-Consistent Explanations | [Paper](https://arxiv.org/abs/2312.04554) | [Code](https://github.com/uvavision/SelfEQ)                  | N/A                                                          |\n| 126   | ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | [Paper](https://arxiv.org/abs/2403.18775) | [Code](https://github.com/chenshuang-zhang/imagenet_d)       | [Homepage](https://chenshuang-zhang.github.io/imagenet_d/)   |\n| 127   | Learning from Synthetic Human Group Activities               | [Paper](https://arxiv.org/abs/2306.16772) | [Code](https://github.com/cjerry1243/M3Act)                  | [Homepage](https://cjerry1243.github.io/M3Act/)              |\n| 128   | A Cross-Subject Brain Decoding Framework                     | [Paper](https://arxiv.org/abs/2404.07850) | [Code](https://github.com/littlepure2333/MindBridge)         | [Homepage](https://littlepure2333.github.io/MindBridge/)     |\n| 129   | Multi-Task Dense Prediction via Mixture of Low-Rank Experts  | [Paper](https://arxiv.org/abs/2403.17749) | [Code](https://github.com/YuqiYang213/MLoRE)                 | N/A                                                          |\n| 130   | Contrastive Mean-Shift Learning for Generalized Category Discovery | [Paper](https://arxiv.org/abs/2404.09451) | [Code](https://github.com/sua-choi/CMS)                      | [Homepage](https://postech-cvlab.github.io/cms/)             |\n\n#### Thank you for Reading","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashishpatel26%2Fcvpr2024","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashishpatel26%2Fcvpr2024","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashishpatel26%2Fcvpr2024/lists"}