{"id":13935272,"url":"https://github.com/52CV/CVPR-2021-Papers","last_synced_at":"2025-07-19T20:31:39.109Z","repository":{"id":38267573,"uuid":"343980863","full_name":"52CV/CVPR-2021-Papers","owner":"52CV","description":null,"archived":false,"fork":false,"pushed_at":"2022-04-11T05:50:36.000Z","size":2006,"stargazers_count":2546,"open_issues_count":0,"forks_count":315,"subscribers_count":65,"default_branch":"main","last_synced_at":"2025-03-26T22:22:07.781Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/52CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-03T02:46:53.000Z","updated_at":"2025-03-26T01:31:01.000Z","dependencies_parsed_at":"2022-08-03T07:30:16.629Z","dependency_job_id":null,"html_url":"https://github.com/52CV/CVPR-2021-Papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/52CV/CVPR-2021-Papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2021-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2021-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2021-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2021-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/52CV","download_url":"https://codeload.github.com/52CV/CVPR-2021-Papers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/52CV%2FCVPR-2021-Papers/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266007397,"owners_count":23863529,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-07T23:01:32.232Z","updated_at":"2025-07-19T20:31:39.083Z","avatar_url":"https://github.com/52CV.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# CVPR2021最新信息及已接收论文/代码(持续更新)\n\n\n官网链接：http://cvpr2021.thecvf.com\u003cbr\u003e\n开会时间：2021年6月19日-6月25日\u003cbr\u003e\n论文接收公布时间：2021年2月28日\u003cbr\u003e\n\n接收论文IDs：\u003cbr\u003e\n\n* [CVPR 2021 接收论文列表！27%接受率！](https://zhuanlan.zhihu.com/p/353686917)\n\n# :exclamation::exclamation::exclamation:🌟🌟🌟 CVPR 2021 收录论文已全部公布，下载可在【我爱计算机视觉】后台回复“CVPR2021”，即可收到。共计 1660 篇。\n\n# :exclamation::exclamation::exclamation:🌟🌟🌟 全部论文已粗略分类完毕，请查阅。\n\n### :exclamation::exclamation::exclamation:注：后续论文的细致分类汇总将发布在公众号【OpenCV中文网】，敬请关注。\n\n## 历年综述论文分类汇总戳这里↘️[CV-Surveys](https://github.com/52CV/CV-Surveys)施工中~~~~~~~~~~\n\n## 2022 年论文分类汇总戳这里\n↘️[CVPR-2022-Papers](https://github.com/52CV/CVPR-2022-Papers)\n↘️[WACV-2022-Papers](https://github.com/52CV/WACV-2022-Papers)\n\n## 2021年论文分类汇总戳这里\n↘️[ICCV-2021-Papers](https://github.com/52CV/ICCV-2021-Papers)\n↘️[CVPR-2021-Papers](https://github.com/52CV/CVPR-2021-Papers)\n\n## 2020 年论文分类汇总戳这里\n↘️[CVPR-2020-Papers](https://github.com/52CV/CVPR-2020-Papers)\n↘️[ECCV-2020-Papers](https://github.com/52CV/ECCV-2020-Papers)\n\n# 目录\n\n|:dog:|:mouse:|:hamster:|:tiger:|\n|------|------|------|------|\n|[73.Object Re-identification(物体重识别)](#73)|[72.Gaze Estimation(视线估计)](#72)|[71.Image-to-Image Translation(图像到图像翻译)](#71)|[70.NLP(自然语言处理)](#70)|[69.Transfer learning(迁移学习)](#69)|\n|[68.Crowd Counting(计数)](#68)|[67.Defect Detection(缺陷检测)](#67)|[66.Optical Flow Estimation(光流估计)](#66)|[65.Style Transfer(风格迁移)](#65)\n|[64.Speech processing(语音处理)](#64)|[63.Image Processing(图像处理)](#63)|[62.Free-Hand Sketches(手绘草图识别)](#62)|[61.算法](#61)|\n|[60. SLAM/AR/机器人](#60)|[59.深度学习模型](#59)|[58.Metric Learning(度量学习/相似度学习)](#58)|[57.Sign Language Recognition(手语识别)](#57)|\n|[56.Computational Photography(光学、几何、光场成像、计算摄影)](#56)|[55.Graph Matching(图匹配)](#55)|[54.Emotion Perception(情绪感知/情感预测)](#54)|[53.Dataset(数据集)](#53)|\n|[52. Image Generation/Synthesis(图像生成)](#52)|[51.Contrastive Learning(对比学习)](#51)|[50.OCR](#50)|[49.Adversarial Learning(对抗学习)](#49)|\n|[48.Image Representation(图像表示)](#48)|[47.Vision-Language(视觉语言)](#47)|[46.Human-Object Interaction(人物交互)](#46)|[45.Camera Localization(相机定位)](#45)|\n|[44. Image/video Captioning(图像/视频字幕)](#44)|[43.Active Learning(主动学习)](#43)|[42.Scene Flow Estimation(场景流估计)](#42)|[41. Representation Learning(表示学习（图像+字幕）)](#41)|\n|[40.Superpixel (超像素)](#40)|[39.Debiasing(去偏见)](#39)|[38.Class-Incremental learning(类增量学习)](#38)|[37.Continual Learning(持续学习)](#37)|\n|[36.Action Detection and Recognition(动作检测与识别)](#36)|[35.Image Clustering(图像聚类) ](#35)|[34.Image/Fine-Grained Classification(图像分类/细粒度分类)](#34)|[33.6D Pose Estimation(6D位姿估计)](#33)|\n|[32.View Synthesis(视图合成)](#32)|[31.Open-Set Recognition(开放集识别)](#31)|[30.Neural rendering(神经渲染)](#30)|[29.Human Pose Estimation(人体姿态估计)](#29)|\n|[28.Dense prediction(密集预测)](#28)|[27.Semantic Line Detection(语义线检测)](#27)|[26.Video Processing(视频相关技术)](#26)|[25.3D(三维视觉)](#25)|\n|[24.Reinforcement Learning(强化学习)](#24)|[23.Autonomous Driving(自动驾驶)](#23)|[22.Medical Imaging(医学影像)](#22)|[21.Transformer/Self-attention](#21)|\n|[20.Person Re-Identification(人员重识别)](#20)|[19.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)](#19)|[18.Aeria/Drones/Satellite/RS Image(航空影像/无人机)](#18)|[17.Super-Resolution(超分辨率)](#17)|\n|[16.Visual Question Answering(视觉问答)](#16)|[15.GAN](#15)|[14.Few-Shot/Zero-Shot Learning,Domain Generalization/Adaptation(小/零样本学习，域适应，域泛化)](#14)|[13.Image/Video Retrieval(图像/视频检索)](#13)|\n|[12.Image Quality Assessment(图像质量评估)](#12)|[11. Face(人脸技术)](#11)|[10.Neural Architecture Search(神经架构搜索)](#10)|[9.Object Tracking(目标跟踪)](#9)\n|[8.Image Segmentation(图像分割)](#8)|[7.Object Detection(目标检测)](#7)|[6.Data Augmentation(数据增广)](#6)|[5.Anomaly Detection(异常检测)](#5)|\n|[4.Weakly Supervised/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)](#4)|[3.Point Cloud(点云)](#3)|[2.Graph Neural Networks(图卷积网络GNN)](#2)|[1.Unkown(未分类)](#1)|\n\n\n\u003ca name=\"74\"/\u003e\n\n## 74.Place Recognition(位置识别)\n  * [SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud Based Place Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Xia_SOE-Net_A_Self-Attention_and_Orientation_Encoding_Network_for_Point_Cloud_CVPR_2021_paper.pdf)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/Yan-Xia/SOE-Net)\n\n\u003ca name=\"73\"/\u003e\n\n## 73.Object Re-identification(物体重识别)\n  * [Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification](https://arxiv.org/abs/2106.06133)\n\n\u003ca name=\"72\"/\u003e\n\n## 72.Gaze Estimation(视线估计)\n* [Weakly-Supervised Physically Unconstrained Gaze Estimation](https://arxiv.org/abs/2105.09803)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/NVlabs/weakly-supervised-gaze)\n* Gaze 目标检测\n  * [Dual Attention Guided Gaze Target Detection in the Wild](https://openaccess.thecvf.com/content/CVPR2021/papers/Fang_Dual_Attention_Guided_Gaze_Target_Detection_in_the_Wild_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/Crystal2333/DAM)\n \n\u003ca name=\"71\"/\u003e\n\n## 71.Image-to-Image Translation(图像到图像翻译)\n* [High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network](https://arxiv.org/abs/2105.09188)\u003cbr\u003e:star:[code](https://github.com/csjliang/LPTN)\n* [CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation](https://arxiv.org/abs/2012.02047)\u003cbr\u003e:open_mouth:oral:house:[project](https://tmux.top/publication/geosim/)\u003cbr\u003e解读：[CoCosNet v2解锁“高配版”图像翻译](https://mp.weixin.qq.com/s/UIxdBXGN7sO8m01Q83PLew)\n* [Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation](https://arxiv.org/abs/2106.09016)\n* [Saliency-Guided Image Translation](https://openaccess.thecvf.com/content/CVPR2021/papers/Jiang_Saliency-Guided_Image_Translation_CVPR_2021_paper.pdf)\n* [Not Just Compete, but Collaborate: Local Image-to-Image Translation via Cooperative Mask Prediction](https://openaccess.thecvf.com/content/CVPR2021/papers/Kim_Not_Just_Compete_but_Collaborate_Local_Image-to-Image_Translation_via_Cooperative_CVPR_2021_paper.pdf) \n* [Unpaired Image-to-Image Translation via Latent Energy Transport](https://arxiv.org/abs/2012.00649)\u003cbr\u003e:star:[code](https://github.com/YangNaruto/latent-energy-transport)\n* 图像翻译\n  * [Unbalanced Feature Transport for Exemplar-Based Image Translation](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhan_Unbalanced_Feature_Transport_for_Exemplar-Based_Image_Translation_CVPR_2021_paper.pdf)\n  * [The Spatially-Correlative Loss for Various Image Translation Tasks](https://openaccess.thecvf.com/content/CVPR2021/papers/Zheng_The_Spatially-Correlative_Loss_for_Various_Image_Translation_Tasks_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/lyndonzheng/F-LSeSim):house:[project](http://www.chuanxiaz.com/publication/flsesim/):tv:[video](https://www.youtube.com/watch?v=pu6PT1om2r0)\n\n\n\u003ca name=\"70\"/\u003e\n\n## 70.NLP(自然语言处理)\n\n  * [Learning Graphs for Knowledge Transfer With Limited Labels](https://openaccess.thecvf.com/content/CVPR2021/papers/Ghosh_Learning_Graphs_for_Knowledge_Transfer_With_Limited_Labels_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/pallabig/LearningGraphsForGCN):house:[project](https://pallabig.github.io/LearningGraphsForGCN/)\n\n\n\u003ca name=\"69\"/\u003e\n\n## 69.Transfer learning(迁移学习)\n* 域迁移\n  * [Visualizing Adapted Knowledge in Domain Transfer](https://arxiv.org/abs/2104.10602)\u003cbr\u003e:star:[code](https://github.com/hou-yz/DA_visualization) \n\n\n\u003ca name=\"68\"/\u003e\n\n## 68.Crowd Counting(计数)\n  * [Learning To Count Everything](https://arxiv.org/abs/2104.08391)\u003cbr\u003e:star:[code](https://github.com/cvlab-stonybrook)\n\n\u003ca name=\"67\"/\u003e\n\n## 67.Defect Detection(缺陷检测)\n  * [CutPaste: Self-Supervised Learning for Anomaly Detection and Localization](https://arxiv.org/abs/2104.04015)\n\n\u003ca name=\"66\"/\u003e\n\n## 66.Optical Flow Estimation(光流估计)\n* [UPFlow:Upsampling Pyramid for Unsupervised Optical Flow Learning](https://arxiv.org/abs/2012.00212)\u003cbr\u003e粗解：[8](https://mp.weixin.qq.com/s/lL1cz_L523TSdYJFfHA2lQ)\n* [Learning Optical Flow from a Few Matches](https://arxiv.org/abs/2104.02166)\u003cbr\u003e:star:[code](https://github.com/zacjiang/scv)\n* [Learning optical flow from still images](https://arxiv.org/abs/2104.03965)\u003cbr\u003e:star:[code](https://github.com/mattpoggi/depthstillation):house:[project](https://mattpoggi.github.io/projects/cvpr2021aleotti/)\n* [AutoFlow: Learning a Better Training Set for Optical Flow](https://arxiv.org/abs/2104.14544)\u003cbr\u003e:house:[project](https://autoflow-google.github.io/)\u003cbr\u003eAutoFlow ：CVPR 2021 Oral ,作者发明了一种专为光流算法训练而设计的数据渲染方法，所训练得到的PWC-Net 与 RAFT光流算法达到了SOTA,代码和数据将开源。\n* [UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning](https://arxiv.org/abs/2012.00212)\u003cbr\u003e:star:[code](https://github.com/coolbeam/UPFlow_pytorch)\n\n\u003ca name=\"65\"/\u003e\n\n## 65.Style Transfer(风格迁移)\n* [Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes](https://arxiv.org/abs/2103.17185)\u003cbr\u003e:star:[code](https://github.com/CompVis/brushstroke-parameterized-style-transfer)\n* [ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows](https://arxiv.org/abs/2103.16877)\u003cbr\u003e:star:[code](https://github.com/pkuanjie/ArtFlow) \n* [Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer](https://arxiv.org/abs/2104.01867)\u003cbr\u003e:star:[code](https://github.com/VinAIResearch/CPM)\n* [Rethinking and Improving the Robustness of Image Style Transfer](https://arxiv.org/abs/2104.05623)\u003cbr\u003e:open_mouth:oral\u003cbr\u003e解读：[CVPR2021 最佳论文候选—提高图像风格迁移的鲁棒性](https://mp.weixin.qq.com/s/OMu941IynGtY9GU8dh4Icg)\n* [Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer](https://arxiv.org/abs/2104.05376)\u003cbr\u003e:star:[code](https://github.com/PaddlePaddle/PaddleGAN/)\n* [Style-Aware Normalized Loss for Improving Arbitrary Style Transfer](https://arxiv.org/abs/2104.10064)\u003cbr\u003e:open_mouth:oral\n* [In the Light of Feature Distributions: Moment Matching for Neural Style Transfer](https://arxiv.org/abs/2103.07208)\u003cbr\u003e:star:[code](https://github.com/D1noFuzi/cmd_styletransfer):house:[project](https://cmdnst.github.io/)\n* [ArtCoder: An End-to-End Method for Generating Scanning-Robust Stylized QR Codes](https://openaccess.thecvf.com/content/CVPR2021/papers/Su_ArtCoder_An_End-to-End_Method_for_Generating_Scanning-Robust_Stylized_QR_Codes_CVPR_2021_paper.pdf)\n* [Adaptive Convolutions for Structure-Aware Style Transfer](https://openaccess.thecvf.com/content/CVPR2021/papers/Chandran_Adaptive_Convolutions_for_Structure-Aware_Style_Transfer_CVPR_2021_paper.pdf) \n* [Learning To Warp for Style Transfer](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Learning_To_Warp_for_Style_Transfer_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/xch-liu/learning-warp-st)\n* [Single-Shot Freestyle Dance Reenactment](https://arxiv.org/abs/2012.01158)\n* [CT-Net: Complementary Transfering Network for Garment Transfer With Arbitrary Geometric Changes](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_CT-Net_Complementary_Transfering_Network_for_Garment_Transfer_With_Arbitrary_Geometric_CVPR_2021_paper.pdf)\n* [DualAST: Dual Style-Learning Networks for Artistic Style Transfer](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_DualAST_Dual_Style-Learning_Networks_for_Artistic_Style_Transfer_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/HalbertCH/DualAST)\n* [What Can Style Transfer and Paintings Do For Model Robustness?](https://arxiv.org/abs/2011.14477)\u003cbr\u003e:star:[code](https://github.com/hubertsgithub/style_painting_robustness)\n* 运动迁移\n  * [Autoregressive Stylized Motion Synthesis with Generative Flow](https://openaccess.thecvf.com/content/CVPR2021/papers/Wen_Autoregressive_Stylized_Motion_Synthesis_With_Generative_Flow_CVPR_2021_paper.pdf)\n\n  \n\u003ca name=\"64\"/\u003e\n\n## 64.Speech processing(语音处理)\n  \n* [Can audio-visual integration strengthen robustness under multimodal attacks?](https://arxiv.org/abs/2104.02000)\u003cbr\u003e:star:[code](https://github.com/YapengTian/AV-Robustness-CVPR21)\n* [Robust Audio-Visual Instance Discrimination](https://arxiv.org/abs/2103.15916)\n* 立体音频生成\n  * [Visually Informed Binaural Audio Generation without Binaural Audios](https://arxiv.org/abs/2104.06162)\u003cbr\u003e:star:[code](https://github.com/SheldonTsui/PseudoBinaural_CVPR2021):house:[project](https://sheldontsui.github.io/projects/PseudoBinaural):tv:[video](https://youtu.be/r-uC2MyAWQc)\n* 视听分离\n  * [Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation](https://arxiv.org/abs/2104.02775)\u003cbr\u003e:house:[project](https://caffnet.github.io/):tv:[video](https://youtu.be/9R2qQ7dGTp8)\n  * [Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation](https://arxiv.org/abs/2104.02026)\u003cbr\u003e:star:[code](https://github.com/YapengTian/CCOL-CVPR21)\n  * [VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency](https://arxiv.org/abs/2101.03149)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/VisualVoice):house:[project](http://vision.cs.utexas.edu/projects/VisualVoice/)\n* 声音-视频解析\n  * [Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing](https://openaccess.thecvf.com/content/CVPR2021/papers/Wu_Exploring_Heterogeneous_Clues_for_Weakly-Supervised_Audio-Visual_Video_Parsing_CVPR_2021_paper.pdf) \n* A-V\n  * [Positive Sample Propagation Along the Audio-Visual Event Line](https://arxiv.org/abs/2104.00239)\u003cbr\u003e:star:[code](https://github.com/jasongief/PSP_CVPR_2021)\n* 语音人脸关联\n  * [Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association](https://arxiv.org/abs/2103.07293)\n\n\n\n\u003ca name=\"63\"/\u003e\n\n## 63.Image Processing(图像处理)\n* 图像信号处理\n  * [Invertible Image Signal Processing](https://arxiv.org/abs/2103.15061)\u003cbr\u003e:star:[code](https://github.com/yzxing87/Invertible-ISP):house:[project](https://yzxing87.github.io/InvISP/index.html)\n* 光谱重建\n  * [Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB](https://arxiv.org/abs/2103.14708)\u003cbr\u003e:open_mouth:oral\n\n\u003ca name=\"62\"/\u003e\n\n## 62.Free-Hand Sketches(手绘草图识别)\n  * [Cloud2Curve: Generation and Vectorization of Parametric Sketches](https://arxiv.org/abs/2103.15536)\n\n\u003ca name=\"61\"/\u003e\n\n## 61.算法\n* 因果推理算法\n  * [ACRE: Abstract Causal REasoning Beyond Covariation](https://arxiv.org/abs/2103.14232)\u003cbr\u003e:star:[code](https://github.com/WellyZhang/ACRE):house:[project](http://wellyzhang.github.io/project/acre.html)\n* 抽象时空推理算法\n  * [Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution](https://arxiv.org/abs/2103.14230)\u003cbr\u003e:star:[code](https://github.com/WellyZhang/PrAE):house:[project](http://wellyzhang.github.io/project/prae.html)\n\n\n\u003ca name=\"60\"/\u003e\n\n## 60. SLAM/AR/机器人\n* [Tangent Space Backpropagation for 3D Transformation Groups](https://arxiv.org/abs/2103.12032)\u003cbr\u003e:star:[code](https://github.com/princeton-vl/lietorch)\n* 视觉里程计\n  * [Generalizing to the Open World: Deep Visual Odometry with Online Adaptation](https://arxiv.org/abs/2103.15279)\n* 机器人\n  * [Visual Room Rearrangement](https://arxiv.org/abs/2103.16544)\u003cbr\u003e:open_mouth:oral:house:[project](https://ai2thor.allenai.org/rearrangement/):tv:[video](https://www.youtube.com/watch?v=1APxaOC9U-A)\n  * [GATSBI: Generative Agent-centric Spatio-temporal Object Interaction](https://arxiv.org/abs/2104.04275)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/mch5048/gatsbi):tv:[video](https://www.youtube.com/watch?v=3urXFiU9kao)\n  * [DexYCB: A Benchmark for Capturing Hand Grasping of Objects](https://arxiv.org/abs/2104.04631)\u003cbr\u003e:star:[code](https://github.com/NVlabs/dex-ycb-toolkit):house:[project](https://dex-ycb.github.io/):tv:[video](https://youtu.be/Q4wyBaZeBw0)\n  * [ContactOpt: Optimizing Contact to Improve Grasps](https://arxiv.org/abs/2104.07267)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/contactopt)\u003cbr\u003e机器人手抓取\n  * [ManipulaTHOR: A Framework for Visual Object Manipulation](https://arxiv.org/abs/2104.11213)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/allenai/manipulathor/):house:[project](https://ai2thor.allenai.org/manipulathor/):tv:[video](https://www.youtube.com/watch?v=nINZ52nlzX0)\n  * 视觉导航\n    * [Pushing it out of the Way: Interactive Visual Navigation](https://arxiv.org/abs/2104.14040)\u003cbr\u003e:house:[project](https://prior.allenai.org/projects/interactive-visual-navigation):tv:[video](https://www.youtube.com/watch?v=GvTI5XCMvPw)\n    * [Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation](https://arxiv.org/abs/2105.07593)\u003cbr\u003e:house:[project](https://sites.google.com/view/slamnet):tv:[video](https://youtu.be/dk1fdtf3fNI)\n* AR\n  * [Stay Positive: Non-Negative Image Synthesis for Augmented Reality](https://openaccess.thecvf.com/content/CVPR2021/papers/Luo_Stay_Positive_Non-Negative_Image_Synthesis_for_Augmented_Reality_CVPR_2021_paper.pdf)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/katieluo88/StayPositive)\n  * [HDR Environment Map Estimation for Real-Time Augmented Reality](https://arxiv.org/abs/2011.10687):tv:[video](https://docs-assets.developer.apple.com/ml-research/papers/hdr-environment-map.mp4)\n  * [NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering Using RGB Cameras](https://arxiv.org/abs/2103.07700)\n  * 虚拟试穿\n    * [VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization](https://arxiv.org/abs/2103.16874)\n    * [Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On](https://arxiv.org/abs/2105.06462)\u003cbr\u003e:house:[project](http://mslab.es/projects/SelfSupervisedGarmentCollisions/):tv:[video](https://youtu.be/9AnBNco6i2U)\n    * [Toward Accurate and Realistic Outfits Visualization with Attention to Details](https://arxiv.org/abs/2106.06593)\n    * [ANR: Articulated Neural Rendering for Virtual Avatars](https://arxiv.org/abs/2012.12890)\u003cbr\u003e:house:[project](https://anr-avatars.github.io/)\n    * [Parser-Free Virtual Try-On via Distilling Appearance Flows](https://openaccess.thecvf.com/content/CVPR2021/papers/Ge_Parser-Free_Virtual_Try-On_via_Distilling_Appearance_Flows_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/geyuying/PF-AFN)\n\n\u003ca name=\"59\"/\u003e\n\n## 59.Capsule Network(胶囊网络)(深度学习模型)\n* [Dynamic Slimmable Network](https://arxiv.org/abs/2103.13258)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/changlin31/DS-Net)\n* [Towards Evaluating and Training Verifiably Robust Neural Networks](https://arxiv.org/abs/2104.00447)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/ZhaoyangLyu/VerifiablyRobustNN) \n* [Activate or Not: Learning Customized Activation](https://arxiv.org/abs/2009.04759)\u003cbr\u003e:star:[code](https://github.com/nmaac/acon)\u003cbr\u003e粗解：[4](https://mp.weixin.qq.com/s/lL1cz_L523TSdYJFfHA2lQ)\u003cbr\u003e解读：[CVPR 2021 | 自适应激活函数ACON: 统一ReLU和Swish的新范式](https://mp.weixin.qq.com/s/pbeA2w54MZ_-wXsmGoo3hg)\n* [DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for Deep Neural Networks](https://arxiv.org/abs/2012.11025)\u003cbr\u003e:star:[code](https://github.com/splitlearning/InferenceBenchmark)\n* Capsule Network(胶囊网络)\n  * [Capsule Network Is Not More Robust Than Convolutional Network](https://openaccess.thecvf.com/content/CVPR2021/papers/Gu_Capsule_Network_Is_Not_More_Robust_Than_Convolutional_Network_CVPR_2021_paper.pdf)\n\n\u003ca name=\"58\"/\u003e\n\n## 58.Metric Learning(度量学习/相似度学习)\n* [Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales](https://arxiv.org/abs/2103.11781)\u003cbr\u003e:star:[code](https://github.com/SupetZYK/DynamicMetricLearning)\n* [Embedding Transfer with Label Relaxation for Improved Metric Learning](https://arxiv.org/abs/2103.14908)\n* [Noise-resistant Deep Metric Learning with Ranking-based Instance Selection](https://arxiv.org/abs/2103.16047)\u003cbr\u003e:star:[code](https://github.com/alibaba-edu/Ranking-based-Instance-Selection)\n* [Unsupervised Hyperbolic Metric Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Yan_Unsupervised_Hyperbolic_Metric_Learning_CVPR_2021_paper.pdf)\n* [Deep Compositional Metric Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Zheng_Deep_Compositional_Metric_Learning_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/wzzheng/DCML)\n* [SLADE: A Self-Training Framework for Distance Metric Learning](https://arxiv.org/abs/2011.10269)\n* [Asymmetric Metric Learning for Knowledge Transfer](https://arxiv.org/abs/2006.16331)\u003cbr\u003e:star:[code](https://github.com/budnikm/aml)\n* [Relative Order Analysis and Optimization for Unsupervised Deep Metric Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Kan_Relative_Order_Analysis_and_Optimization_for_Unsupervised_Deep_Metric_Learning_CVPR_2021_paper.pdf)\n\n\n\u003ca name=\"57\"/\u003e\n\n## 57.Sign Language Recognition(手语识别)\n  * [Skeleton Based Sign Language Recognition Using Whole-body Keypoints](https://arxiv.org/abs/2103.08833)\u003cbr\u003e:star:[code](https://github.com/jackyjsy/CVPR21Chal-SLR)\n  * [Read and Attend: Temporal Localisation in Sign Language Videos](https://arxiv.org/abs/2103.16481)\u003cbr\u003e:house:[project](https://www.robots.ox.ac.uk/~vgg/research/bslattend/)\n  * [Fingerspelling Detection in American Sign Language](https://arxiv.org/abs/2104.01291)\n* 手语翻译\n  * [Improving Sign Language Translation with Monolingual Data by Sign Back-Translation](https://arxiv.org/abs/2105.12397)\u003cbr\u003e:sunflower:[dataset](http://home.ustc.edu.cn/~zhouh156/dataset/csl-daily/)\n\n\n\n\u003ca name=\"56\"/\u003e\n\n## 56.Computational Photography(光学、几何、光场成像、计算摄影)\n  * [Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging](https://arxiv.org/abs/2103.07152)\u003cbr\u003e:star:[code](https://github.com/TaoHuang95/DGSMP):house:[project](https://see.xidian.edu.cn/faculty/wsdong/Projects/DGSM-SCI.htm)\n  * [Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging](https://arxiv.org/abs/2103.16693)\u003cbr\u003e:house:[project](https://light.princeton.edu/publication/mask-tof/)\n  * [Passive Inter-Photon Imaging](https://arxiv.org/abs/2104.00059)\u003cbr\u003e:open_mouth:oral\n  * [Shape and Material Capture at Home](https://arxiv.org/abs/2104.06397)\u003cbr\u003e:star:[code](https://github.com/dlichy/ShapeAndMaterial):house:[project](https://dlichy.github.io/ShapeAndMaterialAtHome/)\n  * [Event-based Synthetic Aperture Imaging with a Hybrid Network](https://arxiv.org/abs/2103.02376)\u003cbr\u003e分享会\n  * [High-Speed Image Reconstruction Through Short-Term Plasticity for Spiking Cameras](https://openaccess.thecvf.com/content/CVPR2021/papers/Zheng_High-Speed_Image_Reconstruction_Through_Short-Term_Plasticity_for_Spiking_Cameras_CVPR_2021_paper.pdf)\n  * [Leveraging the Availability of Two Cameras for Illuminant Estimation](https://openaccess.thecvf.com/content/CVPR2021/papers/Abdelhamed_Leveraging_the_Availability_of_Two_Cameras_for_Illuminant_Estimation_CVPR_2021_paper.pdf)\n* 相机姿势\n  * [Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty](https://arxiv.org/abs/2104.08278)\u003cbr\u003e:open_mouth:oral\n  * [Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis](https://arxiv.org/abs/2104.01508)\u003cbr\u003e:star:[code](https://github.com/AlvinZhuyx/camera_pose_representation)\n  * [Uncertainty-Aware Camera Pose Estimation From Points and Lines](https://openaccess.thecvf.com/content/CVPR2021/papers/Vakhitov_Uncertainty-Aware_Camera_Pose_Estimation_From_Points_and_Lines_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/alexandervakhitov/uncertain-pnp):house:[project](https://alexandervakhitov.github.io/uncertain-pnp/) \n  * [Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation](https://arxiv.org/abs/2103.07153)\u003cbr\u003e:star:[code](https://github.com/germain-hug/NRE):house:[project](https://www.hugogermain.com/nre)\n  * [Wide-Baseline Relative Camera Pose Estimation with Directional Learning](https://arxiv.org/abs/2106.03336)\n  * [Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias](https://arxiv.org/abs/2007.03887)\u003cbr\u003e:open_mouth:oral\n* 室内照明估计\n  * [Indoor Lighting Estimation Using an Event Camera](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Indoor_Lighting_Estimation_Using_an_Event_Camera_CVPR_2021_paper.pdf)\n* Phase Retrieval相位恢复算法\n  * [Physics-Based Iterative Projection Complex Neural Network for Phase Retrieval in Lensless Microscopy Imaging](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Physics-Based_Iterative_Projection_Complex_Neural_Network_for_Phase_Retrieval_in_CVPR_2021_paper.pdf)\n\n  \n\u003ca name=\"55\"/\u003e\n\n## 55.Graph Matching(图匹配)\n  * [Deep Graph Matching under Quadratic Constraint](https://arxiv.org/abs/2103.06643)\u003cbr\u003e:star:[code](https://github.com/Zerg-Overmind/QC-DGM)\n\n\u003ca name=\"54\"/\u003e\n\n## 54.Emotion Perception(情绪感知/情感预测)\n* [Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality](https://arxiv.org/abs/2103.06541)\u003cbr\u003e:house:[project](https://gamma.umd.edu/researchdirections/affectivecomputing/affect2mm/)\n* Human Multimodal Emotion Recognition(人类多模态情感识别)\n  * [Progressive Modality Reinforcement for Human Multimodal Emotion Recognition From Unaligned Multimodal Sequences](https://openaccess.thecvf.com/content/CVPR2021/papers/Lv_Progressive_Modality_Reinforcement_for_Human_Multimodal_Emotion_Recognition_From_Unaligned_CVPR_2021_paper.pdf)\n\n\u003ca name=\"53\"/\u003e\n\n## 53.Dataset(数据集)\n  * [Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts](https://arxiv.org/abs/2102.08981)\u003cbr\u003e:sunflower:[dataset](https://github.com/google-research-datasets/conceptual-12m)\n  * [Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark](https://arxiv.org/abs/2103.10895)\u003cbr\u003e:house:[project](https://vap.aau.dk/sewer-ml/)\n  * [Benchmarking Representation Learning for Natural World Image Collections](https://arxiv.org/abs/2103.16483)\u003cbr\u003e:sunflower:[dataset](https://github.com/visipedia/newt)\n  * [SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data](https://arxiv.org/abs/2105.08612)\u003cbr\u003e:open_mouth:oral:sunflower:[dataset](http://sailvos.web.illinois.edu/_site/index.html)\n * [Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback](https://arxiv.org/abs/1905.12794)\u003cbr\u003e:sunflower:[dataset](https://github.com/XiaoxiaoGuo/fashion-iq)\n * [Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges](https://arxiv.org/abs/2009.03137)\u003cbr\u003e:sunflower:[dataset](https://github.com/QingyongHu/SensatUrban):tv:[video](https://www.youtube.com/watch?v=IG0tTdqB3L8)\u003cbr\u003e\n * 人脸图像修饰数据集\n  * [PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency](https://arxiv.org/abs/2105.09180)\u003cbr\u003e:star:[code](https://github.com/csjliang/PPR10K)\n* 室外场景\n  * [OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_OpenRooms_An_Open_Framework_for_Photorealistic_Indoor_Scene_Datasets_CVPR_2021_paper.pdf)\u003cbr\u003e:open_mouth:oral:sunflower:[dataset](https://vilab-ucsd.github.io/ucsd-openrooms/dataset/):house:[project](https://vilab-ucsd.github.io/ucsd-openrooms/)\n* 视觉艺术\n  * [ArtEmis: Affective Language for Visual Art](https://arxiv.org/abs/2101.07396)\u003cbr\u003e:house:[project](https://www.artemisdataset.org/)主页中包含全部：数据集、代码、视频等\n* UGC 视频质量评估\n  * [Rich Features for Perceptual Quality Assessment of UGC Videos](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Rich_Features_for_Perceptual_Quality_Assessment_of_UGC_Videos_CVPR_2021_paper.pdf)\u003cbr\u003e:sunflower:[dataset](https://media.withyoutube.com/ugc-dataset)\n* 室内定位数据集\n  * [Large-Scale Localization Datasets in Crowded Indoor Spaces](https://arxiv.org/abs/2105.08941)\u003cbr\u003e:sunflower:[dataset](https://naverlabs.com/datasets)\n  * [Zillow Indoor Dataset: Annotated Floor Plans With 360deg Panoramas and 3D Room Layouts](https://openaccess.thecvf.com/content/CVPR2021/papers/Cruz_Zillow_Indoor_Dataset_Annotated_Floor_Plans_With_360deg_Panoramas_and_CVPR_2021_paper.pdf)\n* 数据集(人类意图研究)\n  * [Intentonomy: A Dataset and Study Towards Human Intent Understanding](https://arxiv.org/abs/2011.05558)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/kmnp/intentonomy)\n* 人脸识别数据集\n  * [Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_Virtual_Fully-Connected_Layer_Training_a_Large-Scale_Face_Recognition_Dataset_With_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/pengyuLPY/Virtual-Fully-Connected-Layer)\n* 视觉属性预测数据集\n  * [Learning To Predict Visual Attributes in the Wild](https://openaccess.thecvf.com/content/CVPR2021/papers/Pham_Learning_To_Predict_Visual_Attributes_in_the_Wild_CVPR_2021_paper.pdf)\u003cbr\u003e:sunflower:[dataset](https://vawdataset.com/)\n* 数据集(Object-Centric Videos)\n  * [Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations](https://arxiv.org/abs/2012.09988)\u003cbr\u003e:sunflower:[dataset](https://github.com/google-research-datasets/Objectron)\n* 视频场景解析\n  * [VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild](https://openaccess.thecvf.com/content/CVPR2021/papers/Miao_VSPW_A_Large-scale_Dataset_for_Video_Scene_Parsing_in_the_CVPR_2021_paper.pdf)\u003cbr\u003e:sunflower:[dataset](https://github.com/VSPW-dataset/VSPW_code):house:[project](https://www.vspwdataset.com/)\n* 数据集（手语）\n  * [How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language](https://arxiv.org/abs/2008.08143)\u003cbr\u003e:sunflower:[dataset](https://how2sign.github.io/)\n\n\u003ca name=\"52\"/\u003e\n\n## 52. Image Generation/Synthesis(图像生成)\n\n- [Spatially-Adaptive Pixelwise Networks for Fast Image Translation](https://arxiv.org/abs/2012.02992)\u003cbr\u003e:house:[project](https://tamarott.github.io/ASAPNet_web/)\u003cbr\u003e采用超网络和隐式函数，极快的图像到图像翻译速度（比基线快18倍）\n- [Image Generators with Conditionally-Independent Pixel Synthesis](https://arxiv.org/abs/2011.13775)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/saic-mdal/CIPS)\n* [Im2Vec: Synthesizing Vector Graphics without Vector Supervision](https://arxiv.org/abs/2102.02798)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/preddy5/Im2Vec):house:[project](http://geometry.cs.ucl.ac.uk/projects/2021/im2vec/)\n* [Context-Aware Layout to Image Generation with Enhanced Object Appearance](https://arxiv.org/abs/2103.11897)\u003cbr\u003e:star:[code](https://github.com/wtliao/layout2img) \n* [Adversarial Generation of Continuous Images](https://arxiv.org/pdf/2011.12026.pdf)\u003cbr\u003e:star:[code](https://github.com/universome/inr-gan)\n* [StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis](https://arxiv.org/abs/2104.07098)\n* [IMAGINE: Image Synthesis by Image-Guided Model Inversion](https://arxiv.org/abs/2104.05895)\n* [SSN: Soft Shadow Network for Image Compositing](https://arxiv.org/abs/2007.08211)\n* [Mask-Embedded Discriminator With Region-Based Semantic Regularization for Semi-Supervised Class-Conditional Image Synthesis](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Mask-Embedded_Discriminator_With_Region-Based_Semantic_Regularization_for_Semi-Supervised_Class-Conditional_Image_CVPR_2021_paper.pdf)\n* [Learning Semantic Person Image Generation by Region-Adaptive Normalization](https://arxiv.org/abs/2104.06650)\u003cbr\u003e:star:[code](https://github.com/cszy98/SPGNet)\n* [MUST-GAN: Multi-Level Statistics Transfer for Self-Driven Person Image Generation](https://openaccess.thecvf.com/content/CVPR2021/papers/Ma_MUST-GAN_Multi-Level_Statistics_Transfer_for_Self-Driven_Person_Image_Generation_CVPR_2021_paper.pdf)\n* [Combining Semantic Guidance and Deep Reinforcement Learning for Generating Human Level Paintings](https://arxiv.org/abs/2011.12589)\u003cbr\u003e:star:[code](https://github.com/1jsingh/semantic-guidance)\n* [Diverse Semantic Image Synthesis via Probability Distribution Modeling](https://arxiv.org/abs/2103.06878)\u003cbr\u003e:star:[code](https://github.com/tzt101/INADE)\n* [Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Mol2Image_Improved_Conditional_Flow_Models_for_Molecule_to_Image_Synthesis_CVPR_2021_paper.pdf) \n\n \n\u003ca name=\"51\"/\u003e\n\n## 51.Contrastive Learning(对比学习)\n* [AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries](https://arxiv.org/abs/2011.08435)\u003cbr\u003e:star:[code](https://arxiv.org/abs/2011.08435)\u003cbr\u003e解读:[CVPR 2021接收论文：AdCo基于对抗的对比学习](https://mp.weixin.qq.com/s/u7Lhzh8uYEEHfWiM32-4yQ)\n* [LAFEAT: Piercing Through Adversarial Defenses with Latent Features](https://arxiv.org/abs/2104.09284)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/lafeat/lafeat)\n* [Distilling Audio-Visual Knowledge by Compositional Contrastive Learning](https://arxiv.org/abs/2104.10955)\u003cbr\u003e:star:[code](https://github.com/yanbeic/CCL)\n* [Mining Better Samples for Contrastive Learning of Temporal Correspondence](https://openaccess.thecvf.com/content/CVPR2021/papers/Jeon_Mining_Better_Samples_for_Contrastive_Learning_of_Temporal_Correspondence_CVPR_2021_paper.pdf)\n* [Jo-SRC: A Contrastive Approach for Combating Noisy Labels](https://openaccess.thecvf.com/content/CVPR2021/papers/Yao_Jo-SRC_A_Contrastive_Approach_for_Combating_Noisy_Labels_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/NUST-Machine-Intelligence-Laboratory/Jo-SRC)\n* [Neighborhood Contrastive Learning for Novel Class Discovery](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhong_Neighborhood_Contrastive_Learning_for_Novel_Class_Discovery_CVPR_2021_paper.pdf) \n\n  \n  \n\u003ca name=\"50\"/\u003e\n\n## 50.OCR\n\n* [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://arxiv.org/abs/2104.10442)\n* [Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter](https://arxiv.org/abs/2106.05920)\n* [Sequence-to-Sequence Contrastive Learning for Text Recognition](http://arxiv.org/abs/2012.10873)\n* [A Multiplexed Network for End-to-End, Multilingual OCR](https://arxiv.org/abs/2103.15992)\n* [TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption](https://arxiv.org/abs/2012.04638)\n* 场景文本检测\n  * [What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels](https://arxiv.org/abs/2103.04400)\u003cbr\u003e:star:[code](https://github.com/ku21fan/STR-Fewer-Labels)\n  * [Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/FangShancheng/ABINet)\n  * [MOST: A Multi-Oriented Scene Text Detector with Localization Refinement](https://arxiv.org/abs/2104.01070)\n  * [Scene Text Retrieval via Joint Text Detection and Similarity Learning](https://arxiv.org/abs/2104.01552)\u003cbr\u003e:star:[code](https://github.com/lanfeng4659/STR-TDSL)\n  * [TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text](https://arxiv.org/abs/2105.05486)\u003cbr\u003e:house:[project](https://textvqa.org/textocr)\n  * [Progressive Contour Regression for Arbitrary-Shape Scene Text Detection](https://openaccess.thecvf.com/content/CVPR2021/papers/Dai_Progressive_Contour_Regression_for_Arbitrary-Shape_Scene_Text_Detection_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/dpengwen/PCR)\n  * [Dictionary-guided Scene Text Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/VinAIResearch/dict-guided)\n  * [Primitive Representation Learning for Scene Text Recognition](https://arxiv.org/abs/2105.04286)\n* 手写文本识别\n  * [MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition](https://arxiv.org/abs/2104.01876)\n* 文本分割\n  * [Rethinking Text Segmentation: A Novel Dataset and a Text-Specific Refinement Approach](https://arxiv.org/abs/2011.14021)\u003cbr\u003e:star:[code](https://github.com/SHI-Labs/Rethinking-Text-Segmentation)\n* 视频文本检测\n  * [Semantic-Aware Video Text Detection](https://openaccess.thecvf.com/content/CVPR2021/papers/Feng_Semantic-Aware_Video_Text_Detection_CVPR_2021_paper.pdf)\n* 文本检测\n  * [Self-Attention Based Text Knowledge Mining for Text Detection](https://openaccess.thecvf.com/content/CVPR2021/papers/Wan_Self-Attention_Based_Text_Knowledge_Mining_for_Text_Detection_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/CVI-SZU/STKM)\n\n\n\n\u003ca name=\"49\"/\u003e\n\n## 49.Adversarial Learning(对抗学习)\n\n- [Simulating Unknown Target Models for Query-Efficient Black-box Attacks](https://arxiv.org/abs/2009.00960)\u003cbr\u003e:star:[code](https://github.com/machanic/MetaSimulator)\u003cbr\u003e黑盒对抗攻击\n- [Delving into Data: Effectively Substitute Training for Black-box Attack](https://arxiv.org/abs/2104.12378)\u003cbr\u003e基于高效训练替代模型的黑盒攻击方法\u003cbr\u003e解读：[8](https://mp.weixin.qq.com/s/yNDkHMhOIb76b4KcEhx4XQ)\n- [LiBRe: A Practical Bayesian Approach to Adversarial Detection](https://arxiv.org/abs/2103.14835)\u003cbr\u003e:star:[code](https://github.com/thudzj/ScalableBDL)\n* [Invisible Perturbations: Physical Adversarial Examples Exploiting the Rolling Shutter Effect](https://arxiv.org/abs/2011.13375)\n* [Enhancing the Transferability of Adversarial Attacks Through Variance Tuning](https://arxiv.org/abs/2103.15571)\u003cbr\u003e:star:[code](https://github.com/JHL-HUST/VT)\n* [Natural Adversarial Examples](https://arxiv.org/abs/1907.07174)\u003cbr\u003e:star:[code](https://github.com/hendrycks/natural-adv-examples)\n* [SurFree: A Fast Surrogate-Free Black-Box Attack](https://arxiv.org/abs/2011.12807)\u003cbr\u003e:star:[code](https://github.com/t-maho/SurFree)\n* [Regularizing Neural Networks via Adversarial Model Perturbation](https://arxiv.org/abs/2010.04925)\u003cbr\u003e:star:[code](https://github.com/hiyouga/AMP-Regularizer)\n* [Adversarial Imaging Pipelines](https://arxiv.org/abs/2102.03728)\n* [MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation](https://arxiv.org/abs/2005.03161)\n* [Universal Spectral Adversarial Attacks for Deformable Shapes](https://arxiv.org/abs/2104.03356)\n* [Adversarial Robustness Across Representation Spaces](https://arxiv.org/abs/2012.00802)\u003cbr\u003e:star:[code](https://github.com/tensorflow/neural-structured-learning/tree/master/research/multi_representation_adversary)\n* [Protecting Intellectual Property of Generative Adversarial Networks From Ambiguity Attacks](https://openaccess.thecvf.com/content/CVPR2021/papers/Ong_Protecting_Intellectual_Property_of_Generative_Adversarial_Networks_From_Ambiguity_Attacks_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/dingsheng-ong/ipr-gan)\n* [Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World](https://arxiv.org/abs/2103.01050)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/nlsde-safety-team/DualAttentionAttack)\n* [Learning Compositional Representation for 4D Captures with Neural ODE](https://arxiv.org/abs/2103.08271)\n* 对抗攻击\n  * [Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink](https://arxiv.org/abs/2103.06504)\n\n\n\n\u003ca name=\"48\"/\u003e\n\n## 48.Image Representation(图像表示)\n\n- [Learning Continuous Image Representation with Local Implicit Image Function](https://arxiv.org/abs/2012.09161)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/yinboc/liif):house:[project](https://yinboc.github.io/liif/):tv:[video](https://youtu.be/6f2roieSY_8)\n\n\u003ca name=\"47\"/\u003e\n\n## 47.Vision-Language(视觉语言)\n\n- [Structured Scene Memory for Vision-Language Navigation](https://arxiv.org/abs/2103.03454)\u003cbr\u003e:star:[code](https://github.com/HanqingWangAI/SSM-VLN)\n* [Kaleido-BERT: Vision-Language Pre-training on Fashion Domain](https://arxiv.org/abs/2103.16110)\u003cbr\u003e\n* [Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning](https://arxiv.org/abs/2104.03135)\u003cbr\u003e:star:[code](https://github.com/researchmm/soho)\n* [UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training](https://arxiv.org/abs/2104.00332)\n* [VinVL: Revisiting Visual Representations in Vision-Language Models](https://arxiv.org/abs/2101.00529)\u003cbr\u003e:star:[code](https://github.com/pzzhang/VinVL)\n* [Connecting What To Say With Where To Look by Modeling Human Attention Traces](https://arxiv.org/abs/2105.05964)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/connect-caption-and-trace):house:[project](http://pages.cs.wisc.edu/~zihangm/connect_caption_trace)\n* [Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Adaptive_Cross-Modal_Prototypes_for_Cross-Domain_Visual-Language_Retrieval_CVPR_2021_paper.pdf) \n* [VLN BERT: A Recurrent Vision-and-Language BERT for Navigation](https://openaccess.thecvf.com/content/CVPR2021/papers/Hong_VLN_BERT_A_Recurrent_Vision-and-Language_BERT_for_Navigation_CVPR_2021_paper.pdf)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/YicongHong/Recurrent-VLN-BERT)\n* [Transitional Adaptation of Pretrained Models for Visual Storytelling](https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Transitional_Adaptation_of_Pretrained_Models_for_Visual_Storytelling_CVPR_2021_paper.pdf) \n* [Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation](https://arxiv.org/abs/2105.11541)\u003cbr\u003e:star:[code](https://github.com/amazon-research/read-up)\n* [Causal Attention for Vision-Language Tasks](https://arxiv.org/abs/2103.03493)\u003cbr\u003e:star:[code](https://github.com/yangxuntu/catt) \n \n\u003ca name=\"46\"/\u003e\n\n## 46.Human-Object Interaction(人物交互)\n\n- [Learning Asynchronous and Sparse Human-Object Interaction in Videos](https://arxiv.org/abs/2103.02758)\n- [QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information](https://arxiv.org/abs/2103.05399)\u003cbr\u003e:star:[code](https://github.com/hitachi-rd-cv/qpic)\n- [Reformulating HOI Detection as Adaptive Set Prediction](https://arxiv.org/abs/2103.05983)\u003cbr\u003e:star:[code](https://github.com/yoyomimi/AS-Net)\n* [Detecting Human-Object Interaction via Fabricated Compositional Learning](https://arxiv.org/abs/2103.08214)\u003cbr\u003e:star:[code](https://github.com/zhihou7/FCL)\n* [Affordance Transfer Learning for Human-Object Interaction Detection](https://arxiv.org/abs/2104.02867)\u003cbr\u003e:star:[code](https://github.com/zhihou7/HOI-CL)\n* [Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection](https://arxiv.org/abs/2104.05269)\u003cbr\u003e:star:[code](https://github.com/SherlockHolmes221/GGNet)\n* [Hierarchical Video Prediction Using Relational Layouts for Human-Object Interactions](https://openaccess.thecvf.com/content/CVPR2021/papers/Bodla_Hierarchical_Video_Prediction_Using_Relational_Layouts_for_Human-Object_Interactions_CVPR_2021_paper.pdf)\n\n\u003ca name=\"45\"/\u003e\n\n## 45.Camera Localization(相机定位)\n\n- [Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments](https://arxiv.org/abs/2012.04746)\u003cbr\u003e:open_mouth:oral\n- [Back to the Feature: Learning Robust Camera Localization from Pixels to Pose](https://arxiv.org/abs/2103.09213)\u003cbr\u003e:star:[code](https://github.com/cvg/pixloc)\n- [Learning Camera Localization via Dense Scene Matching](https://arxiv.org/abs/2103.16792)\u003cbr\u003e:star:[code](https://github.com/Tangshitao/Dense-Scene-Matching)\n- [Privacy Preserving Localization and Mapping From Uncalibrated Cameras](https://openaccess.thecvf.com/content/CVPR2021/papers/Geppert_Privacy_Preserving_Localization_and_Mapping_From_Uncalibrated_Cameras_CVPR_2021_paper.pdf)\n* 视觉定位\n  * [VS-Net: Voting with Segmentation for Visual Localization](https://arxiv.org/abs/2105.10886)\u003cbr\u003e:star:[code](https://github.com/zju3dv/VS-Net):house:[project](https://drinkingcoder.github.io/publication/vs-net/):tv:[video](https://youtu.be/5WLEyyLdxAs)\n\n\u003ca name=\"44\"/\u003e\n\n## 44. Image/video Captioning(图像/视频字幕)\n\n- [Scan2Cap: Context-aware Dense Captioning in RGB-D Scans](https://arxiv.org/abs/2012.02206)\u003cbr\u003e:star:[code](https://github.com/daveredrum/Scan2Cap):house:[project](https://daveredrum.github.io/Scan2Cap/):tv:[video](https://youtu.be/AgmIpDbwTCY)\n- [VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs](https://arxiv.org/abs/2101.12059)\u003cbr\u003e视频字幕、视频问答和视频对话任务的多模式框架\n- [Open-book Video Captioning with Retrieve-Copy-Generate Network](https://arxiv.org/abs/2103.05284)\n* 图像字幕\n  * [Human-like Controllable Image Captioning with Verb-specific Semantic Roles](https://arxiv.org/abs/2103.12204)\u003cbr\u003e:star:[code](https://github.com/mad-red/VSR-guided-CIC)\n  * [Towards Accurate Text-based Image Captioning with Content Diversity Exploration](https://arxiv.org/abs/2105.03236)\u003cbr\u003e:star:[code](https://github.com/guanghuixu/AnchorCaptioner)\n  * [Image Change Captioning by Learning From an Auxiliary Task](https://openaccess.thecvf.com/content/CVPR2021/papers/Hosseinzadeh_Image_Change_Captioning_by_Learning_From_an_Auxiliary_Task_CVPR_2021_paper.pdf)\n  * [FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_FAIEr_Fidelity_and_Adequacy_Ensured_Image_Caption_Evaluation_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://vipl.ict.ac.cn/view_database.php?id=6)\n  * [Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Towards_Bridging_Event_Captioner_and_Sentence_Localizer_for_Weakly_Supervised_CVPR_2021_paper.pdf)\n  * [Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Improving_OCR-Based_Image_Captioning_by_Incorporating_Geometrical_Relationship_CVPR_2021_paper.pdf)\n  \n\u003ca name=\"43\"/\u003e\n\n## 43.Active Learning(主动学习)\n\n- [Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning](https://arxiv.org/abs/2003.11249)\n* [Task-Aware Variational Adversarial Active Learning](https://arxiv.org/abs/2002.04709)\u003cbr\u003e:star:[code](https://github.com/cubeyoung/TA-VAAL)\n\n\u003ca name=\"42\"/\u003e\n\n## 42.Scene Flow Estimation(场景流估计)\n* 场景流估计\n  * [Self-Supervised Multi-Frame Monocular Scene Flow](https://arxiv.org/abs/2105.02216)\u003cbr\u003e:star:[code](https://github.com/visinf/multi-mono-sf)\n  * [HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding](https://arxiv.org/abs/2105.07751)\n  * [Self-Point-Flow: Self-Supervised Scene Flow Estimation from Point Clouds with Optimal Transport and Random Walk](https://arxiv.org/abs/2105.08248)\u003cbr\u003e:open_mouth:oral\n  * [FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation](https://arxiv.org/abs/2011.10147)\u003cbr\u003e:star:[code](https://github.com/yairkit/flowstep3d)\n  * [RAFT-3D: Scene Flow Using Rigid-Motion Embeddings](https://openaccess.thecvf.com/content/CVPR2021/papers/Teed_RAFT-3D_Scene_Flow_Using_Rigid-Motion_Embeddings_CVPR_2021_paper.pdf)\n\n\u003ca name=\"41\"/\u003e\n\n## 41. Representation Learning(表示学习（图像+字幕）)\n\n- [VirTex: Learning Visual Representations from Textual Annotations](https://arxiv.org/abs/2006.06666)\u003cbr\u003e:star:[code](https://github.com/kdexd/virtex)\n- [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/facebookresearch/simsiam)\n- [Representation Learning via Global Temporal Alignment and Cycle-Consistency](https://arxiv.org/abs/2105.05217)\u003cbr\u003e:star:[code](https://github.com/hadjisma/VideoAlignment)\n* [SelfDoc: Self-Supervised Document Representation Learning](https://arxiv.org/abs/2106.03331)\n* [CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models](https://arxiv.org/abs/2004.08697)\n* [Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders](https://arxiv.org/abs/2103.16046)\u003cbr\u003e:star:[code](https://github.com/junhocho/HGCAE)\n* [Boosting Video Representation Learning With Multi-Faceted Integration](https://openaccess.thecvf.com/content/CVPR2021/papers/Qiu_Boosting_Video_Representation_Learning_With_Multi-Faceted_Integration_CVPR_2021_paper.pdf)\n\n\u003ca name=\"40\"/\u003e\n\n## 40.Superpixel (超像素)\n\n- [Learning the Superpixel in a Non-iterative and Lifelong Manner](https://arxiv.org/abs/2103.10681)\u003cbr\u003e:star:[code](https://github.com/zh460045050/LNSNet)\n\n\u003ca name=\"39\"/\u003e\n\n## 39.Debiasing(去偏见)\n\n- [Fair Attribute Classification through Latent Space De-biasing](https://arxiv.org/abs/2012.01469)\u003cbr\u003e:star:[code](https://github.com/princetonvisualai/gan-debiasing):house:[project](https://princetonvisualai.github.io/gan-debiasing/)\u003cbr\u003e\n- [Reducing Domain Gap by Reducing Style Bias](https://arxiv.org/abs/1910.11645)\u003cbr\u003e:star:[code](https://github.com/hyeonseobnam/sagnet)\n* 偏差矫正\n  * [EnD: Entangling and Disentangling deep representations for bias correction](https://arxiv.org/abs/2103.02023)\u003cbr\u003e:star:[code](https://github.com/EIDOSlab/entangling-disentangling-bias)\n\n\u003ca name=\"38\"/\u003e\n\n## 38.Class-Incremental learning(类增量学习)\n\n- [IIRC: Incremental Implicitly-Refined Classification](https://arxiv.org/abs/2012.12477)\u003cbr\u003e:house:[project](https://chandar-lab.github.io/IIRC/)\u003cbr\u003e\n- [Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning](https://arxiv.org/abs/2103.04059)\u003cbr\u003e:star:[code](https://github.com/ali-chr/Semantic-aware-Knowledge-Distillation-for-Few-ShotClass-Incremental-Learning)\n- [DER: Dynamically Expandable Representation for Class Incremental Learning](https://arxiv.org/abs/2103.16788)\u003cbr\u003e:star:[code](https://github.com/Rhyssiyan/DER-ClassIL.pytorch)\n- [Distilling Causal Effect of Data in Class-Incremental Learning](https://arxiv.org/abs/2103.01737)\u003cbr\u003e:star:[code](https://github.com/JoyHuYY1412/DDE_CIL)\n- [Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhu_Self-Promoted_Prototype_Refinement_for_Few-Shot_Class-Incremental_Learning_CVPR_2021_paper.pdf)\n* [Adaptive Aggregation Networks for Class-Incremental Learning](https://arxiv.org/abs/2010.05063)\u003cbr\u003e:star:[code](https://github.com/yaoyao-liu/class-incremental-learning)\n* 增量学习\n  * [Few-Shot Incremental Learning with Continually Evolved Classifiers](https://arxiv.org/abs/2104.03047)\n  * [On Learning the Geodesic Path for Incremental Learning](https://arxiv.org/abs/2104.08572)\u003cbr\u003e:star:[code](https://github.com/chrysts/geodesic_continual_learning)\n  * [Prototype Augmentation and Self-Supervision for Incremental Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhu_Prototype_Augmentation_and_Self-Supervision_for_Incremental_Learning_CVPR_2021_paper.pdf)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/Impression2805/CVPR21_PASS)\n  * [Incremental Learning via Rate Reduction](https://arxiv.org/abs/2011.14593)\n\n\u003ca name=\"37\"/\u003e\n\n## 37. Continual Learning(持续学习)\n\n- [Training Networks in Null Space for Continual Learning]()\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/ShipengWang/Adam-NSCL)\n* [Efficient Feature Transformations for Discriminative and Generative Continual Learning](https://arxiv.org/abs/2103.13558)\n* [Rainbow Memory: Continual Learning with a Memory of Diverse Samples](https://arxiv.org/abs/2103.17230) \n* [Rectification-based Knowledge Retention for Continual Learning](https://arxiv.org/abs/2103.16597) \n* [Layerwise Optimization by Gradient Decomposition for Continual Learning](https://arxiv.org/abs/2105.07561)\n* [Continual Learning via Bit-Level Information Preserving](https://arxiv.org/abs/2105.04444)\u003cbr\u003e:star:[code](https://github.com/Yujun-Shi/BLIP)\n* [Training Networks in Null Space of Feature Covariance for Continual Learning](https://arxiv.org/abs/2103.07113)\u003cbr\u003e:open_mouth:oral\n* [ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-Supervised Continual Learning](https://arxiv.org/abs/2101.00407)\n\n\u003ca name=\"36\"/\u003e\n\n## 36.Action Detection and Recognition(动作检测与识别)\n\n- [Coarse-Fine Networks for Temporal Activity Detection in Videos](https://arxiv.org/abs/2103.01302)\u003cbr\u003e:star:[code](https://github.com/kkahatapitiya/Coarse-Fine-Networks)\n- [3D CNNs with Adaptive Temporal Feature Resolutions](https://arxiv.org/abs/2011.08652)\u003cbr\u003e:star:[code](https://github.com/SimilarityGuidedSampling/Similarity-Guided-Sampling):house:[project](https://similarityguidedsampling.github.io/)\n- [Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack](https://arxiv.org/abs/2103.05347)\u003cbr\u003e:tv:[video](https://www.youtube.com/watch?v=DeMkN3efp9s)\n- [BASAR:Black-box Attack on Skeletal Action Recognition](https://arxiv.org/abs/2103.05266)\u003cbr\u003e:house:[project](http://drhewang.com/pages/AAHAR.html):tv:[video](https://www.youtube.com/watch?v=PjWgwnAkV8g)\u003cbr\u003e解读：[对抗攻防新方向：动作识别算法容易被攻击！](https://mp.weixin.qq.com/s/AKxGfguKZK5QAT_k0CdbtQ)\n- [TDN: Temporal Difference Networks for Efficient Action Recognition]( https://arxiv.org/abs/2012.10071)\u003cbr\u003e:star:[code](https://github.com/MCG-NJU/TDN)\n- [ACTION-Net: Multipath Excitation for Action Recognition](https://arxiv.org/abs/2103.07372)\u003cbr\u003e:star:[code](https://github.com/V-Sense/ACTION-Net)\u003cbr\u003e解读：[CVPR 2021 | 用于动作识别，即插即用、混合注意力机制的 ACTION 模块](https://mp.weixin.qq.com/s/L2_lkhKbVhW8fjAaDdsyWQ)\u003cbr\u003e解读：[CVPR 2021 ｜针对强时序依赖，即插即用、混合注意力机制的 ACTION 模块](https://mp.weixin.qq.com/s/tonyk649KzU1Y_c6p8isuQ)\n- [No frame left behind: Full Video Action Recognition](https://arxiv.org/abs/2103.15395)\n* [Recognizing Actions in Videos from Unseen Viewpoints](https://arxiv.org/abs/2103.16516)\n* [Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories](https://arxiv.org/abs/2104.01198)\n* [Motion Representations for Articulated Animation](https://arxiv.org/abs/2104.11280)\u003cbr\u003e:star:[code](https://github.com/snap-research/articulated-animation):house:[project](https://snap-research.github.io/articulated-animation/):tv:[video](https://www.youtube.com/watch?v=gpBYN8t8_yY)\n* [Home Action Genome: Cooperative Compositional Action Understanding](https://arxiv.org/abs/2105.05226)\n* [Anticipating human actions by correlating past with the future with Jaccard similarity measures](https://arxiv.org/abs/2105.12414)\n* [Graph-Based High-Order Relation Modeling for Long-Term Action Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhou_Graph-Based_High-Order_Relation_Modeling_for_Long-Term_Action_Recognition_CVPR_2021_paper.pdf)\n* [Representing Videos As Discriminative Sub-Graphs for Action Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_Representing_Videos_As_Discriminative_Sub-Graphs_for_Action_Recognition_CVPR_2021_paper.pdf)\n* [Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_Three_Birds_with_One_Stone_Multi-Task_Temporal_Action_Detection_via_CVPR_2021_paper.pdf)\n* [Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization](https://arxiv.org/abs/2012.01405)\u003cbr\u003e:star:[code](https://github.com/google-research/google-research/tree/master/poem) \n* [Spatio-temporal Contrastive Domain Adaptation for Action Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Song_Spatio-temporal_Contrastive_Domain_Adaptation_for_Action_Recognition_CVPR_2021_paper.pdf) \n* [Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition](https://arxiv.org/abs/2010.11757)\u003cbr\u003e:star:[code](https://github.com/IBM/action-recognition-pytorch)\n* [Semi-Supervised Action Recognition With Temporal Contrastive Learning](https://arxiv.org/abs/2102.02751)\u003cbr\u003e:star:[code](https://github.com/CVIR/TCL):house:[project](https://cvir.github.io/TCL/):tv:[video](https://www.youtube.com/watch?v=_qIYu3EU2kY)\n* [WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos](https://arxiv.org/abs/2006.03732)\n* [BABEL: Bodies, Action and Behavior With English Labels](https://openaccess.thecvf.com/content/CVPR2021/papers/Punnakkal_BABEL_Bodies_Action_and_Behavior_With_English_Labels_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/abhinanda-punnakkal/BABEL):house:[project](https://babel.is.tue.mpg.de/):tv:[video](https://www.youtube.com/watch?v=BYWxvjKpCqA)\n* 动作定位\n  * [Few-Shot Transformation of Common Actions into Time and Space](https://arxiv.org/abs/2104.02439)\u003cbr\u003e:star:[code](https://github.com/PengWan-Yang) \n* 时序动作定位\n  * [Modeling Multi-Label Action Dependencies for Temporal Action Localization](https://arxiv.org/abs/2103.03027)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/ptirupat/MLAD)\u003cbr\u003e提出基于注意力的网络架构来学习视频中的动作依赖性，用于解决多标签时间动作定位任务。\n  * [The Blessings of Unlabeled Background in Untrimmed Videos](https://arxiv.org/abs/2103.13183)\u003cbr\u003e:star:[code](https://github.com/liuyuancv/WTAL_blessing)\n  * [Temporal Context Aggregation Network for Temporal Action Proposal Refinement](https://arxiv.org/abs/2103.13141)\n  * [Learning Salient Boundary Feature for Anchor-free Temporal Action Localization](https://arxiv.org/abs/2103.13137)\u003cbr\u003e基于显著边界特征学习的无锚框时序动作定位\u003cbr\u003e解读：[10](https://mp.weixin.qq.com/s/yNDkHMhOIb76b4KcEhx4XQ)\n  * [CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning](https://arxiv.org/abs/2103.16392)\n  * [Action Unit Memory Network for Weakly Supervised Temporal Action Localization](https://arxiv.org/abs/2104.14135)\n  * [Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization](https://arxiv.org/abs/2006.07976)\u003cbr\u003e:star:[code](https://github.com/Siyu-C/ACAR-Net) \n  * [Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Uncertainty_Guided_Collaborative_Training_for_Weakly_Supervised_Temporal_Action_Detection_CVPR_2021_paper.pdf)\n* Video Actor Segmentation\n  * [Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation](https://arxiv.org/abs/2105.06818)\n* 动作分割\n  * [Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment](https://openaccess.thecvf.com/content/CVPR2021/papers/Shen_Learning_To_Segment_Actions_From_Visual_and_Language_Instructions_via_CVPR_2021_paper.pdf) \n  * 时序动作分割\n    * [Temporal Action Segmentation from Timestamp Supervision](https://arxiv.org/abs/2103.06669)\u003cbr\u003e:star:[code](https://github.com/ZheLi2020/TimestampActionSeg)\n    * [Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation](https://arxiv.org/abs/2103.11264)\u003cbr\u003e:star:[code](https://github.com/ssarfraz/FINCH-Clustering/tree/master/TW-FINCH)\n  * 无监督动作分割\n    * [Action Shuffle Alternating Learning for Unsupervised Action Segmentation](https://arxiv.org/abs/2104.02116)\n  * 监督动作分割\n   * [Anchor-Constrained Viterbi for Set-Supervised Action Segmentation](https://arxiv.org/abs/2104.02113)\n  * 视频动作分割\n    * [Global2Local: Efficient Structure Search for Video Action Segmentation](https://arxiv.org/abs/2101.00910)\u003cbr\u003e:star:[code](https://github.com/ShangHua-Gao/G2L-search)\u003cbr\u003e从全局到局部：面向视频动作分割的高效网络结构搜索\u003cbr\u003e解读：[19](https://mp.weixin.qq.com/s/yNDkHMhOIb76b4KcEhx4XQ)\n* Video Moment Localization(视频时刻定位)\n  * [Structured Multi-Level Interaction Network for Video Moment Localization via Language Query](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Structured_Multi-Level_Interaction_Network_for_Video_Moment_Localization_via_Language_CVPR_2021_paper.pdf)   \n* 时空事件定位\n  * [Multi-Shot Temporal Event Localization: A Benchmark](https://arxiv.org/abs/2012.09434)\u003cbr\u003e:star:[code](https://github.com/xlliu7/MUSES):house:[project](https://songbai.site/muses/) \n\n\u003ca name=\"35\"/\u003e\n\n## 35.Image Clustering(图像聚类) \n\n- [Improving Unsupervised Image Clustering With Robust Learning](https://arxiv.org/abs/2012.11150)\u003cbr\u003e:star:[code](https://github.com/deu30303/RUC)\u003cbr\u003e利用鲁棒学习改进无监督图像聚类技术\u003cbr\u003e\n- [Jigsaw Clustering for Unsupervised Visual Representation Learning](https://arxiv.org/abs/2104.00323)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/Jia-Research-Lab/JigsawClustering)\n- [COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction](https://openaccess.thecvf.com/content/CVPR2021/papers/Lin_COMPLETER_Incomplete_Multi-View_Clustering_via_Contrastive_Prediction_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://pengxi.me/)\n\n\u003ca name=\"34\"/\u003e\n\n## 34.Image Classification(图像分类)\n\n- [Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels](https://arxiv.org/abs/2101.05022)\u003cbr\u003e:star:[code](https://github.com/naver-ai/relabel_imagenet)\u003cbr\u003e\n- [Differentiable Patch Selection for Image Recognition](https://arxiv.org/abs/2104.03059)\u003cbr\u003e:star:[code](https://github.com/google-research/google-research/tree/master/ptopk_patch_selection/)\n- [Achieving Robustness in Classification Using Optimal Transport With Hinge Regularization](https://openaccess.thecvf.com/content/CVPR2021/papers/Serrurier_Achieving_Robustness_in_Classification_Using_Optimal_Transport_With_Hinge_Regularization_CVPR_2021_paper.pdf)\n- [Are Labels Always Necessary for Classifier Accuracy Evaluation?](https://arxiv.org/abs/2007.02915)\n* 细粒度分类\n  * [Fine-grained Angular Contrastive Learning with Coarse Labels](https://arxiv.org/abs/2012.03515)\u003cbr\u003e:open_mouth:oral\u003cbr\u003e:star:[code](https://github.com/guybuk/ANCOR)\u003cbr\u003e使用自监督进行 Coarse Labels（粗标签）的细粒度分类方面的工作。粗标签与细粒度标签相比，更容易和更便宜，因为细粒度标签通常需要域专家。\n  * [Graph-based High-Order Relation Discovery for Fine-grained Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhao_Graph-Based_High-Order_Relation_Discovery_for_Fine-Grained_Recognition_CVPR_2021_paper.pdf)\u003cbr\u003e基于特征间高阶关系挖掘的细粒度识别方法\u003cbr\u003e解读：[20](https://mp.weixin.qq.com/s/yNDkHMhOIb76b4KcEhx4XQ)\n  * [Few-Shot Classification with Feature Map Reconstruction Networks](https://arxiv.org/abs/2012.01506)\u003cbr\u003e:star:[code](https://github.com/Tsingularity/FRN):tv:[video](https://www.youtube.com/watch?v=kbsRsbQKTRc)\n  * [A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification](https://arxiv.org/abs/2104.00679)\u003cbr\u003e:open_mouth:oral\n  * [GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Shi_GLAVNet_Global-Local_Audio-Visual_Cues_for_Fine-Grained_Material_Recognition_CVPR_2021_paper.pdf)\n  * [Learning Deep Classifiers Consistent With Fine-Grained Novelty Detection](https://openaccess.thecvf.com/content/CVPR2021/papers/Cheng_Learning_Deep_Classifiers_Consistent_With_Fine-Grained_Novelty_Detection_CVPR_2021_paper.pdf)\n  * [Your \"Flamingo\" is My \"Bird\": Fine-Grained, or Not](https://arxiv.org/abs/2011.09040)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/PRIS-CV/Fine-Grained-or-Not)\n  * [Discrimination-Aware Mechanism for Fine-Grained Representation Learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Xu_Discrimination-Aware_Mechanism_for_Fine-Grained_Representation_Learning_CVPR_2021_paper.pdf)\n  * [Neural Prototype Trees for Interpretable Fine-grained Image Recognition](https://arxiv.org/abs/2012.02046)\u003cbr\u003e:star:[code](https://github.com/M-Nauta/ProtoTree)\n* 图像分类\n  * [MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition](https://arxiv.org/abs/2103.12579)\u003cbr\u003e:star:[code](https://github.com/BIT-DA/MetaSAug)\n  * [PML: Progressive Margin Loss for Long-tailed Age Classification](https://arxiv.org/abs/2103.02140)\u003cbr\u003e\n  * [Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification](https://arxiv.org/abs/2103.14267)\u003cbr\u003e:house:[project](https://www.kaihan.org/HybridLT/)\n  * [Capsule Network is Not More Robust than Convolutional Network](https://arxiv.org/abs/2103.15459)\n  * [Model-Contrastive Federated Learning](https://arxiv.org/abs/2103.16257)\u003cbr\u003e:star:[code](https://github.com/QinbinLi/MOON)\n  * [Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets](https://arxiv.org/abs/2104.12690)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/fidler-lab/efficient-annotation-cookbook):house:[project](https://fidler-lab.github.io/efficient-annotation-cookbook/)\n  * [Correlated Input-Dependent Label Noise in Large-Scale Image Classification](https://arxiv.org/abs/2105.10305)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet)\n  * [Towards Robust Classification Model by Counterfactual and Invariant Data Generation](https://arxiv.org/abs/2106.01127)\u003cbr\u003e:star:[code](https://github.com/zzzace2000/robust_cls_model)\n  *  [Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification With Self-Supervised Contrastive Learning](https://arxiv.org/abs/2011.08939)\u003cbr\u003e:star:[code](https://github.com/binli123/dsmil-wsi)\n  * [Generative Classifiers as a Basis for Trustworthy Image Classification](https://arxiv.org/abs/2007.15036)\u003cbr\u003e:star:[code](https://github.com/VLL-HD/trustworthy_GCs)\n  * [Synthesize-It-Classifier: Learning a Generative Classifier Through Recurrent Self-Analysis](https://openaccess.thecvf.com/content/CVPR2021/papers/Pal_Synthesize-It-Classifier_Learning_a_Generative_Classifier_Through_Recurrent_Self-Analysis_CVPR_2021_paper.pdf)\n  * [Background Splitting: Finding Rare Classes in a Sea of Background](https://arxiv.org/abs/2008.12873)\n  * [Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image Classification](https://arxiv.org/abs/2010.05785)\n  * [Self-Supervised Wasserstein Pseudo-Labeling for Semi-Supervised Image Classification](https://openaccess.thecvf.com/content/CVPR2021/papers/Taherkhani_Self-Supervised_Wasserstein_Pseudo-Labeling_for_Semi-Supervised_Image_Classification_CVPR_2021_paper.pdf)\n  * [DAP: Detection-Aware Pre-training with Weak Supervision](https://arxiv.org/abs/2103.16651)\n* 半监督图像分类\n  * [SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification](https://arxiv.org/abs/2103.16725)\u003cbr\u003e:star:[code](https://github.com/zijian-hu/SimPLE)\n* 视觉识别 \n  * [Fair Feature Distillation for Visual Recognition](https://openaccess.thecvf.com/content/CVPR2021/papers/Jung_Fair_Feature_Distillation_for_Visual_Recognition_CVPR_2021_paper.pdf)\n  * 长尾视觉识别\n    * [Distribution Alignment: A Unified Framework for Long-tail Visual Recognition](https://arxiv.org/abs/2103.16370)\u003cbr\u003e:star:[code](https://github.com/Megvii-BaseDetection/DisAlign)\n    * [Improving Calibration for Long-Tailed Recognition](https://arxiv.org/abs/2104.00466)\u003cbr\u003e:star:[code](https://github.com/Jia-Research-Lab/MiSLAS)\n    * [Adversarial Robustness under Long-Tailed Distribution](https://arxiv.org/abs/2104.02703)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/wutong16/Adversarial_Long-Tail)\n    * [Disentangling Label Distribution for Long-Tailed Visual Recognition](https://arxiv.org/abs/2012.00321)\u003cbr\u003e:star:[code](https://github.com/hyperconnect/LADE)\n    * [Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings](https://openaccess.thecvf.com/content/CVPR2021/papers/Guo_Long-Tailed_Multi-Label_Visual_Recognition_by_Collaborative_Training_on_Uniform_and_CVPR_2021_paper.pdf) \n* 物体分类\n  * [Object Classification From Randomized EEG Trials](https://arxiv.org/abs/2004.06046)\n* Nearest Neighbor Matching(最近邻匹配)\n  * [Nearest Neighbor Matching for Deep Clustering](https://openaccess.thecvf.com/content/CVPR2021/papers/Dang_Nearest_Neighbor_Matching_for_Deep_Clustering_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/ZhiyuanDang/NNM) \n* OOD检测\n  * [MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space](https://arxiv.org/abs/2105.01879)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/deeplearning-wisc/large_scale_ood)\n  * [MOOD: Multi-level Out-of-distribution Detection](https://arxiv.org/abs/2104.14726)\u003cbr\u003e:star:[code](https://github.com/deeplearning-wisc/MOOD)\n  * [Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces](https://openaccess.thecvf.com/content/CVPR2021/papers/Zaeemzadeh_Out-of-Distribution_Detection_Using_Union_of_1-Dimensional_Subspaces_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/zaeemzadeh/OOD)\n \n\u003ca name=\"33\"/\u003e\n\n## 33.6D Pose Estimation(6D位姿估计)\n\n- [FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation](https://arxiv.org/abs/2103.02242)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/ethnhe/FFB6D)\u003cbr\u003e\n- [GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation](http://arxiv.org/abs/2102.12145)\u003cbr\u003e:star:[code](https://github.com/THU-DA-6D-Pose-Group/GDR-Net)\n- [FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism](https://arxiv.org/abs/2103.07054)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/DC1991/FS-Net)\n- [Wide-Depth-Range 6D Object Pose Estimation in Space](https://arxiv.org/abs/2104.00337)\u003cbr\u003e:star:[code](https://github.com/cvlab-epfl/wide-depth-range-pose)\n- [DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency](https://arxiv.org/abs/2104.03658)\n* [Single-view robot pose and joint angle estimation via render \u0026 compare](https://arxiv.org/abs/2104.09359)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/ylabbe/robopose):house:[project](https://www.di.ens.fr/willow/research/robopose/):tv:[video](https://www.youtube.com/watch?v=3yzwS99sgLI)\n* [Keypoint-Graph-Driven Learning Framework for Object Pose Estimation](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Keypoint-Graph-Driven_Learning_Framework_for_Object_Pose_Estimation_CVPR_2021_paper.pdf)  \n* [StablePose: Learning 6D Object Poses From Geometrically Stable Patches](https://arxiv.org/abs/2102.09334)\n\n\u003ca name=\"32\"/\u003e\n\n## 32.View Synthesis(视图合成)\n\n- [ID-Unet: Iterative Soft and Hard Deformation for View Synthesis](https://arxiv.org/abs/2103.02264)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/MingyuY/Iterative-view-synthesis)\n- [NeX: Real-time View Synthesis with Neural Basis Expansion](https://arxiv.org/abs/2103.05606)\u003cbr\u003e:open_mouth:oral:house:[project](https://nex-mpi.github.io/):tv:[video](https://www.youtube.com/watch?v=HyfkF7Z-ddA)\u003cbr\u003e利用神经基础扩展的实时视图合成技术\n- [Layout-Guided Novel View Synthesis from a Single Indoor Panorama](https://arxiv.org/abs/2103.17022)\u003cbr\u003e:star:[code](https://github.com/bluestyle97/PNVS)\n- [Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes](https://arxiv.org/abs/2104.06935)\u003cbr\u003e:house:[project](https://virtualhumans.mpi-inf.mpg.de/)\n- [Stable View Synthesis](https://arxiv.org/abs/2011.07233)\u003cbr\u003e:star:[code](https://github.com/intel-isl/StableViewSynthesis)\n* [Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes](https://arxiv.org/abs/2011.13084)\u003cbr\u003e:house:[project](http://www.cs.cornell.edu/~zl548/NSFF/):tv:[video](https://youtu.be/qsMIH7gYRCc)\n\n\u003ca name=\"31\"/\u003e\n\n## 31.Open-Set Recognition(开放集识别)\n\n- [Counterfactual Zero-Shot and Open-Set Visual Recognition](https://arxiv.org/abs/2103.00887)\u003cbr\u003e:star:[code](https://github.com/yue-zhongqi/gcm-cf)\u003cbr\u003e\n- [Few-shot Open-set Recognition by Transformation Consistency](https://arxiv.org/abs/2103.01537)\u003cbr\u003e\n- [Learning Placeholders for Open-Set Recognition](https://arxiv.org/abs/2103.15086)\u003cbr\u003e:open_mouth:oral\n\n\u003ca name=\"30\"/\u003e\n\n## 30.Neural rendering(神经渲染)\n\n- [DeRF: Decomposed Radiance Fields](https://arxiv.org/abs/2011.12490)\u003cbr\u003e:house:[project](https://ubc-vision.github.io/derf/)\u003cbr\u003e\n- [D-NeRF: Neural Radiance Fields for Dynamic Scenes](https://arxiv.org/abs/2011.13961)\u003cbr\u003e:house:[project](https://www.albertpumarola.com/research/D-NeRF/index.html)\u003cbr\u003e\n* [Neural Lumigraph Rendering](https://arxiv.org/abs/2103.11571)\u003cbr\u003e:sunflower:[dataset](https://drive.google.com/file/d/1BBpIfrqwZNYmG1TiFljlCnwsmL2OUxNT/view):house:[project](http://www.computationalimaging.org/publications/nlr/):tv:[video](https://www.youtube.com/watch?v=maVF-7x9644)\u003cbr\u003e斯坦福大学\n* [AutoInt: Automatic Integration for Fast Neural Volume Rendering](https://arxiv.org/abs/2012.01714)\u003cbr\u003e:open_mouth:oral:house:[project](http://www.computationalimaging.org/publications/automatic-integration/):tv:[video](https://youtu.be/GYxFYbih0PU)\u003cbr\u003e斯坦福大学\n* [pixelNeRF: Neural Radiance Fields from One or Few Images](https://arxiv.org/abs/2012.02190)\u003cbr\u003e:star:[code](https://github.com/sxyu/pixel-nerf):house:[project](https://alexyu.net/pixelnerf/):tv:[video](https://youtu.be/voebZx7f32g)\n* [IBRNet: Learning Multi-View Image-Based Rendering](https://arxiv.org/abs/2102.13090)\u003cbr\u003e:house:[project](https://ibrnet.github.io/)\u003cbr\u003e备注：有学者评论pixelNeRF和IBRNet的工作思想相近，但IBRNet似乎更加成熟。\n* [Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans](https://arxiv.org/abs/2012.15838)\u003cbr\u003e:star:[code](https://github.com/zju3dv/neuralbody):house:[project](https://zju3dv.github.io/neuralbody/):tv:[video](https://youtu.be/BPCAMeBCE-8)\u003cbr\u003e浙大等学者发明的Neural Body算法，输入多角度视频可输出3D人体和新角度视图。\n* [NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis](https://arxiv.org/abs/2012.03927)\u003cbr\u003e:house:[project](https://pratulsrinivasan.github.io/nerv/):tv:[video](https://youtu.be/4XyDdvhhjVo)\u003cbr\u003e在任意照明条件下，根据一组输入图像生成完整的3D场景\n* [Self-Supervised Visibility Learning for Novel View Synthesis](https://arxiv.org/abs/2103.15407)\u003cbr\u003e:star:[code](https://github.com/shiyujiao/SVNVS)\n* [STaR: Self-Supervised Tracking and Reconstruction of Rigid Objects in Motion With Neural Rendering](https://arxiv.org/abs/2101.01602)\u003cbr\u003e:star:[code](https://github.com/wentaoyuan):house:[project](https://wentaoyuan.github.io/star/):tv:[video](https://wentaoyuan.github.io/star/videos/overview.mp4)\n* [Pulsar: Efficient Sphere-Based Neural Rendering](https://arxiv.org/abs/2004.07484)\n* [Learning Compositional Radiance Fields of Dynamic Human Heads](https://arxiv.org/abs/2012.09955)\u003cbr\u003e:open_mouth:oral:house:[project](https://ziyanw1.github.io/hybrid_nerf/)\n* [NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections](https://openaccess.thecvf.com/content/CVPR2021/papers/Martin-Brualla_NeRF_in_the_Wild_Neural_Radiance_Fields_for_Unconstrained_Photo_CVPR_2021_paper.pdf)  \n* [Neural Geometric Level of Detail: Real-Time Rendering With Implicit 3D Shapes](https://arxiv.org/abs/2101.10994)\u003cbr\u003e:star:[code](https://github.com/nv-tlabs/nglod):house:[project](https://nv-tlabs.github.io/nglod/)\n* [Space-Time Neural Irradiance Fields for Free-Viewpoint Video](https://arxiv.org/abs/2011.12950)\u003cbr\u003e:house:[project](https://video-nerf.github.io/):tv:[video](https://youtu.be/2tN8ghNu2sI)   \n* [Neural Scene Graphs for Dynamic Scenes](https://arxiv.org/abs/2011.10379)\u003cbr\u003e:open_mouth:oral:house:[project](https://light.princeton.edu/publication/neural-scene-graphs/):tv:[video](https://youtu.be/ea4Y6P0Hk3o)\n* [NeuTex: Neural Texture Mapping for Volumetric Neural Rendering](https://arxiv.org/abs/2103.00762) \n\n\u003ca name=\"29\"/\u003e\n\n## 29.Human Pose Estimation(人体姿态估计)\n\n- [Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration](https://arxiv.org/abs/2103.02845)\u003cbr\u003e:star:[code](https://github.com/SeanChenxy/HandMesh)\u003cbr\u003e\n- [Monocular Real-time Full Body Capture with Inter-part Correlations](https://arxiv.org/abs/2012.06087)\u003cbr\u003e:tv:[video](https://www.youtube.com/watch?v=pAcywTUTv-E)\u003cbr\u003e在电影动作特效中，人体运动捕捉是关键技术，高质量的捕捉往往需要特殊设备，而如果能使用普通RGB相机进行运动捕捉，将会使人人都是特效师。该视频来自清华、马普所等单位的学者发表于CVPR2021的论文结果，使用单目RGB相机的动作捕捉。\n- [Behavior-Driven Synthesis of Human Dynamics](https://arxiv.org/abs/2103.04677)\u003cbr\u003e:star:[code](https://github.com/CompVis/behavior-driven-video-synthesis):house:[project](https://compvis.github.io/behavior-driven-video-synthesis/)\n- [Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation](https://arxiv.org/abs/2012.15175)\u003cbr\u003e:star:[code](https://github.com/greatlog/SWAHR-HumanPose)\u003cbr\u003e粗解：[2](https://mp.weixin.qq.com/s/lL1cz_L523TSdYJFfHA2lQ)\n- [Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression](https://arxiv.org/abs/2104.02300)\u003cbr\u003e:star:[code](https://github.com/HRNet/DEKR)\n- [SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks](https://arxiv.org/abs/2104.03313)\u003cbr\u003e:open_mouth:oral:house:[project](https://scanimate.is.tue.mpg.de/)\n- [On Self-Contact and Human Pose](https://arxiv.org/abs/2104.03176)\u003cbr\u003e:house:[project](https://tuch.is.tue.mpg.de/)\n- [Lite-HRNet: A Lightweight High-Resolution Network](https://arxiv.org/abs/2104.06403)\u003cbr\u003e:star:[code](https://github.com/HRNet/)\u003cbr\u003e解读：[Lite-HRNet：轻量级HRNet，FLOPs大幅下降](https://mp.weixin.qq.com/s/4V6EOYVSybMR9oxpcsWv9w)\n- [Deep Dual Consecutive Network for Human Pose Estimation](https://arxiv.org/abs/2103.07254)\u003cbr\u003e:star:[code](https://github.com/Pose-Group/DCPose)\n- [3D Human Action Representation Learning via Cross-View Consistency Pursuit](https://arxiv.org/abs/2104.14466)\u003cbr\u003e:star:[code](https://github.com/LinguoLi/CrosSCLR)\n- [Body Meshes as Points](https://arxiv.org/abs/2105.02467)\u003cbr\u003e:star:[code](https://github.com/jfzhang95/BMP)\n- [Unsupervised Human Pose Estimation through Transforming Shape Templates](https://arxiv.org/abs/2105.04154)\u003cbr\u003e:house:[project](https://infantmotion.github.io/)\n- [When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks](https://arxiv.org/abs/2105.06152)\n- [Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking](https://arxiv.org/abs/2106.03772) \n* 3D手部重建\n  * [Model-based 3D Hand Reconstruction via Self-Supervised Learning](https://arxiv.org/abs/2103.11703)\u003cbr\u003e:star:[code](https://github.com/TerenceCYJ/S2HAND):tv:[video](https://www.youtube.com/watch?v=tuQzu-UfSe8\u0026feature=youtu.be)\n* 人体运动迁移\n  * [Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling](https://arxiv.org/abs/2103.14338)\u003cbr\u003e:star:[code](https://github.com/HuangZhiChao95/FewShotMotionTransfer):tv:[video](https://www.youtube.com/watch?v=ZJ15X-sdKSU)\n* Human Volumetric Capture\n  * [POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture](https://arxiv.org/abs/2103.15331)\u003cbr\u003e:open_mouth:oral:house:[project](http://www.liuyebin.com/posefusion/posefusion.html)\n  * [High-Fidelity Neural Human Motion Transfer from Monocular Video](https://arxiv.org/abs/2012.10974)\n* 3D人体姿态估计\n  * [CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild](https://arxiv.org/abs/2011.14679)\u003cbr\u003e:star:[code](https://github.com/bastianwandt/CanonPose)\n  * [Context Modeling in 3D Human Pose Estimation: A Unified Perspective](https://arxiv.org/abs/2103.15507)\n  * [PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers](https://arxiv.org/abs/2011.13607)\u003cbr\u003e:star:[code](https://github.com/yu-frank/PerspectiveCropLayers):tv:[video](https://twitter.com/i/status/1334395954644930560)\u003cbr\u003e通过消除 location-dependent 透视效果来改进3D人体姿势估计技术工作。\u003cbr\u003e\n  * [Graph Stacked Hourglass Networks for 3D Human Pose Estimation](https://arxiv.org/abs/2103.16385)\n  * [Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors](https://arxiv.org/abs/2103.17265)\u003cbr\u003e:open_mouth:oral:house:[project](http://virtualhumans.mpi-inf.mpg.de/hps/)\n  * [SimPoE: Simulated Character Control for 3D Human Pose Estimation](https://arxiv.org/abs/2104.00683)\u003cbr\u003e:open_mouth:oral:house:[project](https://www.ye-yuan.com/simpoe/)\n  * [Reconstructing 3D Human Pose by Watching Humans in the Mirror](https://arxiv.org/abs/2104.00340)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/zju3dv/Mirrored-Human):house:[project](https://zju3dv.github.io/Mirrored-Human/)\n  * [Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo](https://arxiv.org/abs/2104.02273)\u003cbr\u003e:star:[code](https://github.com/jiahaoLjh/PlaneSweepPose)\n  * [PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation](https://arxiv.org/abs/2105.02465)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/jfzhang95/PoseAug)\n  * [AGORA: Avatars in Geography Optimized for Regression Analysis](https://arxiv.org/abs/2104.14643)\u003cbr\u003e:house:[project](https://agora.is.tue.mpg.de/)\n  * [Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals](https://openaccess.thecvf.com/content/CVPR2021/papers/Luo_Intelligent_Carpet_Inferring_3D_Human_Pose_From_Tactile_Signals_CVPR_2021_paper.pdf)\n  * [HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation](https://openaccess.thecvf.com/content/CVPR2021/papers/Li_HybrIK_A_Hybrid_Analytical-Neural_Inverse_Kinematics_Solution_for_3D_Human_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/Jeff-sjtu/HybrIK)\n  * [Neural Descent for Visual 3D Human Pose and Shape](https://arxiv.org/abs/2008.06910)\n  * [Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild](https://arxiv.org/abs/2103.10978) \n* 动物姿态估计\n  * [From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation](https://arxiv.org/abs/2103.14843)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/chaneyddtt/UDA-Animal-Pose):tv:[video](https://www.youtube.com/watch?v=uF8BE9J7wNw)\n* 3D人体网格配准\n  * [Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration](https://arxiv.org/abs/2104.08160)\u003cbr\u003e:star:[code](https://github.com/taconite/PTF):house:[project](https://taconite.github.io/PTF/website/PTF.html):tv:[video](https://youtu.be/XNk4o2Z0S2c)\n* 多人人体重建\n  * [Multi-person Implicit Reconstruction from a Single Image](https://arxiv.org/abs/2104.09283)\n* 3D人体运动\n  * [We are More than Our Joints: Predicting how 3D Bodies Move](https://arxiv.org/pdf/2012.00619.pdf)\u003cbr\u003e:house:[project](https://yz-cnsdqz.github.io/MOJO/MOJO.html):tv:[video](https://youtu.be/5DqLWAb37X0)\u003cbr\u003e分享会\n* 人体运动捕捉\n  * [Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors](http://www.liuyebin.com/Function4D/assets/Function4D.pdf)\u003cbr\u003e:open_mouth:oral:house:[project](http://www.liuyebin.com/Function4D/Function4D.html):tv:[video](https://www.youtube.com/watch?v=-rWUn4fEQNU)\n  * [ChallenCap: Monocular 3D Capture of Challenging Human Performances Using Multi-Modal References](https://arxiv.org/abs/2103.06747)\n* 多人姿态估计\n  * [FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions](https://arxiv.org/abs/2105.14185)\u003cbr\u003e:star:[code](https://git.io/AdelaiDet)\u003cbr\u003eFCPose，无 ROI 和无分组的端到端可训练人体姿势估计器可以达到更好的准确性和速度，在 COCO 数据集上，使用 DLA-34 主干的 FCPose 实时版本比 Mask R-CNN（ResNet-101）快 4.5 倍（41.67FPS vs. 9.26FPS），同时实现了性能的提高。与最近的自上而下和自下而上的方法相比，FCPose 还实现了更好的速度/准确度权衡。\n  * [Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks](https://arxiv.org/abs/2104.01797)\u003cbr\u003e:star:[code](https://github.com/3dpose/3D-Multi-Person-Pose)\n* 手-物交互姿态估计\n  * [Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time](https://arxiv.org/abs/2106.05266)\u003cbr\u003e:star:[code](https://github.com/stevenlsw/Semi-Hand-Object):house:[project](https://stevenlsw.github.io/Semi-Hand-Object/):tv:[video](https://youtu.be/7bnl2olUt-0)\n* 人体关键点检测\n  * [Regressive Domain Adaptation for Unsupervised Keypoint Detection](https://arxiv.org/abs/2103.06175)\u003cbr\u003e:star:[code](https://github.com/thuml/Transfer-Learning-Library)\n* 3D人体形状\n  * [LEAP: Learning Articulated Occupancy of People](https://arxiv.org/abs/2104.06849)\u003cbr\u003e:star:[code](https://github.com/neuralbodies/leap):house:[project](https://neuralbodies.github.io/LEAP/):tv:[video](https://youtu.be/UVB8A_T5e3c)\n  * [Beyond Static Features for Temporally Consistent 3D Human Pose and Shape From a Video](https://arxiv.org/abs/2011.08627)\u003cbr\u003e:star:[code](https://github.com/hongsukchoi/TCMR_RELEASE):tv:[video](https://www.youtube.com/watch?v=WB3nTnSQDII)\n* 人体动画（姿势迁移）\n  * [Pose-Guided Human Animation From a Single Image in the Wild](https://arxiv.org/abs/2012.03796)\n* 基于人体感应的3D健身训练自动系统\n  * [AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training](https://openaccess.thecvf.com/content/CVPR2021/papers/Fieraru_AIFit_Automatic_3D_Human-Interpretable_Feedback_Models_for_Fitness_Training_CVPR_2021_paper.pdf)\u003cbr\u003e:house:[project](http://vision.imar.ro/fit3d/)\n* 三维人体运动\n  * [Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes](https://arxiv.org/abs/2012.05522)\u003cbr\u003e:star:[code](https://github.com/jiashunwang/Long-term-Motion-in-3D-Scenes):house:[project](https://jiashunwang.github.io/Long-term-Motion-in-3D-Scenes/):tv:[video](https://youtu.be/qQ0GmCP1Ksw)\n* 三维人体重建\n  * [StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision](https://arxiv.org/abs/2104.05289)\u003cbr\u003e:star:[code](https://github.com/CrisHY1995/StereoPIFu_Code):house:[project](https://hy1995.top/StereoPIFuProject/)\n* 手势到手势翻译\n  * [Model-Aware Gesture-to-Gesture Translation](https://openaccess.thecvf.com/content/CVPR2021/papers/Hu_Model-Aware_Gesture-to-Gesture_Translation_CVPR_2021_paper.pdf)\n* 3D人体运动预测\n  * [Towards Accurate 3D Human Motion Prediction From Incomplete Observations](https://openaccess.thecvf.com/content/CVPR2021/papers/Cui_Towards_Accurate_3D_Human_Motion_Prediction_From_Incomplete_Observations_CVPR_2021_paper.pdf)\n* 手势识别\n  * [Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics](https://arxiv.org/abs/2007.12287)\u003cbr\u003e:star:[code](https://github.com/facebookresearch/body2hands):house:[project](http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/):tv:[video](http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/supp_pres_vCVPR_blur.mp4)\n* 三维人体网格重建\n  * [Holistic 3D Human and Scene Mesh Estimation From Single View Images](https://arxiv.org/abs/2012.01591)\n* 微观手势情感分析\n  * [iMiGUE: An Identity-Free Video Dataset for Micro-Gesture Understanding and Emotion Analysis](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_iMiGUE_An_Identity-Free_Video_Dataset_for_Micro-Gesture_Understanding_and_Emotion_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/linuxsino/iMiGUE)\n* Dense Human Correspondences\n  * [HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences](https://arxiv.org/abs/2103.15573)\u003cbr\u003e:star:[code](https://github.com/googleinterns/humangps):house:[project](https://feitongt.github.io/HumanGPS/):tv:[video](https://youtu.be/Ji34XtrJQ5o)\n\n\u003ca name=\"28\"/\u003e\n\n## 28.Dense prediction(密集预测)\n\n- [Densely connected multidilated convolutional networks for dense prediction tasks](https://arxiv.org/abs/2011.11844)\u003cbr\u003e提出的D3Net在语义分割\u0026音乐源分离任务上的表现优于SOTA网络\u003cbr\u003e\n- [Dense Contrastive Learning for Self-Supervised Visual Pre-Training](https://arxiv.org/abs/2011.09157)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/WXinlong/DenseCL)\n* [Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning](https://arxiv.org/abs/2011.10043)\u003cbr\u003e:star:[code](https://github.com/zdaxie/PixPro)\n* [Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks](https://openaccess.thecvf.com/content/CVPR2021/papers/Takahashi_Densely_Connected_Multi-Dilated_Convolutional_Networks_for_Dense_Prediction_Tasks_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/sony/ai-research-code/tree/master/d3net)\n\n\u003ca name=\"27\"/\u003e\n\n## 27.Semantic Line Detection(语义线检测)\n* [Harmonious Semantic Line Detection via Maximal Weight Clique Selection](https://arxiv.org/abs/2104.06903)\u003cbr\u003e:star:[code](https://github.com/dongkwonjin)\n \n\n\u003ca name=\"26\"/\u003e\n\n## 26.Video Processing(视频相关技术)\n* [Skip-Convolutions for Efficient Video Processing](https://arxiv.org/abs/2104.11487)\n* [VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples](https://arxiv.org/abs/2103.05905)\u003cbr\u003e:star:[code](https://github.com/tinapan-pt/VideoMoCo)\n* [Learning by Aligning Videos in Time](https://arxiv.org/abs/2103.17260)\n* [Hierarchical Motion Understanding via Motion Programs](https://arxiv.org/abs/2104.11216)\u003cbr\u003e:house:[project](https://sumith1896.github.io/motion2prog/):tv:[video](https://youtu.be/OpyY-s0LKAs)\n* [Stochastic Image-to-Video Synthesis using cINNs](https://arxiv.org/abs/2105.04551)\u003cbr\u003e:star:[code](https://github.com/CompVis/image2video-synthesis-using-cINNs):house:[project](https://compvis.github.io/image2video-synthesis-using-cINNs/)\n* [Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions](https://arxiv.org/abs/2105.04489)\u003cbr\u003e:house:[project](http://moments.csail.mit.edu/spoken.html)\n* [Gradient Forward-Propagation for Large-Scale Temporal Video Modelling](https://arxiv.org/abs/2106.08318)\n* [Learning To Reconstruct High Speed and High Dynamic Range Videos From Events](https://openaccess.thecvf.com/content/CVPR2021/papers/Zou_Learning_To_Reconstruct_High_Speed_and_High_Dynamic_Range_Videos_CVPR_2021_paper.pdf) \n* 视频摘要\n  * [Learning Discriminative Prototypes with Dynamic Time Warping](https://arxiv.org/abs/2103.09458)\u003cbr\u003e:star:[code](https://github.com/BorealisAI/TSC-Disc-Proto)\n  * [Learning Triadic Belief Dynamics in Nonverbal Communication from Videos](https://arxiv.org/abs/2104.02841)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/LifengFan/Triadic-Belief-Dynamics)\n* 视频编解码\n  * [MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing](https://arxiv.org/abs/2103.01786)\u003cbr\u003e:star:[code](https://github.com/xyvirtualgroup/MetaSCI-CVPR2021)\n  * [FVC: A New Framework towards Deep Video Compression in Feature Space](https://arxiv.org/abs/2105.09600)\u003cbr\u003e:open_mouth:oral\n  * [Memory-Efficient Network for Large-Scale Video Compressive Sensing](https://arxiv.org/abs/2103.03089)\u003cbr\u003e:star:[code](https://github.com/BoChenGroup/RevSCI-net)\n  * [Deep Learning in Latent Space for Video Prediction and Compression](https://openaccess.thecvf.com/content/CVPR2021/papers/Liu_Deep_Learning_in_Latent_Space_for_Video_Prediction_and_Compression_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/BowenL0218/Video_Compression)\n* 视频插帧\n  * [FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation](https://arxiv.org/pdf/2012.08512.pdf)\u003cbr\u003e:star:[code](https://tarun005.github.io/FLAVR/Code):house:[project](https://tarun005.github.io/FLAVR/)\u003cbr\u003e\n  * [Deep Animation Video Interpolation in the Wild](https://arxiv.org/abs/2104.02495)\u003cbr\u003e:star:[code](https://github.com/lisiyao21/AnimeInterp/)\n  * [TimeLens: Event-based Video Frame Interpolation](https://arxiv.org/abs/2106.07286)\u003cbr\u003e:star:[code](https://github.com/uzh-rpg/rpg_timelens):sunflower:[dataset](http://rpg.ifi.uzh.ch/TimeLens.html):tv:[video](https://youtu.be/dVLyia-ezvo)\n  * [Time Lens: Event-based Video Frame Interpolation](https://openaccess.thecvf.com/content/CVPR2021/papers/Tulyakov_Time_Lens_Event-Based_Video_Frame_Interpolation_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/uzh-rpg/rpg_timelens):house:[project](http://rpg.ifi.uzh.ch/TimeLens.html):tv:[video](https://www.youtube.com/watch?v=dVLyia-ezvo)\n* 视频语言学习（video-and-language learning）\n  * [Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling](https://arxiv.org/pdf/2102.06183.pdf)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/jayleicn/ClipBERT)\u003cbr\u003e\n* 视频预测\n  * [Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction](https://arxiv.org/abs/2103.04174)\u003cbr\u003e:house:[project](https://sites.google.com/view/ghvae):tv:[video](https://youtu.be/C8_-z8SEGOU)\n  * [Learning Semantic-Aware Dynamics for Video Prediction](https://arxiv.org/abs/2104.09762)\n  * [Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning ](https://arxiv.org/abs/2104.00924)\u003cbr\u003e:star:[code](https://github.com/sangmin-git/LMC-Memory)\u003cbr\u003e解读：[引入记忆模块，突破长距离依赖视频预测的性能瓶颈](https://mp.weixin.qq.com/s/GXcoHk9ks_ekVv-o14fVGg)\n  * [Learning Goals from Failure](https://arxiv.org/abs/2006.15657)\u003cbr\u003e:star:[code](https://github.com/cvlab-columbia/aha):house:[project](https://aha.cs.columbia.edu/)\n  * [MotionRNN: A Flexible Model for Video Prediction With Spacetime-Varying Motions](https://arxiv.org/abs/2103.02243)\n* 视频理解\n  * [Context-aware Biaffine Localizing Network for Temporal Sentence Grounding](https://arxiv.org/abs/2103.11555)\u003cbr\u003e:star:[code](https://github.com/liudaizong/CBLN)\n  * [Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos](https://arxiv.org/abs/2103.12346)\u003cbr\u003e:house:[project](https://sijiesong.github.io/co-grounding/)\n  * [Visual Semantic Role Labeling for Video Understanding](https://arxiv.org/abs/2104.00990)\u003cbr\u003e:house:[project](https://vidsitu.org/)\n  * [Temporal Query Networks for Fine-grained Video Understanding](https://arxiv.org/abs/2104.09496)\u003cbr\u003e:open_mouth:oral:house:[project](https://www.robots.ox.ac.uk/~vgg/research/tqn/) \n  * [Shot Contrastive Self-Supervised Learning for Scene Boundary Detection](https://arxiv.org/abs/2104.13537)\n  * [FrameExit: Conditional Early Exiting for Efficient Video Recognition](https://arxiv.org/abs/2104.13400)\u003cbr\u003e:open_mouth:oral\n  * [Towards Long-Form Video Understanding](https://openaccess.thecvf.com/content/CVPR2021/papers/Wu_Towards_Long-Form_Video_Understanding_CVPR_2021_paper.pdf)\n* 视频缩放\n  * [Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling](https://arxiv.org/abs/2103.14858)\u003cbr\u003e:star:[code](https://github.com/ding3820/MIMO-VRN):house:[project](https://ding3820.github.io/MIMO-VRN/)\n* 视频异常检测\n  * [MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection](https://arxiv.org/abs/2104.01633)\n  * [Learning Normal Dynamics in Videos With Meta Prototype Network](https://arxiv.org/abs/2104.06689)\u003cbr\u003e:star:[code](https://github.com/ktr-hubrt/MPN/)\u003cbr\u003e[又好又快的视频异常检测，引入元学习的动态原型学习组件](https://mp.weixin.qq.com/s/osEi-MtD6ViYT9_mzWDS-Q)\n  * [Anomaly Detection in Video via Self-Supervised and Multi-Task Learning](https://arxiv.org/abs/2011.07491)\n* 视频声源定位\n  * [Localizing Visual Sounds the Hard Way](https://arxiv.org/abs/2104.02691)\u003cbr\u003e:star:[code](https://github.com/hche11/Localizing-Visual-Sounds-the-Hard-Way):house:[project](https://www.robots.ox.ac.uk/~vgg/research/lvs/)\n* 视频分析\n  * [Self-Supervised Learning for Semi-Supervised Temporal Action Proposal](https://arxiv.org/abs/2104.03214)\u003cbr\u003e:star:[code](https://github.com/wangxiang1230/SSTAP)\n* 视频生成\n  * [Playable Video Generation](https://arxiv.org/abs/2101.12195)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/willi-menapace/PlayableVideoGeneration):house:[project](https://willi-menapace.github.io/playable-video-generation-website/):tv:[video](https://www.youtube.com/watch?v=QtDjSyZERpg)\n  * [One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing](https://arxiv.org/abs/2011.15126)\u003cbr\u003e:open_mouth:oral:star:[code](https://github.com/NVlabs/imaginaire):house:[project](https://nvlabs.github.io/face-vid2vid/):tv:[video](https://youtu.be/nLYg9Waw72U)\u003cbr\u003e解读：[颠覆视频压缩的不一定是新压缩算法，而可能是GAN！英伟达新算法最高压缩90%流量](https://mp.weixin.qq.com/s/UpfgxiIaSU4iIjbrkS--zA)\u003cbr\u003eNvidia的新研究，使用人脸关键点+GAN重建视频通话，相比传统的H.264节省90%流量。代码未开源，但英伟达的GAN框架开源了。\n* 视频视角切换\n  * [Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos](https://arxiv.org/abs/2104.07905)\n* Action Selection Learning\n  * [Weakly Supervised Action Selection Learning in Video](https://arxiv.org/abs/2105.02439)\u003cbr\u003e:star:[code](https://github.com/layer6ai-labs/ASL)\n* 视频描述\n  * [Towards Diverse Paragraph Captioning for Untrimmed Videos](https://arxiv.org/abs/2105.14477)\u003cbr\u003e:star:[code](https://github.com/syuqings/video-paragraph)\n* 视频分类\n  * [Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks](https://arxiv.org/abs/2002.05123)\u003cbr\u003e:star:[code](https://github.com/roiponytch/Flickering_Adversarial_Video)\n* 视频字幕\n  * [Sketch, Ground, and Refine: Top-Down Dense Video Captioning](https://openaccess.thecvf.com/content/CVPR2021/papers/Deng_Sketch_Ground_and_Refine_Top-Down_Dense_Video_Captioning_CVPR_2021_paper.pdf)\u003cbr\u003e:star:[code](https://github.com/bearcatt/SGR)\n* Video Grounding\n  * [Cascaded Prediction Network via Segment Tree for Temporal Video Grounding](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhao_Cascaded_Prediction_Network_via_Segment_Tree_for_Temporal_Video_Grounding_CVPR_2021_paper.pdf)\n  * [Interventional Video Grounding With Dual Contrastive Lear","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52CV%2FCVPR-2021-Papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F52CV%2FCVPR-2021-Papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F52CV%2FCVPR-2021-Papers/lists"}