{"id":15034101,"url":"https://github.com/patrick-llgc/learning-deep-learning","last_synced_at":"2025-05-14T19:07:59.659Z","repository":{"id":40001903,"uuid":"86965622","full_name":"patrick-llgc/Learning-Deep-Learning","owner":"patrick-llgc","description":"Paper reading notes on Deep Learning and Machine Learning","archived":false,"fork":false,"pushed_at":"2024-11-29T15:28:50.000Z","size":91704,"stargazers_count":1181,"open_issues_count":1,"forks_count":176,"subscribers_count":111,"default_branch":"master","last_synced_at":"2025-05-14T19:07:51.505Z","etag":null,"topics":["3d-object-detection","3d-object-recognition","cnn","computer-vision","deep-learning","literature-review","machine-learning","medical","medical-imaging","paper","paper-reading","paper-review","point-cloud","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/patrick-llgc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-02T05:29:55.000Z","updated_at":"2025-05-13T04:42:30.000Z","dependencies_parsed_at":"2023-02-16T20:01:07.173Z","dependency_job_id":"0ed8d9ce-fd69-4ff7-a9ce-ff25bb866663","html_url":"https://github.com/patrick-llgc/Learning-Deep-Learning","commit_stats":{"total_commits":903,"total_committers":7,"mean_commits":129.0,"dds":0.2646733111849391,"last_synced_commit":"49661042086810d400fca0a24dcb730db5d8e1ad"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-llgc%2FLearning-Deep-Learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-llgc%2FLearning-Deep-Learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-llgc%2FLearning-Deep-Learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-llgc%2FLearning-Deep-Learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/patrick-llgc","download_url":"https://codeload.github.com/patrick-llgc/Learning-Deep-Learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254209859,"owners_count":22032897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","3d-object-recognition","cnn","computer-vision","deep-learning","literature-review","machine-learning","medical","medical-imaging","paper","paper-reading","paper-review","point-cloud","reinforcement-learning"],"created_at":"2024-09-24T20:23:55.823Z","updated_at":"2025-05-14T19:07:57.829Z","avatar_url":"https://github.com/patrick-llgc.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Paper notes\nThis repository contains my paper reading notes on deep learning and machine learning. It is inspired by [Denny Britz](https://github.com/dennybritz/deeplearning-papernotes) and [Daniel Takeshi](https://github.com/DanielTakeshi/Paper_Notes). A minimalistic webpage generated with Github io can be found [here](https://patrick-llgc.github.io/Learning-Deep-Learning/).\n\n## About me\nMy name is [Patrick Langechuan Liu](https://www.linkedin.com/in/patrick-llgc/). After about a decade of education and research in physics, I found my passion in deep learning and autonomous driving.\n\n## What to read\nIf you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into [this list of papers](start/first_cnn_papers.md). I did so ([see my notes](start/first_cnn_papers_notes.md)) and it served me well.\n\nHere is [a list of trustworthy sources of papers](trusty.md) in case I ran out of papers to read.\n\n## My review posts by topics\nI regularly update [my blog in Toward Data Science](https://medium.com/@patrickllgc).\n\n- [BEV Perception in Mass Production Autonomous Driving](https://towardsdatascience.com/bev-perception-in-mass-production-autonomous-driving-c6e3f1e46ae0)\n- [Challenges of Mass Production Autonomous Driving in China](https://towardsdatascience.com/challenges-of-mass-production-autonomous-driving-in-china-407c7e2dc5d8)\n- [Vision-centric Semantic Occupancy Prediction for Autonomous Driving](https://towardsdatascience.com/vision-centric-semantic-occupancy-prediction-for-autonomous-driving-16a46dbd6f65) ([related paper notes](topics/topic_occupancy_network.md))\n- [Drivable Space in Autonomous Driving — The Industry](https://medium.com/@patrickllgc/drivable-space-in-autonomous-driving-the-industry-7a4624b94d41)\n- [Drivable Space in Autonomous Driving — The Academia](https://towardsdatascience.com/drivable-space-in-autonomous-driving-a-review-of-academia-ef1a6aa4dc15)\n- [Drivable Space in Autonomous Driving — The Concept](https://towardsdatascience.com/drivable-space-in-autonomous-driving-the-concept-df699bb8682f)\n- [Monocular BEV Perception with Transformers in Autonomous Driving](https://towardsdatascience.com/monocular-bev-perception-with-transformers-in-autonomous-driving-c41e4a893944) ([related paper notes](topics/topic_transformers_bev.md))\n- [Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning](https://towardsdatascience.com/illustrated-difference-between-mlp-and-transformers-for-tensor-reshaping-52569edaf89)\n- [Monocular 3D Lane Line Detection in Autonomous Driving](https://towardsdatascience.com/monocular-3d-lane-line-detection-in-autonomous-driving-4d7cdfabf3b6) ([related paper notes](topics/topic_3d_lld.md))\n- [Deep-Learning based Object detection in Crowded Scenes](https://towardsdatascience.com/deep-learning-based-object-detection-in-crowded-scenes-1c9fddbd7bc4) ([related paper notes](topics/topic_crowd_detection.md))\n- [Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving](https://towardsdatascience.com/monocular-birds-eye-view-semantic-segmentation-for-autonomous-driving-ee2f771afb59) ([related paper notes](topics/topic_bev_segmentation.md))\n- [Deep Learning in Mapping for Autonomous Driving](https://towardsdatascience.com/deep-learning-in-mapping-for-autonomous-driving-9e33ee951a44)\n- [Monocular Dynamic Object SLAM in Autonomous Driving](https://towardsdatascience.com/monocular-dynamic-object-slam-in-autonomous-driving-f12249052bf1)\n- [Monocular 3D Object Detection in Autonomous Driving — A Review](https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e)\n- [Self-supervised Keypoint Learning — A Review](https://towardsdatascience.com/self-supervised-keypoint-learning-aade18081fc3)\n- [Single Stage Instance Segmentation — A Review](https://towardsdatascience.com/single-stage-instance-segmentation-a-review-1eeb66e0cc49)\n- [Self-paced Multitask Learning — A Review](https://towardsdatascience.com/self-paced-multitask-learning-76c26e9532d0)\n- [Convolutional Neural Networks with Heterogeneous Metadata](https://towardsdatascience.com/convolutional-neural-networks-with-heterogeneous-metadata-2af9241218a9)\n- [Lifting 2D object detection to 3D in autonomous driving](https://towardsdatascience.com/geometric-reasoning-based-cuboid-generation-in-monocular-3d-object-detection-5ee2996270d1)\n- [Multimodal Regression](https://towardsdatascience.com/anchors-and-multi-bin-loss-for-multi-modal-target-regression-647ea1974617)\n- [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link\u0026sk=7628c5be39f876b2c05e43c13d0b48a3)\n\n## 2024-11 (1)\n- [On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258) [[Notes](paper_notes/opportunities_foundation_models.md)]\n\n## 2024-06 (8)\n- [LINGO-1: Exploring Natural Language for Autonomous Driving](https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/) [[Notes](paper_notes/lingo_1.md)] [Wayve, open-loop world model]\n- [LINGO-2: Driving with Natural Language](https://wayve.ai/thinking/lingo-2-driving-with-language/) [[Notes](paper_notes/lingo_2.md)] [Wayve, closed-loop world model]\n- [OpenVLA: An Open-Source Vision-Language-Action Model](https://arxiv.org/abs/2406.09246) [open source RT-2]\n- [Parting with Misconceptions about Learning-based Vehicle Motion Planning](https://arxiv.org/abs/2306.07962) \u003ckbd\u003eCoRL 2023\u003c/kbd\u003e [Simple non-learning based baseline]\n- [QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving](https://arxiv.org/abs/2404.01486) [Waabi]\n- [MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving](https://ieeexplore.ieee.org/document/7139412) [[Notes](paper_notes/mpdm.md)] \u003ckbd\u003eICRA 2015\u003c/kbd\u003e [Behavior planning, UMich, May Autonomy]\n- [MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction](https://www.roboticsproceedings.org/rss11/p43.pdf) [[Notes](paper_notes/mpdm2.md)] \u003ckbd\u003eRSS 2015\u003c/kbd\u003e [Behavior planning]\n- [MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment](https://link.springer.com/article/10.1007/s10514-017-9619-z) \u003ckbd\u003eRSS 2017\u003c/kbd\u003e [Behavior planning]\n- [EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching](https://arxiv.org/abs/2003.02746) [[Notes](paper_notes/eudm.md)] \u003ckbd\u003eICRA 2020\u003c/kbd\u003e [Wenchao Ding, Shaojie Shen, Behavior planning]\n- [TPP: Tree-structured Policy Planning with Learned Behavior Models](https://arxiv.org/abs/2301.11902) \u003ckbd\u003eICRA 2023\u003c/kbd\u003e [Marco Pavone, Nvidia, Behavior planning]\n- [MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving](https://arxiv.org/abs/2308.12021) [[Notes](paper_notes/marc.md)] \u003ckbd\u003eRAL 2023\u003c/kbd\u003e [Shaojie Shen, Behavior planning]\n- [EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments](https://arxiv.org/abs/2108.07993) \u003ckbd\u003eTRO 2021\u003c/kbd\u003e [Wenchao Ding, encyclopedia of pnc]\n- [trajdata: A Unified Interface to Multiple Human Trajectory Datasets](https://arxiv.org/abs/2307.13924) \u003ckbd\u003eNeurIPS 2023\u003c/kbd\u003e [Marco Pavone, Nvidia]\n- [Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization](https://arxiv.org/abs/2307.09466) [Xpeng]\n- [Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles](https://arxiv.org/abs/1910.04586) [[Notes](paper_notes/joint_learned_bptp.md)] \u003ckbd\u003eIROS 2019 Oral\u003c/kbd\u003e [Uber ATG, behavioral planning, motion planning]\n- [Enhancing End-to-End Autonomous Driving with Latent World Model](https://arxiv.org/abs/2406.08481)\n- [OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments](https://arxiv.org/abs/2312.09243) [Jiwen Lu]\n- [RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision](https://arxiv.org/abs/2309.09502) \u003ckbd\u003eICRA 2024\u003c/kbd\u003e\n- [EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision](https://arxiv.org/pdf/2311.02077) [Sanja, Marco, NV]\n- [FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation](https://opendrivelab.com/e2ead/AD23Challenge/Track_3_NVOCC.pdf?=\u0026linkId=100000205404832)\n- [Trajeglish: Traffic Modeling as Next-Token Prediction](https://arxiv.org/abs/2312.04535) \u003ckbd\u003eICLR 2024\u003c/kbd\u003e\n- [Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks](https://arxiv.org/pdf/2106.13052) \u003ckbd\u003eITSC 2021\u003c/kbd\u003e\n- [Learning-Based Approach for Online Lane Change Intention Prediction](https://ieeexplore.ieee.org/document/6629564/) \u003ckbd\u003eIV 2013\u003c/kbd\u003e [SVM, LC intention prediction]\n- [Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario](https://ieeexplore.ieee.org/document/10171417) \u003ckbd\u003eRAL 2023\u003c/kbd\u003e [Wenchao Ding, Huawei, crowdsourced map]\n- [FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow](https://arxiv.org/abs/2305.01622) \u003ckbd\u003eICRA 2023\u003c/kbd\u003e\n- [Hybrid A-star: Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments](https://www.semanticscholar.org/paper/Path-Planning-for-Autonomous-Vehicles-in-Unknown-Dolgov-Thrun/0e8c927d9c2c46b87816a0f8b7b8b17ed1263e9c) \u003ckbd\u003eIJRR 2010\u003c/kbd\u003e [Dolgov, Thrun, Searching]\n- [Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame](https://www.semanticscholar.org/paper/Optimal-trajectory-generation-for-dynamic-street-in-Werling-Ziegler/6bda8fc13bda8cffb3bb426a73ce5c12cc0a1760) \u003ckbd\u003eICRA 2010\u003c/kbd\u003e [Werling, Thrun, Sampling] [MUST READ for planning folks]\n- [Autonomous Driving on Curvy Roads Without Reliance on Frenet Frame: A Cartesian-Based Trajectory Planning Method](https://ieeexplore.ieee.org/document/9703250) \u003ckbd\u003eTITS 2022\u003c/kbd\u003e\n- [Baidu Apollo EM Motion Planner](https://arxiv.org/abs/1807.08048) [[Notes](paper_notes/apollo_em_planner.md)][Optimization]\n- [基于改进混合A*的智能汽车时空联合规划方法](https://www.qichegongcheng.com/CN/abstract/abstract1500.shtml) \u003ckbd\u003e汽车工程: 规划\u0026决策2023年\u003c/kbd\u003e [Joint optimization, search]\n- [Enable Faster and Smoother Spatio-temporal Trajectory Planning for Autonomous Vehicles in Constrained Dynamic Environment](https://journals.sagepub.com/doi/abs/10.1177/0954407020906627) \u003ckbd\u003eJAE 2020\u003c/kbd\u003e [Joint optimization, search]\n- [Focused Trajectory Planning for Autonomous On-Road Driving](https://www.ri.cmu.edu/pub_files/2013/6/IV2013-Tianyu.pdf) \u003ckbd\u003eIV 2013\u003c/kbd\u003e [Joint optimization, Iteration]\n- [SSC: Safe Trajectory Generation for Complex Urban Environments Using Spatio-Temporal Semantic Corridor](https://arxiv.org/abs/1906.09788) \u003ckbd\u003eRAL 2019\u003c/kbd\u003e [Joint optimization, SSC, Wenchao Ding, Motion planning]\n- [AlphaGo: Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961) [[Notes](paper_notes/alphago.md)] \u003ckbd\u003eNature 2016\u003c/kbd\u003e [DeepMind, MTCS]\n- [AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/full/10.1126/science.aar6404) \u003ckbd\u003eScience 2017\u003c/kbd\u003e [DeepMind]\n- [MuZero: Mastering Atari, Go, chess and shogi by planning with a learned model](https://www.nature.com/articles/s41586-020-03051-4) \u003ckbd\u003eNature 2020\u003c/kbd\u003e [DeepMind]\n- [Grandmaster-Level Chess Without Search](https://arxiv.org/abs/2402.04494) [DeepMind]\n- [Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving](https://arxiv.org/abs/1610.03295) [MobileEye, desire and traj optimization]\n- [Comprehensive Reactive Safety: No Need For A Trajectory If You Have A Strategy](https://arxiv.org/abs/2207.00198) \u003ckbd\u003eIROS 2022\u003c/kbd\u003e [Da Fang, Qcraft]\n- [BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning](https://arxiv.org/abs/2310.10357) \u003ckbd\u003eAAAI 2024\u003c/kbd\u003e\n- [LLM-MCTS: Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://arxiv.org/abs/2305.14078) \u003ckbd\u003eNeurIPS 2023\u003c/kbd\u003e\n- [Hivt: Hierarchical vector transformer for multi-agent motion prediction](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf) \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e [Zikang Zhou, agent-centric, motion prediction]\n- [QCNet: Query-Centric Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.pdf) [[Notes](paper_notes/qcnet.md)] \u003ckbd\u003eCVPR 2023\u003c/kbd\u003e [Zikang Zhou, scene-centric, motion prediction]\n\n## 2024-03 (11)\n- [Genie: Generative Interactive Environments](https://arxiv.org/abs/2402.15391) [[Notes](paper_notes/genie.md)] [DeepMind, World Model]\n- [DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving](https://arxiv.org/abs/2309.09777) [[Notes](paper_notes/drive_dreamer.md)] [Jiwen Lu, World Model]\n- [WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens](https://arxiv.org/abs/2401.09985) [[Notes](paper_notes/world_dreamer.md)] [Jiwen Lu, World Model]\n- [VideoPoet: A Large Language Model for Zero-Shot Video Generation](https://arxiv.org/abs/2312.14125) [Like sora, but LLM, NOT world model]\n- [Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2304.08818) [[Notes](paper_notes/video_ldm.md)] \u003ckbd\u003eCVPR 2023\u003c/kbd\u003e [Sanja, Nvidia, VideoLDM, Video prediction]\n- [Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos](https://arxiv.org/abs/2206.11795) \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [[Notes](paper_notes/vpt.md)] [OpenAI]\n- [MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge](https://arxiv.org/abs/2206.08853) \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [NVidia, Outstanding paper award]\n- [Humanoid Locomotion as Next Token Prediction](https://arxiv.org/abs/2402.19469) [[Notes](paper_notes/locomotion_next_token_pred.md)] [Berkeley, EAI]\n- [RPT: Robot Learning with Sensorimotor Pre-training](https://arxiv.org/abs/2306.10007) [[Notes](paper_notes/rpt.md)] \u003ckbd\u003eCoRL 2023 Oral\u003c/kbd\u003e [Berkeley, EAI]\n- [MVP: Real-World Robot Learning with Masked Visual Pre-training](https://arxiv.org/abs/2210.03109) [[Notes](paper_notes/mvp.md)] \u003ckbd\u003eCoRL 2022\u003c/kbd\u003e [Berkeley, EAI]\n- [BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning](https://arxiv.org/abs/2202.02005) [[Notes](paper_notes/bc_z.md)] \u003ckbd\u003eCoRL 2021\u003c/kbd\u003e [Eric Jang, 1X]\n- [GenAD: Generalized Predictive Model for Autonomous Driving](https://arxiv.org/abs/2403.09630) [[Notes](paper_notes/genad.md)] \u003ckbd\u003eCVPR 2024\u003c/kbd\u003e\n- [HG-DAgger: Interactive Imitation Learning with Human Experts](https://arxiv.org/abs/1810.02890) [DAgger]\n- [DriveGAN: Towards a Controllable High-Quality Neural Simulation](https://arxiv.org/abs/2104.15060) [[Notes](paper_notes/drive_gan.md)] \u003ckbd\u003eCVPR 2021 oral\u003c/kbd\u003e [Nvidia, Sanja]\n- [VideoGPT: Video Generation using VQ-VAE and Transformers](https://arxiv.org/abs/2104.10157) [[Notes](paper_notes/videogpt.md)] [Pieter Abbeel]\n- [LLM, Vision Tokenizer and Vision Intelligence, by Lu Jiang](https://mp.weixin.qq.com/s/Hamz5XMT1tSZHKdPaCBTKg) [[Notes](paper_notes/llm_vision_intel.md)] [Interview Lu Jiang]\n- [AV2.0: Reimagining an autonomous vehicle](https://arxiv.org/abs/2108.05805) [[Notes](paper_notes/av20.md)] [Wayve, Alex Kendall]\n- [Simulation for E2E AD](https://www.youtube.com/watch?v=8fivoXbT1Ao\u0026ab_channel=Wayve) [Wayve, Tech Sharing, E2E]\n- [E2E lateral planning](https://blog.comma.ai/end-to-end-lateral-planning/) [Comma.ai, E2E planning]\n- [Learning and Leveraging World Models in Visual Representation Learning](https://arxiv.org/abs/2403.00504) [LeCun, JEPA series]\n- [LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models](https://arxiv.org/abs/2312.00785) [Large Vision Models, Jitendra Malik]\n- [LWM: World Model on Million-Length Video And Language With RingAttention](https://arxiv.org/abs/2402.08268) [Pieter Abbeel]\n- [OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving](https://arxiv.org/abs/2311.16038) [Jiwen Lu, World Model]\n- [GenAD: Generative End-to-End Autonomous Driving](https://arxiv.org/abs/2402.11502)\n- [TCP: Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline](https://arxiv.org/abs/2206.08129) \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [E2E planning, Hongyang]\n- [Transfuser: Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https://arxiv.org/abs/2104.09224) \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [E2E planning, Geiger]\n- [Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving](https://arxiv.org/abs/2310.01957) [Wayve, LLM + AD]\n- [LingoQA: Video Question Answering for Autonomous Driving](https://arxiv.org/abs/2312.14115) [Wayve, LLM + AD]\n- [Panacea: Panoramic and Controllable Video Generation for Autonomous Driving](https://arxiv.org/abs/2311.16813) \u003ckbd\u003eCVPR 2024\u003c/kbd\u003e [Megvii]\n- [PlanT: Explainable Planning Transformers via Object-Level Representations](https://arxiv.org/abs/2210.14222) \u003ckbd\u003eCoRL 2022\u003c/kbd\u003e\n- [Scene as Occupancy](https://arxiv.org/abs/2306.02851) \u003ckbd\u003eICCV 2023\u003c/kbd\u003e\n- [AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes](https://arxiv.org/abs/2305.10430) [Baidu]\n- [The Shift from Models to Compound AI Systems](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/)\n- [Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach](https://arxiv.org/abs/2108.08265) \u003ckbd\u003eICCV 2021\u003c/kbd\u003e\n- [Learning from All Vehicles](https://arxiv.org/abs/2203.11934) \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e\n- [LBC: Learning by Cheating](https://arxiv.org/abs/1912.12294) \u003ckbd\u003eCoRL 2019\u003c/kbd\u003e\n- [Learning to drive from a world on rails](https://arxiv.org/abs/2105.00636) \u003ckbd\u003eICCV 2021 oral\u003c/kbd\u003e [Philipp Krähenbühl]\n- [Learning from All Vehicles](https://arxiv.org/abs/2203.11934) \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e [Philipp Krähenbühl]\n- [VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning](https://arxiv.org/abs/2402.13243) [Horizon]\n- [VQ-VAE: Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937) \u003ckbd\u003eNeurIPS 2017\u003c/kbd\u003e [Image Tokenizer]\n- [VQ-GAN: Taming Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2012.09841) \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [Image Tokenizer]\n- [ViT-VQGAN: Vector-quantized Image Modeling with Improved VQGAN](https://arxiv.org/abs/2110.04627) \u003ckbd\u003eICLR 2022\u003c/kbd\u003e [Image Tokenizer]\n- [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/abs/2202.04200) \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e [LLM, non-autoregressive]\n- [MAGVIT: Masked Generative Video Transformer](https://arxiv.org/abs/2212.05199) \u003ckbd\u003eCVPR 2023 highlight\u003c/kbd\u003e [Video Tokenizer]\n- [MAGVIT-v2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation](https://arxiv.org/abs/2310.05737) \u003ckbd\u003eICLR 2024\u003c/kbd\u003e [Video Tokenizer]\n- [Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models](https://arxiv.org/abs/2402.17177) [Reverse Engineering of Sora]\n- [GLaM: Efficient Scaling of Language Models with Mixture-of-Experts](https://arxiv.org/abs/2112.06905) \u003ckbd\u003eICML 2022\u003c/kbd\u003e [MoE, LLM]\n- [Lifelong Language Pretraining with Distribution-Specialized Experts](https://arxiv.org/abs/2305.12281) \u003ckbd\u003eICML 2023\u003c/kbd\u003e [MoE, LLM]\n- [DriveLM: Drive on Language](https://arxiv.org/abs/2312.14150) [Hongyang Li]\n- [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://arxiv.org/abs/2309.16534) \u003ckbd\u003eICCV 2023\u003c/kbd\u003e [Waymo, LLM + AD]\n- [AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes](https://arxiv.org/abs/2305.10430) [No perception]\n- CubeLLM: align 2D/3D with language\n- EmerNeRF: ICLR 2024\n- A Language Agent for Autonomous Driving\n- [Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal]\n- [DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation](https://arxiv.org/abs/2403.06845)\n- [DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving](https://arxiv.org/abs/2405.04390) \u003ckbd\u003eCVPR 2024\u003c/kbd\u003e [Zheng Zhu]\n- [Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond](https://arxiv.org/abs/2405.03520) [Zheng Zhu]\n\n## 2024-02 (7)\n- [End-to-end Autonomous Driving: Challenges and Frontiers](https://arxiv.org/abs/2306.16927) [[Notes](paper_notes/e2e_review_hongyang.md)] [Hongyang Li, Shanghai AI labs]\n- [DriveVLM: The convergence of Autonomous Driving and Large Vision-Language Models](https://arxiv.org/abs/2402.12289) [[Notes](paper_notes/drivevlm.md)] [Hang Zhao]\n- [DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model](https://arxiv.org/abs/2310.01412) [[Notes](paper_notes/drivegpt4.md)] [HKU]\n- [GAIA-1: A Generative World Model for Autonomous Driving](https://arxiv.org/abs/2309.17080) [[Notes](paper_notes/gaia_1.md)] [Wayve, vision foundation model]\n- [ADriver-I: A General World Model for Autonomous Driving](https://arxiv.org/abs/2311.13549) [[Notes](paper_notes/adriver_i.md)] [Megvii, Xiangyu]\n- [Drive-WM: Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving](https://arxiv.org/abs/2311.17918) [[Notes](paper_notes/drive_wm.md)]\n- [X]() [[Notes](paper_notes/x.md)] [E2E planning]\n\n\n## 2023-12 (4)\n- [ChatGPT for Robotics: Design Principles and Model Abilities](https://arxiv.org/abs/2306.17582) [[Notes](paper_notes/prompt_craft.md)] [Microsoft, LLM for robotics]\n- [RoboVQA: Multimodal Long-Horizon Reasoning for Robotics](https://arxiv.org/abs/2311.00899) [[Notes](paper_notes/robovqa.md)] [Google DeepMind, LLM for robotics]\n- [ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application](https://ieeexplore.ieee.org/document/10235949) [Microsoft Robotics]\n- [GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration](https://arxiv.org/abs/2311.12015) [[Notes](paper_notes/gpt4v_robotics.md)] [LLM for robotics, Microsoft Robotics]\n- [LLM-Brain: LLM as A Robotic Brain: Unifying Egocentric Memory and Control](https://arxiv.org/abs/2304.09349) [[Notes](paper_notes/llm_brain.md)]\n- [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291) [[Notes](paper_notes/voyager.md)] [Reasoning Critique, Linxi Jim Fan]\n\n## 2023-09 (3)\n- [RetNet: Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) [[Notes](paper_notes/retnet.md)] [MSRA]\n- [Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention](https://arxiv.org/abs/2006.16236) [[Notes](paper_notes/transformers_are_rnns.md)] \u003ckbd\u003eICML 2020\u003c/kbd\u003e [Linear attention]\n- [AFT: An Attention Free Transformer](https://arxiv.org/abs/2105.14103) [[Notes](paper_notes/aft.md)] [Apple]\n\n\n## 2023-08 (3)\n- [RT-1: Robotics Transformer for Real-World Control at Scale](https://arxiv.org/abs/2212.06817) [[Notes](paper_notes/rt1.md)] [DeepMind]\n- [RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control](https://robotics-transformer2.github.io/assets/rt2.pdf) [[Notes](paper_notes/rt2.md)] [DeepMind, end-to-end visuomotor]\n- [RWKV: Reinventing RNNs for the Transformer Era](https://arxiv.org/abs/2305.13048) [[Notes](paper_notes/rwkv.md)]\n\n## 2023-07 (6)\n- [MILE: Model-Based Imitation Learning for Urban Driving](https://arxiv.org/abs/2210.07729) [[Notes](paper_notes/mile.md)] \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [Alex Kendall]\n- [PaLM-E: An embodied multimodal language model](https://arxiv.org/abs/2303.03378) [[Notes](paper_notes/palm_e.md)] [Google Robotics]\n- [VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models](https://voxposer.github.io/voxposer.pdf) [[Notes](paper_notes/voxposer.md)] [Feifei Li]\n- [CaP: Code as Policies: Language Model Programs for Embodied Control](https://arxiv.org/abs/2209.07753) [[Notes](paper_notes/cap.md)] [[Project](https://code-as-policies.github.io/)]\n- [ProgPrompt: Generating Situated Robot Task Plans using Large Language Models](https://arxiv.org/abs/2209.11302) \u003ckbd\u003eICRA 2023\u003c/kbd\u003e\n- [TidyBot: Personalized Robot Assistance with Large Language Models](https://arxiv.org/abs/2305.05658) [[Notes](paper_notes/tidybot.md)] [[Project](https://tidybot.cs.princeton.edu/)]\n- [SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances](https://arxiv.org/abs/2204.01691) [[Notes](paper_notes/saycan.md)] [[Project](https://say-can.github.io/)]\n\n\n## 2023-06 (5)\n- [End-to-end review by Shanghai AI Labs](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)\n- [Pix2seq v2: A Unified Sequence Interface for Vision Tasks](https://arxiv.org/abs/2206.07669) [[Notes](paper_notes/pix2seq_v2.md)] \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [Geoffrey Hinton]\n- 🦩 [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198) [[Notes](paper_notes/flamingo.md)] \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [DeepMind]\n- 😼 [Gato: A Generalist Agent](https://arxiv.org/abs/2205.06175) [[Notes](paper_notes/gato.md)] \u003ckbd\u003eTMLR 2022\u003c/kbd\u003e [DeepMind]\n- [BC-SAC: Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios](https://arxiv.org/abs/2212.11419) [[Notes](paper_notes/bc_sac.md)] \u003ckbd\u003eNeurIPS 2022\u003c/kbd\u003e [Waymo]\n- [MGAIL-AD: Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving](https://arxiv.org/abs/2210.09539) [[Notes](paper_notes/mgail_ad.md)] \u003ckbd\u003eIROS 2022\u003c/kbd\u003e [Waymo]\n\n\n\n## 2023-05 (7)\n- [SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving](https://arxiv.org/abs/2303.09551) [[Notes](paper_notes/surroundocc.md)] [Occupancy Network, Wei Yi, Jiwen Lu]\n- [Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving](https://arxiv.org/abs/2304.14365) [[Notes](paper_notes/occ3d.md)] [Occupancy Network, Zhao Hang]\n- [Occupancy Networks: Learning 3D Reconstruction in Function Space](https://arxiv.org/abs/1812.03828) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [[Notes](paper_notes/occupancy_networks.md)] [Andreas Geiger]\n- [OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction](https://arxiv.org/abs/2304.05316) [Occupancy Network, PhiGent]\n- [Pix2seq: A Language Modeling Framework for Object Detection](https://arxiv.org/abs/2109.10852) [[Notes](paper_notes/pix2seq.md)] \u003ckbd\u003eICLR 2022\u003c/kbd\u003e [Geoffrey Hinton]\n- [VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks](https://arxiv.org/abs/2305.11175) [[Notes](paper_notes/vision_llm.md)] [Jifeng Dai]\n- [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/abs/2303.17580) [[Notes](paper_notes/hugging_gpt.md)]\n\n\n## 2023-04 (1)\n- [UniAD: Planning-oriented Autonomous Driving](https://arxiv.org/abs/2212.10156) [[Notes](paper_notes/uniad.md)] \u003ckbd\u003eCVPR 2023 best paper\u003c/kbd\u003e [BEV, e2e, Hongyang Li]\n\n\n\n## 2023-03 (5)\n- [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) [[Notes](paper_notes/gpt4.md)] [OpenAI, GPT]\n- [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](https://arxiv.org/abs/2303.03991) [[Notes](paper_notes/openoccupancy.md)] [Occupancy Network, Jiwen Lu]\n- [VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion](https://arxiv.org/abs/2302.12251) [[Note](paper_notes/voxformer.md)] \u003ckbd\u003eCVPR 2023 highlight\u003c/kbd\u003e [Occupancy Network, Nvidia]\n- [MonoScene: Monocular 3D Semantic Scene Completion](https://arxiv.org/abs/2112.00726) \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e [[Notes](paper_notes/monoscene.md)] [Occupancy Network, single cam]\n- [CoReNet: Coherent 3D scene reconstruction from a single RGB image](https://arxiv.org/abs/2004.12989) [[Notes](paper_notes/corenet.md)] \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e\n\n\n## 2023-02 (4)\n- [Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning](https://arxiv.org/abs/2211.04325) [[Notes](paper_notes/out_of_data.md)] [Epoch.ai industry report]\n- [Codex: Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) [[Notes](paper_notes/codex.md)] [GPT, OpenAI]\n- [InstructGPT: Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) [[Notes](paper_notes/instructgpt.md)] [GPT, OpenAI]\n- [TPVFormer: Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction](https://arxiv.org/abs/2302.07817) [[Notes](paper_notes/tpvformer.md)] \u003ckbd\u003eCVPR 2023\u003c/kbd\u003e [Occupancy Network, Jiwen Lu]\n\n\n## 2023-01 (2)\n- [PPGeo: Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling](https://arxiv.org/abs/2301.01006) [[Notes](paper_notes/ppgeo.md)] \u003ckbd\u003eICLR 2023\u003c/kbd\u003e\n- [nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles](https://arxiv.org/abs/2106.11810) [[Notes](paper_notes/nuplan.md)]\n\n\n\n## 2022-11 (1)\n- [M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction](https://arxiv.org/abs/2202.11884) [[Notes](paper_notes/m2i.md)] \u003ckbd\u003eCVPR 2022\u003c/kbd\u003e\n\n\n## 2022-10 (1)\n- [Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe](https://arxiv.org/abs/2209.05324) [[Notes](paper_notes/delving_bev.md)] [PJLab]\n\n## 2022-09 (3)\n- [ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries](https://arxiv.org/abs/2208.01582) [[Notes](paper_notes/vip3d.md)] [BEV, perception + prediction, Hang Zhao]\n- [MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction](https://arxiv.org/abs/2208.14437) [[Notes](paper_notes/maptr.md)] [Horizon, BEVNet]\n- [StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving](https://arxiv.org/abs/2206.00991) \u003ckbd\u003eICRA 2022\u003c/kbd\u003e\n- [MOTR: End-to-End Multiple-Object Tracking with Transformer](https://arxiv.org/abs/2105.03247) \u003ckbd\u003eECCV 2022\u003c/kbd\u003e [Megvii, MOT]\n- [Anchor DETR: Query Design for Transformer-Based Object Detection](https://arxiv.org/abs/2109.07107) [[Notes](paper_notes/anchor_detr.md)] \u003ckbd\u003eAAAI 2022\u003c/kbd\u003e [Megvii]\n\n\n## 2022-08 (1)\n- [HOME: Heatmap Output for future Motion Estimation](https://arxiv.org/abs/2105.10968) [[Notes](paper_notes/home.md)] \u003ckbd\u003eITSC 2021\u003c/kbd\u003e [behavior prediction, Huawei Paris]\n\n## 2022-07 (8)\n- [PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark](https://arxiv.org/abs/2203.11089) [[Notes](paper_notes/persformer.md)] [BEVNet, lane line]\n- [VectorMapNet: End-to-end Vectorized HD Map Learning](https://arxiv.org/abs/2206.08920) [[Notes](paper_notes/vectormapnet.md)] [BEVNet, LLD, Hang Zhao]\n- [PETR: Position Embedding Transformation for Multi-View 3D Object Detection](https://arxiv.org/abs/2203.05625) [[Notes](paper_notes/petr.md)] \u003ckbd\u003eECCV 2022\u003c/kbd\u003e [BEVNet]\n- [PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images](https://arxiv.org/abs/2206.01256) [[Notes](paper_notes/petrv2.md)] [BEVNet, MegVii]\n- [M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation](https://arxiv.org/abs/2204.05088) [[Notes](paper_notes/m2bev.md)] [BEVNet, nvidia]\n- [BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection](https://arxiv.org/abs/2206.10092) [[Notes](paper_notes/bevdepth.md)] [BEVNet, NuScenes SOTA, Megvii]\n- [CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation](https://arxiv.org/abs/2205.02833) [[Notes](paper_notes/cvt.md)] \u003ckbd\u003eCVPR 2022 oral\u003c/kbd\u003e [UTAustin, Philipp]\n- [Wayformer: Motion Forecasting via Simple \u0026 Efficient Attention Networks](https://arxiv.org/abs/2207.05844) [[Notes](paper_notes/wayformer.md)] [Behavior prediction, Waymo]\n\n## 2022-06 (3)\n- [BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection](https://arxiv.org/abs/2203.17054) [[Notes](paper_notes/bevdet4d.md)] [BEVNet]\n- [BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving](https://arxiv.org/abs/2205.09743) [[Notes](paper_notes/beverse.md)] [Jiwen Lu, BEVNet, perception + prediction]\n- [BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation](https://arxiv.org/abs/2205.13542) [[Notes](paper_notes/bevfusion.md)] [BEVNet, Han Song]\n\n## 2022-03 (1)\n- [BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers](https://arxiv.org/abs/2203.17270) [[Notes](paper_notes/bevformer.md)] \u003ckbd\u003eECCV 2022\u003c/kbd\u003e [BEVNet, Hongyang Li, Jifeng Dai]\n\n## 2022-02 (1)\n- [TNT: Target-driveN Trajectory Prediction](https://arxiv.org/abs/2008.08294) [[Notes](paper_notes/tnt.md)] \u003ckbd\u003eCoRL 2020\u003c/kbd\u003e [prediction, Waymo, Hang Zhao]\n- [DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets](https://arxiv.org/abs/2108.09640) [[Notes](paper_notes/dense_tnt.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [prediction, Waymo, 1st place winner WOMD]\n\n## 2022-01 (1)\n- [Manydepth: The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth](https://arxiv.org/abs/2104.14540) [[Notes](paper_notes/manydepth.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [monodepth, Niantic]\n- [DEKR: Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression](https://arxiv.org/abs/2104.02300) [[Notes](paper_notes/dekr.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e\n\n## 2021-12 (5)\n- [BN-FFN-BN: Leveraging Batch Normalization for Vision Transformers](https://openaccess.thecvf.com/content/ICCV2021W/NeurArch/papers/Yao_Leveraging_Batch_Normalization_for_Vision_Transformers_ICCVW_2021_paper.pdf) [[Notes](paper_notes/bn_ffn_bn.md)] \u003ckbd\u003eICCVW 2021\u003c/kbd\u003e [BN, transformers]\n- [PowerNorm: Rethinking Batch Normalization in Transformers](https://arxiv.org/abs/2003.07845) [[Notes](paper_notes/powernorm.md)] \u003ckbd\u003eICML 2020\u003c/kbd\u003e [BN, transformers]\n- [MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction](https://arxiv.org/abs/2111.14973) [[Notes](paper_notes/multipath++.md)] \u003ckbd\u003eICRA 2022\u003c/kbd\u003e [Waymo, behavior prediction]\n- [BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View](https://arxiv.org/abs/2112.11790) [[Notes](paper_note/bevdet.md)]\n- [Translating Images into Maps](https://arxiv.org/abs/2110.00966) [[Notes](paper_notes/translating_images_to_maps.md)] \u003ckbd\u003eICRA 2022\u003c/kbd\u003e [BEVNet, transformers]\n\n## 2021-11 (4)\n- [DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries](https://arxiv.org/abs/2110.06922) [[Notes](paper_notes/detr3d.md)] \u003ckbd\u003eCoRL 2021\u003c/kbd\u003e [BEVNet, transformers]\n- [Robust-CVD: Robust Consistent Video Depth Estimation](https://arxiv.org/abs/2012.05901) \u003ckbd\u003eCVPR 2021 oral\u003c/kbd\u003e [[website](https://robust-cvd.github.io/)]\n- [MAE: Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) [[Notes](paper_notes/mae.md)] [Kaiming He, unsupervised learning]\n- [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886) [[Notes](paper_notes/simmim.md)] [MSRA, unsupervised learning, MAE]\n- [iBOT: Image BERT Pre-Training with Online Tokenizer](https://arxiv.org/abs/2111.07832)\n\n## 2021-10 (3)\n- [STSU: Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images](https://arxiv.org/abs/2110.01997) [[Notes](paper_notes/stsu.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [BEV feat stitching, Luc Van Gool]\n- [PanopticBEV: Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images](https://arxiv.org/abs/2108.03227) [[Notes](paper_notes/panoptic_bev.md)] \u003ckbd\u003eRAL 2022\u003c/kbd\u003e [BEVNet, vertical/horizontal features]\n- [NEAT: Neural Attention Fields for End-to-End Autonomous Driving](https://arxiv.org/abs/2109.04456) [[Notes](paper_notes/neat.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [[supplementary](http://www.cvlibs.net/publications/Chitta2021ICCV_supplementary.pdf)] [BEVNet]\n\n\n## 2021-09 (11)\n- [DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?](https://arxiv.org/abs/2108.06417) [[Notes](paper_notes/dd3d.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [mono3D, Toyota]\n- [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070) [[Notes](paper_notes/efficientdet.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [BiFPN, Tesla AI day]\n- [PnPNet: End-to-End Perception and Prediction with Tracking in the Loop](https://arxiv.org/abs/2005.14711) [[Notes](paper_notes/pnpnet.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [Uber ATG]\n- [MP3: A Unified Model to Map, Perceive, Predict and Plan](https://arxiv.org/abs/2101.06806) [[Notes](paper_notes/mp3.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [Uber, planning]\n- [BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning](http://arxiv.org/abs/2110.04931) [[Notes](paper_notes/bevnet_sdca.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [BEVNet, surveillance]\n- [LiDAR R-CNN: An Efficient and Universal 3D Object Detector](https://arxiv.org/abs/2103.15297) [[Notes](paper_notes/lidar_rcnn.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [TuSimple, Naiyan Wang]\n- [Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches](https://arxiv.org/abs/2102.05897) [[Notes](paper_notes/corner_case_vision_arxiv.md)] [corner cases]\n- [Systematization of Corner Cases for Visual Perception in Automated Driving](https://ieeexplore.ieee.org/document/9304789) [[Notes](paper_notes/corner_case_vision_iv.md)] \u003ckbd\u003eIV 2020\u003c/kbd\u003e [corner cases]\n- [An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving](https://arxiv.org/abs/2103.03678) [[Notes](paper_notes/corner_case_multisensor.md)] \u003ckbd\u003eIV 2021\u003c/kbd\u003e [corner cases]\n- [PYVA: Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation](https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Projecting_Your_View_Attentively_Monocular_Road_Scene_Layout_Estimation_via_CVPR_2021_paper.html) [[Notes](paper_notes/pyva.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [[Supplementary](https://openaccess.thecvf.com/content/CVPR2021/supplemental/Yang_Projecting_Your_View_CVPR_2021_supplemental.zip)] [BEVNet]\n- [YOLOF: You Only Look One-level Feature](https://arxiv.org/abs/2103.09460) [[Notes](paper_notes/yolof.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [megvii]\n- [Perceiving Humans: from Monocular 3D Localization to Social Distancing](https://arxiv.org/abs/2009.00984) [[Notes](paper_notes/perceiving_humans.md)] \u003ckbd\u003eTITS 2021\u003c/kbd\u003e [monoloco++]\n- [PifPaf: Composite Fields for Human Pose Estimation](https://arxiv.org/abs/1903.06593) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images](https://arxiv.org/abs/2108.03227) [BEVNet]\n- [TransformerFusion: Monocular RGB Scene Reconstruction using Transformers](https://arxiv.org/abs/2107.02191)\n- [Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Projecting_Your_View_Attentively_Monocular_Road_Scene_Layout_Estimation_via_CVPR_2021_paper.pdf) \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e\n- [Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https://arxiv.org/abs/2104.09224) \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e\n- [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152)\n- [Probabilistic and Geometric Depth: Detecting Objects in Perspective](https://arxiv.org/abs/2107.14160) \u003ckbd\u003eCoRL 2021\u003c/kbd\u003e\n\n\n## 2021-08 (11)\n- [EgoNet: Exploring Intermediate Representation for Monocular Vehicle Pose Estimation](https://arxiv.org/abs/2011.08464) [[Notes](paper_notes/egonet.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [mono3D]\n- [MonoEF: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach](https://arxiv.org/abs/2106.15796) [[Notes](paper_notes/monoef.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [mono3D]\n- [GAC: Ground-aware Monocular 3D Object Detection for Autonomous Driving](https://arxiv.org/abs/2102.00690) [[Notes](paper_notes/gac.md)] \u003ckbd\u003eRAL 2021\u003c/kbd\u003e [mono3D]\n- [FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection](https://arxiv.org/abs/2104.10956) [[Notes](paper_notes/fcos3d.md)] \u003ckbd\u003eNeurIPS 2020\u003c/kbd\u003e [mono3D, senseTime]\n- [GUPNet: Geometry Uncertainty Projection Network for Monocular 3D Object Detection](https://arxiv.org/abs/2107.13774) [[Notes](paper_notes/gupnet.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [mono3D, Wanli Ouyang]\n- [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) [[Notes](paper_notes/darts.md)] \u003ckbd\u003eICLR 2019\u003c/kbd\u003e [VGG author]\n- [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443) [[Notes](paper_notes/fbnet.md)] \u003ckbd\u003eCVPR 20219\u003c/kbd\u003e [DARTS]\n- [FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions](https://arxiv.org/abs/2004.05565) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining](https://arxiv.org/abs/2006.02049) \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e\n- [Perceiver: General Perception with Iterative Attention](https://arxiv.org/abs/2103.03206) [[Notes](paper_notes/perceiver.md)] \u003ckbd\u003eICML 2021\u003c/kbd\u003e [transformers, multimodal]\n- [Perceiver IO: A General Architecture for Structured Inputs \u0026 Outputs](https://arxiv.org/abs/2107.14795) [[Notes](paper_notes/perceiver_io.md)]\n- [PillarMotion: Self-Supervised Pillar Motion Learning for Autonomous Driving](https://arxiv.org/abs/2104.08683)  [[Notes](paper_notes/pillar_motion.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [Qcraft, Alan Yuille]\n- [SimTrack: Exploring Simple 3D Multi-Object Tracking for Autonomous Driving](https://arxiv.org/abs/2108.10312) [[Notes](paper_notes/simtrack.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [QCraft, Alan Yuille]\n\n\n## 2021-07 (1)\n- [HDMapNet: An Online HD Map Construction and Evaluation Framework](https://arxiv.org/abs/2107.06307) [[Notes](paper_notes/hdmapnet.md)] \u003ckbd\u003eCVPR 2021 workshop\u003c/kbd\u003e [youtube video only, Li Auto]\n\n\n## 2021-06 (2)\n- [FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras](https://arxiv.org/abs/2104.10490) [[Notes](paper_notes/fiery.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [BEVNet, perception + prediction]\n- [Baidu's CNN seg](https://zhuanlan.zhihu.com/p/35034215) [[Notes](paper_notes/cnn_seg.md)]\n\n## 2021-04 (5)\n- [Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation](https://arxiv.org/abs/2012.15175) [[Notes](paper_notes/swahr.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [megvii] \n- [CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark](https://arxiv.org/abs/1812.00324) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [The Overlooked Elephant of Object Detection: Open Set](https://openaccess.thecvf.com/content_WACV_2020/html/Dhamija_The_Overlooked_Elephant_of_Object_Detection_Open_Set_WACV_2020_paper.html) \u003ckbd\u003eWACV 2021\u003c/kbd\u003e\n- [Class-Agnostic Object Detection](https://arxiv.org/abs/2011.14204) \u003ckbd\u003eWACV 2021\u003c/kbd\u003e\n- [OWOD: Towards Open World Object Detection](https://arxiv.org/abs/2103.02603) [[Notes](paper_notes/owod.md)] \u003ckbd\u003eCVPR 2021 oral\u003c/kbd\u003e\n- [FsDet: Frustratingly Simple Few-Shot Object Detection](https://arxiv.org/abs/2003.06957) \u003ckbd\u003eICML 2020\u003c/kbd\u003e\n- [MonoFlex: Objects are Different: Flexible Monocular 3D Object Detection](https://arxiv.org/abs/2104.02323) [[Notes](paper_notes/monoflex.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [mono3D, Jiwen Lu, cropped]\n- [monoDLE: Delving into Localization Errors for Monocular 3D Object Detection](https://arxiv.org/abs/2103.16237) [[Notes](paper_notes/monodle.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [mono3D]\n- [Exploring 2D Data Augmentation for 3D Monocular Object Detection](https://arxiv.org/abs/2104.10786)\n- [OCM3D: Object-Centric Monocular 3D Object Detection](https://arxiv.org/abs/2104.06041) [mono3D]\n- [FSM: Full Surround Monodepth from Multiple Cameras](https://arxiv.org/abs/2104.00152) [[Notes](paper_notes/fsm.md)] \u003ckbd\u003eICRA 2021\u003c/kbd\u003e [monodepth, Xnet]\n\n\n## 2021-03 (4)\n- [CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection](https://arxiv.org/abs/2103.01100) [[Notes](paper_notes/caddn.md)] \u003ckbd\u003eCVPR 2021 oral\u003c/kbd\u003e [mono3D, BEVNet]\n- [DSNT: Numerical Coordinate Regression with Convolutional Neural Networks](https://arxiv.org/abs/1801.07372) [[Notes](paper_notes/dsnt.md)] [differentiable spatial to numerical transform]\n- [Soft-Argmax: Human pose regression by combining indirect part detection and contextual information](https://arxiv.org/abs/1710.02322)\n- [INSTA-YOLO: Real-Time Instance Segmentation](https://arxiv.org/abs/2102.06777) [[Notes](paper_notes/insta_yolo.md)] \u003ckbd\u003eICML workshop 2020\u003c/kbd\u003e [single stage instance segmentation]\n- [CenterNet2: Probabilistic two-stage detection](https://arxiv.org/abs/2103.07461) [[Notes](paper_notes/centernet2.md)] [CenterNet, two-stage]\n\n\n## 2021-01 (7)\n- [Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection](https://arxiv.org/abs/2012.00257) [[Notes](paper_notes/confluence.md)] [NMS]\n- [BoxInst: High-Performance Instance Segmentation with Box Annotations](https://arxiv.org/abs/2012.02310) [[Notes](paper_notes/boxinst.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [Chunhua Shen, Tian Zhi]\n- [3DSSD: Point-based 3D Single Stage Object Detector](https://arxiv.org/abs/2002.10187) [[Notes](paper_notes/3dssd.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697) [[Notes](paper_notes/repvgg.md)] [Megvii, Xiangyu Zhang, ACNet]\n- [ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks](https://arxiv.org/abs/1908.03930) [[Notes](paper_notes/acnet.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e\n- [BEV-Feat-Stitching: Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera](https://arxiv.org/abs/2012.03040) [[Notes](paper_notes/bev_feat_stitching.md)] [BEVNet, mono3D, Luc Van Gool]\n- [PSS: Object Detection Made Simpler by Eliminating Heuristic NMS](https://arxiv.org/abs/2101.11782) [[Notes](paper_notes/pss.md)] [Transformer, DETR]\n\n## 2020-12 (17)\n- [DeFCN: End-to-End Object Detection with Fully Convolutional Network](https://arxiv.org/abs/2012.03544) [[Notes](paper_notes/defcn.md)] [Transformer, DETR]\n- [OneNet: End-to-End One-Stage Object Detection by Classification Cost](https://arxiv.org/abs/2012.05780) [[Notes](paper_notes/onenet.md)] [Transformer, DETR]\n- [Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles](http://driving.stanford.edu/papers/ICRA2011.pdf) [[Notes](paper_notes/tfl_stanford.md)] \u003ckbd\u003eICRA 2011\u003c/kbd\u003e [traffic light, Sebastian Thrun]\n- [Towards lifelong feature-based mapping in semi-static environments](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43966.pdf) [[Notes](paper_notes/lifelong_feature_mapping_google.md)] \u003ckbd\u003eICRA 2016\u003c/kbd\u003e\n- [How to Keep HD Maps for Automated Driving Up To Date](http://www.lewissoft.com/pdf/ICRA2020/1484.pdf) [[Notes](paper_notes/keep_hd_maps_updated_bmw.md)] \u003ckbd\u003eICRA 2020\u003c/kbd\u003e [BMW]\n- [Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection](https://arxiv.org/abs/2011.12885) [[Notes](paper_notes/gfocalv2.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [focal loss]\n- [Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning](http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w9/Milz_Visual_SLAM_for_CVPR_2018_paper.pdf) [[Notes](paper_notes/vslam_for_ad.md)] \u003ckbd\u003eCVPR 2018 workshop\u003c/kbd\u003e\n- [Centroid Voting: Object-Aware Centroid Voting for Monocular 3D Object Detection](https://arxiv.org/abs/2007.09836) [[Notes](paper_notes/centroid_voting.md)] \u003ckbd\u003eIROS 2020\u003c/kbd\u003e [mono3D, geometry + appearance = distance]\n- [Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras](https://arxiv.org/abs/2003.03759) [[Notes](paper_notes/mono3d_fisheye.md)] [GM Israel, mono3D]\n- [DeepPS: Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset](https://cslinzhang.github.io/deepps/parkingslot.pdf) \u003ckbd\u003eTIP 2018\u003c/kbd\u003e [Parking slot detection, PS2.0 dataset]\n- [PSDet: Efficient and Universal Parking Slot Detection](https://arxiv.org/abs/2005.05528) [[Notes](paper_notes/psdet.md)] \u003ckbd\u003eIV 2020\u003c/kbd\u003e [Zongmu, Parking slot detection]\n- [PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning](https://arxiv.org/abs/2001.00138) [[Notes](paper_notes/patdnn.md)] \u003ckbd\u003eASPLOS 2020\u003c/kbd\u003e [pruning]\n- [Scaled-YOLOv4: Scaling Cross Stage Partial Network](https://arxiv.org/abs/2011.08036) [[Notes](paper_notes/scaled_yolov4.md)] [yolo]\n- [Yolov5 by Ultralytics](https://github.com/ultralytics/yolov5) [[Notes](paper_notes/yolov5.md)] [yolo, spatial2channel]\n- [PP-YOLO: An Effective and Efficient Implementation of Object Detector](https://arxiv.org/abs/2007.12099) [[Notes](paper_notes/pp_yolo.md)] [yolo, paddle-paddle, baidu]\n- [PointPainting: Sequential Fusion for 3D Object Detection](https://arxiv.org/pdf/1911.10150.pdf) [[Notes](paper_notes/point_painting.md)] [nuscenece]\n- [MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps](https://arxiv.org/abs/2003.06754) [[Notes](paper_notes/motionnet.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [Unseen moving objects, BEV]\n- [Locating Objects Without Bounding Boxes](https://arxiv.org/abs/1806.07564) [[Notes](paper_notes/objects_without_bboxes.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [weighted Haussdorf distance, NMS-free]\n\n\n## 2020-11 (18)\n- [TSP: Rethinking Transformer-based Set Prediction for Object Detection](https://arxiv.org/abs/2011.10881) [[Notes](paper_notes/tsp.md)] \u003ckbd\u003eICCV 2021\u003c/kbd\u003e [DETR, transformers, Kris Kitani]\n- [Sparse R-CNN: End-to-End Object Detection with Learnable Proposals](https://arxiv.org/abs/2011.12450) [[Notes](paper_notes/sparse_rcnn.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [DETR, Transformer]\n- [Unsupervised Monocular Depth Learning in Dynamic Scenes](https://arxiv.org/abs/2010.16404) [[Notes](paper_notes/learn_depth_and_motion.md)] \u003ckbd\u003eCoRL 2020\u003c/kbd\u003e [LearnK improved ver, Google]\n- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] \u003ckbd\u003eICML 2020\u003c/kbd\u003e [Mono3D, pairwise relationship]\n- [Argoverse: 3D Tracking and Forecasting with Rich Maps](https://arxiv.org/abs/1911.02620) [[Notes](paper_notes/argoverse.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [HD maps, dataset, CV lidar]\n- [The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes](https://arxiv.org/abs/1903.01568) [[Notes](paper_notes/h3d.md)] \u003ckbd\u003eICRA 2019\u003c/kbd\u003e\n- [Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection](https://arxiv.org/abs/2006.07864) \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [dataset, Daimler, mono3D]\n- [NYC3DCars: A Dataset of 3D Vehicles in Geographic Context](https://www.cs.cornell.edu/~snavely/publications/papers/nyc3dcars_iccv13.pdf) \u003ckbd\u003eICCV 2013\u003c/kbd\u003e\n- [Towards Fully Autonomous Driving: Systems and Algorithms](https://www.ri.cmu.edu/wp-content/uploads/2017/12/levinson-iv2011.pdf) \u003ckbd\u003eIV 2011\u003c/kbd\u003e\n- [Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding](https://arxiv.org/abs/2005.13423) [[Notes](paper_notes/center3d.md)] [mono3D, LID+DepJoint]\n- [ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection](https://arxiv.org/abs/2003.00529) \u003ckbd\u003eAAAI 2020 oral\u003c/kbd\u003e [mono3D] \n- [CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection](https://arxiv.org/abs/2011.04841) [[Notes](paper_notes/centerfusion.md)] \u003ckbd\u003eWACV 2021\u003c/kbd\u003e [early fusion, camera, radar]\n- [3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation](https://arxiv.org/abs/2011.01535) [[Notes](paper_notes/3d_lanenet+.md)] \u003ckbd\u003eNeurIPS 2020 workshop\u003c/kbd\u003e [GM Israel, 3D LLD]\n- [LSTR: End-to-end Lane Shape Prediction with Transformers](https://arxiv.org/abs/2011.04233) [[Notes](paper_notes/lstr.md)] \u003ckbd\u003eWACV 2021\u003c/kbd\u003e [LLD, transformers]\n- [PIXOR: Real-time 3D Object Detection from Point Clouds](https://arxiv.org/abs/1902.06326) [[Notes](paper_notes/pixor.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e (birds eye view)\n- [HDNET/PIXOR++: Exploiting HD Maps for 3D Object Detection](http://proceedings.mlr.press/v87/yang18b/yang18b.pdf) [[Notes](paper_notes/pixor++.md)] \u003ckbd\u003eCoRL 2018\u003c/kbd\u003e\n- [CPNDet: Corner Proposal Network for Anchor-free, Two-stage Object Detection](https://arxiv.org/abs/2007.13816) \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [anchor free, two stage]\n- [MVF: End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds](https://arxiv.org/abs/1910.06528) [[Notes](paper_notes/mvf.md)] \u003ckbd\u003eCoRL 2019\u003c/kbd\u003e [Waymo, VoxelNet 1st author]\n- [Pillar-based Object Detection for Autonomous Driving](https://arxiv.org/abs/2007.10323) [[Notes](paper_notes/pillar_od.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e\n- [Training-Time-Friendly Network for Real-Time Object Detection](https://arxiv.org/abs/1909.00700) \u003ckbd\u003eAAAI 2020\u003c/kbd\u003e [anchor-free, fast training]\n- [Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies](https://arxiv.org/abs/2006.06091) [Review of autonomous stack, Yu Huang]\n- [Dense Monocular Depth Estimation in Complex Dynamic Scenes](https://openaccess.thecvf.com/content_cvpr_2016/papers/Ranftl_Dense_Monocular_Depth_CVPR_2016_paper.pdf) \u003ckbd\u003eCVPR 2016\u003c/kbd\u003e\n- [Probabilistic Future Prediction for Video Scene Understanding](https://anthonyhu.github.io/research/probabilistic-future-prediction/)\n- [AB3D: A Baseline for 3D Multi-Object Tracking](https://arxiv.org/abs/1907.03961) \u003ckbd\u003eIROS 2020\u003c/kbd\u003e [3D MOT]\n- [Spatial-Temporal Relation Networks for Multi-Object Tracking](https://arxiv.org/abs/1904.11489) \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [MOT, feature location over time]\n- [Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking](https://arxiv.org/abs/1802.09298) \u003ckbd\u003eICRA 2018\u003c/kbd\u003e [MOT, IIT, 3D shape]\n- [ST-3D: Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking](https://arxiv.org/abs/2004.09305) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [Peilinag LI, author of VINS and S3DOT]\n- [Augment Your Batch: Improving Generalization Through Instance Repetition](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [RetinaTrack: Online Single Stage Joint Detection and Tracking](https://arxiv.org/abs/2003.13870) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [MOT]\n- [Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots](https://arxiv.org/abs/1912.12791)\n- [Gradient Centralization: A New Optimization Technique for Deep Neural Networks](https://arxiv.org/abs/2004.01461) \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e\n- [Depth Completion via Deep Basis Fitting](https://arxiv.org/abs/1912.10336) \u003ckbd\u003eWACV 2020\u003c/kbd\u003e\n- [BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation](https://arxiv.org/abs/1907.10326) [monodepth, supervised]\n- [The Edge of Depth: Explicit Constraints between Segmentation and Depth](https://arxiv.org/abs/2004.00171) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [monodepth, Xiaoming Liu]\n- [On the Continuity of Rotation Representations in Neural Networks](https://arxiv.org/abs/1812.07035) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [rotational representation]\n- [VDO-SLAM: A Visual Dynamic Object-aware SLAM System](https://arxiv.org/abs/2005.11052) \u003ckbd\u003eIJRR 2020\u003c/kbd\u003e\n- [Dynamic SLAM: The Need For Speed](https://arxiv.org/abs/2002.08584)\n- [Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction](https://arxiv.org/abs/2004.10681) \u003ckbd\u003eECCV 2020\u003c/kbd\u003e\n- [Traffic Light Mapping and Detection](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37259.pdf) [[Notes](paper_notes/tfl_mapping_google.md)] \u003ckbd\u003eICRA 2011\u003c/kbd\u003e [traffic light, Google, Chris Urmson]\n- [Traffic light recognition exploiting map and localization at every stage](https://web.yonsei.ac.kr/jksuhr/papers/Traffic%20light%20recognition%20exploiting%20map%20and%20localization%20at%20every%20stage.pdf) [[Notes](paper_notes/tfl_exploting_map_korea.md)] \u003ckbd\u003eExpert Systems 2017\u003c/kbd\u003e [traffic light, 鲜于明镐，徐在圭，郑浩奇]\n- [Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars](https://arxiv.org/abs/1906.11886) [[Notes](paper_notes/tfl_lidar_map_building_brazil.md)] \u003ckbd\u003e IJCNN 2019\u003c/kbd\u003e [traffic light, Espirito Santo Brazil]\n\n\n## 2020-10 (14)\n- [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383) [[Notes](paper_notes/tsm.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [Song Han, video, object detection]\n- [WOD: Waymo Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset](https://arxiv.org/abs/1912.04838) [[Notes](paper_notes/wod.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection](https://arxiv.org/abs/2006.04388) [[Notes](paper_notes/gfocal.md)] \u003ckbd\u003eNeurIPS 2020\u003c/kbd\u003e [classification as regression]\n- [A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection](https://arxiv.org/abs/2009.13592) \u003ckbd\u003eNeurIPS 2020 spotlight\u003c/kbd\u003e\n- [Rethinking the Value of Labels for Improving Class-Imbalanced Learning](https://arxiv.org/abs/2006.07529) \u003ckbd\u003eNeurIPS 2020\u003c/kbd\u003e\n- [RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1711.07752) [[Notes](paper_notes/rep_loss.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [crowd detection, Megvii]\n- [Adaptive NMS: Refining Pedestrian Detection in a Crowd](https://arxiv.org/abs/1904.03629) [[Notes](paper_notes/adaptive_nms.md)] \u003ckbd\u003eCVPR 2019 oral\u003c/kbd\u003e [crowd detection, NMS]\n- [AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1807.08407) [[Notes](paper_notes/agg_loss.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [crowd detection]\n- [CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions](https://arxiv.org/abs/2003.09163) [[Notes](paper_notes/crowd_det.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [crowd detection, Megvii, Earth mover's distance]\n- [R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing](https://arxiv.org/abs/2003.12729) [[Notes](paper_notes/r2_nms.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [Double Anchor R-CNN for Human Detection in a Crowd](https://arxiv.org/abs/1909.09998) [[Notes](paper_notes/double_anchor.md)] [head-body bundle]\n- [Review: AP vs MR](paper_notes/ap_mr.md)\n- [SKU110K: Precise Detection in Densely Packed Scenes](https://arxiv.org/abs/1904.00853) [[Notes](paper_notes/sku110k.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [crowd detection, no occlusion]\n- [GossipNet: Learning non-maximum suppression](https://arxiv.org/abs/1705.02950) \u003ckbd\u003eCVPR 2017\u003c/kbd\u003e \n- [TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation](https://arxiv.org/abs/1807.01438) \u003ckbd\u003eECCV 2018\u003c/kbd\u003e\n- [Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels](https://arxiv.org/abs/2010.03506) \u003ckbd\u003eGCPR 2020\u003c/kbd\u003e [mono3D, Daniel Cremers, TUM]\n- [CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection](https://arxiv.org/abs/2006.04080) [[Notes](paper_notes/cubifae_3d.md)] [mono3D, depth AE pretraining]\n- [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) [[Notes](paper_notes/deformable_detr.md)] \u003ckbd\u003eICLR 2021\u003c/kbd\u003e [Jifeng Dai, DETR]\n- [ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) [[Notes](paper_notes/vit.md)] \u003ckbd\u003eICLR 2021\u003c/kbd\u003e\n- [BYOL: Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) [self-supervised]\n\n## 2020-09 (15)\n- [SDFLabel: Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors](https://arxiv.org/abs/1911.11288) [[Notes](paper_notes/sdflabel.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [TRI, differentiable rendering]\n- [DensePose: Dense Human Pose Estimation In The Wild](https://arxiv.org/abs/1802.00434) [[Notes](paper_notes/densepose.md)] \u003ckbd\u003eCVPR 2018 oral\u003c/kbd\u003e [FAIR]\n- [NOCS: Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation](https://arxiv.org/abs/1901.02970) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection](https://arxiv.org/abs/2009.14524) [[Notes](paper_notes/monodr.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [TRI, mono3D]\n- [Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D](https://arxiv.org/abs/2008.05711) [[Notes](paper_notes/lift_splat_shoot.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [BEV-Net, Utoronto, Sanja Fidler]\n- [Implicit Latent Variable Model for Scene-Consistent Motion Forecasting](https://arxiv.org/abs/2007.12036) \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [Uber ATG, Rachel Urtasun]\n- [FISHING Net: Future Inference of Semantic Heatmaps In Grids](https://arxiv.org/abs/2006.09917) [[Notes](paper_notes/fishing_net.md)] \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [BEV-Net, Mapping, Zoox]\n- [VPN: Cross-view Semantic Segmentation for Sensing Surroundings](https://arxiv.org/abs/1906.03560) [[Notes](paper_notes/vpn.md)] \u003ckbd\u003eRAL 2020\u003c/kbd\u003e [Bolei Zhou, BEV-Net]\n- [VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks](https://arxiv.org/abs/1804.02176) [[Notes](paper_notes/ved.md)] \u003ckbd\u003eICRA 2019\u003c/kbd\u003e [BEV-Net]\n- [Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View](https://arxiv.org/abs/2005.04078) [[Notes](paper_notes/cam2bev.md)] \u003ckbd\u003eITSC 2020\u003c/kbd\u003e [BEV-Net] \n- [Learning to Look around Objects for Top-View Representations of Outdoor Scenes](https://arxiv.org/abs/1803.10870) [[Notes](paper_notes/learning_to_look_around_objects.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [BEV-Net, UCSD, Manmohan Chandraker]\n- [A Parametric Top-View Representation of Complex Road Scenes](https://arxiv.org/abs/1812.06152) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [BEV-Net, UCSD, Manmohan Chandraker]\n- [FTM: Understanding Road Layout from Videos as a Whole](https://arxiv.org/abs/2007.00822) \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [BEV-Net, UCSD, Manmohan Chandraker]\n- [KM3D-Net: Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training](https://arxiv.org/abs/2009.00764) [[Notes](paper_notes/km3d_net.md)] \u003ckbd\u003eRAL 2021\u003c/kbd\u003e [RTM3D, Peixuan Li]\n- [InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving](https://arxiv.org/abs/2008.07008) [[Notes](paper_notes/instance_mot_seg.md)] \u003ckbd\u003eIROS 2020\u003c/kbd\u003e [motion segmentation]\n- [MPV-Nets: Monocular Plan View Networks for Autonomous Driving](https://arxiv.org/abs/1905.06937) [[Notes](paper_notes/mpv_nets.md)] \u003ckbd\u003eIROS 2019\u003c/kbd\u003e [BEV-Net]\n- [Class-Balanced Loss Based on Effective Number of Samples](https://arxiv.org/abs/1901.05555) [[Notes](paper_notes/class_balanced_loss.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [Focal loss authors]\n- [Geometric Pretraining for Monocular Depth Estimation](http://lewissoft.com/pdf/ICRA2020/0035.pdf) [[Notes](paper_notes/geometric_pretraining.md)] \u003ckbd\u003eICRA 2020\u003c/kbd\u003e\n- [Robust Traffic Light and Arrow Detection Using Digital Map with Spatial Prior Information for Automated Driving](https://www.mdpi.com/1424-8220/20/4/1181) [[Notes](paper_notes/tfl_robust_japan.md)] \u003ckbd\u003eSensors 2020\u003c/kbd\u003e [traffic light, 金沢]\n\n\n## 2020-08 (26)\n- [Feature-metric Loss for Self-supervised Learning of Depth and Egomotion](https://arxiv.org/abs/2007.10603) [[Notes](paper_notes/feature_metric.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [feature-metric, local minima, monodepth]\n- [Depth-VO-Feat: Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction](https://arxiv.org/abs/1803.03893) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [feature-metric, monodepth]\n- [MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge](https://arxiv.org/abs/1904.04144) [[Notes](paper_notes/monoresmatch.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [monodepth, local minima, cheap stereo GT]\n- [SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance](https://arxiv.org/abs/2007.06936) [[Notes](paper_notes/sgdepth.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [Moving objects]\n- [Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding](https://arxiv.org/abs/1806.10556) \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [dynamic objects, rigid and dynamic motion]\n- [Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding](https://arxiv.org/abs/1810.06125) \u003ckbd\u003eTPAMI 2018\u003c/kbd\u003e\n- [CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](https://arxiv.org/abs/1805.09806) [[Notes](paper_notes/cc.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [ObjMotionNet: Self-supervised Object Motion and Depth Estimation from Video](https://arxiv.org/abs/1912.04250) [[Notes](paper_notes/obj_motion_net.md)] \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [object motion prediction, velocity prediction]\n- [Instance-wise Depth and Motion Learning from Monocular Videos](https://arxiv.org/abs/1912.09351)\n- [Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation](https://arxiv.org/abs/2006.04371)\n- [Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues](https://arxiv.org/abs/2006.09876)\n- [DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency](https://arxiv.org/abs/1809.01649) \u003ckbd\u003eECCV 2018\u003c/kbd\u003e\n- [LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments](https://arxiv.org/abs/1807.05696) [mapping]\n- [Road-SLAM: Road Marking based SLAM with Lane-level Accuracy](https://www.naverlabs.com/img/autonomousDriving/intelligence/dissertation/Road-SLAM_Road%20Marking%20based%20SLAM%20with%20Lane-level%20Accuracy.pdf) [[Notes](paper_notes/road_slam.md)] [HD mapping]\n- [AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot](https://arxiv.org/abs/2007.01813) [[Notes](paper_notes/avp_slam.md)] \u003ckbd\u003eIROS 2020\u003c/kbd\u003e [Huawei, HD mapping, Tong Qin, VINS author, autonomous valet parking]\n- [AVP-SLAM-Late-Fusion: Mapping and Localization using Semantic Road Marking with Centimeter-level Accuracy in Indoor Parking Lots](https://ieeexplore.ieee.org/abstract/document/8917529) [[Notes](paper_notes/avp_slam_late_fusion.md)] \u003ckbd\u003eITSC 2019\u003c/kbd\u003e\n- [Lane markings-based relocalization on highway](https://ieeexplore.ieee.org/abstract/document/8917254) \u003ckbd\u003eITSC 2019\u003c/kbd\u003e\n- [DeepRoadMapper: Extracting Road Topology from Aerial Images](https://openaccess.thecvf.com/content_ICCV_2017/papers/Mattyus_DeepRoadMapper_Extracting_Road_ICCV_2017_paper.pdf) [[Notes](paper_notes/deep_road_mapper.md)] \u003ckbd\u003eICCV 2017\u003c/kbd\u003e [Uber ATG, NOT HD maps]\n- [RoadTracer: Automatic Extraction of Road Networks from Aerial Images](https://openaccess.thecvf.com/content_cvpr_2018/papers/Bastani_RoadTracer_Automatic_Extraction_CVPR_2018_paper.pdf) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [NOT HD maps]\n- [PolyMapper: Topological Map Extraction From Overhead Images](https://arxiv.org/abs/1812.01497) [[Notes](paper_notes/polymapper.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [mapping, polygon, NOT HD maps]\n- [HRAN: Hierarchical Recurrent Attention Networks for Structured Online Maps](https://openaccess.thecvf.com/content_cvpr_2018/papers/Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.pdf) [[Notes](paper_notes/hran.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [HD mapping, highway, polyline loss, Chamfer distance]\n- [Deep Structured Crosswalk: End-to-End Deep Structured Models for Drawing Crosswalks](https://openaccess.thecvf.com/content_ECCV_2018/papers/Justin_Liang_End-to-End_Deep_Structured_ECCV_2018_paper.pdf) [[Notes](paper_notes/deep_structured_crosswalk.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e\n- [DeepBoundaryExtractor: Convolutional Recurrent Network for Road Boundary Extraction](http://openaccess.thecvf.com/content_CVPR_2019/html/Liang_Convolutional_Recurrent_Network_for_Road_Boundary_Extraction_CVPR_2019_paper.html) [[Notes](paper_notes/deep_boundary_extractor.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [HD mapping, boundary, polyline loss]\n- [DAGMapper: Learning to Map by Discovering Lane Topology](http://openaccess.thecvf.com/content_ICCV_2019/papers/Homayounfar_DAGMapper_Learning_to_Map_by_Discovering_Lane_Topology_ICCV_2019_paper.pdf) [[Notes](paper_notes/dagmapper.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [HD mapping, highway, forks and merges, polyline loss]\n- [Sparse-HD-Maps: Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization](https://arxiv.org/abs/1908.03274) [[Notes](paper_notes/sparse_hd_maps.md)] \u003ckbd\u003eIROS 2019 oral\u003c/kbd\u003e [Uber ATG, metadata, mapping, localization]\n- [Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks](https://arxiv.org/abs/1803.06904) \u003ckbd\u003eIEEE TGRS 2018\u003c/kbd\u003e\n- [Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs](https://www.mdpi.com/1424-8220/20/7/1870/htm) \u003ckbd\u003eSensors 2020\u003c/kbd\u003e [Tsinghua, 3D HD maps]\n- [PatchNet: Rethinking Pseudo-LiDAR Representation](https://arxiv.org/abs/2008.04582) [[Notes](paper_notes/patchnet.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [SenseTime, Wanli Ouyang]\n- [D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection](https://arxiv.org/abs/1912.04799) [[Notes](paper_notes/d4lcn.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [mono3D]\n- [MfS: Learning Stereo from Single Images](https://arxiv.org/abs/2008.01484) [[Notes](paper_notes/mfs.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [mono for stereo, learn stereo matching with mono]\n- [BorderDet: Border Feature for Dense Object Detection](https://arxiv.org/abs/2007.11056) \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e [Megvii]\n- [Scale-Aware Trident Networks for Object Detection](https://arxiv.org/abs/1901.01892) \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [different heads for different scales]\n- [Learning Depth from Monocular Videos using Direct Methods](https://arxiv.org/abs/1712.00175)\n- [Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints](https://arxiv.org/abs/1802.05522) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [Google]\n- [NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections](https://arxiv.org/abs/2008.02268)\n- [Supervising the new with the old: learning SFM from SFM](http://openaccess.thecvf.com/content_ECCV_2018/papers/Maria_Klodt_Supervising_the_new_ECCV_2018_paper.pdf) [[Notes](paper_notes/learn_sfm_from_sfm.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e\n- [Neural RGB-\u003eD Sensing: Depth and Uncertainty from a Video Camera](https://arxiv.org/abs/1901.02571) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [multi-frame monodepth]\n- [Don't Forget The Past: Recurrent Depth Estimation from Monocular Video](https://arxiv.org/abs/2001.02613) [multi-frame monodepth, RNN]\n- [Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth](https://arxiv.org/abs/1904.07087) [multi-frame monodepth, RNN]\n- [Exploiting temporal consistency for real-time video depth estimation](https://arxiv.org/abs/1908.03706) \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [multi-frame monodepth, RNN, indoor]\n- [SfM-Net: Learning of Structure and Motion from Video](https://arxiv.org/abs/1704.07804) [dynamic object, SfM]\n- [MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection](https://ieeexplore.ieee.org/document/8500395) [[Notes](paper_notes/mb_net.md)] \u003ckbd\u003eIV 2018\u003c/kbd\u003e [mono3D: Daimler]\n- [BS3D: Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images](https://ieeexplore.ieee.org/abstract/document/8814036/) [[Notes](paper_notes/bs3d.md)] \u003ckbd\u003eIV 2019\u003c/kbd\u003e [mono3D, Daimler]\n- [3D-GCK: Single-Shot 3D Detection of Vehicles from Monocular RGB Images via\nGeometrically Constrained Keypoints in Real-Time](https://arxiv.org/abs/2006.13084) [[Notes](paper_notes/3d_gck.md)] \u003ckbd\u003eIV 2020\u003c/kbd\u003e [[mono3D, Daimler]\n- [UR3D: Distance-Normalized Unified Representation for Monocular 3D Object Detection](https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/6559_ECCV_2020_paper.php) [[Notes](paper_notes/ur3d.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [mono3D]\n- [DA-3Det: Monocular 3D Object Detection via Feature Domain Adaptation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123540018.pdf) [[Notes](paper_notes/da_3det.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [mono3D]\n- [RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection](https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/2822_ECCV_2020_paper.php) [[Notes](paper_notes/rarnet.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [mono3D]\n\n\n## 2020-07 (25)\n- [CenterTrack: Tracking Objects as Points](https://arxiv.org/abs/2004.01177) [[Notes](paper_notes/centertrack.md)] \u003ckbd\u003eECCV 2020 spotlight\u003c/kbd\u003e [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection, Philipp Krähenbühl]\n- [CenterPoint: Center-based 3D Object Detection and Tracking](https://arxiv.org/abs/2006.11275) [[Notes](paper_notes/centerpoint.md)] \u003ckbd\u003eCVPR 2021\u003c/kbd\u003e [lidar based 3D MOD, CenterNet]\n- [Tracktor: Tracking without bells and whistles](https://arxiv.org/abs/1903.05625) [[Notes](paper_notes/tracktor.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [Tracktor/Tracktor++, Laura Leal-Taixe@TUM]\n- [FairMOT: A Simple Baseline for Multi-Object Tracking](https://arxiv.org/abs/2004.01888) [[Notes](paper_notes/fairmot.md)]\n- [DeepMOT: A Differentiable Framework for Training Multiple Object Trackers](https://arxiv.org/abs/1906.06618) [[Notes](paper_notes/deepmot.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [trainable Hungarian, Laura Leal-Taixe@TUM]\n- [MPNTracker: Learning a Neural Solver for Multiple Object Tracking](https://arxiv.org/abs/1912.07515) \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [trainable Hungarian, Laura Leal-Taixe@TUM]\n- [nuScenes: A multimodal dataset for autonomous driving](https://arxiv.org/abs/1903.11027) [[Notes](paper_notes/nuscenes.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [dataset, point cloud, radar]\n- [CBGS: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection](https://arxiv.org/abs/1908.09492) [[Notes](paper_notes/cbgs.md)] \u003ckbd\u003eCVPRW 2019\u003c/kbd\u003e [Megvii, lidar, WAD challenge winner]\n- [AFDet: Anchor Free One Stage 3D Object Detection](https://arxiv.org/abs/2006.12671) and [Competition solution](https://arxiv.org/pdf/2006.15505.pdf) [[Notes](paper_notes/afdet.md)]  \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [Horizon robotics, lidar, winning for Waymo challenge] \n- Review of MOT and SOT [[Notes](paper_notes/mot_and_sot.md)]\n- [CrowdHuman: A Benchmark for Detecting Human in a Crowd](https://arxiv.org/abs/1805.00123) [[Notes](paper_notes/crowdhuman.md)] [megvii, pedestrian, dataset]\n- [WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild](https://arxiv.org/abs/1909.12118) [[Notes](paper_notes/widerperson.md)] \u003ckbd\u003eTMM 2019\u003c/kbd\u003e [dataset, pedestrian]\n- [Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection](http://www.gavrila.net/Publications/iv16_cyclist_benchmark.pdf) [[Notes](paper_notes/tsinghua_daimler_cyclist.md)] \u003ckbd\u003eIV 2016\u003c/kbd\u003e [dataset, cyclist Detection]\n- [Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera](https://drive.google.com/drive/u/0/folders/1inawrX9NVcchDQZepnBeJY4i9aAI5mg9) [[Notes]([paper_notes/specialized_cyclists.md)] \u003ckbd\u003eIV 2019\u003c/kbd\u003e [Extention to KITTI]\n- [PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01550) [[Notes](paper_notes/pointtrack.md)] \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e [MOTS]\n- [PointTrack++ for Effective Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01549) [[Notes](paper_notes/pointtrack++.md)] \u003ckbd\u003eCVPR 2020 workshop\u003c/kbd\u003e [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]\n- [SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth](https://arxiv.org/abs/1906.11109) [[Notes](paper_notes/spatial_embedding.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [one-stage, instance segmentation]\n- [BA-Net: Dense Bundle Adjustment Networks](https://arxiv.org/abs/1806.04807) [[Notes](paper_notes/banet.md)] \u003ckbd\u003eICLR 2019\u003c/kbd\u003e [Bundle adjustment, multi-frame monodepth, feature-metric]\n- [DeepSFM: Structure From Motion Via Deep Bundle Adjustment](https://arxiv.org/abs/1912.09697) \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e [multi-frame monodepth, indoor scene]\n- [CVD: Consistent Video Depth Estimation](https://arxiv.org/abs/2004.15021) [[Notes](paper_notes/cvd.md)] \u003ckbd\u003eSIGGRAPH 2020\u003c/kbd\u003e [multi-frame monodepth, online finetune]\n- [DeepV2D: Video to Depth with Differentiable Structure from Motion](https://arxiv.org/abs/1812.04605) [[Notes](paper_notes/deepv2d.md)] \u003ckbd\u003eICLR 2020\u003c/kbd\u003e [multi-frame monodepth, Jia Deng]\n- [GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose](https://arxiv.org/abs/1803.02276) [[Notes](paper_notes/geonet.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [residual optical flow, monodepth, rigid and dynamic motion]\n- [GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera](https://arxiv.org/abs/1907.05820) [[Notes](paper_notes/glnet.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [online finetune, rigid and dynamic motion]\n- [Depth Hints: Self-Supervised Monocular Depth Hints](https://arxiv.org/abs/1909.09051) [[Notes](paper_notes/depth_hints.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [monodepth, local minima, cheap stereo GT]\n- [MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation](https://arxiv.org/abs/2005.06209) [[Notes](paper_notes/mono_uncertainty.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [depth uncertainty]\n- [Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment](https://arxiv.org/abs/1909.13163) [[Notes](paper_notes/ba_sfm_learner.md)] [Bundle adjustment, xmotors.ai, multi-frame monodepth]\n- [Kinematic 3D Object Detection in Monocular Video](https://arxiv.org/abs/2007.09548) [[Notes](paper_notes/kinematic_mono3d.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [multi-frame mono3D, Xiaoming Liu]\n- [VelocityNet: Camera-based vehicle velocity estimation from monocular video](https://arxiv.org/abs/1802.07094) [[Notes](paper_notes/velocity_net.md)] \u003ckbd\u003eCVPR 2017 workshop\u003c/kbd\u003e [monocular velocity estimation, CVPR 2017 challenge winner]\n- [Vehicle Centric VelocityNet: End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera](https://arxiv.org/abs/2006.04082) [[Notes](paper_notes/vehicle_centric_velocity_net.md)] [monocular velocity estimation, monocular distance, SOTA]\n\n## 2020-06 (20)\n- [LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain](http://personal.stevens.edu/~benglot/Shan_Englot_IROS_2018_Preprint.pdf) [[Notes](paper_notes/lego_loam.md)] \u003ckbd\u003eIROS 2018\u003c/kbd\u003e [lidar, mapping]\n- [PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction](http://openaccess.thecvf.com/content_ICCV_2019/papers/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.pdf) [[Notes](paper_notes/pie.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e\n- [JAAD: Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian\nCrosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w3/Rasouli_Are_They_Going_ICCV_2017_paper.pdf) \u003ckbd\u003eICCV 2017\u003c/kbd\u003e\n- [Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs](https://bmvc2019.org/wp-content/uploads/papers/0283-paper.pdf) \u003ckbd\u003eBMVC 2019\u003c/kbd\u003e\n- [Is the Pedestrian going to Cross? Answering by 2D Pose Estimation](https://arxiv.org/abs/1807.10580) \u003ckbd\u003eIV 2018\u003c/kbd\u003e\n- [Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation](https://arxiv.org/abs/1910.03858) \u003ckbd\u003eITSC 2019\u003c/kbd\u003e [skeleton, pedestrian, cyclist intention]\n- [Attentive Single-Tasking of Multiple Tasks](https://arxiv.org/abs/1904.08918) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [DETR: End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) [[Notes](paper_notes/detr.md)] \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e [FAIR]\n- [Transformer: Attention Is All You Need](https://arxiv.org/abs/1706.03762) [[Notes](paper_notes/transformer.md)] \u003ckbd\u003eNIPS 2017\u003c/kbd\u003e\n- [SpeedNet: Learning the Speediness in Videos](https://arxiv.org/abs/2004.06130) [[Notes](paper_notes/speednet.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e\n- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [Mono3D, pairwise relationship]\n- [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111) [[Notes](paper_notes/smoke.md)] \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [Mono3D, Zongmu]\n- [Vehicle Re-ID for Surround-view Camera System](https://drive.google.com/file/d/1e6y8wtHAricaEHS9CpasSGOx0aAxCGib/view) [[Notes](paper_notes/reid_surround_fisheye.md)] \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [tireline, vehicle ReID, Zongmu]\n- [End-to-End Lane Marker Detection via Row-wise Classification](https://arxiv.org/abs/2005.08630) [[Notes](paper_notes/e2e_lmd.md)] [Qualcomm Korea, LLD as cls]\n- [Reliable multilane detection and classification by utilizing CNN as a regression network](http://openaccess.thecvf.com/content_ECCVW_2018/papers/11133/Chougule_Reliable_multilane_detection_and_classification_by_utilizing_CNN_as_a_ECCVW_2018_paper.pdf) \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [LLD as reg]\n- [SUPER: A Novel Lane Detection System](https://arxiv.org/abs/2005.07277) [[Notes](paper_notes/super.md)]\n- [Learning Lightweight Lane Detection CNNs by Self Attention Distillation](https://arxiv.org/abs/1908.00821) \u003ckbd\u003eICCV 2019\u003c/kbd\u003e\n- [StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation](http://www.bmva.org/bmvc/2015/papers/paper109/paper109.pdf) \u003ckbd\u003eBMVC 2015\u003c/kbd\u003e\n- [StixelNetV2: Real-time category-based and general obstacle detection for autonomous driving](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w3/Garnett_Real-Time_Category-Based_and_ICCV_2017_paper.pdf) [[Notes](paper_notes/stixelnetv2.md)] \u003ckbd\u003eICCV 2017\u003c/kbd\u003e [DS]\n- [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/abs/1609.05158) [[Notes](paper_notes/subpixel_conv.md)] \u003ckbd\u003eCVPR 2016\u003c/kbd\u003e [channel-to-pixel]\n- [Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints](https://arxiv.org/abs/1912.04363) [mono3D]\n- [Self-Mono-SF: Self-Supervised Monocular Scene Flow Estimation](https://arxiv.org/abs/2004.04143) [[Notes](paper_notes/self_mono_sf.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [scene-flow, Stereo input]\n- [MEBOW: Monocular Estimation of Body Orientation In the Wild](https://arxiv.org/abs/2011.13688) [[Notes](paper_notes/mebow.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [VG-NMS: Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes](https://arxiv.org/abs/2006.08547) [[Notes](paper_notes/vg_nms.md)] \u003ckbd\u003eNeurIPS 2019 workshop\u003c/kbd\u003e [Crowded scene, NMS, Daimler]\n- [WYSIWYG: What You See is What You Get: Exploiting Visibility for 3D Object Detection](https://arxiv.org/abs/1912.04986) [[Notes](paper_notes/wysiwyg.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [occupancy grid]\n- [Real-Time Panoptic Segmentation From Dense Detections](https://arxiv.org/abs/1912.01202) [[Notes](paper_notes/realtime_panoptic.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [bbox + semantic segmentation = panoptic segmentation, Toyota]\n- [Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving](https://drive.google.com/file/d/1DY95vfWBLKOOZZyq8gLDd0heZ6aBSdji/view) [[Notes](paper_notes/human_centric_annotation.md)] \u003ckbd\u003eCVPRW 2020\u003c/kbd\u003e [efficient annotation]\n- [SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving](https://arxiv.org/abs/2005.03844) [[Notes](paper_notes/surfel_gan.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [Waymo, auto data generation, surfel]\n- [LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World](https://arxiv.org/abs/2006.09348) [[Notes](paper_notes/lidarsim.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [Uber ATG, auto data generation, surfel]\n- [SuMa++: Efficient LiDAR-based Semantic SLAM](http://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/chen2019iros.pdf) \u003ckbd\u003eIROS 2019\u003c/kbd\u003e [semantic segmentation, lidar, SLAM]\n- [PON/PyrOccNet: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks](https://arxiv.org/abs/2003.13402) [[Notes](paper_notes/pyroccnet.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [BEV-Net, OFT]\n- [MonoLayout: Amodal scene layout from a single image](https://arxiv.org/abs/2002.08394) [[Notes](paper_notes/monolayout.md)] \u003ckbd\u003eWACV 2020\u003c/kbd\u003e [BEV-Net]\n- [BEV-Seg: Bird’s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud](https://arxiv.org/abs/2006.11436) [[Notes](paper_notes/bev_seg.md)] \u003ckbd\u003eCVPR 2020 workshop\u003c/kbd\u003e [BEV-Net, Mapping]\n- [A Geometric Approach to Obtain a Bird's Eye View from an Image](https://arxiv.org/abs/1905.02231) \u003ckbd\u003eICCVW 2019\u003c/kbd\u003e [mapping, geometry, Andrew Zisserman]\n- [FrozenDepth: Learning the Depths of Moving People by Watching Frozen People](https://arxiv.org/abs/1904.11111) [[Notes](paper_notes/frozen_depth.md)] \u003ckbd\u003eCVPR 2019 oral\u003c/kbd\u003e\n- [ORB-SLAM: a Versatile and Accurate Monocular SLAM System](https://arxiv.org/abs/1502.00956) \u003ckbd\u003eTRO 2015\u003c/kbd\u003e\n- [ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras](https://arxiv.org/abs/1610.06475) \u003ckbd\u003eTRO 2016\u003c/kbd\u003e\n- [CubeSLAM: Monocular 3D Object SLAM](https://arxiv.org/abs/1806.00557) [[Notes](paper_notes/cube_slam.md)] \u003ckbd\u003eTRO 2019\u003c/kbd\u003e [dynamic SLAM, orb slam + mono3D]\n- [ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings](https://arxiv.org/abs/2003.12980) [[Notes](paper_notes/cluster_vo.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [general dynamic SLAM]\n- [S3DOT: Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving](https://arxiv.org/abs/1807.02062) [[Notes](paper_notes/s3dot.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [Peiliang Li]\n- [Multi-object Monocular SLAM for Dynamic Environments](https://arxiv.org/abs/2002.03528) [[Notes](paper_notes/multi_object_mono_slam.md)] \u003ckbd\u003eIV 2020\u003c/kbd\u003e [monolayout authors]\n- [PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume](https://arxiv.org/abs/1709.02371) [[Notes](paper_notes/pwc_net.md)] \u003ckbd\u003eCVPR 2018 oral\u003c/kbd\u003e [Optical flow]\n- [LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation](https://arxiv.org/abs/1805.07036) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [Optical flow]\n- [FlowNet: Learning Optical Flow With Convolutional Networks](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Dosovitskiy_FlowNet_Learning_Optical_ICCV_2015_paper.pdf) \u003ckbd\u003eICCV 2015\u003c/kbd\u003e [Optical flow]\n- [FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks](https://arxiv.org/abs/1612.01925) \u003ckbd\u003eCVPR 2017\u003c/kbd\u003e [Optical flow]\n- [ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network](https://arxiv.org/abs/1811.11431) \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [semantic segmentation, lightweight]\n- [Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes](https://arxiv.org/abs/1908.06316) \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [depth uncertainty]\n\n  \n## 2020-05 (19)\n- [Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems](https://arxiv.org/abs/1809.07408) [[Notes](paper_notes/hevi.md)] [Honda] \u003ckbd\u003eICRA 2019\u003c/kbd\u003e\n- [PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation](https://arxiv.org/abs/1905.02693) [[Notes](paper_notes/packnet.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [Scale aware depth]\n- [PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth](https://arxiv.org/abs/2002.12319) [[Notes](paper_notes/packnet_sg.md)] \u003ckbd\u003eICLR 2020\u003c/kbd\u003e [TRI, infinite-depth problem]\n- [TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet](https://arxiv.org/abs/2004.01314) [[Notes](paper_notes/trianflow.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e [Scale aware]\n- [Understanding the Limitations of CNN-based Absolute Camera Pose Regression](https://arxiv.org/abs/1903.07504) [[Notes](paper_notes/understanding_apr.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [Drawbacks of PoseNet, MapNet, Laura Leal-Taixe@TUM]\n- [To Learn or Not to Learn: Visual Localization from Essential Matrices](https://arxiv.org/abs/1908.01293) [[Notes](paper_notes/to_learn_or_not.md)] \u003ckbd\u003eICRA 2020\u003c/kbd\u003e [SIFT + 5 pt solver \u003e\u003e others for VO, Laura Leal-Taixe@TUM]\n- [DF-VO: Visual Odometry Revisited: What Should Be Learnt?](https://arxiv.org/abs/1909.09803) [[Notes](paper_notes/df_vo.md)] \u003ckbd\u003eICRA 2020\u003c/kbd\u003e [Depth and Flow for accurate VO]\n- [D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry](https://arxiv.org/abs/2003.01060) [[Notes](paper_notes/d3vo.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [Daniel Cremers, TUM, depth uncertainty]\n- [Network Slimming: Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519) [[Notes](paper_notes/network_slimming.md)] \u003ckbd\u003eICCV 2017\u003c/kbd\u003e\n- [BatchNorm Pruning: Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://arxiv.org/abs/1802.00124) [[Notes](paper_notes/batchnorm_pruning.md)] \u003ckbd\u003eICLR 2018\u003c/kbd\u003e\n- [Direct Sparse Odometry](https://arxiv.org/abs/1607.02565) \u003ckbd\u003ePAMI 2018\u003c/kbd\u003e\n- [Train in Germany, Test in The USA: Making 3D Object Detectors Generalize](https://arxiv.org/abs/2005.08139) [[Notes](paper_notes/train_in_germany.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [PseudoLidarV3: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection](https://arxiv.org/abs/2004.03080) [[Notes](paper_notes/pseudo_lidar_v3.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection](https://arxiv.org/abs/1912.02424) [[Notes](paper_notes/atss.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e\n- [Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression](https://arxiv.org/abs/1911.08287) \u003ckbd\u003eAAAI 2020\u003c/kbd\u003e\n- [Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation](https://arxiv.org/abs/2005.03572) [Journal version]\n- [YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934) [[Notes](paper_notes/yolov4.md)]\n- [CBN: Cross-Iteration Batch Normalization](https://arxiv.org/abs/2002.05712) [[Notes](paper_notes/cbn.md)]\n- [Stitcher: Feedback-driven Data Provider for Object Detection](https://arxiv.org/abs/2004.12432) [[Notes](paper_notes/stitcher.md)]\n- [SKNet: Selective Kernel Networks](https://arxiv.org/abs/1903.06586) [[Notes](paper_notes/sknet.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e\n- [CBAM: Convolutional Block Attention Module](https://arxiv.org/abs/1807.06521) [[Notes](paper_notes/cbam.md)] \u003ckbd\u003eECCV 2018\u003c/kbd\u003e \n- [ResNeSt: Split-Attention Networks](https://arxiv.org/abs/2004.08955) [[Notes](paper_notes/resnest.md)]\n\n## 2020-04 (14)\n- [ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst](https://arxiv.org/pdf/1812.03079.pdf) [[Notes](paper_notes/chauffeurnet.md)] \u003ckbd\u003eRSS 2019\u003c/kbd\u003e [Waymo]\n- [IntentNet: Learning to Predict Intention from Raw Sensor Data](http://www.cs.toronto.edu/~wenjie/papers/intentnet_corl18.pdf) [[Notes](paper_notes/intentnet.md)] \u003ckbd\u003eCoRL 2018\u003c/kbd\u003e [Uber ATG, perception and prediction, Lidar+Map]\n- [RoR: Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions](https://arxiv.org/abs/1906.08945) [[Notes](paper_notes/ror.md)] \u003ckbd\u003eCVPR 2019\u003c/kbd\u003e [Zoox]\n- [MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction](https://arxiv.org/abs/1910.05449) [[Notes](paper_notes/multipath.md)] \u003ckbd\u003eCoRL 2019\u003c/kbd\u003e [Waymo, authors from RoR and ChauffeurNet]\n- [NMP: End-to-end Interpretable Neural Motion Planner](http://www.cs.toronto.edu/~wenjie/papers/cvpr19/nmp.pdf) [[Notes](paper_notes/nmp.md)] \u003ckbd\u003eCVPR 2019 oral\u003c/kbd\u003e [Uber ATG]\n- [Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks](https://arxiv.org/abs/1809.10732) [[Notes](paper_notes/multipath_uber.md)] \u003ckbd\u003eICRA 2019\u003c/kbd\u003e [Henggang Cui, Multimodal, Uber ATG Pittsburgh]\n- [Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving](https://arxiv.org/abs/1808.05819) \u003ckbd\u003eWACV 2020\u003c/kbd\u003e [Uber ATG Pittsburgh] \n- [TensorMask: A Foundation for Dense Object Segmentation](https://arxiv.org/abs/1903.12174) [[Notes](paper_notes/tensormask.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [single-stage instance seg]\n- [BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation](https://arxiv.org/abs/2001.00309) [[Notes](paper_notes/blendmask.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e\n- [Mask Encoding for Single Shot Instance Segmentation](https://arxiv.org/abs/2003.11712) [[Notes](paper_notes/meinst.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [single-stage instance seg, Chunhua Shen]\n- [PolarMask: Single Shot Instance Segmentation with Polar Representation](https://arxiv.org/abs/1909.13226) [[Notes](paper_notes/polarmask.md)] \u003ckbd\u003eCVPR 2020 oral\u003c/kbd\u003e [single-stage instance seg]\n- [SOLO: Segmenting Objects by Locations](https://arxiv.org/abs/1912.04488) [[Notes](paper_notes/solo.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [single-stage instance seg, Chunhua Shen]\n- [SOLOv2: Dynamic, Faster and Stronger](https://arxiv.org/abs/2003.10152) [[Notes](paper_notes/solov2.md)] [single-stage instance seg, Chunhua Shen]\n- [CondInst: Conditional Convolutions for Instance Segmentation](https://arxiv.org/abs/2003.05664) [[Notes](paper_notes/condinst.md)] \u003ckbd\u003eECCV 2020 oral\u003c/kbd\u003e [single-stage instance seg, Chunhua Shen]\n- [CenterMask: Single Shot Instance Segmentation With Point Representation](https://arxiv.org/abs/2004.04446) [[Notes](paper_notes/centermask.md)]\u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n\n\n## 2020-03 (15)\n- [VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition](https://arxiv.org/abs/1710.06288) [[Notes](paper_notes/vpgnet.md)] \u003ckbd\u003eICCV 2017\u003c/kbd\u003e\n- [Which Tasks Should Be Learned Together in Multi-task Learning?](https://arxiv.org/abs/1905.07553) [[Notes](paper_notes/task_grouping.md)] [Stanford, MTL] \u003ckbd\u003eICML 2020\u003c/kbd\u003e\n- [MGDA: Multi-Task Learning as Multi-Objective Optimization](https://arxiv.org/abs/1810.04650) \u003ckbd\u003eNeurIPS 2018\u003c/kbd\u003e\n- [Taskonomy: Disentangling Task Transfer Learning](https://arxiv.org/abs/1804.08328) [[Notes](paper_notes/taskonomy.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e\n- [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883) [[Notes](paper_notes/rethinking_pretraining.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [Kaiming He]\n- [UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor](https://arxiv.org/abs/1907.04011) [[Notes](paper_notes/unsuperpoint.md)] [superpoint]\n- [KP2D: Neural Outlier Rejection for Self-Supervised Keypoint Learning](https://arxiv.org/abs/1912.10615) [[Notes](paper_notes/kp2d.md)] \u003ckbd\u003eICLR 2020\u003c/kbd\u003e (pointNet)\n- [KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation](https://arxiv.org/abs/1912.03426) [[Notes](paper_notes/kp3d.md)] \u003ckbd\u003eCoRL 2020\u003c/kbd\u003e [Toyota, superpoint]\n- [NG-RANSAC: Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses](https://arxiv.org/abs/1905.04132) [[Notes](paper_notes/ng_ransac.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e [pointNet]\n- [Learning to Find Good Correspondences](https://arxiv.org/abs/1711.05971) [[Notes](paper_notes/learning_correspondence.md)] \u003ckbd\u003eCVPR 2018 Oral\u003c/kbd\u003e (pointNet)\n- [RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving](https://arxiv.org/abs/1911.09712) [[Notes](paper_notes/refined_mpl.md)] [Huawei, Mono3D]\n- [DSP: Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation](https://arxiv.org/abs/2002.01619) [[Notes](paper_notes/dsp.md)] \u003ckbd\u003eAAAI 2020\u003c/kbd\u003e (SenseTime, Mono3D)\n- [Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks](https://arxiv.org/abs/1903.02193) (LLD, LSTM)\n- [LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach](https://arxiv.org/abs/1802.05591) [[Notes](paper_notes/lanenet.md)] \u003ckbd\u003eIV 2018\u003c/kbd\u003e (LaneNet)\n- [3D-LaneNet: End-to-End 3D Multiple Lane Detection](http://openaccess.thecvf.com/content_ICCV_2019/papers/Garnett_3D-LaneNet_End-to-End_3D_Multiple_Lane_Detection_ICCV_2019_paper.pdf) [[Notes](paper_notes/3d_lanenet.md)] \u003ckbd\u003eICCV 2019\u003c/kbd\u003e\n- [Semi-Local 3D Lane Detection and Uncertainty Estimation](https://arxiv.org/abs/2003.05257) [[Notes](paper_notes/semilocal_3d_lanenet.md)] [GM Israel, 3D LLD]\n- [Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection](https://arxiv.org/abs/2003.10656) [[Notes](paper_notes/gen_lanenet.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [Apollo, 3D LLD]\n- [Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty](https://arxiv.org/abs/1711.09026) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e [Egocentric prediction]\n- [It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection](http://openaccess.thecvf.com/content_ECCVW_2018/papers/11129/Rasouli_Its_Not_All_About_Size_On_the_Role_of_Data_ECCVW_2018_paper.pdf) \u003ckbd\u003eECCV 2018\u003c/kbd\u003e [pedestrian]\n\n\n## 2020-02 (12)\n- [Associative Embedding: End-to-End Learning for Joint Detection and Grouping](https://arxiv.org/abs/1611.05424) [[Notes](paper_notes/associative_embedding.md)] \u003ckbd\u003eNIPS 2017\u003c/kbd\u003e\n- [Pixels to Graphs by Associative Embedding](https://arxiv.org/abs/1706.07365) [[Notes](paper_notes/pixels_to_graphs.md)] \u003ckbd\u003eNIPS 2017\u003c/kbd\u003e\n- [Social LSTM: Human Trajectory Prediction in Crowded Spaces](http://cvgl.stanford.edu/papers/CVPR16_Social_LSTM.pdf) [[Notes](paper_notes/social_lstm.md)] \u003ckbd\u003eCVPR 2017\u003c/kbd\u003e \n- [Online Video Object Detection using Association LSTM](http://openaccess.thecvf.com/content_ICCV_2017/papers/Lu__Online_Video_ICCV_2017_paper.pdf) [[Notes](paper_notes/association_lstm.md)] [single stage, recurrent]\n- [SuperPoint: Self-Supervised Interest Point Detection and Description](https://arxiv.org/abs/1712.07629) [[Notes](paper_notes/superpoint.md)] \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e (channel-to-pixel, deep SLAM, Magic Leap)\n- [PointRend: Image Segmentation as Rendering](https://arxiv.org/abs/1912.08193) [[Notes](paper_notes/pointrend.md)] \u003ckbd\u003eCVPR 2020 Oral\u003c/kbd\u003e [Kaiming He, FAIR]\n- [Multigrid: A Multigrid Method for Efficiently Training Video Models](https://arxiv.org/abs/1912.00998) [[Notes](paper_notes/multigrid_training.md)] \u003ckbd\u003eCVPR 2020 Oral\u003c/kbd\u003e [Kaiming He, FAIR]\n- [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907) [[Notes](paper_notes/ghostnet.md)] \u003ckbd\u003eCVPR 2020\u003c/kbd\u003e\n- [FixRes: Fixing the train-test resolution discrepancy](https://arxiv.org/abs/1906.06423) [[Notes](paper_notes/fixres.md)] \u003ckbd\u003eNIPS 2019\u003c/kbd\u003e [FAIR]\n- [MoVi-3D: Towards Generalization Across Depth for Monocular 3D Object Detection](https://arxiv.org/abs/1912.08035) [[Notes](paper_notes/movi_3d.md)] \u003ckbd\u003eECCV 2020\u003c/kbd\u003e [Virtual Cam, viewport, Mapillary/Facebook, Mono3D] \n- [Amodal Completion and Size Constancy in Natural Scenes](https://arxiv.org/abs/1509.08147) [[Notes](paper_notes/amodal_completion.md)] \u003ckbd\u003eICCV 2015\u003c/kbd\u003e (Amodal completion)\n- [MoCo: Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722) [[Notes](paper_notes/moco.md)] \u003ckbd\u003eCVPR 2020 Oral\u003c/kbd\u003e [FAIR, Kaiming He]\n\n\n## 2020-01 (19)\n- [Double Descent: Reconciling modern machine learning practice and the bias-variance trade-of](https://arxiv.org/abs/1812.11118) [[Notes](paper_notes/double_descent.md)] \u003ckbd\u003ePNAS 2019\u003c/kbd\u003e\n- [Deep Double Descent: Where Bigger Models and More Data Hurt](https://arxiv.org/abs/1912.02292) [[Notes](paper_notes/deep_double_descent.md)]\n- [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/abs/1712.09913) \u003ckbd\u003eNIPS 2018\u003c/kbd\u003e\n- [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/pdf/1803.06184.pdf) \u003ckbd\u003eCVPR 2018\u003c/kbd\u003e (dataset)\n- [ApolloCar3D: A La","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrick-llgc%2Flearning-deep-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatrick-llgc%2Flearning-deep-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrick-llgc%2Flearning-deep-learning/lists"}