https://github.com/patrick-llgc/learning-deep-learning

Paper reading notes on Deep Learning and Machine Learning
https://github.com/patrick-llgc/learning-deep-learning
3d-object-detection 3d-object-recognition cnn computer-vision deep-learning literature-review machine-learning medical medical-imaging paper paper-reading paper-review point-cloud reinforcement-learning
Last synced: about 1 year ago
JSON representation
Paper reading notes on Deep Learning and Machine Learning
Host: GitHub
URL: https://github.com/patrick-llgc/learning-deep-learning
Owner: patrick-llgc
Created: 2017-04-02T05:29:55.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2024-11-29T15:28:50.000Z (over 1 year ago)
Last Synced: 2025-05-14T19:07:51.505Z (about 1 year ago)
Topics: 3d-object-detection, 3d-object-recognition, cnn, computer-vision, deep-learning, literature-review, machine-learning, medical, medical-imaging, paper, paper-reading, paper-review, point-cloud, reinforcement-learning
Language: Jupyter Notebook
Homepage:
Size: 87.5 MB
Stars: 1,181
Watchers: 111
Forks: 176
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Paper notes

This repository contains my paper reading notes on deep learning and machine learning. It is inspired by [Denny Britz](https://github.com/dennybritz/deeplearning-papernotes) and [Daniel Takeshi](https://github.com/DanielTakeshi/Paper_Notes). A minimalistic webpage generated with Github io can be found [here](https://patrick-llgc.github.io/Learning-Deep-Learning/).

## About me

My name is [Patrick Langechuan Liu](https://www.linkedin.com/in/patrick-llgc/). After about a decade of education and research in physics, I found my passion in deep learning and autonomous driving.

## What to read

If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into [this list of papers](start/first_cnn_papers.md). I did so ([see my notes](start/first_cnn_papers_notes.md)) and it served me well.

Here is [a list of trustworthy sources of papers](trusty.md) in case I ran out of papers to read.

## My review posts by topics

I regularly update [my blog in Toward Data Science](https://medium.com/@patrickllgc).

- [BEV Perception in Mass Production Autonomous Driving](https://towardsdatascience.com/bev-perception-in-mass-production-autonomous-driving-c6e3f1e46ae0)

- [Challenges of Mass Production Autonomous Driving in China](https://towardsdatascience.com/challenges-of-mass-production-autonomous-driving-in-china-407c7e2dc5d8)

- [Vision-centric Semantic Occupancy Prediction for Autonomous Driving](https://towardsdatascience.com/vision-centric-semantic-occupancy-prediction-for-autonomous-driving-16a46dbd6f65) ([related paper notes](topics/topic_occupancy_network.md))

- [Drivable Space in Autonomous Driving — The Industry](https://medium.com/@patrickllgc/drivable-space-in-autonomous-driving-the-industry-7a4624b94d41)

- [Drivable Space in Autonomous Driving — The Academia](https://towardsdatascience.com/drivable-space-in-autonomous-driving-a-review-of-academia-ef1a6aa4dc15)

- [Drivable Space in Autonomous Driving — The Concept](https://towardsdatascience.com/drivable-space-in-autonomous-driving-the-concept-df699bb8682f)

- [Monocular BEV Perception with Transformers in Autonomous Driving](https://towardsdatascience.com/monocular-bev-perception-with-transformers-in-autonomous-driving-c41e4a893944) ([related paper notes](topics/topic_transformers_bev.md))

- [Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning](https://towardsdatascience.com/illustrated-difference-between-mlp-and-transformers-for-tensor-reshaping-52569edaf89)

- [Monocular 3D Lane Line Detection in Autonomous Driving](https://towardsdatascience.com/monocular-3d-lane-line-detection-in-autonomous-driving-4d7cdfabf3b6) ([related paper notes](topics/topic_3d_lld.md))

- [Deep-Learning based Object detection in Crowded Scenes](https://towardsdatascience.com/deep-learning-based-object-detection-in-crowded-scenes-1c9fddbd7bc4) ([related paper notes](topics/topic_crowd_detection.md))

- [Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving](https://towardsdatascience.com/monocular-birds-eye-view-semantic-segmentation-for-autonomous-driving-ee2f771afb59) ([related paper notes](topics/topic_bev_segmentation.md))

- [Deep Learning in Mapping for Autonomous Driving](https://towardsdatascience.com/deep-learning-in-mapping-for-autonomous-driving-9e33ee951a44)

- [Monocular Dynamic Object SLAM in Autonomous Driving](https://towardsdatascience.com/monocular-dynamic-object-slam-in-autonomous-driving-f12249052bf1)

- [Monocular 3D Object Detection in Autonomous Driving — A Review](https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e)

- [Self-supervised Keypoint Learning — A Review](https://towardsdatascience.com/self-supervised-keypoint-learning-aade18081fc3)

- [Single Stage Instance Segmentation — A Review](https://towardsdatascience.com/single-stage-instance-segmentation-a-review-1eeb66e0cc49)

- [Self-paced Multitask Learning — A Review](https://towardsdatascience.com/self-paced-multitask-learning-76c26e9532d0)

- [Convolutional Neural Networks with Heterogeneous Metadata](https://towardsdatascience.com/convolutional-neural-networks-with-heterogeneous-metadata-2af9241218a9)

- [Lifting 2D object detection to 3D in autonomous driving](https://towardsdatascience.com/geometric-reasoning-based-cuboid-generation-in-monocular-3d-object-detection-5ee2996270d1)

- [Multimodal Regression](https://towardsdatascience.com/anchors-and-multi-bin-loss-for-multi-modal-target-regression-647ea1974617)

- [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3)

## 2024-11 (1)

- [On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258) [[Notes](paper_notes/opportunities_foundation_models.md)]

## 2024-06 (8)

- [LINGO-1: Exploring Natural Language for Autonomous Driving](https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/) [[Notes](paper_notes/lingo_1.md)] [Wayve, open-loop world model]

- [LINGO-2: Driving with Natural Language](https://wayve.ai/thinking/lingo-2-driving-with-language/) [[Notes](paper_notes/lingo_2.md)] [Wayve, closed-loop world model]

- [OpenVLA: An Open-Source Vision-Language-Action Model](https://arxiv.org/abs/2406.09246) [open source RT-2]

- [Parting with Misconceptions about Learning-based Vehicle Motion Planning](https://arxiv.org/abs/2306.07962) CoRL 2023 [Simple non-learning based baseline]

- [QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving](https://arxiv.org/abs/2404.01486) [Waabi]

- [MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving](https://ieeexplore.ieee.org/document/7139412) [[Notes](paper_notes/mpdm.md)] ICRA 2015 [Behavior planning, UMich, May Autonomy]

- [MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction](https://www.roboticsproceedings.org/rss11/p43.pdf) [[Notes](paper_notes/mpdm2.md)] RSS 2015 [Behavior planning]

- [MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment](https://link.springer.com/article/10.1007/s10514-017-9619-z) RSS 2017 [Behavior planning]

- [EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching](https://arxiv.org/abs/2003.02746) [[Notes](paper_notes/eudm.md)] ICRA 2020 [Wenchao Ding, Shaojie Shen, Behavior planning]

- [TPP: Tree-structured Policy Planning with Learned Behavior Models](https://arxiv.org/abs/2301.11902) ICRA 2023 [Marco Pavone, Nvidia, Behavior planning]

- [MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving](https://arxiv.org/abs/2308.12021) [[Notes](paper_notes/marc.md)] RAL 2023 [Shaojie Shen, Behavior planning]

- [EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments](https://arxiv.org/abs/2108.07993) TRO 2021 [Wenchao Ding, encyclopedia of pnc]

- [trajdata: A Unified Interface to Multiple Human Trajectory Datasets](https://arxiv.org/abs/2307.13924) NeurIPS 2023 [Marco Pavone, Nvidia]

- [Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization](https://arxiv.org/abs/2307.09466) [Xpeng]

- [Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles](https://arxiv.org/abs/1910.04586) [[Notes](paper_notes/joint_learned_bptp.md)] IROS 2019 Oral [Uber ATG, behavioral planning, motion planning]

- [Enhancing End-to-End Autonomous Driving with Latent World Model](https://arxiv.org/abs/2406.08481)

- [OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments](https://arxiv.org/abs/2312.09243) [Jiwen Lu]

- [RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision](https://arxiv.org/abs/2309.09502) ICRA 2024

- [EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision](https://arxiv.org/pdf/2311.02077) [Sanja, Marco, NV]

- [FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation](https://opendrivelab.com/e2ead/AD23Challenge/Track_3_NVOCC.pdf?=&linkId=100000205404832)

- [Trajeglish: Traffic Modeling as Next-Token Prediction](https://arxiv.org/abs/2312.04535) ICLR 2024

- [Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks](https://arxiv.org/pdf/2106.13052) ITSC 2021

- [Learning-Based Approach for Online Lane Change Intention Prediction](https://ieeexplore.ieee.org/document/6629564/) IV 2013 [SVM, LC intention prediction]

- [Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario](https://ieeexplore.ieee.org/document/10171417) RAL 2023 [Wenchao Ding, Huawei, crowdsourced map]

- [FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow](https://arxiv.org/abs/2305.01622) ICRA 2023

- [Hybrid A-star: Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments](https://www.semanticscholar.org/paper/Path-Planning-for-Autonomous-Vehicles-in-Unknown-Dolgov-Thrun/0e8c927d9c2c46b87816a0f8b7b8b17ed1263e9c) IJRR 2010 [Dolgov, Thrun, Searching]

- [Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame](https://www.semanticscholar.org/paper/Optimal-trajectory-generation-for-dynamic-street-in-Werling-Ziegler/6bda8fc13bda8cffb3bb426a73ce5c12cc0a1760) ICRA 2010 [Werling, Thrun, Sampling] [MUST READ for planning folks]

- [Autonomous Driving on Curvy Roads Without Reliance on Frenet Frame: A Cartesian-Based Trajectory Planning Method](https://ieeexplore.ieee.org/document/9703250) TITS 2022

- [Baidu Apollo EM Motion Planner](https://arxiv.org/abs/1807.08048) [[Notes](paper_notes/apollo_em_planner.md)][Optimization]

- [基于改进混合A*的智能汽车时空联合规划方法](https://www.qichegongcheng.com/CN/abstract/abstract1500.shtml) 汽车工程: 规划&决策2023年 [Joint optimization, search]

- [Enable Faster and Smoother Spatio-temporal Trajectory Planning for Autonomous Vehicles in Constrained Dynamic Environment](https://journals.sagepub.com/doi/abs/10.1177/0954407020906627) JAE 2020 [Joint optimization, search]

- [Focused Trajectory Planning for Autonomous On-Road Driving](https://www.ri.cmu.edu/pub_files/2013/6/IV2013-Tianyu.pdf) IV 2013 [Joint optimization, Iteration]

- [SSC: Safe Trajectory Generation for Complex Urban Environments Using Spatio-Temporal Semantic Corridor](https://arxiv.org/abs/1906.09788) RAL 2019 [Joint optimization, SSC, Wenchao Ding, Motion planning]

- [AlphaGo: Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961) [[Notes](paper_notes/alphago.md)] Nature 2016 [DeepMind, MTCS]

- [AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/full/10.1126/science.aar6404) Science 2017 [DeepMind]

- [MuZero: Mastering Atari, Go, chess and shogi by planning with a learned model](https://www.nature.com/articles/s41586-020-03051-4) Nature 2020 [DeepMind]

- [Grandmaster-Level Chess Without Search](https://arxiv.org/abs/2402.04494) [DeepMind]

- [Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving](https://arxiv.org/abs/1610.03295) [MobileEye, desire and traj optimization]

- [Comprehensive Reactive Safety: No Need For A Trajectory If You Have A Strategy](https://arxiv.org/abs/2207.00198) IROS 2022 [Da Fang, Qcraft]

- [BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning](https://arxiv.org/abs/2310.10357) AAAI 2024

- [LLM-MCTS: Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://arxiv.org/abs/2305.14078) NeurIPS 2023

- [Hivt: Hierarchical vector transformer for multi-agent motion prediction](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf) CVPR 2022 [Zikang Zhou, agent-centric, motion prediction]

- [QCNet: Query-Centric Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.pdf) [[Notes](paper_notes/qcnet.md)] CVPR 2023 [Zikang Zhou, scene-centric, motion prediction]

## 2024-03 (11)

- [Genie: Generative Interactive Environments](https://arxiv.org/abs/2402.15391) [[Notes](paper_notes/genie.md)] [DeepMind, World Model]

- [DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving](https://arxiv.org/abs/2309.09777) [[Notes](paper_notes/drive_dreamer.md)] [Jiwen Lu, World Model]

- [WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens](https://arxiv.org/abs/2401.09985) [[Notes](paper_notes/world_dreamer.md)] [Jiwen Lu, World Model]

- [VideoPoet: A Large Language Model for Zero-Shot Video Generation](https://arxiv.org/abs/2312.14125) [Like sora, but LLM, NOT world model]

- [Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2304.08818) [[Notes](paper_notes/video_ldm.md)] CVPR 2023 [Sanja, Nvidia, VideoLDM, Video prediction]

- [Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos](https://arxiv.org/abs/2206.11795) NeurIPS 2022 [[Notes](paper_notes/vpt.md)] [OpenAI]

- [MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge](https://arxiv.org/abs/2206.08853) NeurIPS 2022 [NVidia, Outstanding paper award]

- [Humanoid Locomotion as Next Token Prediction](https://arxiv.org/abs/2402.19469) [[Notes](paper_notes/locomotion_next_token_pred.md)] [Berkeley, EAI]

- [RPT: Robot Learning with Sensorimotor Pre-training](https://arxiv.org/abs/2306.10007) [[Notes](paper_notes/rpt.md)] CoRL 2023 Oral [Berkeley, EAI]

- [MVP: Real-World Robot Learning with Masked Visual Pre-training](https://arxiv.org/abs/2210.03109) [[Notes](paper_notes/mvp.md)] CoRL 2022 [Berkeley, EAI]

- [BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning](https://arxiv.org/abs/2202.02005) [[Notes](paper_notes/bc_z.md)] CoRL 2021 [Eric Jang, 1X]

- [GenAD: Generalized Predictive Model for Autonomous Driving](https://arxiv.org/abs/2403.09630) [[Notes](paper_notes/genad.md)] CVPR 2024

- [HG-DAgger: Interactive Imitation Learning with Human Experts](https://arxiv.org/abs/1810.02890) [DAgger]

- [DriveGAN: Towards a Controllable High-Quality Neural Simulation](https://arxiv.org/abs/2104.15060) [[Notes](paper_notes/drive_gan.md)] CVPR 2021 oral [Nvidia, Sanja]

- [VideoGPT: Video Generation using VQ-VAE and Transformers](https://arxiv.org/abs/2104.10157) [[Notes](paper_notes/videogpt.md)] [Pieter Abbeel]

- [LLM, Vision Tokenizer and Vision Intelligence, by Lu Jiang](https://mp.weixin.qq.com/s/Hamz5XMT1tSZHKdPaCBTKg) [[Notes](paper_notes/llm_vision_intel.md)] [Interview Lu Jiang]

- [AV2.0: Reimagining an autonomous vehicle](https://arxiv.org/abs/2108.05805) [[Notes](paper_notes/av20.md)] [Wayve, Alex Kendall]

- [Simulation for E2E AD](https://www.youtube.com/watch?v=8fivoXbT1Ao&ab_channel=Wayve) [Wayve, Tech Sharing, E2E]

- [E2E lateral planning](https://blog.comma.ai/end-to-end-lateral-planning/) [Comma.ai, E2E planning]

- [Learning and Leveraging World Models in Visual Representation Learning](https://arxiv.org/abs/2403.00504) [LeCun, JEPA series]

- [LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models](https://arxiv.org/abs/2312.00785) [Large Vision Models, Jitendra Malik]

- [LWM: World Model on Million-Length Video And Language With RingAttention](https://arxiv.org/abs/2402.08268) [Pieter Abbeel]

- [OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving](https://arxiv.org/abs/2311.16038) [Jiwen Lu, World Model]

- [GenAD: Generative End-to-End Autonomous Driving](https://arxiv.org/abs/2402.11502)

- [TCP: Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline](https://arxiv.org/abs/2206.08129) NeurIPS 2022 [E2E planning, Hongyang]

- [Transfuser: Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https://arxiv.org/abs/2104.09224) CVPR 2021 [E2E planning, Geiger]

- [Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving](https://arxiv.org/abs/2310.01957) [Wayve, LLM + AD]

- [LingoQA: Video Question Answering for Autonomous Driving](https://arxiv.org/abs/2312.14115) [Wayve, LLM + AD]

- [Panacea: Panoramic and Controllable Video Generation for Autonomous Driving](https://arxiv.org/abs/2311.16813) CVPR 2024 [Megvii]

- [PlanT: Explainable Planning Transformers via Object-Level Representations](https://arxiv.org/abs/2210.14222) CoRL 2022

- [Scene as Occupancy](https://arxiv.org/abs/2306.02851) ICCV 2023

- [AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes](https://arxiv.org/abs/2305.10430) [Baidu]

- [The Shift from Models to Compound AI Systems](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/)

- [Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach](https://arxiv.org/abs/2108.08265) ICCV 2021

- [Learning from All Vehicles](https://arxiv.org/abs/2203.11934) CVPR 2022

- [LBC: Learning by Cheating](https://arxiv.org/abs/1912.12294) CoRL 2019

- [Learning to drive from a world on rails](https://arxiv.org/abs/2105.00636) ICCV 2021 oral [Philipp Krähenbühl]

- [Learning from All Vehicles](https://arxiv.org/abs/2203.11934) CVPR 2022 [Philipp Krähenbühl]

- [VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning](https://arxiv.org/abs/2402.13243) [Horizon]

- [VQ-VAE: Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937) NeurIPS 2017 [Image Tokenizer]

- [VQ-GAN: Taming Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2012.09841) CVPR 2021 [Image Tokenizer]

- [ViT-VQGAN: Vector-quantized Image Modeling with Improved VQGAN](https://arxiv.org/abs/2110.04627) ICLR 2022 [Image Tokenizer]

- [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/abs/2202.04200) CVPR 2022 [LLM, non-autoregressive]

- [MAGVIT: Masked Generative Video Transformer](https://arxiv.org/abs/2212.05199) CVPR 2023 highlight [Video Tokenizer]

- [MAGVIT-v2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation](https://arxiv.org/abs/2310.05737) ICLR 2024 [Video Tokenizer]

- [Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models](https://arxiv.org/abs/2402.17177) [Reverse Engineering of Sora]

- [GLaM: Efficient Scaling of Language Models with Mixture-of-Experts](https://arxiv.org/abs/2112.06905) ICML 2022 [MoE, LLM]

- [Lifelong Language Pretraining with Distribution-Specialized Experts](https://arxiv.org/abs/2305.12281) ICML 2023 [MoE, LLM]

- [DriveLM: Drive on Language](https://arxiv.org/abs/2312.14150) [Hongyang Li]

- [MotionLM: Multi-Agent Motion Forecasting as Language Modeling](https://arxiv.org/abs/2309.16534) ICCV 2023 [Waymo, LLM + AD]

- [AD-MLP: Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes](https://arxiv.org/abs/2305.10430) [No perception]

- CubeLLM: align 2D/3D with language

- EmerNeRF: ICLR 2024

- A Language Agent for Autonomous Driving

- [Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal]

- [DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation](https://arxiv.org/abs/2403.06845)

- [DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving](https://arxiv.org/abs/2405.04390) CVPR 2024 [Zheng Zhu]

- [Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond](https://arxiv.org/abs/2405.03520) [Zheng Zhu]

## 2024-02 (7)

- [End-to-end Autonomous Driving: Challenges and Frontiers](https://arxiv.org/abs/2306.16927) [[Notes](paper_notes/e2e_review_hongyang.md)] [Hongyang Li, Shanghai AI labs]

- [DriveVLM: The convergence of Autonomous Driving and Large Vision-Language Models](https://arxiv.org/abs/2402.12289) [[Notes](paper_notes/drivevlm.md)] [Hang Zhao]

- [DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model](https://arxiv.org/abs/2310.01412) [[Notes](paper_notes/drivegpt4.md)] [HKU]

- [GAIA-1: A Generative World Model for Autonomous Driving](https://arxiv.org/abs/2309.17080) [[Notes](paper_notes/gaia_1.md)] [Wayve, vision foundation model]

- [ADriver-I: A General World Model for Autonomous Driving](https://arxiv.org/abs/2311.13549) [[Notes](paper_notes/adriver_i.md)] [Megvii, Xiangyu]

- [Drive-WM: Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving](https://arxiv.org/abs/2311.17918) [[Notes](paper_notes/drive_wm.md)]

- [X]() [[Notes](paper_notes/x.md)] [E2E planning]

## 2023-12 (4)

- [ChatGPT for Robotics: Design Principles and Model Abilities](https://arxiv.org/abs/2306.17582) [[Notes](paper_notes/prompt_craft.md)] [Microsoft, LLM for robotics]

- [RoboVQA: Multimodal Long-Horizon Reasoning for Robotics](https://arxiv.org/abs/2311.00899) [[Notes](paper_notes/robovqa.md)] [Google DeepMind, LLM for robotics]

- [ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application](https://ieeexplore.ieee.org/document/10235949) [Microsoft Robotics]

- [GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration](https://arxiv.org/abs/2311.12015) [[Notes](paper_notes/gpt4v_robotics.md)] [LLM for robotics, Microsoft Robotics]

- [LLM-Brain: LLM as A Robotic Brain: Unifying Egocentric Memory and Control](https://arxiv.org/abs/2304.09349) [[Notes](paper_notes/llm_brain.md)]

- [Voyager: An Open-Ended Embodied Agent with Large Language Models](https://arxiv.org/abs/2305.16291) [[Notes](paper_notes/voyager.md)] [Reasoning Critique, Linxi Jim Fan]

## 2023-09 (3)

- [RetNet: Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/abs/2307.08621) [[Notes](paper_notes/retnet.md)] [MSRA]

- [Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention](https://arxiv.org/abs/2006.16236) [[Notes](paper_notes/transformers_are_rnns.md)] ICML 2020 [Linear attention]

- [AFT: An Attention Free Transformer](https://arxiv.org/abs/2105.14103) [[Notes](paper_notes/aft.md)] [Apple]

## 2023-08 (3)

- [RT-1: Robotics Transformer for Real-World Control at Scale](https://arxiv.org/abs/2212.06817) [[Notes](paper_notes/rt1.md)] [DeepMind]

- [RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control](https://robotics-transformer2.github.io/assets/rt2.pdf) [[Notes](paper_notes/rt2.md)] [DeepMind, end-to-end visuomotor]

- [RWKV: Reinventing RNNs for the Transformer Era](https://arxiv.org/abs/2305.13048) [[Notes](paper_notes/rwkv.md)]

## 2023-07 (6)

- [MILE: Model-Based Imitation Learning for Urban Driving](https://arxiv.org/abs/2210.07729) [[Notes](paper_notes/mile.md)] NeurIPS 2022 [Alex Kendall]

- [PaLM-E: An embodied multimodal language model](https://arxiv.org/abs/2303.03378) [[Notes](paper_notes/palm_e.md)] [Google Robotics]

- [VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models](https://voxposer.github.io/voxposer.pdf) [[Notes](paper_notes/voxposer.md)] [Feifei Li]

- [CaP: Code as Policies: Language Model Programs for Embodied Control](https://arxiv.org/abs/2209.07753) [[Notes](paper_notes/cap.md)] [[Project](https://code-as-policies.github.io/)]

- [ProgPrompt: Generating Situated Robot Task Plans using Large Language Models](https://arxiv.org/abs/2209.11302) ICRA 2023

- [TidyBot: Personalized Robot Assistance with Large Language Models](https://arxiv.org/abs/2305.05658) [[Notes](paper_notes/tidybot.md)] [[Project](https://tidybot.cs.princeton.edu/)]

- [SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances](https://arxiv.org/abs/2204.01691) [[Notes](paper_notes/saycan.md)] [[Project](https://say-can.github.io/)]

## 2023-06 (5)

- [End-to-end review by Shanghai AI Labs](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)

- [Pix2seq v2: A Unified Sequence Interface for Vision Tasks](https://arxiv.org/abs/2206.07669) [[Notes](paper_notes/pix2seq_v2.md)] NeurIPS 2022 [Geoffrey Hinton]

- 🦩 [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198) [[Notes](paper_notes/flamingo.md)] NeurIPS 2022 [DeepMind]

- 😼 [Gato: A Generalist Agent](https://arxiv.org/abs/2205.06175) [[Notes](paper_notes/gato.md)] TMLR 2022 [DeepMind]

- [BC-SAC: Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios](https://arxiv.org/abs/2212.11419) [[Notes](paper_notes/bc_sac.md)] NeurIPS 2022 [Waymo]

- [MGAIL-AD: Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving](https://arxiv.org/abs/2210.09539) [[Notes](paper_notes/mgail_ad.md)] IROS 2022 [Waymo]

## 2023-05 (7)

- [SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving](https://arxiv.org/abs/2303.09551) [[Notes](paper_notes/surroundocc.md)] [Occupancy Network, Wei Yi, Jiwen Lu]

- [Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving](https://arxiv.org/abs/2304.14365) [[Notes](paper_notes/occ3d.md)] [Occupancy Network, Zhao Hang]

- [Occupancy Networks: Learning 3D Reconstruction in Function Space](https://arxiv.org/abs/1812.03828) CVPR 2019 [[Notes](paper_notes/occupancy_networks.md)] [Andreas Geiger]

- [OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction](https://arxiv.org/abs/2304.05316) [Occupancy Network, PhiGent]

- [Pix2seq: A Language Modeling Framework for Object Detection](https://arxiv.org/abs/2109.10852) [[Notes](paper_notes/pix2seq.md)] ICLR 2022 [Geoffrey Hinton]

- [VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks](https://arxiv.org/abs/2305.11175) [[Notes](paper_notes/vision_llm.md)] [Jifeng Dai]

- [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/abs/2303.17580) [[Notes](paper_notes/hugging_gpt.md)]

## 2023-04 (1)

- [UniAD: Planning-oriented Autonomous Driving](https://arxiv.org/abs/2212.10156) [[Notes](paper_notes/uniad.md)] CVPR 2023 best paper [BEV, e2e, Hongyang Li]

## 2023-03 (5)

- [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) [[Notes](paper_notes/gpt4.md)] [OpenAI, GPT]

- [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](https://arxiv.org/abs/2303.03991) [[Notes](paper_notes/openoccupancy.md)] [Occupancy Network, Jiwen Lu]

- [VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion](https://arxiv.org/abs/2302.12251) [[Note](paper_notes/voxformer.md)] CVPR 2023 highlight [Occupancy Network, Nvidia]

- [MonoScene: Monocular 3D Semantic Scene Completion](https://arxiv.org/abs/2112.00726) CVPR 2022 [[Notes](paper_notes/monoscene.md)] [Occupancy Network, single cam]

- [CoReNet: Coherent 3D scene reconstruction from a single RGB image](https://arxiv.org/abs/2004.12989) [[Notes](paper_notes/corenet.md)] ECCV 2020 oral

## 2023-02 (4)

- [Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning](https://arxiv.org/abs/2211.04325) [[Notes](paper_notes/out_of_data.md)] [Epoch.ai industry report]

- [Codex: Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) [[Notes](paper_notes/codex.md)] [GPT, OpenAI]

- [InstructGPT: Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) [[Notes](paper_notes/instructgpt.md)] [GPT, OpenAI]

- [TPVFormer: Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction](https://arxiv.org/abs/2302.07817) [[Notes](paper_notes/tpvformer.md)] CVPR 2023 [Occupancy Network, Jiwen Lu]

## 2023-01 (2)

- [PPGeo: Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling](https://arxiv.org/abs/2301.01006) [[Notes](paper_notes/ppgeo.md)] ICLR 2023

- [nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles](https://arxiv.org/abs/2106.11810) [[Notes](paper_notes/nuplan.md)]

## 2022-11 (1)

- [M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction](https://arxiv.org/abs/2202.11884) [[Notes](paper_notes/m2i.md)] CVPR 2022

## 2022-10 (1)

- [Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe](https://arxiv.org/abs/2209.05324) [[Notes](paper_notes/delving_bev.md)] [PJLab]

## 2022-09 (3)

- [ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries](https://arxiv.org/abs/2208.01582) [[Notes](paper_notes/vip3d.md)] [BEV, perception + prediction, Hang Zhao]

- [MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction](https://arxiv.org/abs/2208.14437) [[Notes](paper_notes/maptr.md)] [Horizon, BEVNet]

- [StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving](https://arxiv.org/abs/2206.00991) ICRA 2022

- [MOTR: End-to-End Multiple-Object Tracking with Transformer](https://arxiv.org/abs/2105.03247) ECCV 2022 [Megvii, MOT]

- [Anchor DETR: Query Design for Transformer-Based Object Detection](https://arxiv.org/abs/2109.07107) [[Notes](paper_notes/anchor_detr.md)] AAAI 2022 [Megvii]

## 2022-08 (1)

- [HOME: Heatmap Output for future Motion Estimation](https://arxiv.org/abs/2105.10968) [[Notes](paper_notes/home.md)] ITSC 2021 [behavior prediction, Huawei Paris]

## 2022-07 (8)

- [PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark](https://arxiv.org/abs/2203.11089) [[Notes](paper_notes/persformer.md)] [BEVNet, lane line]

- [VectorMapNet: End-to-end Vectorized HD Map Learning](https://arxiv.org/abs/2206.08920) [[Notes](paper_notes/vectormapnet.md)] [BEVNet, LLD, Hang Zhao]

- [PETR: Position Embedding Transformation for Multi-View 3D Object Detection](https://arxiv.org/abs/2203.05625) [[Notes](paper_notes/petr.md)] ECCV 2022 [BEVNet]

- [PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images](https://arxiv.org/abs/2206.01256) [[Notes](paper_notes/petrv2.md)] [BEVNet, MegVii]

- [M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation](https://arxiv.org/abs/2204.05088) [[Notes](paper_notes/m2bev.md)] [BEVNet, nvidia]

- [BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection](https://arxiv.org/abs/2206.10092) [[Notes](paper_notes/bevdepth.md)] [BEVNet, NuScenes SOTA, Megvii]

- [CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation](https://arxiv.org/abs/2205.02833) [[Notes](paper_notes/cvt.md)] CVPR 2022 oral [UTAustin, Philipp]

- [Wayformer: Motion Forecasting via Simple & Efficient Attention Networks](https://arxiv.org/abs/2207.05844) [[Notes](paper_notes/wayformer.md)] [Behavior prediction, Waymo]

## 2022-06 (3)

- [BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection](https://arxiv.org/abs/2203.17054) [[Notes](paper_notes/bevdet4d.md)] [BEVNet]

- [BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving](https://arxiv.org/abs/2205.09743) [[Notes](paper_notes/beverse.md)] [Jiwen Lu, BEVNet, perception + prediction]

- [BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation](https://arxiv.org/abs/2205.13542) [[Notes](paper_notes/bevfusion.md)] [BEVNet, Han Song]

## 2022-03 (1)

- [BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers](https://arxiv.org/abs/2203.17270) [[Notes](paper_notes/bevformer.md)] ECCV 2022 [BEVNet, Hongyang Li, Jifeng Dai]

## 2022-02 (1)

- [TNT: Target-driveN Trajectory Prediction](https://arxiv.org/abs/2008.08294) [[Notes](paper_notes/tnt.md)] CoRL 2020 [prediction, Waymo, Hang Zhao]

- [DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets](https://arxiv.org/abs/2108.09640) [[Notes](paper_notes/dense_tnt.md)] ICCV 2021 [prediction, Waymo, 1st place winner WOMD]

## 2022-01 (1)

- [Manydepth: The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth](https://arxiv.org/abs/2104.14540) [[Notes](paper_notes/manydepth.md)] CVPR 2021 [monodepth, Niantic]

- [DEKR: Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression](https://arxiv.org/abs/2104.02300) [[Notes](paper_notes/dekr.md)] CVPR 2021

## 2021-12 (5)

- [BN-FFN-BN: Leveraging Batch Normalization for Vision Transformers](https://openaccess.thecvf.com/content/ICCV2021W/NeurArch/papers/Yao_Leveraging_Batch_Normalization_for_Vision_Transformers_ICCVW_2021_paper.pdf) [[Notes](paper_notes/bn_ffn_bn.md)] ICCVW 2021 [BN, transformers]

- [PowerNorm: Rethinking Batch Normalization in Transformers](https://arxiv.org/abs/2003.07845) [[Notes](paper_notes/powernorm.md)] ICML 2020 [BN, transformers]

- [MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction](https://arxiv.org/abs/2111.14973) [[Notes](paper_notes/multipath++.md)] ICRA 2022 [Waymo, behavior prediction]

- [BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View](https://arxiv.org/abs/2112.11790) [[Notes](paper_note/bevdet.md)]

- [Translating Images into Maps](https://arxiv.org/abs/2110.00966) [[Notes](paper_notes/translating_images_to_maps.md)] ICRA 2022 [BEVNet, transformers]

## 2021-11 (4)

- [DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries](https://arxiv.org/abs/2110.06922) [[Notes](paper_notes/detr3d.md)] CoRL 2021 [BEVNet, transformers]

- [Robust-CVD: Robust Consistent Video Depth Estimation](https://arxiv.org/abs/2012.05901) CVPR 2021 oral [[website](https://robust-cvd.github.io/)]

- [MAE: Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) [[Notes](paper_notes/mae.md)] [Kaiming He, unsupervised learning]

- [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886) [[Notes](paper_notes/simmim.md)] [MSRA, unsupervised learning, MAE]

- [iBOT: Image BERT Pre-Training with Online Tokenizer](https://arxiv.org/abs/2111.07832)

## 2021-10 (3)

- [STSU: Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images](https://arxiv.org/abs/2110.01997) [[Notes](paper_notes/stsu.md)] ICCV 2021 [BEV feat stitching, Luc Van Gool]

- [PanopticBEV: Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images](https://arxiv.org/abs/2108.03227) [[Notes](paper_notes/panoptic_bev.md)] RAL 2022 [BEVNet, vertical/horizontal features]

- [NEAT: Neural Attention Fields for End-to-End Autonomous Driving](https://arxiv.org/abs/2109.04456) [[Notes](paper_notes/neat.md)] ICCV 2021 [[supplementary](http://www.cvlibs.net/publications/Chitta2021ICCV_supplementary.pdf)] [BEVNet]

## 2021-09 (11)

- [DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?](https://arxiv.org/abs/2108.06417) [[Notes](paper_notes/dd3d.md)] ICCV 2021 [mono3D, Toyota]

- [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070) [[Notes](paper_notes/efficientdet.md)] CVPR 2020 [BiFPN, Tesla AI day]

- [PnPNet: End-to-End Perception and Prediction with Tracking in the Loop](https://arxiv.org/abs/2005.14711) [[Notes](paper_notes/pnpnet.md)] CVPR 2020 [Uber ATG]

- [MP3: A Unified Model to Map, Perceive, Predict and Plan](https://arxiv.org/abs/2101.06806) [[Notes](paper_notes/mp3.md)] CVPR 2021 [Uber, planning]

- [BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning](http://arxiv.org/abs/2110.04931) [[Notes](paper_notes/bevnet_sdca.md)] ICCV 2021 [BEVNet, surveillance]

- [LiDAR R-CNN: An Efficient and Universal 3D Object Detector](https://arxiv.org/abs/2103.15297) [[Notes](paper_notes/lidar_rcnn.md)] CVPR 2021 [TuSimple, Naiyan Wang]

- [Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches](https://arxiv.org/abs/2102.05897) [[Notes](paper_notes/corner_case_vision_arxiv.md)] [corner cases]

- [Systematization of Corner Cases for Visual Perception in Automated Driving](https://ieeexplore.ieee.org/document/9304789) [[Notes](paper_notes/corner_case_vision_iv.md)] IV 2020 [corner cases]

- [An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving](https://arxiv.org/abs/2103.03678) [[Notes](paper_notes/corner_case_multisensor.md)] IV 2021 [corner cases]

- [PYVA: Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation](https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Projecting_Your_View_Attentively_Monocular_Road_Scene_Layout_Estimation_via_CVPR_2021_paper.html) [[Notes](paper_notes/pyva.md)] CVPR 2021 [[Supplementary](https://openaccess.thecvf.com/content/CVPR2021/supplemental/Yang_Projecting_Your_View_CVPR_2021_supplemental.zip)] [BEVNet]

- [YOLOF: You Only Look One-level Feature](https://arxiv.org/abs/2103.09460) [[Notes](paper_notes/yolof.md)] CVPR 2021 [megvii]

- [Perceiving Humans: from Monocular 3D Localization to Social Distancing](https://arxiv.org/abs/2009.00984) [[Notes](paper_notes/perceiving_humans.md)] TITS 2021 [monoloco++]

- [PifPaf: Composite Fields for Human Pose Estimation](https://arxiv.org/abs/1903.06593) CVPR 2019

- [Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images](https://arxiv.org/abs/2108.03227) [BEVNet]

- [TransformerFusion: Monocular RGB Scene Reconstruction using Transformers](https://arxiv.org/abs/2107.02191)

- [Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Projecting_Your_View_Attentively_Monocular_Road_Scene_Layout_Estimation_via_CVPR_2021_paper.pdf) CVPR 2021

- [Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https://arxiv.org/abs/2104.09224) CVPR 2021

- [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152)

- [Probabilistic and Geometric Depth: Detecting Objects in Perspective](https://arxiv.org/abs/2107.14160) CoRL 2021

## 2021-08 (11)

- [EgoNet: Exploring Intermediate Representation for Monocular Vehicle Pose Estimation](https://arxiv.org/abs/2011.08464) [[Notes](paper_notes/egonet.md)] CVPR 2021 [mono3D]

- [MonoEF: Monocular 3D Object Detection: An Extrinsic Parameter Free Approach](https://arxiv.org/abs/2106.15796) [[Notes](paper_notes/monoef.md)] CVPR 2021 [mono3D]

- [GAC: Ground-aware Monocular 3D Object Detection for Autonomous Driving](https://arxiv.org/abs/2102.00690) [[Notes](paper_notes/gac.md)] RAL 2021 [mono3D]

- [FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection](https://arxiv.org/abs/2104.10956) [[Notes](paper_notes/fcos3d.md)] NeurIPS 2020 [mono3D, senseTime]

- [GUPNet: Geometry Uncertainty Projection Network for Monocular 3D Object Detection](https://arxiv.org/abs/2107.13774) [[Notes](paper_notes/gupnet.md)] ICCV 2021 [mono3D, Wanli Ouyang]

- [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) [[Notes](paper_notes/darts.md)] ICLR 2019 [VGG author]

- [FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search](https://arxiv.org/abs/1812.03443) [[Notes](paper_notes/fbnet.md)] CVPR 20219 [DARTS]

- [FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions](https://arxiv.org/abs/2004.05565) CVPR 2020

- [FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining](https://arxiv.org/abs/2006.02049) CVPR 2021

- [Perceiver: General Perception with Iterative Attention](https://arxiv.org/abs/2103.03206) [[Notes](paper_notes/perceiver.md)] ICML 2021 [transformers, multimodal]

- [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) [[Notes](paper_notes/perceiver_io.md)]

- [PillarMotion: Self-Supervised Pillar Motion Learning for Autonomous Driving](https://arxiv.org/abs/2104.08683)  [[Notes](paper_notes/pillar_motion.md)] CVPR 2021 [Qcraft, Alan Yuille]

- [SimTrack: Exploring Simple 3D Multi-Object Tracking for Autonomous Driving](https://arxiv.org/abs/2108.10312) [[Notes](paper_notes/simtrack.md)] ICCV 2019 [QCraft, Alan Yuille]

## 2021-07 (1)

- [HDMapNet: An Online HD Map Construction and Evaluation Framework](https://arxiv.org/abs/2107.06307) [[Notes](paper_notes/hdmapnet.md)] CVPR 2021 workshop [youtube video only, Li Auto]

## 2021-06 (2)

- [FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras](https://arxiv.org/abs/2104.10490) [[Notes](paper_notes/fiery.md)] ICCV 2021 [BEVNet, perception + prediction]

- [Baidu's CNN seg](https://zhuanlan.zhihu.com/p/35034215) [[Notes](paper_notes/cnn_seg.md)]

## 2021-04 (5)

- [Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation](https://arxiv.org/abs/2012.15175) [[Notes](paper_notes/swahr.md)] CVPR 2021 [megvii] 

- [CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark](https://arxiv.org/abs/1812.00324) CVPR 2019

- [The Overlooked Elephant of Object Detection: Open Set](https://openaccess.thecvf.com/content_WACV_2020/html/Dhamija_The_Overlooked_Elephant_of_Object_Detection_Open_Set_WACV_2020_paper.html) WACV 2021

- [Class-Agnostic Object Detection](https://arxiv.org/abs/2011.14204) WACV 2021

- [OWOD: Towards Open World Object Detection](https://arxiv.org/abs/2103.02603) [[Notes](paper_notes/owod.md)] CVPR 2021 oral

- [FsDet: Frustratingly Simple Few-Shot Object Detection](https://arxiv.org/abs/2003.06957) ICML 2020

- [MonoFlex: Objects are Different: Flexible Monocular 3D Object Detection](https://arxiv.org/abs/2104.02323) [[Notes](paper_notes/monoflex.md)] CVPR 2021 [mono3D, Jiwen Lu, cropped]

- [monoDLE: Delving into Localization Errors for Monocular 3D Object Detection](https://arxiv.org/abs/2103.16237) [[Notes](paper_notes/monodle.md)] CVPR 2021 [mono3D]

- [Exploring 2D Data Augmentation for 3D Monocular Object Detection](https://arxiv.org/abs/2104.10786)

- [OCM3D: Object-Centric Monocular 3D Object Detection](https://arxiv.org/abs/2104.06041) [mono3D]

- [FSM: Full Surround Monodepth from Multiple Cameras](https://arxiv.org/abs/2104.00152) [[Notes](paper_notes/fsm.md)] ICRA 2021 [monodepth, Xnet]

## 2021-03 (4)

- [CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection](https://arxiv.org/abs/2103.01100) [[Notes](paper_notes/caddn.md)] CVPR 2021 oral [mono3D, BEVNet]

- [DSNT: Numerical Coordinate Regression with Convolutional Neural Networks](https://arxiv.org/abs/1801.07372) [[Notes](paper_notes/dsnt.md)] [differentiable spatial to numerical transform]

- [Soft-Argmax: Human pose regression by combining indirect part detection and contextual information](https://arxiv.org/abs/1710.02322)

- [INSTA-YOLO: Real-Time Instance Segmentation](https://arxiv.org/abs/2102.06777) [[Notes](paper_notes/insta_yolo.md)] ICML workshop 2020 [single stage instance segmentation]

- [CenterNet2: Probabilistic two-stage detection](https://arxiv.org/abs/2103.07461) [[Notes](paper_notes/centernet2.md)] [CenterNet, two-stage]

## 2021-01 (7)

- [Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection](https://arxiv.org/abs/2012.00257) [[Notes](paper_notes/confluence.md)] [NMS]

- [BoxInst: High-Performance Instance Segmentation with Box Annotations](https://arxiv.org/abs/2012.02310) [[Notes](paper_notes/boxinst.md)] CVPR 2021 [Chunhua Shen, Tian Zhi]

- [3DSSD: Point-based 3D Single Stage Object Detector](https://arxiv.org/abs/2002.10187) [[Notes](paper_notes/3dssd.md)] CVPR 2020

- [RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697) [[Notes](paper_notes/repvgg.md)] [Megvii, Xiangyu Zhang, ACNet]

- [ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks](https://arxiv.org/abs/1908.03930) [[Notes](paper_notes/acnet.md)] ICCV 2019

- [BEV-Feat-Stitching: Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera](https://arxiv.org/abs/2012.03040) [[Notes](paper_notes/bev_feat_stitching.md)] [BEVNet, mono3D, Luc Van Gool]

- [PSS: Object Detection Made Simpler by Eliminating Heuristic NMS](https://arxiv.org/abs/2101.11782) [[Notes](paper_notes/pss.md)] [Transformer, DETR]

## 2020-12 (17)

- [DeFCN: End-to-End Object Detection with Fully Convolutional Network](https://arxiv.org/abs/2012.03544) [[Notes](paper_notes/defcn.md)] [Transformer, DETR]

- [OneNet: End-to-End One-Stage Object Detection by Classification Cost](https://arxiv.org/abs/2012.05780) [[Notes](paper_notes/onenet.md)] [Transformer, DETR]

- [Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles](http://driving.stanford.edu/papers/ICRA2011.pdf) [[Notes](paper_notes/tfl_stanford.md)] ICRA 2011 [traffic light, Sebastian Thrun]

- [Towards lifelong feature-based mapping in semi-static environments](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43966.pdf) [[Notes](paper_notes/lifelong_feature_mapping_google.md)] ICRA 2016

- [How to Keep HD Maps for Automated Driving Up To Date](http://www.lewissoft.com/pdf/ICRA2020/1484.pdf) [[Notes](paper_notes/keep_hd_maps_updated_bmw.md)] ICRA 2020 [BMW]

- [Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection](https://arxiv.org/abs/2011.12885) [[Notes](paper_notes/gfocalv2.md)] CVPR 2021 [focal loss]

- [Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning](http://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w9/Milz_Visual_SLAM_for_CVPR_2018_paper.pdf) [[Notes](paper_notes/vslam_for_ad.md)] CVPR 2018 workshop

- [Centroid Voting: Object-Aware Centroid Voting for Monocular 3D Object Detection](https://arxiv.org/abs/2007.09836) [[Notes](paper_notes/centroid_voting.md)] IROS 2020 [mono3D, geometry + appearance = distance]

- [Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras](https://arxiv.org/abs/2003.03759) [[Notes](paper_notes/mono3d_fisheye.md)] [GM Israel, mono3D]

- [DeepPS: Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset](https://cslinzhang.github.io/deepps/parkingslot.pdf) TIP 2018 [Parking slot detection, PS2.0 dataset]

- [PSDet: Efficient and Universal Parking Slot Detection](https://arxiv.org/abs/2005.05528) [[Notes](paper_notes/psdet.md)] IV 2020 [Zongmu, Parking slot detection]

- [PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning](https://arxiv.org/abs/2001.00138) [[Notes](paper_notes/patdnn.md)] ASPLOS 2020 [pruning]

- [Scaled-YOLOv4: Scaling Cross Stage Partial Network](https://arxiv.org/abs/2011.08036) [[Notes](paper_notes/scaled_yolov4.md)] [yolo]

- [Yolov5 by Ultralytics](https://github.com/ultralytics/yolov5) [[Notes](paper_notes/yolov5.md)] [yolo, spatial2channel]

- [PP-YOLO: An Effective and Efficient Implementation of Object Detector](https://arxiv.org/abs/2007.12099) [[Notes](paper_notes/pp_yolo.md)] [yolo, paddle-paddle, baidu]

- [PointPainting: Sequential Fusion for 3D Object Detection](https://arxiv.org/pdf/1911.10150.pdf) [[Notes](paper_notes/point_painting.md)] [nuscenece]

- [MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps](https://arxiv.org/abs/2003.06754) [[Notes](paper_notes/motionnet.md)] CVPR 2020 [Unseen moving objects, BEV]

- [Locating Objects Without Bounding Boxes](https://arxiv.org/abs/1806.07564) [[Notes](paper_notes/objects_without_bboxes.md)] CVPR 2019 [weighted Haussdorf distance, NMS-free]

## 2020-11 (18)

- [TSP: Rethinking Transformer-based Set Prediction for Object Detection](https://arxiv.org/abs/2011.10881) [[Notes](paper_notes/tsp.md)] ICCV 2021 [DETR, transformers, Kris Kitani]

- [Sparse R-CNN: End-to-End Object Detection with Learnable Proposals](https://arxiv.org/abs/2011.12450) [[Notes](paper_notes/sparse_rcnn.md)] CVPR 2020 [DETR, Transformer]

- [Unsupervised Monocular Depth Learning in Dynamic Scenes](https://arxiv.org/abs/2010.16404) [[Notes](paper_notes/learn_depth_and_motion.md)] CoRL 2020 [LearnK improved ver, Google]

- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] ICML 2020 [Mono3D, pairwise relationship]

- [Argoverse: 3D Tracking and Forecasting with Rich Maps](https://arxiv.org/abs/1911.02620) [[Notes](paper_notes/argoverse.md)] CVPR 2019 [HD maps, dataset, CV lidar]

- [The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes](https://arxiv.org/abs/1903.01568) [[Notes](paper_notes/h3d.md)] ICRA 2019

- [Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection](https://arxiv.org/abs/2006.07864) CVPRW 2020 [dataset, Daimler, mono3D]

- [NYC3DCars: A Dataset of 3D Vehicles in Geographic Context](https://www.cs.cornell.edu/~snavely/publications/papers/nyc3dcars_iccv13.pdf) ICCV 2013

- [Towards Fully Autonomous Driving: Systems and Algorithms](https://www.ri.cmu.edu/wp-content/uploads/2017/12/levinson-iv2011.pdf) IV 2011

- [Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding](https://arxiv.org/abs/2005.13423) [[Notes](paper_notes/center3d.md)] [mono3D, LID+DepJoint]

- [ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection](https://arxiv.org/abs/2003.00529) AAAI 2020 oral [mono3D] 

- [CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection](https://arxiv.org/abs/2011.04841) [[Notes](paper_notes/centerfusion.md)] WACV 2021 [early fusion, camera, radar]

- [3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation](https://arxiv.org/abs/2011.01535) [[Notes](paper_notes/3d_lanenet+.md)] NeurIPS 2020 workshop [GM Israel, 3D LLD]

- [LSTR: End-to-end Lane Shape Prediction with Transformers](https://arxiv.org/abs/2011.04233) [[Notes](paper_notes/lstr.md)] WACV 2021 [LLD, transformers]

- [PIXOR: Real-time 3D Object Detection from Point Clouds](https://arxiv.org/abs/1902.06326) [[Notes](paper_notes/pixor.md)] CVPR 2018 (birds eye view)

- [HDNET/PIXOR++: Exploiting HD Maps for 3D Object Detection](http://proceedings.mlr.press/v87/yang18b/yang18b.pdf) [[Notes](paper_notes/pixor++.md)] CoRL 2018

- [CPNDet: Corner Proposal Network for Anchor-free, Two-stage Object Detection](https://arxiv.org/abs/2007.13816) ECCV 2020 [anchor free, two stage]

- [MVF: End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds](https://arxiv.org/abs/1910.06528) [[Notes](paper_notes/mvf.md)] CoRL 2019 [Waymo, VoxelNet 1st author]

- [Pillar-based Object Detection for Autonomous Driving](https://arxiv.org/abs/2007.10323) [[Notes](paper_notes/pillar_od.md)] ECCV 2020

- [Training-Time-Friendly Network for Real-Time Object Detection](https://arxiv.org/abs/1909.00700) AAAI 2020 [anchor-free, fast training]

- [Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies](https://arxiv.org/abs/2006.06091) [Review of autonomous stack, Yu Huang]

- [Dense Monocular Depth Estimation in Complex Dynamic Scenes](https://openaccess.thecvf.com/content_cvpr_2016/papers/Ranftl_Dense_Monocular_Depth_CVPR_2016_paper.pdf) CVPR 2016

- [Probabilistic Future Prediction for Video Scene Understanding](https://anthonyhu.github.io/research/probabilistic-future-prediction/)

- [AB3D: A Baseline for 3D Multi-Object Tracking](https://arxiv.org/abs/1907.03961) IROS 2020 [3D MOT]

- [Spatial-Temporal Relation Networks for Multi-Object Tracking](https://arxiv.org/abs/1904.11489) ICCV 2019 [MOT, feature location over time]

- [Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking](https://arxiv.org/abs/1802.09298) ICRA 2018 [MOT, IIT, 3D shape]

- [ST-3D: Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking](https://arxiv.org/abs/2004.09305) CVPR 2020 [Peilinag LI, author of VINS and S3DOT]

- [Augment Your Batch: Improving Generalization Through Instance Repetition](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf) CVPR 2020

- [RetinaTrack: Online Single Stage Joint Detection and Tracking](https://arxiv.org/abs/2003.13870) CVPR 2020 [MOT]

- [Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots](https://arxiv.org/abs/1912.12791)

- [Gradient Centralization: A New Optimization Technique for Deep Neural Networks](https://arxiv.org/abs/2004.01461) ECCV 2020 oral

- [Depth Completion via Deep Basis Fitting](https://arxiv.org/abs/1912.10336) WACV 2020

- [BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation](https://arxiv.org/abs/1907.10326) [monodepth, supervised]

- [The Edge of Depth: Explicit Constraints between Segmentation and Depth](https://arxiv.org/abs/2004.00171) CVPR 2020 [monodepth, Xiaoming Liu]

- [On the Continuity of Rotation Representations in Neural Networks](https://arxiv.org/abs/1812.07035) CVPR 2019 [rotational representation]

- [VDO-SLAM: A Visual Dynamic Object-aware SLAM System](https://arxiv.org/abs/2005.11052) IJRR 2020

- [Dynamic SLAM: The Need For Speed](https://arxiv.org/abs/2002.08584)

- [Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction](https://arxiv.org/abs/2004.10681) ECCV 2020

- [Traffic Light Mapping and Detection](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37259.pdf) [[Notes](paper_notes/tfl_mapping_google.md)] ICRA 2011 [traffic light, Google, Chris Urmson]

- [Traffic light recognition exploiting map and localization at every stage](https://web.yonsei.ac.kr/jksuhr/papers/Traffic%20light%20recognition%20exploiting%20map%20and%20localization%20at%20every%20stage.pdf) [[Notes](paper_notes/tfl_exploting_map_korea.md)] Expert Systems 2017 [traffic light, 鲜于明镐，徐在圭，郑浩奇]

- [Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars](https://arxiv.org/abs/1906.11886) [[Notes](paper_notes/tfl_lidar_map_building_brazil.md)]  IJCNN 2019 [traffic light, Espirito Santo Brazil]

## 2020-10 (14)

- [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383) [[Notes](paper_notes/tsm.md)] ICCV 2019 [Song Han, video, object detection]

- [WOD: Waymo Dataset: Scalability in Perception for Autonomous Driving: Waymo Open Dataset](https://arxiv.org/abs/1912.04838) [[Notes](paper_notes/wod.md)] CVPR 2020

- [Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection](https://arxiv.org/abs/2006.04388) [[Notes](paper_notes/gfocal.md)] NeurIPS 2020 [classification as regression]

- [A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection](https://arxiv.org/abs/2009.13592) NeurIPS 2020 spotlight

- [Rethinking the Value of Labels for Improving Class-Imbalanced Learning](https://arxiv.org/abs/2006.07529) NeurIPS 2020

- [RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1711.07752) [[Notes](paper_notes/rep_loss.md)] CVPR 2018 [crowd detection, Megvii]

- [Adaptive NMS: Refining Pedestrian Detection in a Crowd](https://arxiv.org/abs/1904.03629) [[Notes](paper_notes/adaptive_nms.md)] CVPR 2019 oral [crowd detection, NMS]

- [AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1807.08407) [[Notes](paper_notes/agg_loss.md)] ECCV 2018 [crowd detection]

- [CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions](https://arxiv.org/abs/2003.09163) [[Notes](paper_notes/crowd_det.md)] CVPR 2020 oral [crowd detection, Megvii, Earth mover's distance]

- [R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing](https://arxiv.org/abs/2003.12729) [[Notes](paper_notes/r2_nms.md)] CVPR 2020

- [Double Anchor R-CNN for Human Detection in a Crowd](https://arxiv.org/abs/1909.09998) [[Notes](paper_notes/double_anchor.md)] [head-body bundle]

- [Review: AP vs MR](paper_notes/ap_mr.md)

- [SKU110K: Precise Detection in Densely Packed Scenes](https://arxiv.org/abs/1904.00853) [[Notes](paper_notes/sku110k.md)] CVPR 2019 [crowd detection, no occlusion]

- [GossipNet: Learning non-maximum suppression](https://arxiv.org/abs/1705.02950) CVPR 2017 

- [TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation](https://arxiv.org/abs/1807.01438) ECCV 2018

- [Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels](https://arxiv.org/abs/2010.03506) GCPR 2020 [mono3D, Daniel Cremers, TUM]

- [CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection](https://arxiv.org/abs/2006.04080) [[Notes](paper_notes/cubifae_3d.md)] [mono3D, depth AE pretraining]

- [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) [[Notes](paper_notes/deformable_detr.md)] ICLR 2021 [Jifeng Dai, DETR]

- [ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) [[Notes](paper_notes/vit.md)] ICLR 2021

- [BYOL: Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) [self-supervised]

## 2020-09 (15)

- [SDFLabel: Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors](https://arxiv.org/abs/1911.11288) [[Notes](paper_notes/sdflabel.md)] CVPR 2020 oral [TRI, differentiable rendering]

- [DensePose: Dense Human Pose Estimation In The Wild](https://arxiv.org/abs/1802.00434) [[Notes](paper_notes/densepose.md)] CVPR 2018 oral [FAIR]

- [NOCS: Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation](https://arxiv.org/abs/1901.02970) CVPR 2019

- [monoDR: Monocular Differentiable Rendering for Self-Supervised 3D Object Detection](https://arxiv.org/abs/2009.14524) [[Notes](paper_notes/monodr.md)] ECCV 2020 [TRI, mono3D]

- [Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D](https://arxiv.org/abs/2008.05711) [[Notes](paper_notes/lift_splat_shoot.md)] ECCV 2020 [BEV-Net, Utoronto, Sanja Fidler]

- [Implicit Latent Variable Model for Scene-Consistent Motion Forecasting](https://arxiv.org/abs/2007.12036) ECCV 2020 [Uber ATG, Rachel Urtasun]

- [FISHING Net: Future Inference of Semantic Heatmaps In Grids](https://arxiv.org/abs/2006.09917) [[Notes](paper_notes/fishing_net.md)] CVPRW 2020 [BEV-Net, Mapping, Zoox]

- [VPN: Cross-view Semantic Segmentation for Sensing Surroundings](https://arxiv.org/abs/1906.03560) [[Notes](paper_notes/vpn.md)] RAL 2020 [Bolei Zhou, BEV-Net]

- [VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks](https://arxiv.org/abs/1804.02176) [[Notes](paper_notes/ved.md)] ICRA 2019 [BEV-Net]

- [Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View](https://arxiv.org/abs/2005.04078) [[Notes](paper_notes/cam2bev.md)] ITSC 2020 [BEV-Net] 

- [Learning to Look around Objects for Top-View Representations of Outdoor Scenes](https://arxiv.org/abs/1803.10870) [[Notes](paper_notes/learning_to_look_around_objects.md)] ECCV 2018 [BEV-Net, UCSD, Manmohan Chandraker]

- [A Parametric Top-View Representation of Complex Road Scenes](https://arxiv.org/abs/1812.06152) CVPR 2019 [BEV-Net, UCSD, Manmohan Chandraker]

- [FTM: Understanding Road Layout from Videos as a Whole](https://arxiv.org/abs/2007.00822) CVPR 2020 [BEV-Net, UCSD, Manmohan Chandraker]

- [KM3D-Net: Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training](https://arxiv.org/abs/2009.00764) [[Notes](paper_notes/km3d_net.md)] RAL 2021 [RTM3D, Peixuan Li]

- [InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving](https://arxiv.org/abs/2008.07008) [[Notes](paper_notes/instance_mot_seg.md)] IROS 2020 [motion segmentation]

- [MPV-Nets: Monocular Plan View Networks for Autonomous Driving](https://arxiv.org/abs/1905.06937) [[Notes](paper_notes/mpv_nets.md)] IROS 2019 [BEV-Net]

- [Class-Balanced Loss Based on Effective Number of Samples](https://arxiv.org/abs/1901.05555) [[Notes](paper_notes/class_balanced_loss.md)] CVPR 2019 [Focal loss authors]

- [Geometric Pretraining for Monocular Depth Estimation](http://lewissoft.com/pdf/ICRA2020/0035.pdf) [[Notes](paper_notes/geometric_pretraining.md)] ICRA 2020

- [Robust Traffic Light and Arrow Detection Using Digital Map with Spatial Prior Information for Automated Driving](https://www.mdpi.com/1424-8220/20/4/1181) [[Notes](paper_notes/tfl_robust_japan.md)] Sensors 2020 [traffic light, 金沢]

## 2020-08 (26)

- [Feature-metric Loss for Self-supervised Learning of Depth and Egomotion](https://arxiv.org/abs/2007.10603) [[Notes](paper_notes/feature_metric.md)] ECCV 2020 [feature-metric, local minima, monodepth]

- [Depth-VO-Feat: Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction](https://arxiv.org/abs/1803.03893) CVPR 2018 [feature-metric, monodepth]

- [MonoResMatch: Learning monocular depth estimation infusing traditional stereo knowledge](https://arxiv.org/abs/1904.04144) [[Notes](paper_notes/monoresmatch.md)] CVPR 2019 [monodepth, local minima, cheap stereo GT]

- [SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance](https://arxiv.org/abs/2007.06936) [[Notes](paper_notes/sgdepth.md)] ECCV 2020 [Moving objects]

- [Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding](https://arxiv.org/abs/1806.10556) ECCV 2018 [dynamic objects, rigid and dynamic motion]

- [Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding](https://arxiv.org/abs/1810.06125) TPAMI 2018

- [CC: Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](https://arxiv.org/abs/1805.09806) [[Notes](paper_notes/cc.md)] CVPR 2019

- [ObjMotionNet: Self-supervised Object Motion and Depth Estimation from Video](https://arxiv.org/abs/1912.04250) [[Notes](paper_notes/obj_motion_net.md)] CVPRW 2020 [object motion prediction, velocity prediction]

- [Instance-wise Depth and Motion Learning from Monocular Videos](https://arxiv.org/abs/1912.09351)

- [Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation](https://arxiv.org/abs/2006.04371)

- [Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues](https://arxiv.org/abs/2006.09876)

- [DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency](https://arxiv.org/abs/1809.01649) ECCV 2018

- [LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments](https://arxiv.org/abs/1807.05696) [mapping]

- [Road-SLAM: Road Marking based SLAM with Lane-level Accuracy](https://www.naverlabs.com/img/autonomousDriving/intelligence/dissertation/Road-SLAM_Road%20Marking%20based%20SLAM%20with%20Lane-level%20Accuracy.pdf) [[Notes](paper_notes/road_slam.md)] [HD mapping]

- [AVP-SLAM: Semantic Visual Mapping and Localization for Autonomous Vehicles in the Parking Lot](https://arxiv.org/abs/2007.01813) [[Notes](paper_notes/avp_slam.md)] IROS 2020 [Huawei, HD mapping, Tong Qin, VINS author, autonomous valet parking]

- [AVP-SLAM-Late-Fusion: Mapping and Localization using Semantic Road Marking with Centimeter-level Accuracy in Indoor Parking Lots](https://ieeexplore.ieee.org/abstract/document/8917529) [[Notes](paper_notes/avp_slam_late_fusion.md)] ITSC 2019

- [Lane markings-based relocalization on highway](https://ieeexplore.ieee.org/abstract/document/8917254) ITSC 2019

- [DeepRoadMapper: Extracting Road Topology from Aerial Images](https://openaccess.thecvf.com/content_ICCV_2017/papers/Mattyus_DeepRoadMapper_Extracting_Road_ICCV_2017_paper.pdf) [[Notes](paper_notes/deep_road_mapper.md)] ICCV 2017 [Uber ATG, NOT HD maps]

- [RoadTracer: Automatic Extraction of Road Networks from Aerial Images](https://openaccess.thecvf.com/content_cvpr_2018/papers/Bastani_RoadTracer_Automatic_Extraction_CVPR_2018_paper.pdf) CVPR 2018 [NOT HD maps]

- [PolyMapper: Topological Map Extraction From Overhead Images](https://arxiv.org/abs/1812.01497) [[Notes](paper_notes/polymapper.md)] ICCV 2019 [mapping, polygon, NOT HD maps]

- [HRAN: Hierarchical Recurrent Attention Networks for Structured Online Maps](https://openaccess.thecvf.com/content_cvpr_2018/papers/Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.pdf) [[Notes](paper_notes/hran.md)] CVPR 2018 [HD mapping, highway, polyline loss, Chamfer distance]

- [Deep Structured Crosswalk: End-to-End Deep Structured Models for Drawing Crosswalks](https://openaccess.thecvf.com/content_ECCV_2018/papers/Justin_Liang_End-to-End_Deep_Structured_ECCV_2018_paper.pdf) [[Notes](paper_notes/deep_structured_crosswalk.md)] ECCV 2018

- [DeepBoundaryExtractor: Convolutional Recurrent Network for Road Boundary Extraction](http://openaccess.thecvf.com/content_CVPR_2019/html/Liang_Convolutional_Recurrent_Network_for_Road_Boundary_Extraction_CVPR_2019_paper.html) [[Notes](paper_notes/deep_boundary_extractor.md)] CVPR 2019 [HD mapping, boundary, polyline loss]

- [DAGMapper: Learning to Map by Discovering Lane Topology](http://openaccess.thecvf.com/content_ICCV_2019/papers/Homayounfar_DAGMapper_Learning_to_Map_by_Discovering_Lane_Topology_ICCV_2019_paper.pdf) [[Notes](paper_notes/dagmapper.md)] ICCV 2019 [HD mapping, highway, forks and merges, polyline loss]

- [Sparse-HD-Maps: Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization](https://arxiv.org/abs/1908.03274) [[Notes](paper_notes/sparse_hd_maps.md)] IROS 2019 oral [Uber ATG, metadata, mapping, localization]

- [Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks](https://arxiv.org/abs/1803.06904) IEEE TGRS 2018

- [Monocular Localization with Vector HD Map (MLVHM): A Low-Cost Method for Commercial IVs](https://www.mdpi.com/1424-8220/20/7/1870/htm) Sensors 2020 [Tsinghua, 3D HD maps]

- [PatchNet: Rethinking Pseudo-LiDAR Representation](https://arxiv.org/abs/2008.04582) [[Notes](paper_notes/patchnet.md)] ECCV 2020 [SenseTime, Wanli Ouyang]

- [D4LCN: Learning Depth-Guided Convolutions for Monocular 3D Object Detection](https://arxiv.org/abs/1912.04799) [[Notes](paper_notes/d4lcn.md)] CVPR 2020 [mono3D]

- [MfS: Learning Stereo from Single Images](https://arxiv.org/abs/2008.01484) [[Notes](paper_notes/mfs.md)] ECCV 2020 [mono for stereo, learn stereo matching with mono]

- [BorderDet: Border Feature for Dense Object Detection](https://arxiv.org/abs/2007.11056) ECCV 2020 oral [Megvii]

- [Scale-Aware Trident Networks for Object Detection](https://arxiv.org/abs/1901.01892) ICCV 2019 [different heads for different scales]

- [Learning Depth from Monocular Videos using Direct Methods](https://arxiv.org/abs/1712.00175)

- [Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints](https://arxiv.org/abs/1802.05522) CVPR 2018 [Google]

- [NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections](https://arxiv.org/abs/2008.02268)

- [Supervising the new with the old: learning SFM from SFM](http://openaccess.thecvf.com/content_ECCV_2018/papers/Maria_Klodt_Supervising_the_new_ECCV_2018_paper.pdf) [[Notes](paper_notes/learn_sfm_from_sfm.md)] ECCV 2018

- [Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera](https://arxiv.org/abs/1901.02571) CVPR 2019 [multi-frame monodepth]

- [Don't Forget The Past: Recurrent Depth Estimation from Monocular Video](https://arxiv.org/abs/2001.02613) [multi-frame monodepth, RNN]

- [Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth](https://arxiv.org/abs/1904.07087) [multi-frame monodepth, RNN]

- [Exploiting temporal consistency for real-time video depth estimation](https://arxiv.org/abs/1908.03706) ICCV 2019 [multi-frame monodepth, RNN, indoor]

- [SfM-Net: Learning of Structure and Motion from Video](https://arxiv.org/abs/1704.07804) [dynamic object, SfM]

- [MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection](https://ieeexplore.ieee.org/document/8500395) [[Notes](paper_notes/mb_net.md)] IV 2018 [mono3D: Daimler]

- [BS3D: Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images](https://ieeexplore.ieee.org/abstract/document/8814036/) [[Notes](paper_notes/bs3d.md)] IV 2019 [mono3D, Daimler]

- [3D-GCK: Single-Shot 3D Detection of Vehicles from Monocular RGB Images via

Geometrically Constrained Keypoints in Real-Time](https://arxiv.org/abs/2006.13084) [[Notes](paper_notes/3d_gck.md)] IV 2020 [[mono3D, Daimler]

- [UR3D: Distance-Normalized Unified Representation for Monocular 3D Object Detection](https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/6559_ECCV_2020_paper.php) [[Notes](paper_notes/ur3d.md)] ECCV 2020 [mono3D]

- [DA-3Det: Monocular 3D Object Detection via Feature Domain Adaptation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123540018.pdf) [[Notes](paper_notes/da_3det.md)] ECCV 2020 [mono3D]

- [RAR-Net: Reinforced Axial Refinement Network for Monocular 3D Object Detection](https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/2822_ECCV_2020_paper.php) [[Notes](paper_notes/rarnet.md)] ECCV 2020 [mono3D]

## 2020-07 (25)

- [CenterTrack: Tracking Objects as Points](https://arxiv.org/abs/2004.01177) [[Notes](paper_notes/centertrack.md)] ECCV 2020 spotlight [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection, Philipp Krähenbühl]

- [CenterPoint: Center-based 3D Object Detection and Tracking](https://arxiv.org/abs/2006.11275) [[Notes](paper_notes/centerpoint.md)] CVPR 2021 [lidar based 3D MOD, CenterNet]

- [Tracktor: Tracking without bells and whistles](https://arxiv.org/abs/1903.05625) [[Notes](paper_notes/tracktor.md)] ICCV 2019 [Tracktor/Tracktor++, Laura Leal-Taixe@TUM]

- [FairMOT: A Simple Baseline for Multi-Object Tracking](https://arxiv.org/abs/2004.01888) [[Notes](paper_notes/fairmot.md)]

- [DeepMOT: A Differentiable Framework for Training Multiple Object Trackers](https://arxiv.org/abs/1906.06618) [[Notes](paper_notes/deepmot.md)] CVPR 2020 [trainable Hungarian, Laura Leal-Taixe@TUM]

- [MPNTracker: Learning a Neural Solver for Multiple Object Tracking](https://arxiv.org/abs/1912.07515) CVPR 2020 oral [trainable Hungarian, Laura Leal-Taixe@TUM]

- [nuScenes: A multimodal dataset for autonomous driving](https://arxiv.org/abs/1903.11027) [[Notes](paper_notes/nuscenes.md)] CVPR 2020 [dataset, point cloud, radar]

- [CBGS: Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection](https://arxiv.org/abs/1908.09492) [[Notes](paper_notes/cbgs.md)] CVPRW 2019 [Megvii, lidar, WAD challenge winner]

- [AFDet: Anchor Free One Stage 3D Object Detection](https://arxiv.org/abs/2006.12671) and [Competition solution](https://arxiv.org/pdf/2006.15505.pdf) [[Notes](paper_notes/afdet.md)]  CVPRW 2020 [Horizon robotics, lidar, winning for Waymo challenge] 

- Review of MOT and SOT [[Notes](paper_notes/mot_and_sot.md)]

- [CrowdHuman: A Benchmark for Detecting Human in a Crowd](https://arxiv.org/abs/1805.00123) [[Notes](paper_notes/crowdhuman.md)] [megvii, pedestrian, dataset]

- [WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild](https://arxiv.org/abs/1909.12118) [[Notes](paper_notes/widerperson.md)] TMM 2019 [dataset, pedestrian]

- [Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection](http://www.gavrila.net/Publications/iv16_cyclist_benchmark.pdf) [[Notes](paper_notes/tsinghua_daimler_cyclist.md)] IV 2016 [dataset, cyclist Detection]

- [Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera](https://drive.google.com/drive/u/0/folders/1inawrX9NVcchDQZepnBeJY4i9aAI5mg9) [[Notes]([paper_notes/specialized_cyclists.md)] IV 2019 [Extention to KITTI]

- [PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01550) [[Notes](paper_notes/pointtrack.md)] ECCV 2020 oral [MOTS]

- [PointTrack++ for Effective Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01549) [[Notes](paper_notes/pointtrack++.md)] CVPR 2020 workshop [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]

- [SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth](https://arxiv.org/abs/1906.11109) [[Notes](paper_notes/spatial_embedding.md)] ICCV 2019 [one-stage, instance segmentation]

- [BA-Net: Dense Bundle Adjustment Networks](https://arxiv.org/abs/1806.04807) [[Notes](paper_notes/banet.md)] ICLR 2019 [Bundle adjustment, multi-frame monodepth, feature-metric]

- [DeepSFM: Structure From Motion Via Deep Bundle Adjustment](https://arxiv.org/abs/1912.09697) ECCV 2020 oral [multi-frame monodepth, indoor scene]

- [CVD: Consistent Video Depth Estimation](https://arxiv.org/abs/2004.15021) [[Notes](paper_notes/cvd.md)] SIGGRAPH 2020 [multi-frame monodepth, online finetune]

- [DeepV2D: Video to Depth with Differentiable Structure from Motion](https://arxiv.org/abs/1812.04605) [[Notes](paper_notes/deepv2d.md)] ICLR 2020 [multi-frame monodepth, Jia Deng]

- [GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose](https://arxiv.org/abs/1803.02276) [[Notes](paper_notes/geonet.md)] CVPR 2018 [residual optical flow, monodepth, rigid and dynamic motion]

- [GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera](https://arxiv.org/abs/1907.05820) [[Notes](paper_notes/glnet.md)] ICCV 2019 [online finetune, rigid and dynamic motion]

- [Depth Hints: Self-Supervised Monocular Depth Hints](https://arxiv.org/abs/1909.09051) [[Notes](paper_notes/depth_hints.md)] ICCV 2019 [monodepth, local minima, cheap stereo GT]

- [MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation](https://arxiv.org/abs/2005.06209) [[Notes](paper_notes/mono_uncertainty.md)] CVPR 2020 [depth uncertainty]

- [Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment](https://arxiv.org/abs/1909.13163) [[Notes](paper_notes/ba_sfm_learner.md)] [Bundle adjustment, xmotors.ai, multi-frame monodepth]

- [Kinematic 3D Object Detection in Monocular Video](https://arxiv.org/abs/2007.09548) [[Notes](paper_notes/kinematic_mono3d.md)] ECCV 2020 [multi-frame mono3D, Xiaoming Liu]

- [VelocityNet: Camera-based vehicle velocity estimation from monocular video](https://arxiv.org/abs/1802.07094) [[Notes](paper_notes/velocity_net.md)] CVPR 2017 workshop [monocular velocity estimation, CVPR 2017 challenge winner]

- [Vehicle Centric VelocityNet: End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera](https://arxiv.org/abs/2006.04082) [[Notes](paper_notes/vehicle_centric_velocity_net.md)] [monocular velocity estimation, monocular distance, SOTA]

## 2020-06 (20)

- [LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain](http://personal.stevens.edu/~benglot/Shan_Englot_IROS_2018_Preprint.pdf) [[Notes](paper_notes/lego_loam.md)] IROS 2018 [lidar, mapping]

- [PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction](http://openaccess.thecvf.com/content_ICCV_2019/papers/Rasouli_PIE_A_Large-Scale_Dataset_and_Models_for_Pedestrian_Intention_Estimation_ICCV_2019_paper.pdf) [[Notes](paper_notes/pie.md)] ICCV 2019

- [JAAD: Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian

Crosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w3/Rasouli_Are_They_Going_ICCV_2017_paper.pdf) ICCV 2017

- [Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs](https://bmvc2019.org/wp-content/uploads/papers/0283-paper.pdf) BMVC 2019

- [Is the Pedestrian going to Cross? Answering by 2D Pose Estimation](https://arxiv.org/abs/1807.10580) IV 2018

- [Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation](https://arxiv.org/abs/1910.03858) ITSC 2019 [skeleton, pedestrian, cyclist intention]

- [Attentive Single-Tasking of Multiple Tasks](https://arxiv.org/abs/1904.08918) CVPR 2019

- [DETR: End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) [[Notes](paper_notes/detr.md)] ECCV 2020 oral [FAIR]

- [Transformer: Attention Is All You Need](https://arxiv.org/abs/1706.03762) [[Notes](paper_notes/transformer.md)] NIPS 2017

- [SpeedNet: Learning the Speediness in Videos](https://arxiv.org/abs/2004.06130) [[Notes](paper_notes/speednet.md)] CVPR 2020 oral

- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] CVPR 2020 [Mono3D, pairwise relationship]

- [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111) [[Notes](paper_notes/smoke.md)] CVPRW 2020 [Mono3D, Zongmu]

- [Vehicle Re-ID for Surround-view Camera System](https://drive.google.com/file/d/1e6y8wtHAricaEHS9CpasSGOx0aAxCGib/view) [[Notes](paper_notes/reid_surround_fisheye.md)] CVPRW 2020 [tireline, vehicle ReID, Zongmu]

- [End-to-End Lane Marker Detection via Row-wise Classification](https://arxiv.org/abs/2005.08630) [[Notes](paper_notes/e2e_lmd.md)] [Qualcomm Korea, LLD as cls]

- [Reliable multilane detection and classification by utilizing CNN as a regression network](http://openaccess.thecvf.com/content_ECCVW_2018/papers/11133/Chougule_Reliable_multilane_detection_and_classification_by_utilizing_CNN_as_a_ECCVW_2018_paper.pdf) ECCV 2018 [LLD as reg]

- [SUPER: A Novel Lane Detection System](https://arxiv.org/abs/2005.07277) [[Notes](paper_notes/super.md)]

- [Learning Lightweight Lane Detection CNNs by Self Attention Distillation](https://arxiv.org/abs/1908.00821) ICCV 2019

- [StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation](http://www.bmva.org/bmvc/2015/papers/paper109/paper109.pdf) BMVC 2015

- [StixelNetV2: Real-time category-based and general obstacle detection for autonomous driving](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w3/Garnett_Real-Time_Category-Based_and_ICCV_2017_paper.pdf) [[Notes](paper_notes/stixelnetv2.md)] ICCV 2017 [DS]

- [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/abs/1609.05158) [[Notes](paper_notes/subpixel_conv.md)] CVPR 2016 [channel-to-pixel]

- [Car Pose in Context: Accurate Pose Estimation with Ground Plane Constraints](https://arxiv.org/abs/1912.04363) [mono3D]

- [Self-Mono-SF: Self-Supervised Monocular Scene Flow Estimation](https://arxiv.org/abs/2004.04143) [[Notes](paper_notes/self_mono_sf.md)] CVPR 2020 oral [scene-flow, Stereo input]

- [MEBOW: Monocular Estimation of Body Orientation In the Wild](https://arxiv.org/abs/2011.13688) [[Notes](paper_notes/mebow.md)] CVPR 2020

- [VG-NMS: Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes](https://arxiv.org/abs/2006.08547) [[Notes](paper_notes/vg_nms.md)] NeurIPS 2019 workshop [Crowded scene, NMS, Daimler]

- [WYSIWYG: What You See is What You Get: Exploiting Visibility for 3D Object Detection](https://arxiv.org/abs/1912.04986) [[Notes](paper_notes/wysiwyg.md)] CVPR 2020 oral [occupancy grid]

- [Real-Time Panoptic Segmentation From Dense Detections](https://arxiv.org/abs/1912.01202) [[Notes](paper_notes/realtime_panoptic.md)] CVPR 2020 oral [bbox + semantic segmentation = panoptic segmentation, Toyota]

- [Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving](https://drive.google.com/file/d/1DY95vfWBLKOOZZyq8gLDd0heZ6aBSdji/view) [[Notes](paper_notes/human_centric_annotation.md)] CVPRW 2020 [efficient annotation]

- [SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving](https://arxiv.org/abs/2005.03844) [[Notes](paper_notes/surfel_gan.md)] CVPR 2020 oral [Waymo, auto data generation, surfel]

- [LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World](https://arxiv.org/abs/2006.09348) [[Notes](paper_notes/lidarsim.md)] CVPR 2020 oral [Uber ATG, auto data generation, surfel]

- [SuMa++: Efficient LiDAR-based Semantic SLAM](http://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/chen2019iros.pdf) IROS 2019 [semantic segmentation, lidar, SLAM]

- [PON/PyrOccNet: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks](https://arxiv.org/abs/2003.13402) [[Notes](paper_notes/pyroccnet.md)] CVPR 2020 oral [BEV-Net, OFT]

- [MonoLayout: Amodal scene layout from a single image](https://arxiv.org/abs/2002.08394) [[Notes](paper_notes/monolayout.md)] WACV 2020 [BEV-Net]

- [BEV-Seg: Bird’s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud](https://arxiv.org/abs/2006.11436) [[Notes](paper_notes/bev_seg.md)] CVPR 2020 workshop [BEV-Net, Mapping]

- [A Geometric Approach to Obtain a Bird's Eye View from an Image](https://arxiv.org/abs/1905.02231) ICCVW 2019 [mapping, geometry, Andrew Zisserman]

- [FrozenDepth: Learning the Depths of Moving People by Watching Frozen People](https://arxiv.org/abs/1904.11111) [[Notes](paper_notes/frozen_depth.md)] CVPR 2019 oral

- [ORB-SLAM: a Versatile and Accurate Monocular SLAM System](https://arxiv.org/abs/1502.00956) TRO 2015

- [ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras](https://arxiv.org/abs/1610.06475) TRO 2016

- [CubeSLAM: Monocular 3D Object SLAM](https://arxiv.org/abs/1806.00557) [[Notes](paper_notes/cube_slam.md)] TRO 2019 [dynamic SLAM, orb slam + mono3D]

- [ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings](https://arxiv.org/abs/2003.12980) [[Notes](paper_notes/cluster_vo.md)] CVPR 2020 [general dynamic SLAM]

- [S3DOT: Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving](https://arxiv.org/abs/1807.02062) [[Notes](paper_notes/s3dot.md)] ECCV 2018 [Peiliang Li]

- [Multi-object Monocular SLAM for Dynamic Environments](https://arxiv.org/abs/2002.03528) [[Notes](paper_notes/multi_object_mono_slam.md)] IV 2020 [monolayout authors]

- [PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume](https://arxiv.org/abs/1709.02371) [[Notes](paper_notes/pwc_net.md)] CVPR 2018 oral [Optical flow]

- [LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation](https://arxiv.org/abs/1805.07036) CVPR 2018 [Optical flow]

- [FlowNet: Learning Optical Flow With Convolutional Networks](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Dosovitskiy_FlowNet_Learning_Optical_ICCV_2015_paper.pdf) ICCV 2015 [Optical flow]

- [FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks](https://arxiv.org/abs/1612.01925) CVPR 2017 [Optical flow]

- [ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network](https://arxiv.org/abs/1811.11431) CVPR 2019 [semantic segmentation, lightweight]

- [Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes](https://arxiv.org/abs/1908.06316) ICCV 2019 [depth uncertainty]

  

## 2020-05 (19)

- [Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems](https://arxiv.org/abs/1809.07408) [[Notes](paper_notes/hevi.md)] [Honda] ICRA 2019

- [PackNet: 3D Packing for Self-Supervised Monocular Depth Estimation](https://arxiv.org/abs/1905.02693) [[Notes](paper_notes/packnet.md)] CVPR 2020 oral [Scale aware depth]

- [PackNet-SG: Semantically-Guided Representation Learning for Self-Supervised Monocular Depth](https://arxiv.org/abs/2002.12319) [[Notes](paper_notes/packnet_sg.md)] ICLR 2020 [TRI, infinite-depth problem]

- [TrianFlow: Towards Better Generalization: Joint Depth-Pose Learning without PoseNet](https://arxiv.org/abs/2004.01314) [[Notes](paper_notes/trianflow.md)] CVPR 2020 [Scale aware]

- [Understanding the Limitations of CNN-based Absolute Camera Pose Regression](https://arxiv.org/abs/1903.07504) [[Notes](paper_notes/understanding_apr.md)] CVPR 2019 [Drawbacks of PoseNet, MapNet, Laura Leal-Taixe@TUM]

- [To Learn or Not to Learn: Visual Localization from Essential Matrices](https://arxiv.org/abs/1908.01293) [[Notes](paper_notes/to_learn_or_not.md)] ICRA 2020 [SIFT + 5 pt solver >> others for VO, Laura Leal-Taixe@TUM]

- [DF-VO: Visual Odometry Revisited: What Should Be Learnt?](https://arxiv.org/abs/1909.09803) [[Notes](paper_notes/df_vo.md)] ICRA 2020 [Depth and Flow for accurate VO]

- [D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry](https://arxiv.org/abs/2003.01060) [[Notes](paper_notes/d3vo.md)] CVPR 2020 oral [Daniel Cremers, TUM, depth uncertainty]

- [Network Slimming: Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519) [[Notes](paper_notes/network_slimming.md)] ICCV 2017

- [BatchNorm Pruning: Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://arxiv.org/abs/1802.00124) [[Notes](paper_notes/batchnorm_pruning.md)] ICLR 2018

- [Direct Sparse Odometry](https://arxiv.org/abs/1607.02565) PAMI 2018

- [Train in Germany, Test in The USA: Making 3D Object Detectors Generalize](https://arxiv.org/abs/2005.08139) [[Notes](paper_notes/train_in_germany.md)] CVPR 2020

- [PseudoLidarV3: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection](https://arxiv.org/abs/2004.03080) [[Notes](paper_notes/pseudo_lidar_v3.md)] CVPR 2020

- [ATSS: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection](https://arxiv.org/abs/1912.02424) [[Notes](paper_notes/atss.md)] CVPR 2020 oral

- [Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression](https://arxiv.org/abs/1911.08287) AAAI 2020

- [Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation](https://arxiv.org/abs/2005.03572) [Journal version]

- [YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934) [[Notes](paper_notes/yolov4.md)]

- [CBN: Cross-Iteration Batch Normalization](https://arxiv.org/abs/2002.05712) [[Notes](paper_notes/cbn.md)]

- [Stitcher: Feedback-driven Data Provider for Object Detection](https://arxiv.org/abs/2004.12432) [[Notes](paper_notes/stitcher.md)]

- [SKNet: Selective Kernel Networks](https://arxiv.org/abs/1903.06586) [[Notes](paper_notes/sknet.md)] CVPR 2019

- [CBAM: Convolutional Block Attention Module](https://arxiv.org/abs/1807.06521) [[Notes](paper_notes/cbam.md)] ECCV 2018 

- [ResNeSt: Split-Attention Networks](https://arxiv.org/abs/2004.08955) [[Notes](paper_notes/resnest.md)]

## 2020-04 (14)

- [ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst](https://arxiv.org/pdf/1812.03079.pdf) [[Notes](paper_notes/chauffeurnet.md)] RSS 2019 [Waymo]

- [IntentNet: Learning to Predict Intention from Raw Sensor Data](http://www.cs.toronto.edu/~wenjie/papers/intentnet_corl18.pdf) [[Notes](paper_notes/intentnet.md)] CoRL 2018 [Uber ATG, perception and prediction, Lidar+Map]

- [RoR: Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions](https://arxiv.org/abs/1906.08945) [[Notes](paper_notes/ror.md)] CVPR 2019 [Zoox]

- [MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction](https://arxiv.org/abs/1910.05449) [[Notes](paper_notes/multipath.md)] CoRL 2019 [Waymo, authors from RoR and ChauffeurNet]

- [NMP: End-to-end Interpretable Neural Motion Planner](http://www.cs.toronto.edu/~wenjie/papers/cvpr19/nmp.pdf) [[Notes](paper_notes/nmp.md)] CVPR 2019 oral [Uber ATG]

- [Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks](https://arxiv.org/abs/1809.10732) [[Notes](paper_notes/multipath_uber.md)] ICRA 2019 [Henggang Cui, Multimodal, Uber ATG Pittsburgh]

- [Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving](https://arxiv.org/abs/1808.05819) WACV 2020 [Uber ATG Pittsburgh] 

- [TensorMask: A Foundation for Dense Object Segmentation](https://arxiv.org/abs/1903.12174) [[Notes](paper_notes/tensormask.md)] ICCV 2019 [single-stage instance seg]

- [BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation](https://arxiv.org/abs/2001.00309) [[Notes](paper_notes/blendmask.md)] CVPR 2020 oral

- [Mask Encoding for Single Shot Instance Segmentation](https://arxiv.org/abs/2003.11712) [[Notes](paper_notes/meinst.md)] CVPR 2020 oral [single-stage instance seg, Chunhua Shen]

- [PolarMask: Single Shot Instance Segmentation with Polar Representation](https://arxiv.org/abs/1909.13226) [[Notes](paper_notes/polarmask.md)] CVPR 2020 oral [single-stage instance seg]

- [SOLO: Segmenting Objects by Locations](https://arxiv.org/abs/1912.04488) [[Notes](paper_notes/solo.md)] ECCV 2020 [single-stage instance seg, Chunhua Shen]

- [SOLOv2: Dynamic, Faster and Stronger](https://arxiv.org/abs/2003.10152) [[Notes](paper_notes/solov2.md)] [single-stage instance seg, Chunhua Shen]

- [CondInst: Conditional Convolutions for Instance Segmentation](https://arxiv.org/abs/2003.05664) [[Notes](paper_notes/condinst.md)] ECCV 2020 oral [single-stage instance seg, Chunhua Shen]

- [CenterMask: Single Shot Instance Segmentation With Point Representation](https://arxiv.org/abs/2004.04446) [[Notes](paper_notes/centermask.md)]CVPR 2020

## 2020-03 (15)

- [VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition](https://arxiv.org/abs/1710.06288) [[Notes](paper_notes/vpgnet.md)] ICCV 2017

- [Which Tasks Should Be Learned Together in Multi-task Learning?](https://arxiv.org/abs/1905.07553) [[Notes](paper_notes/task_grouping.md)] [Stanford, MTL] ICML 2020

- [MGDA: Multi-Task Learning as Multi-Objective Optimization](https://arxiv.org/abs/1810.04650) NeurIPS 2018

- [Taskonomy: Disentangling Task Transfer Learning](https://arxiv.org/abs/1804.08328) [[Notes](paper_notes/taskonomy.md)] CVPR 2018

- [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883) [[Notes](paper_notes/rethinking_pretraining.md)] ICCV 2019 [Kaiming He]

- [UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor](https://arxiv.org/abs/1907.04011) [[Notes](paper_notes/unsuperpoint.md)] [superpoint]

- [KP2D: Neural Outlier Rejection for Self-Supervised Keypoint Learning](https://arxiv.org/abs/1912.10615) [[Notes](paper_notes/kp2d.md)] ICLR 2020 (pointNet)

- [KP3D: Self-Supervised 3D Keypoint Learning for Ego-motion Estimation](https://arxiv.org/abs/1912.03426) [[Notes](paper_notes/kp3d.md)] CoRL 2020 [Toyota, superpoint]

- [NG-RANSAC: Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses](https://arxiv.org/abs/1905.04132) [[Notes](paper_notes/ng_ransac.md)] ICCV 2019 [pointNet]

- [Learning to Find Good Correspondences](https://arxiv.org/abs/1711.05971) [[Notes](paper_notes/learning_correspondence.md)] CVPR 2018 Oral (pointNet)

- [RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving](https://arxiv.org/abs/1911.09712) [[Notes](paper_notes/refined_mpl.md)] [Huawei, Mono3D]

- [DSP: Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation](https://arxiv.org/abs/2002.01619) [[Notes](paper_notes/dsp.md)] AAAI 2020 (SenseTime, Mono3D)

- [Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks](https://arxiv.org/abs/1903.02193) (LLD, LSTM)

- [LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach](https://arxiv.org/abs/1802.05591) [[Notes](paper_notes/lanenet.md)] IV 2018 (LaneNet)

- [3D-LaneNet: End-to-End 3D Multiple Lane Detection](http://openaccess.thecvf.com/content_ICCV_2019/papers/Garnett_3D-LaneNet_End-to-End_3D_Multiple_Lane_Detection_ICCV_2019_paper.pdf) [[Notes](paper_notes/3d_lanenet.md)] ICCV 2019

- [Semi-Local 3D Lane Detection and Uncertainty Estimation](https://arxiv.org/abs/2003.05257) [[Notes](paper_notes/semilocal_3d_lanenet.md)] [GM Israel, 3D LLD]

- [Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection](https://arxiv.org/abs/2003.10656) [[Notes](paper_notes/gen_lanenet.md)] ECCV 2020 [Apollo, 3D LLD]

- [Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty](https://arxiv.org/abs/1711.09026) CVPR 2018 [Egocentric prediction]

- [It’s Not All About Size: On the Role of Data Properties in Pedestrian Detection](http://openaccess.thecvf.com/content_ECCVW_2018/papers/11129/Rasouli_Its_Not_All_About_Size_On_the_Role_of_Data_ECCVW_2018_paper.pdf) ECCV 2018 [pedestrian]

## 2020-02 (12)

- [Associative Embedding: End-to-End Learning for Joint Detection and Grouping](https://arxiv.org/abs/1611.05424) [[Notes](paper_notes/associative_embedding.md)] NIPS 2017

- [Pixels to Graphs by Associative Embedding](https://arxiv.org/abs/1706.07365) [[Notes](paper_notes/pixels_to_graphs.md)] NIPS 2017

- [Social LSTM: Human Trajectory Prediction in Crowded Spaces](http://cvgl.stanford.edu/papers/CVPR16_Social_LSTM.pdf) [[Notes](paper_notes/social_lstm.md)] CVPR 2017 

- [Online Video Object Detection using Association LSTM](http://openaccess.thecvf.com/content_ICCV_2017/papers/Lu__Online_Video_ICCV_2017_paper.pdf) [[Notes](paper_notes/association_lstm.md)] [single stage, recurrent]

- [SuperPoint: Self-Supervised Interest Point Detection and Description](https://arxiv.org/abs/1712.07629) [[Notes](paper_notes/superpoint.md)] CVPR 2018 (channel-to-pixel, deep SLAM, Magic Leap)

- [PointRend: Image Segmentation as Rendering](https://arxiv.org/abs/1912.08193) [[Notes](paper_notes/pointrend.md)] CVPR 2020 Oral [Kaiming He, FAIR]

- [Multigrid: A Multigrid Method for Efficiently Training Video Models](https://arxiv.org/abs/1912.00998) [[Notes](paper_notes/multigrid_training.md)] CVPR 2020 Oral [Kaiming He, FAIR]

- [GhostNet: More Features from Cheap Operations](https://arxiv.org/abs/1911.11907) [[Notes](paper_notes/ghostnet.md)] CVPR 2020

- [FixRes: Fixing the train-test resolution discrepancy](https://arxiv.org/abs/1906.06423) [[Notes](paper_notes/fixres.md)] NIPS 2019 [FAIR]

- [MoVi-3D: Towards Generalization Across Depth for Monocular 3D Object Detection](https://arxiv.org/abs/1912.08035) [[Notes](paper_notes/movi_3d.md)] ECCV 2020 [Virtual Cam, viewport, Mapillary/Facebook, Mono3D] 

- [Amodal Completion and Size Constancy in Natural Scenes](https://arxiv.org/abs/1509.08147) [[Notes](paper_notes/amodal_completion.md)] ICCV 2015 (Amodal completion)

- [MoCo: Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722) [[Notes](paper_notes/moco.md)] CVPR 2020 Oral [FAIR, Kaiming He]

## 2020-01 (19)

- [Double Descent: Reconciling modern machine learning practice and the bias-variance trade-of](https://arxiv.org/abs/1812.11118) [[Notes](paper_notes/double_descent.md)] PNAS 2019

- [Deep Double Descent: Where Bigger Models and More Data Hurt](https://arxiv.org/abs/1912.02292) [[Notes](paper_notes/deep_double_descent.md)]

- [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/abs/1712.09913) NIPS 2018

- [The ApolloScape Open Dataset for Autonomous Driving and its Application](https://arxiv.org/pdf/1803.06184.pdf) CVPR 2018 (dataset)

- [ApolloCar3D: A La
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/patrick-llgc/learning-deep-learning

Awesome Lists containing this project

README