https://github.com/tigerneil/awesome-deep-rl

For deep RL and the future of AI.
https://github.com/tigerneil/awesome-deep-rl

aaai aamas agi aistats artificial-general-intelligence deep-reinforcement-learning distributional exploration-exploitation game hierarchical-reinforcement-learning iclr icml ijcai inverse-rl multiagent-reinforcement-learning planning reinforcement-learning reward theoretical-computer-science uai

Last synced: over 1 year ago
JSON representation

For deep RL and the future of AI.

Host: GitHub
URL: https://github.com/tigerneil/awesome-deep-rl
Owner: tigerneil
License: mit
Created: 2017-02-03T12:12:02.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2024-03-01T08:20:45.000Z (over 2 years ago)
Last Synced: 2024-11-24T20:54:02.704Z (over 1 year ago)
Topics: aaai, aamas, agi, aistats, artificial-general-intelligence, deep-reinforcement-learning, distributional, exploration-exploitation, game, hierarchical-reinforcement-learning, iclr, icml, ijcai, inverse-rl, multiagent-reinforcement-learning, planning, reinforcement-learning, reward, theoretical-computer-science, uai
Language: HTML
Homepage:
Size: 1.92 MB
Stars: 1,422
Watchers: 108
Forks: 218
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

fucking-awesome-awesomeness - by @tigerneil
awesome-deep-reinforcement-learning - tigerneil/awesome-deep-rl
fucking-lists - awesome-deep-rl
awesomelist - awesome-deep-rl
more-awesome - Deep Learning: by @tigerneil - For deep RL and the future of AI. (To Sort)
awesome-ai-list-guide - awesome-deep-rl
awesome-github-projects - awesome-deep-rl - For deep RL and the future of AI. ⭐1,514 `HTML` (📦 Legacy & Inactive Projects)
collection - awesome-deep-rl
lists - awesome-deep-rl
awesome-awesomeness - by @tigerneil
awesome-reinforcement-learning - This project is for learning and researching on Deep RL. Maintained by University AI researchers
ultimate-awesome - awesome-deep-rl - For deep RL and the future of AI. . (Other Lists / TeX Lists)
StarryDivineSky - tigerneil/awesome-deep-rl - deep-rl是一个深度强化学习资源精选列表，旨在汇集与深度强化学习相关的优秀项目和资源，为研究人员和开发者提供便利。它可能包含论文、代码库、教程、博客文章和其他有用的材料。该项目关注深度强化学习及其在人工智能未来发展中的作用，可能涵盖各种深度强化学习算法、应用和最新进展。通过整理这些资源，该项目旨在促进深度强化学习领域的学习、研究和创新。具体内容需要查看README.md文件才能了解更多细节，例如资源的分类、推荐理由等。该项目可能定期更新，以反映深度强化学习领域的最新发展动态。 (漏洞库_漏洞靶场 / 资源传输下载)
awesome-deep-rl - awesome-deep-rl
awesome-artificial-intelligence-research - Awesome Deep RL - deep reinforcement learning papers. (Core Machine Learning Research / Reinforcement Learning)
awesome-of-awesome-ml - awesome-deep-rl (by tigerneil)
awesome-game-ai - Awesome Deep Reinforcement Learning

README

          # Awesome Deep Reinforcement Learning

> **Mar 1 2024 update: HILP added**

> 

> **July 2022 update: EDDICT added**

> 

> **Mar 2022 update: a few papers released in early 2022**

> 

> **Dec 2021 update: Unsupervised RL**

## Introduction to awesome drl

Reinforcement learning is the fundamental framework for building AGI. Therefore we share important contributions within this awesome drl project. 

## Landscape of Deep RL

![updated Landscape of **DRL**](images/awesome-drl.png)

## Content

- [Awesome Deep Reinforcement Learning](#awesome-deep-reinforcement-learning)

  - [Introduction to awesome drl](#introduction-to-awesome-drl)

  - [Landscape of Deep RL](#landscape-of-deep-rl)

  - [Content](#content)

  - [General guidances](#general-guidances)

  - [2022](#2022)

  - [Foundations and theory](#foundations-and-theory)

  - [General benchmark frameworks](#general-benchmark-frameworks)

  - [Unsupervised](#unsupervised)

  - [Offline](#offline)

  - [Value based](#value-based)

  - [Policy gradient](#policy-gradient)

  - [Explorations](#explorations)

  - [Actor-Critic](#actor-critic)

  - [Model-based](#model-based)

  - [Model-free + Model-based](#model-free--model-based)

  - [Hierarchical](#hierarchical)

  - [Option](#option)

  - [Connection with other methods](#connection-with-other-methods)

  - [Connecting value and policy methods](#connecting-value-and-policy-methods)

  - [Reward design](#reward-design)

  - [Unifying](#unifying)

  - [Faster DRL](#faster-drl)

  - [Multi-agent](#multi-agent)

  - [New design](#new-design)

  - [Multitask](#multitask)

  - [Observational Learning](#observational-learning)

  - [Meta Learning](#meta-learning)

  - [Distributional](#distributional)

  - [Planning](#planning)

  - [Safety](#safety)

  - [Inverse RL](#inverse-rl)

  - [No reward RL](#no-reward-rl)

  - [Time](#time)

  - [Adversarial learning](#adversarial-learning)

  - [Use Natural Language](#use-natural-language)

  - [Generative and contrastive representation learning](#generative-and-contrastive-representation-learning)

  - [Belief](#belief)

  - [PAC](#pac)

  - [Applications](#applications)

Illustrations:

![](images/ACER.png)

**Recommendations and suggestions are welcome**. 

## General guidances

* [Awesome Offline RL](https://github.com/hanjuku-kaso/awesome-offline-rl)

* [Reinforcement Learning Today](http://reinforcementlearning.today/)

* [Multiagent Reinforcement Learning by Marc Lanctot RLSS @ Lille](http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf) 11 July 2019

* [RLDM 2019 Notes by David Abel](https://david-abel.github.io/notes/rldm_2019.pdf) 11 July 2019

* [A Survey of Reinforcement Learning Informed by Natural Language](RLNL.md) 10 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.03926.pdf)

* [Challenges of Real-World Reinforcement Learning](ChallengesRealWorldRL.md) 29 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.12901.pdf)

* [Ray Interference: a Source of Plateaus in Deep Reinforcement Learning](RayInterference.md) 25 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.11455.pdf)

* [Principles of Deep RL by David Silver](p10.md)

* [University AI's General introduction to deep rl (in Chinese)](https://www.jianshu.com/p/dfd987aa765a)

* [OpenAI's spinningup](https://spinningup.openai.com/en/latest/)

* [The Promise of Hierarchical Reinforcement Learning](https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/) 9 Mar 2019

* [Deep Reinforcement Learning that Matters](reproducing.md) 30 Jan 2019 [arxiv](https://arxiv.org/pdf/1709.06560.pdf)

## 2024

* [Foundation Policies with Hilbert Representations](HILP.md) [arxiv](https://arxiv.org/abs/2402.15567) [repo](https://github.com/seohongpark/HILP) 23 Feb 2024

## 2022

* Reinforcement Learning with Action-Free Pre-Training from Videos [arxiv](https://arxiv.org/abs/2203.13880) [repo](https://github.com/younggyoseo/apv)

## Generalist policies

* [Foundation Policies with Hilbert Representations](HILP.md) [arxiv](https://arxiv.org/abs/2402.15567) [repo](https://github.com/seohongpark/HILP) 23 Feb 2024

## Foundations and theory

* [General non-linear Bellman equations](GNLBE.md) 9 July 2019 [arxiv](https://arxiv.org/pdf/1907.07331.pdf)

* [Monte Carlo Gradient Estimation in Machine Learning](MCGE.md) 25 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.10652.pdf)

## General benchmark frameworks

* [Brax](https://github.com/google/brax/) 

![](https://github.com/google/brax/raw/main/docs/img/fetch.gif)

* [Android-Env](https://github.com/deepmind/android_env) 

  * ![](https://github.com/deepmind/android_env/raw/main/docs/images/device_control.gif)

* [MuJoCo](http://mujoco.org/) | [MuJoCo Chinese version](https://github.com/tigerneil/mujoco-zh)

* [Unsupervised RL Benchmark](https://github.com/rll-research/url_benchmark)

* [Dataset for Offline RL](https://github.com/rail-berkeley/d4rl)

* [Spriteworld: a flexible, configurable python-based reinforcement learning environment](https://github.com/deepmind/spriteworld)

* [Chainerrl Visualizer](https://github.com/chainer/chainerrl-visualizer)

* [Behaviour Suite for Reinforcement Learning](BSRL.md) 13 Aug 2019 [arxiv](https://arxiv.org/pdf/1908.03568.pdf) | [code](https://github.com/deepmind/bsuite)

* [Quantifying Generalization in Reinforcement Learning](Coinrun.md) 20 Dec 2018 [arxiv](https://arxiv.org/pdf/1812.02341.pdf)

* [S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning](SRL.md) 25 Sept 2018

* [dopamine](https://github.com/google/dopamine)

* [StarCraft II](https://github.com/deepmind/pysc2)

* [tfrl](https://github.com/deepmind/trfl)

* [chainerrl](https://github.com/chainer/chainerrl)

* [PARL](https://github.com/PaddlePaddle/PARL) 

* [DI-engine: a generalized decision intelligence engine. It supports various Deep RL algorithms](https://github.com/opendilab/DI-engine)

* [PPO x Family: Course in Chinese for Deep RL](https://github.com/opendilab/PPOxFamily)

## Unsupervised

* [URLB: Unsupervised Reinforcement Learning Benchmark](https://arxiv.org/abs/2110.15191) 28 Oct 2021

* [APS: Active Pretraining with Successor Feature](https://arxiv.org/abs/2108.13956) 31 Aug 2021

* [Behavior From the Void: Unsupervised Active Pre-Training](https://arxiv.org/abs/2103.04551) 8 Mar 2021

* [Reinforcement Learning with Prototypical Representations](https://arxiv.org/abs/2102.11271) 22 Feb 2021

* [Efficient Exploration via State Marginal Matching](https://arxiv.org/abs/1906.05274) 12 Jun 2019

* [Self-Supervised Exploration via Disagreement](https://arxiv.org/abs/1906.04161) 10 Jun 2019

* [Exploration by Random Network Distillation](https://arxiv.org/abs/1810.12894) 30 Oct 2018

* [Diversity is All You Need: Learning Skills without a Reward Function](https://arxiv.org/abs/1802.06070) 16 Feb 2018

* [Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363) 15 May 2017 

## Offline

* [PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators](https://arxiv.org/abs/2102.06961) 10 Nov 2021

* [A General Offline Reinforcement Learning Framework for Interactive Recommendation]() AAAI 2021

## Value based

* [Harnessing Structures for Value-Based Planning and Reinforcement Learning](SVRL.md) 5 Feb 2020 [arxiv](https://arxiv.org/abs/1909.12255) | [code](https://github.com/YyzHarry/SV-RL)

* [Recurrent Value Functions](RVF.md) 23 May 2019 [arxiv](https://arxiv.org/pdf/1905.09562.pdf)

* [Stochastic Lipschitz Q-Learning](LipschitzQ.md) 24 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.10653.pdf)

* [TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning](https://arxiv.org/pdf/1710.11417) 8 Mar 2018

* [DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/pdf/1803.00933.pdf) 2 Mar 2018

* [Rainbow: Combining Improvements in Deep Reinforcement Learning](Rainbow.md) 6 Oct 2017

* [Learning from Demonstrations for Real World Reinforcement Learning](DQfD.md) 12 Apr 2017

* [Dueling Network Architecture](Dueling.md)

* [Double DQN](DDQN.md)

* [Prioritized Experience](PER.md)

* [Deep Q-Networks](DQN.md)

## Policy gradient

* [Phasic Policy Gradient](PPG.md) 9 Sep 2020 [arxiv](https://arxiv.org/pdf/2009.04416.pdf) [code](https://github.com/openai/phasic-policy-gradient)

* [An operator view of policy gradient methods](OVPG.md) 22 Jun 2020 [arxiv](https://arxiv.org/pdf/2006.11266.pdf)

* [Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces](DirPG.md) 14 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.06062.pdf)

* [Policy Gradient Search: Online Planning and Expert Iteration without Search Trees](PGS.md) 7 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.03646.pdf)

* [SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING](SPU.md) 24 Dec 2018 [arxiv](https://arxiv.org/pdf/1805.11706v4.pdf)

* [PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation](PPO-CMA.md) 5 Oct 2018 [arxiv](https://arxiv.org/pdf/1810.02541v6.pdf)

* [Clipped Action Policy Gradient](CAPG.md) 22 June 2018

* [Expected Policy Gradients for Reinforcement Learning](EPG.md) 10 Jan 2018

* [Proximal Policy Optimization Algorithms](PPO.md) 20 July 2017

* [Emergence of Locomotion Behaviours in Rich Environments](DPPO.md) 7 July 2017

* [Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning](IPG.md) 1 Jun 2017

* [Equivalence Between Policy Gradients and Soft Q-Learning](PGSQL.md)

* [Trust Region Policy Optimization](TRPO.md)

* [Reinforcement Learning with Deep Energy-Based Policies](DEBP.md)

* [Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC](QPROP.md)

## Explorations

* [Entropic Desired Dynamics for Intrinsic Control](EDDICT.md) 2021 [openreview](https://openreview.net/pdf?id=lBSSxTgXmiK)

* [Self-Supervised Exploration via Disagreement](Disagreement.md) 10 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.04161.pdf)

* [Approximate Exploration through State Abstraction](MBIE-EB.md) 24 Jan 2019

* [The Uncertainty Bellman Equation and Exploration](UBE.md) 15 Sep 2017

* [Noisy Networks for Exploration](NoisyNet.md) 30 Jun 2017 [implementation](https://github.com/Kaixhin/NoisyNet-A3C)

* [Count-Based Exploration in Feature Space for Reinforcement Learning](PhiEB.md) 25 Jun 2017

* [Count-Based Exploration with Neural Density Models](NDM.md) 14 Jun 2017

* [UCB and InfoGain Exploration via Q-Ensembles](QEnsemble.md) 11 Jun 2017

* [Minimax Regret Bounds for Reinforcement Learning](MMRB.md) 16 Mar 2017

* [Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models](incentivizing.md)

* [EX2: Exploration with Exemplar Models for Deep Reinforcement Learning](EX2.md)

## Actor-Critic

* [Generalized Off-Policy Actor-Critic](Geoff-PAC.md) 27 Mar 2019

* [Soft Actor-Critic Algorithms and Applications](https://arxiv.org/pdf/1812.05905.pdf) 29 Jan 2019

* [The Reactor: A Sample-Efficient Actor-Critic Architecture](REACTOR.md) 15 Apr 2017

* [SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY](ACER.md)

* [REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS](UNREAL.md)

* [Continuous control with deep reinforcement learning](DDPG.md)

## Model-based

 

* [Self-Consistent Models and Values](sc.md) 25 Oct 2021 [arxiv](https://arxiv.org/pdf/2110.12840.pdf)

* [When to use parametric models in reinforcement learning?](parametric.md) 12 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.05243.pdf)

* [Model Based Reinforcement Learning for Atari](https://arxiv.org/pdf/1903.00374.pdf) 5 Mar 2019

* [Model-Based Stabilisation of Deep Reinforcement Learning](MBDQN.md) 6 Sep 2018

* [Learning model-based planning from scratch](IBP.md) 19 July 2017

## Model-free + Model-based

* [Imagination-Augmented Agents for Deep Reinforcement Learning](I2As.md) 19 July 2017

## Hierarchical

* [WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?](HIRO.md) 23 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.10618.pdf) 

* [Language as an Abstraction for Hierarchical Deep Reinforcement Learning](HAL.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07343.pdf)

## Option

* [Variational Option Discovery Algorithms](VALOR.md) 26 July 2018

* [A Laplacian Framework for Option Discovery in Reinforcement Learning](LFOD.md) 16 Jun 2017

## Connection with other methods

* [Robust Imitation of Diverse Behaviors](GVG.md)

* [Learning human behaviors from motion capture by adversarial imitation](GAIL.md)

* [Connecting Generative Adversarial Networks and Actor-Critic Methods](GANAC.md)

## Connecting value and policy methods

* [Bridging the Gap Between Value and Policy Based Reinforcement Learning](PCL.md)

* [Policy gradient and Q-learning](PGQ.md)

## Reward design

* [End-to-End Robotic Reinforcement Learning without Reward Engineering](VICE.md) 16 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.07854.pdf)

* [Reinforcement Learning with Corrupted Reward Channel](RLCRC.md) 23 May 2017

## Unifying

* [Multi-step Reinforcement Learning: A Unifying Algorithm](MSRL.md)

## Faster DRL

* [Neural Episodic Control](NEC.md)

## Multi-agent

* [No Press Diplomacy: Modeling Multi-Agent Gameplay](Dip.md) 4 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.02128.pdf)

* [Options as responses: Grounding behavioural hierarchies in multi-agent RL](OPRE) 6 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.01470.pdf)

* [Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination](MERL.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07315.pdf)

* [A Regularized Opponent Model with Maximum Entropy Objective](ROMMEO.md) 17 May 2019 [arxiv](https://arxiv.org/pdf/1905.08087.pdf)

* [Deep Q-Learning for Nash Equilibria: Nash-DQN](NashDQN.md) 23 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.10554.pdf)

* [Malthusian Reinforcement Learning](MRL.md) 3 Mar 2019 [arxiv](https://arxiv.org/pdf/1812.07019.pdf)

* [Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning](bad.md) 4 Nov 2018

* [INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL](ISMCI.md) 19 Oct 2018

* [QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning](http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/rashidicml18.pdf) 30 Mar 2018

* [Modeling Others using Oneself in Multi-Agent Reinforcement Learning](SOM.md) 26 Feb 2018

* [The Mechanics of n-Player Differentiable Games](SGA.md) 15 Feb 2018 

* [Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments](RoboSumo.md) 10 Oct 2017

* [Learning with Opponent-Learning Awareness](LOLA.md) 13 Sep 2017

* [Counterfactual Multi-Agent Policy Gradients](COMA.md) 

* [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](MADDPG.md) 7 Jun 2017

* [Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games](BiCNet.md) 29 Mar 2017

## New design

* [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/pdf/1802.01561.pdf) 9 Feb 2018

* [Reverse Curriculum Generation for Reinforcement Learning](RECUR.md)

* [Trial without Error: Towards Safe Reinforcement Learning via Human Intervention](HIRL.md)

* [Learning to Design Games: Strategic Environments in Deep Reinforcement Learning](DualMDP.md) 5 July 2017

## Multitask

* [Kickstarting Deep Reinforcement Learning](https://arxiv.org/pdf/1803.03835.pdf) 10 Mar 2018

* [Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning](ZSTG.md) 7 Nov 2017

* [Distral: Robust Multitask Reinforcement Learning](Distral.md) 13 July 2017

## Observational Learning

* [Observational Learning by Reinforcement Learning](OLRL.md) 20 Jun 2017

## Meta Learning

* [Discovery of Useful Questions as Auxiliary Tasks](GVF.md) 10 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.04607.pdf)

* [Meta-learning of Sequential Strategies](MetaSS.md) 8 May 2019 [arxiv](https://arxiv.org/pdf/1905.03030.pdf)

* [Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables](PEARL.md) 19 Mar 2019 [arxiv](https://arxiv.org/pdf/1903.08254.pdf)

* [Some Considerations on Learning to Explore via Meta-Reinforcement Learning](E2.md) 11 Jan 2019 [arxiv](https://arxiv.org/pdf/1803.01118.pdf)

* [Meta-Gradient Reinforcement Learning](MGRL.md) 24 May 2018 [arxiv](https://arxiv.org/pdf/1805.09801.pdf)

* [ProMP: Proximal Meta-Policy Search](ProMP.md) 16 Oct 2018 [arxiv](https://arxiv.org/pdf/1810.06784)

* [Unsupervised Meta-Learning for Reinforcement Learning](UML.md) 12 Jun 2018

## Distributional

* [GAN Q-learning](GANQL.md) 20 July 2018

* [Implicit Quantile Networks for Distributional Reinforcement Learning](IQN.md) 14 Jun 2018

* [Nonlinear Distributional Gradient Temporal-Difference Learning](GTD.md) 20 May 2018

* [DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS](D4PG.md) 23 Apr 2018

* [An Analysis of Categorical Distributional Reinforcement Learning](C51-analysis.md) 22 Feb 2018

* [Distributional Reinforcement Learning with Quantile Regression](QR-DQN.md) 27 Oct 2017

* [A Distributional Perspective on Reinforcement Learning](C51.md) 21 July 2017

## Planning

* [Search on the Replay Buffer: Bridging Planning and Reinforcement Learning](SoRB.md) 12 June 2019 [arxiv](https://arxiv.org/pdf/1906.05253.pdf)

## Safety

* [Robust Reinforcement Learning for Continuous Control with Model Misspecification](MPO.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07516.pdf)

* [Verifiable Reinforcement Learning via Policy Extraction](Viper.md) 22 May 2018 [arxiv](https://arxiv.org/pdf/1805.08328.pdf)

## Inverse RL

* [ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS IN INVERSE REINFORCEMENT LEARNING](OP-GAIL.md) 9 Sep 2018

## No reward RL

* [Fast Task Inference with Variational Intrinsic Successor Features](VISR.md) 2 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.05030.pdf)

* [Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363) 15 May 2017 

## Time

* [Interval timing in deep reinforcement learning agents](Intervaltime.md) 31 May 2019 [arxiv](https://arxiv.org/pdf/1905.13469.pdf)

* [Time Limits in Reinforcement Learning](PEB.md)

## Adversarial learning

* [Sample-efficient Adversarial Imitation Learning from Observation](LQR+GAIfO.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07374.pdf) 

## Use Natural Language

* [Using Natural Language for Reward Shaping in Reinforcement Learning](LEARN.md) 31 May 2019 [arxiv](https://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/goyal.ijcai19.pdf&pubid=127757)

## Generative and contrastive representation learning

* [Unsupervised State Representation Learning in Atari](ST-DIM.md) 19 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.08226.pdf)

## Belief

* [Shaping Belief States with Generative Environment Models for RL](GenerativeBelief.md) 24 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.09237v2.pdf)

## PAC

* [Provably Convergent Off-Policy Actor-Critic with Function Approximation](COF-PAC.md) 11 Nov 2019 [arxiv](https://arxiv.org/pdf/1911.04384.pdf)

## Applications

* [Benchmarks for Deep Off-Policy Evaluation](bdope.md) 30 Mar 2021 [arxiv](https://arxiv.org/pdf/2103.16596.pdf)

* [Learning Reciprocity in Complex Sequential Social Dilemmas](Reciprocity.md) 19 Mar 2019 [arxiv](https://arxiv.org/pdf/1903.08082.pdf)

* [DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills](dmimic.md) 9 Apr 2018

* [TUNING RECURRENT NEURAL NETWORKS WITH REINFORCEMENT LEARNING](RLTUNER.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tigerneil/awesome-deep-rl

Awesome Lists containing this project

README