Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-offline-rl
An index of algorithms for offline reinforcement learning (offline-rl)
https://github.com/hanjuku-kaso/awesome-offline-rl
Last synced: 3 days ago
JSON representation
-
Papers
-
Offline RL: Theory/Methods
- Instabilities of Offline RL with Pre-Trained Neural Representation
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Critic-Guided Decision Transformer for Offline Reinforcement Learning
- CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning
- Neural Network Approximation for Pessimistic Offline Reinforcement Learning
- A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
- The Generalization Gap in Offline Reinforcement Learning
- Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills
- MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
- Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning
- Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
- Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
- Hierarchical Decision Transformer
- Prompt-Tuning Decision Transformer with Preference Ranking
- Context Shift Reduction for Offline Meta-Reinforcement Learning
- Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
- Score Models for Offline Goal-Conditioned Reinforcement Learning
- Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
- Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective
- Rethinking Decision Transformer via Hierarchical Reinforcement Learning
- Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
- GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
- SERA: Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning
- Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
- Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
- CROP: Conservative Reward for Model-based Offline Policy Optimization
- Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
- Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
- Boosting Continuous Control with Consistency Policy
- Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
- Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
- DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
- Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning
- Learning to Reach Goals via Diffusion
- Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making
- Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
- Reasoning with Latent Diffusion in Offline Reinforcement Learning
- Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
- Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
- Robust Offline Reinforcement Learning -- Certify the Confidence Interval
- Stackelberg Batch Policy Learning
- H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
- Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
- DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning
- Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration
- Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning
- Multi-Objective Decision Transformers for Offline Reinforcement Learning
- AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
- Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations
- PASTA: Pretrained Action-State Transformer Agents
- Towards A Unified Agent with Foundation Models
- Offline Reinforcement Learning with Imbalanced Datasets
- LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
- Elastic Decision Transformer
- Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
- Is RLHF More Difficult than Standard RL?
- Supervised Pretraining Can Learn In-Context Reinforcement Learning
- Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
- Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery
- CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
- Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
- Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning
- A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
- HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach
- Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration
- In-Sample Policy Iteration for Offline Reinforcement Learning
- Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning
- Offline Prioritized Experience Replay
- Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
- Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
- Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
- Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
- MADiff: Offline Multi-agent Learning with Diffusion Models
- Provable Offline Reinforcement Learning with Human Feedback
- Think Before You Act: Decision Transformers with Internal Working Memory
- Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning
- Offline Primal-Dual Reinforcement Learning for Linear MDPs
- Federated Offline Policy Learning with Heterogeneous Observational Data
- Offline Reinforcement Learning with Additional Covering Distributions
- Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems
- Federated Ensemble-Directed Offline Reinforcement Learning
- IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
- Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
- Reinforcement Learning from Passive Data via Latent Intentions
- Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- Batch Quantum Reinforcement Learning
- Accelerating exploration and representation learning with offline pre-training
- On Context Distribution Shift in Task Representation Learning for Offline Meta RL
- Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
- Learning Excavation of Rigid Objects with Offline Reinforcement Learning
- Goal-conditioned Offline Reinforcement Learning through State Space Partitioning
- Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies
- Deploying Offline Reinforcement Learning with Human Feedback
- Synthetic Experience Replay
- ENTROPY: Environment Transformer and Offline Policy Optimization
- Graph Decision Transformer
- Selective Uncertainty Propagation in Offline RL
- Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning
- Skill Decision Transformer
- Guiding Online Reinforcement Learning with Action-Free Offline Pretraining
- SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning
- APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning
- Designing an offline reinforcement learning objective from scratch
- Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning
- Learning to View: Decision Transformers for Active Object Detection
- Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning
- Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
- Contextual Conservative Q-Learning for Offline Reinforcement Learning
- Offline Policy Optimization in RL with Variance Regularizaton
- Transformer in Transformer as Backbone for Deep Reinforcement Learning
- Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
- Supported Value Regularization for Offline Reinforcement Learning
- Conservative State Value Estimation for Offline Reinforcement Learning
- Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
- Adversarial Model for Offline Reinforcement Learning
- Percentile Criterion Optimization in Offline Reinforcement Learning
- Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
- HIQL: Offline Goal-Conditioned RL with Latent States as Actions
- Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning
- Offline RL with Discrete Proxy Representations for Generalizability in POMDPs
- Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization
- Bi-Level Offline Policy Optimization with Limited Exploration
- Provably (More) Sample-Efficient Offline RL with Options
- Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
- AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
- Budgeting Counterfactual for Offline RL
- Efficient Diffusion Policies for Offline Reinforcement Learning
- Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
- Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
- Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
- Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
- Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
- On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond
- Conservative Offline Policy Adaptation in Multi-Agent Games
- Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
- Survival Instinct in Offline Reinforcement Learning
- Learning from Visual Observation via Offline Pretrained State-to-Go Transformer
- Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization
- Learning to Influence Human Behavior with Offline Reinforcement Learning
- Residual Q-Learning: Offline and Online Policy Customization without Value
- Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
- Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
- Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
- Corruption-Robust Offline Reinforcement Learning with General Function Approximation
- Learning to Modulate pre-trained Models in RL
- Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
- One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
- Mutual Information Regularized Offline Reinforcement Learning
- Offline RL With Heteroskedastic Datasets and Support Constraints
- Offline Reinforcement Learning with Differential Privacy
- Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
- Reining Generalization in Offline Reinforcement Learning via Representation Distinction
- VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
- SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
- Hierarchical Diffusion for Offline Decision Making
- MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
- Safe Offline Reinforcement Learning with Real-Time Budget Constraints
- Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints
- A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning
- Anti-Exploration by Random Network Distillation
- Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
- PASTA: Pessimistic Assortment Optimization
- Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
- Supported Trust Region Optimization for Offline Reinforcement Learning
- Principled Offline RL in the Presence of Rich Exogenous Information
- Efficient Online Reinforcement Learning with Offline Data
- Boosting Offline Reinforcement Learning with Action Preference Query
- Model-based Offline Reinforcement Learning with Count-based Conservatism
- Constrained Decision Transformer for Offline Safe Reinforcement Learning
- Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
- Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
- What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?
- Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
- MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
- Distance Weighted Supervised Learning for Offline Interaction Data
- Masked Trajectory Models for Prediction, Representation, and Control
- Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
- Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models
- Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
- Future-conditioned Unsupervised Pretraining for Decision Transformer
- PAC-Bayesian Offline Contextual Bandits With Guarantees
- Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
- Jump-Start Reinforcement Learning - rl.github.io/)]
- Learning Temporally AbstractWorld Models without Online Experimentation
- A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback
- Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation
- Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
- Actor-Critic Alignment for Offline-to-Online Reinforcement Learning
- Leveraging Offline Data in Online Reinforcement Learning
- Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
- Offline Learning in Markov Games with General Function Approximation
- Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
- Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL
- Confidence-Conditioned Value Functions for Offline Reinforcement Learning
- Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes - offlinerl/home)]
- Is Conditional Generative Modeling all you need for Decision-Making? - diffuser/)]
- Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
- Extreme Q-Learning: MaxEnt RL without Entropy
- Dichotomy of Control: Separating What You Can Control from What You Cannot
- From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
- Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
- The In-Sample Softmax for Offline Reinforcement Learning
- VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training - rl)] [[code](https://github.com/facebookresearch/vip)]
- Does Zero-Shot Reinforcement Learning Exist?
- Behavior Prior Representation learning for Offline Reinforcement Learning
- Mind the Gap: Offline Policy Optimization for Imperfect Rewards
- Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
- User-Interactive Offline Reinforcement Learning
- Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data
- Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
- Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
- Efficient Offline Policy Optimization with a Learned Model
- Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
- In-sample Actor Critic for Offline Reinforcement Learning
- Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
- Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
- Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling
- Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
- Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
- Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
- Hyper-Decision Transformer for Efficient Online Policy Adaptation
- Efficient Planning in a Compact Latent Action Space
- Preference Transformer: Modeling Human Preferences using Transformers for RL - transformer)]
- Behavior Proximal Policy Optimization
- The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning
- Decision Transformer under Random Frame Dropping
- Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
- Finetuning Offline World Models in the Real World
- On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples
- Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
- Safe Policy Improvement for POMDPs via Finite-State Controllers
- Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning
- On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
- Contrastive Example-Based Control
- Curriculum Offline Reinforcement Learning
- Offline Reinforcement Learning with On-Policy Q-Function Regularization
- Model-based Offline Policy Optimization with Adversarial Network
- Efficient experience replay architecture for offline reinforcement learning
- Automatic Trade-off Adaptation in Offline RL
- Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling
- Latent Variable Representation for Reinforcement Learning
- Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning
- State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning
- Masked Autoencoding for Scalable and Generalizable Decision Making
- Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning
- Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
- Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
- Model-based Trajectory Stitching for Improved Offline Reinforcement Learning
- Offline Reinforcement Learning with Adaptive Behavior Regularization
- Contextual Transformer for Offline Meta Reinforcement Learning
- Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning
- ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
- Contrastive Value Learning: Implicit Models for Simple Offline RL
- Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
- Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information
- Provable Safe Reinforcement Learning with Binary Feedback
- Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision
- Implicit Offline Reinforcement Learning via Supervised Learning
- Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation
- Boosting Offline Reinforcement Learning via Data Rebalancing
- ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning - nd/cwbc)]
- State Advantage Weighting for Offline RL
- Blessing from Experts: Super Reinforcement Learning in Confounded Environments
- DCE: Offline Reinforcement Learning With Double Conservative Estimates
- On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning
- Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes
- Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation
- C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning
- Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
- AdaCat: Adaptive Categorical Discretization for Autoregressive Models
- Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning
- Offline Reinforcement Learning at Multiple Frequencies - nstep-returns/)]
- General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States
- Behavior Transformers: Cloning k modes with one stone
- Contrastive Learning as Goal-Conditioned Reinforcement Learning
- Federated Offline Reinforcement Learning
- Provable Benefit of Multitask Representation Learning in Reinforcement Learning
- Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward
- Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games
- Offline Reinforcement Learning with Causal Structured World Models
- Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL
- Byzantine-Robust Online and Offline Distributed Reinforcement Learning
- Model Generation with Provable Coverability for Offline Reinforcement Learning
- You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments
- Multi-Game Decision Transformers
- Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
- No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL
- How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
- Offline Visual Representation Learning for Embodied Navigation
- Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
- BATS: Best Action Trajectory Stitching
- Settling the Sample Complexity of Model-Based Offline Reinforcement Learning
- PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations
- Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps
- Meta Reinforcement Learning for Adaptive Control: An Offline Approach
- The Efficacy of Pessimism in Asynchronous Q-Learning
- Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation
- A Regularized Implicit Policy for Offline Reinforcement Learning
- Reinforcement Learning in Possibly Nonstationary Environments - RL)]
- Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
- VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
- Retrieval-Augmented Reinforcement Learning
- Online Decision Transformer
- Transferred Q-learning
- Settling the Communication Complexity for Distributed Offline Reinforcement Learning
- Offline Reinforcement Learning with Realizability and Single-policy Concentrability
- Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL
- Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning
- Can Wikipedia Help Offline Reinforcement Learning?
- MOORe: Model-based Offline-to-Online Reinforcement Learning
- Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning
- Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning
- Single-Shot Pruning for Offline Reinforcement Learning
- Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations - rl)] [[code](https://github.com/albertwilcox/mcac)]
- Data-Driven Offline Decision-Making via Invariant Representation Learning
- Bellman Residual Orthogonalization for Offline Reinforcement Learning
- A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
- RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
- On Gap-dependent Bounds for Offline Reinforcement Learning
- Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
- Supported Policy Optimization for Offline Reinforcement Learning
- When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
- Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters
- When does return-conditioned supervised learning work for offline reinforcement learning?
- RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- When is Offline Two-Player Zero-Sum Markov Game Solvable?
- Bidirectional Learning for Offline Infinite-width Model-based Optimization
- Mildly Conservative Q-Learning for Offline Reinforcement Learning
- Bootstrapped Transformer for Offline Reinforcement Learning
- LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
- Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
- Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
- Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
- Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression
- Dual Generator Offline Reinforcement Learning
- MoCoDA: Model-based Counterfactual Data Augmentation
- A Policy-Guided Imitation Approach for Offline Reinforcement Learning
- A Unified Framework for Alternating Offline Model Training and Policy Learning
- Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
- S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
- ASPiRe:Adaptive Skill Priors for Reinforcement Learning
- Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
- Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
- Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
- Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
- Offline RL Policies Should be Trained to be Adaptive
- Adversarially Trained Actor Critic for Offline Reinforcement Learning
- Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
- How to Leverage Unlabeled Data in Offline Reinforcement Learning
- Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
- Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
- Offline Meta-Reinforcement Learning with Online Self-Supervision
- Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching
- Constrained Offline Policy Optimization
- Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations
- Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
- Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
- Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
- Prompting Decision Transformer for Few-Shot Policy Generalization
- Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
- On the Role of Discount Factor in Offline Reinforcement Learning
- Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
- Representation Learning for Online and Offline RL in Low-rank MDPs - y8s)]
- Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage - NqpQs)]
- Revisiting Design Choices in Model-Based Offline Reinforcement Learning
- DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
- COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
- POETREE: Interpretable Policy Learning with Adaptive Decision Trees
- Planning in Stochastic Environments with a Learned Model
- Offline Reinforcement Learning with Value-based Episodic Memory
- When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
- Learning Value Functions from Undirected State-only Experience
- Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL
- Offline Reinforcement Learning with Implicit Q-Learning
- RvS: What is Essential for Offline RL via Supervised Learning?
- Pareto Policy Pool for Model-based Offline Reinforcement Learning
- CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning
- COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
- DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
- Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning
- Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
- Generalized Decision Transformer for Offline Hindsight Information Matching
- Model-Based Offline Meta-Reinforcement Learning with Regularization
- AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale
- Dealing with the Unknown: Pessimistic Offline Reinforcement Learning
- You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL
- S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning
- A Workflow for Offline Model-Free Robotic Reinforcement Learning - rl-workflow)]
- Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes - our-way-to-more-general-robots)] [[video](https://www.youtube.com/watch?v=BxOKPEtMuZw)] [[code](https://github.com/deepmind/rgb_stacking)]
- Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions
- Offline Reinforcement Learning with Representations for Actions
- Towards Off-Policy Learning for Ranking Policies with Logged Feedback
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets
- Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
- Model Selection in Batch Policy Optimization
- Learning Contraction Policies from Offline Data
- CoMPS: Continual Meta Policy Search
- MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance
- Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks
- Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
- Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation - k)]
- UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning
- Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning
- Batch Reinforcement Learning from Crowds
- SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning
- Safely Bridging Offline and Online Reinforcement Learning
- Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information
- Value Penalized Q-Learning for Recommender Systems
- Offline Reinforcement Learning with Soft Behavior Regularization
- Planning from Pixels in Environments with Combinatorially Hard Search Spaces
- StARformer: Transformer with State-Action-Reward Representations
- Offline RL With Resource Constrained Online Deployment - OfflineRL)]
- Lifelong Robotic Reinforcement Learning by Retaining Experiences - experience/)]
- Dual Behavior Regularized Reinforcement Learning
- DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning - curr/home)] [[code](https://github.com/DanielTakeshi/DCUR)]
- DROMO: Distributionally Robust Offline Model-based Policy Optimization
- Implicit Behavioral Cloning
- Reducing Conservativeness Oriented Offline Reinforcement Learning
- Policy Gradients Incorporating the Future
- Offline Decentralized Multi-Agent Reinforcement Learning
- OPAL: Offline Preference-Based Apprenticeship Learning - prefs)]
- Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning
- Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning
- The Least Restriction for Offline Reinforcement Learning
- Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
- Causal Reinforcement Learning using Observational and Interventional Data
- On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
- Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL
- On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
- Offline Reinforcement Learning as Anti-Exploration
- Corruption-Robust Offline Reinforcement Learning
- Offline Inverse Reinforcement Learning
- Heuristic-Guided Reinforcement Learning
- Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- Model-Based Offline Planning with Trajectory Pruning
- InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem
- Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
- MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale - opt/)]
- Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)
- Regularized Behavior Value Estimation
- Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
- Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
- GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
- MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning
- Continuous Doubly Constrained Batch Reinforcement Learning
- Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
- Fast Rates for the Regret of Offline Reinforcement Learning - 2JU9zKE)]
- Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment - MxJQTKA)]
- Weighted Model Estimation for Offline Model-based Reinforcement Learning
- A Minimalist Approach to Offline Reinforcement Learning
- Conservative Offline Distributional Reinforcement Learning
- Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
- Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
- Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
- Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
- Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
- Offline Reinforcement Learning with Reverse Model-based Imagination
- Nearly Horizon-Free Offline Reinforcement Learning
- Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
- Online and Offline Reinforcement Learning by Planning with a Learned Model
- Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
- Offline RL Without Off-Policy Evaluation
- Offline Model-based Adaptable Policy Learning
- COMBO: Conservative Offline Model-Based Policy Optimization
- PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
- Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
- Bellman-consistent Pessimism for Offline Reinforcement Learning
- The Difficulty of Passive Learning in Deep Reinforcement Learning
- Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
- Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
- Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills - models.github.io/)]
- Is Pessimism Provably Efficient for Offline RL?
- Representation Matters: Offline Pretraining for Sequential Decision Making
- Offline Reinforcement Learning with Pseudometric Learning
- Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
- Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
- Offline Reinforcement Learning with Fisher Divergence Critic Regularization
- OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
- Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
- Vector Quantized Models for Planning
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
- Offline Meta-Reinforcement Learning with Advantage Weighting
- Model-Based Offline Planning
- Batch Reinforcement Learning Through Continuation Method
- Model-Based Visual Planning with Self-Supervised Functional Distances
- Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
- DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
- What are the Statistical Limits of Offline RL with Linear Function Approximation?
- Reset-Free Lifelong Learning with Skill-Space Planning - free-lifelong-learning)]
- Risk-Averse Offline Reinforcement Learning
- Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
- PAC-Bayesian Policy Evaluation for Reinforcement Learning
- Tree-Based Batch Mode Reinforcement Learning
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Efficient Self-Supervised Data Collection for Offline Robot Learning
- Boosting Offline Reinforcement Learning with Residual Generative Modeling
- BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning
- Behavior Constraining in Weight Space for Offline Reinforcement Learning
- Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents
- Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
- Reinforcement Learning via Fenchel-Rockafellar Duality - research/dice_rl)]
- Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
- Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains
- Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion - batch-rl-locomotion)]
- Semi-Supervised Reward Learning for Offline Reinforcement Learning
- Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
- A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
- Offline Reinforcement Learning from Images with Latent Space Models
- POPO: Pessimistic Offline Policy Optimization
- Reinforcement Learning with Videos: Combining Offline Observations with Interaction
- Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones - rl/)]
- Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
- Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
- OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning - iclr)]
- Batch Value-function Approximation with Only Realizability
- DRIFT: Deep Reinforcement Learning for Functional Software Testing
- Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
- Learning Dexterous Manipulation from Suboptimal Experts
- The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line
- Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
- Offline Meta Learning of Exploration
- Hyperparameter Selection for Offline Reinforcement Learning
- Interpretable Control by Reinforcement Learning
- Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning
- DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
- Critic Regularized Regression
- Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
- Conservative Q-Learning for Offline Reinforcement Learning - offline-rl)] [[code](https://github.com/aviralkumar2907/CQL)] [[blog](https://bair.berkeley.edu/blog/2020/12/07/offline/)]
- BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
- MOPO: Model-based Offline Policy Optimization
- MOReL: Model-Based Offline Reinforcement Learning - model-based-offline-reinforcement-learning-with-aravind-rajeswaran/)]
- Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
- Multi-task Batch Reinforcement Learning with Metric Learning
- Safe Policy Improvement with Baseline Bootstrapping
- Information-Theoretic Considerations in Batch Reinforcement Learning
- Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents
- Counterfactual Data Augmentation using Locally Factored Dynamics
- On Reward-Free Reinforcement Learning with Linear Function Approximation
- Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
- BRPO: Batch Residual Policy Optimization
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
- COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning - rl)] [[blog](https://bair.berkeley.edu/blog/2020/12/07/offline/)] [[code](https://github.com/avisingh599/cog)]
- Accelerating Reinforcement Learning with Learned Skill Priors
- PLAS: Latent Action Space for Offline Reinforcement Learning - policy)] [[code](https://github.com/Wenxuan-Zhou/PLAS)]
- Scaling data-driven robotics with reward sketching and batch reinforcement learning - driven-robotics/)]
- Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
- Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration
- Behavior Regularized Offline Reinforcement Learning
- Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift
- Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
- AlgaeDICE: Policy Gradient from Arbitrary Experience
- Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction - off-policyrl)] [[blog](https://bair.berkeley.edu/blog/2019/12/05/bear/)] [[code](https://github.com/aviralkumar2907/BEAR)]
- Off-Policy Deep Reinforcement Learning without Exploration
- Safe Policy Improvement with Soft Baseline Bootstrapping
- Importance Weighted Transfer of Samples in Reinforcement Learning
- Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
- Off-Policy Policy Gradient with State Distribution Correction
- Behavioral Cloning from Observation
- Diverse Exploration for Fast and Safe Policy Improvement
- Deep Exploration via Bootstrapped DQN
- Safe Policy Improvement by Minimizing Robust Baseline Regret
- Residential Demand Response Applications Using Batch Reinforcement Learning
- Structural Return Maximization for Reinforcement Learning
- Simultaneous Perturbation Algorithms for Batch Off-Policy Search
- Guided Policy Search
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Accelerating Online Reinforcement Learning with Offline Datasets
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Offline Contextual Bandits with Overparameterized Models
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Value-Aided Conditional Supervised Learning for Offline RL
- Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
- DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
- Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
- Context-Former: Stitching via Latent Conditioned Sequence Modeling
- Adversarially Trained Actor Critic for offline CMDPs
- Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
- Solving Continual Offline Reinforcement Learning with Decision Transformer
- MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
- Reframing Offline Reinforcement Learning as a Regression Problem
- Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
- Policy-regularized Offline Multi-objective Reinforcement Learning
- Differentiable Tree Search in Latent State Space
- Learning from Sparse Offline Datasets via Conservative Density Estimation
- Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
- PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning
- SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
- On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Offline Reinforcement Learning as One Big Sequence Modeling Problem
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Pessimism for Offline Linear Contextual Bandits using ℓp Confidence Sets
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Curriculum Offline Reinforcement Learning
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Distance-Sensitive Offline Reinforcement Learning
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Exploiting Reward Shifting in Value-Based Deep RL
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Revisiting the Minimalist Approach to Offline Reinforcement Learning
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Robust Reinforcement Learning using Offline Data
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
- Safe Offline Reinforcement Learning Through Hierarchical Policies
-
Review/Survey/Position Papers
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
- A Survey on Offline Model-Based Reinforcement Learning
- Foundation Models for Decision Making: Problems, Methods, and Opportunities
- A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- A Review of Off-Policy Evaluation in Reinforcement Learning
- On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
- Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization
- Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives
- A Survey on Transformers in Reinforcement Learning
- Deep Reinforcement Learning: Opportunities and Challenges
- A Survey on Model-based Reinforcement Learning
- Survey on Fair Reinforcement Learning: Theory and Practice
- Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
- A Survey of Generalisation in Deep Reinforcement Learning
-
Offline RL: Applications
- Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
- Multi-objective Optimization of Notifications Using Offline Reinforcement Learning
- Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning
- GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems
- RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System
- Compressive Features in Offline Reinforcement Learning for Recommender Systems
- Causal-aware Safe Policy Improvement for Task-oriented dialogue
- Offline Contextual Bandits for Wireless Network Optimization
- Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment
- Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement
- Medical Dead-ends and Learning to Identify High-risk States and Treatments
- An Offline Deep Reinforcement Learning for Maintenance Decision-Making
- Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
- Offline Reinforcement Learning for Visual Navigation
- Semi-Markov Offline Reinforcement Learning for Healthcare
- Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs
- Offline reinforcement learning with uncertainty for treatment strategies in sepsis
- Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
- Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles
- pH-RL: A personalization architecture to bring reinforcement learning to health practice
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning - zhan)]
- Personalization for Web-based Services using Offline Reinforcement Learning
- Safe Driving via Expert Guided Policy Optimization
- A General Offline Reinforcement Learning Framework for Interactive Recommendation
- Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms
- Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
- Learning robust driving policies without online exploration
- Network Intrusion Detection Based on Extended RBF Neural Network With Offline Reinforcement Learning
- Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation
- Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
- An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
- Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
- End-to-end Offline Reinforcement Learning for Glycemia Control
- Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments
- Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
- Uncertainty-Aware Decision Transformer for Stochastic Driving Environments
- Advancing RAN Slicing with Offline Reinforcement Learning
- Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach
- Self-Driving Telescopes: Autonomous Scheduling of Astronomical Observation Campaigns with Offline Reinforcement Learning
- A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning
- Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets
- STEER: Unified Style Transfer with Expert Reinforcement
- Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
- Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning
- Offline Reinforcement Learning for Optimizing Production Bidding Policies
- Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills
- Robotic Offline RL from Internet Videos via Value-Function Pre-Training
- VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
- RLSynC: Offline-Online Reinforcement Learning for Synthon Completion
- Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World
- Reinforced Self-Training (ReST) for Language Modeling
- Aligning Language Models with Offline Reinforcement Learning from Human Feedback
- Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation
- Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills
- Improving Offline RL by Blending Heuristics
- IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
- Robust Reinforcement Learning Objectives for Sequential Recommender Systems
- The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
- Remote Electrical Tilt Optimization via Safe Reinforcement Learning
- An Optimistic Perspective on Offline Reinforcement Learning - rl.github.io/)] [[blog](https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html)]
- Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
- Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
- Human-centric Dialog Training via Offline Reinforcement Learning
- Batch Reinforcement Learning on the Industrial Benchmark: First Experiences
- Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
- Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
- Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
- Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning
- A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions
- Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
- End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient
- Policy Networks with Two-Stage Training for Dialogue Systems
- Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
- PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning
- Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure
- Offline Experience Replay for Continual Offline Reinforcement Learning
- Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning
- Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning
- User Retention-oriented Recommendation with Decision Transformer
- Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning
- INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic Learning and Search
- Learning-based MPC from Big Data Using Reinforcement Learning
- Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
- Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings
- Winning Solution of Real Robot Challenge III
- Beyond Reward: Offline Preference-guided Policy Optimization
- Dialog Action-Aware Transformer for Dialog Policy Learning
- Can Offline Reinforcement Learning Help Natural Language Understanding?
- NeurIPS 2022 Competition: Driving SMARTS
- Controlling Commercial Cooling Systems Using Reinforcement Learning
- Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials
- Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning
- Learning-to-defer for sequential medical decision-making under uncertainty
- Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios
- DevFormer: A Symmetric Transformer for Context-Aware Device Placement
- On the Effectiveness of Offline RL for Dialogue Response Generation
- Bidirectional Learning for Offline Model-based Biological Sequence Design
- ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
- Semi-Offline Reinforcement Learning for Optimized Text Generation
- Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
- Offline RL for Natural Language Generation with Implicit Language Q Learning
- Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
- Dialogue Evaluation with Offline Reinforcement Learning
- Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems
- A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning
- BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion
- Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space
- Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
- ARLO: A Framework for Automated Reinforcement Learning
- A Reinforcement Learning-based Volt-VAR Control Dataset and Testing Environment
- CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning
- Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes - glucose)]
- CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System - codes)]
- A Conservative Q-Learning approach for handling distribution shift in sepsis treatment strategies
- Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning
- Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit
- Offline Reinforcement Learning for Mobile Notifications
- Offline Reinforcement Learning for Road Traffic Control
- Sustainable Online Reinforcement Learning for Auto-bidding
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
- P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer
- Online Symbolic Music Alignment with Offline Reinforcement Learning
- BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market
-
Offline RL: Benchmarks/Experiments
- Real World Offline Reinforcement Learning with Realistic Data Source - orl)] [[dataset](https://drive.google.com/drive/folders/1nyMPlbwkjsJ_FyMwVp9ynOvz_ykGtbA8)]
- Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets
- B2RL: An open-source Dataset for Building Batch Reinforcement Learning
- An Empirical Study of Implicit Regularization in Deep Offline RL
- Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations
- ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning
- Pearl: A Production-ready Reinforcement Learning Agent
- LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
- Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
- Datasets and Benchmarks for Offline Safe Reinforcement Learning
- Improving and Benchmarking Offline Reinforcement Learning Algorithms
- Benchmarks and Algorithms for Offline Preference-Based Reward Learning
- Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
- CORL: Research-oriented Deep Offline Reinforcement Learning Library - team/CORL)]
- Benchmarking Offline Reinforcement Learning on Real-Robot Hardware - learning/trifinger_rl_datasets)]
- Train Offline, Test Online: A Real Robot Learning Benchmark
- Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation
- Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning
- The Challenges of Exploration for Offline Reinforcement Learning
- Offline Equilibrium Finding
- Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
- Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
- Dungeons and Data: A Large-Scale NetHack Dataset
- NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning
- A Closer Look at Offline RL Agents
- Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
- On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
- Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
- d3rlpy: An Offline Deep Reinforcement Learning Library
- Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning - jku/OfflineRL)]
- Interpretable performance analysis towards offline reinforcement learning: A dataset perspective
- Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
- RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning - research/rlds)]
- Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning
- Offline Reinforcement Learning Hands-On
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning - berkeley/d4rl)]
- RL Unplugged: Benchmarks for Offline Reinforcement Learning - research/tree/master/rl_unplugged)] [[dataset](https://console.cloud.google.com/storage/browser/rl_unplugged?pli=1)]
- Benchmarking Batch Deep Reinforcement Learning Algorithms
-
Off-Policy Evaluation and Learning: Theory/Methods
- Off-Policy Evaluation of Slate Policies under Bayes Risk
- A Practical Guide of Off-Policy Evaluation for Bandit Problems
- Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
- Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
- Anytime-valid off-policy inference for contextual bandits
- Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency
- Off-Policy Evaluation in Embedded Spaces
- Safe Exploration for Efficient Policy Evaluation and Comparison
- High-Confidence Off-Policy (or Counterfactual) Variance Estimation
- Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
- Multiply Robust Off-policy Evaluation and Learning under Truncation by Death
- Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
- Policy-Adaptive Estimator Selection for Off-Policy Evaluation
- Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution
- Distributional Off-Policy Evaluation for Slate Recommendations
- Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits
- Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces
- Offline Policy Evaluation with Out-of-Sample Guarantees
- Quantile Off-Policy Evaluation via Deep Conditional Generative Learning
- Debiased Off-Policy Evaluation for Recommendation Systems
- Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias
- Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
- Control Variates for Slate Off-Policy Evaluation
- Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
- Optimal Off-Policy Evaluation from Multiple Logging Policies
- Off-policy Confidence Sequences
- Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting
- Off-Policy Evaluation Using Information Borrowing and Context-Based Switching
- Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation
- Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
- Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits
- Off-Policy Risk Assessment in Contextual Bandits
- Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model - cascade-dr)]
- Off-Policy Evaluation for Large Action Spaces via Embeddings - mips)] [[video](https://youtu.be/Hrqhv-AsMRE)]
- Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
- Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
- Conformal Off-Policy Prediction in Contextual Bandits
- Off-Policy Evaluation with Policy-Dependent Optimization Response
- Off-Policy Evaluation with Deficient Support Using Side Information
- Towards Robust Off-Policy Evaluation via Human Inputs
- Pessimistic Model Selection for Offline Deep Reinforcement Learning
- Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
- Off-Policy Evaluation in Partially Observed Markov Decision Processes
- A Spectral Approach to Off-Policy Evaluation for POMDPs
- Projected State-action Balancing Weights for Offline Reinforcement Learning
- Active Offline Policy Selection
- Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
- Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
- Doubly robust off-policy evaluation with shrinkage
- Adaptive Estimator Selection for Off-Policy Evaluation
- Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
- Improving Offline Contextual Bandits with Distributional Robustness
- Balanced Off-Policy Evaluation in General Action Spaces
- Policy Evaluation with Latent Confounders via Optimal Balance
- On the Design of Estimators for Bandit Off-Policy Evaluation
- CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
- Policy Evaluation and Optimization with Continuous Treatments
- Confounding-Robust Policy Improvement
- Balanced Policy Evaluation and Learning
- Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
- Off-policy Evaluation for Slate Recommendation
- Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
- Data-Efficient Policy Evaluation Through Behavior Policy Search
- Doubly Robust Policy Evaluation and Optimization
- Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
- Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
- State-Action Similarity-Based Representations for Off-Policy Evaluation
- Off-Policy Evaluation for Human Feedback
- Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
- An Instrumental Variable Approach to Confounded Off-Policy Evaluation
- Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes
- Distributional Offline Policy Evaluation with Predictive Error Guarantees
- The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
- Revisiting Bellman Errors for Offline Model Selection
- Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
- Multiple-policy High-confidence Policy Evaluation
- Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments
- Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
- Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
- When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
- Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
- Variational Latent Branching Model for Off-Policy Evaluation
- Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
- Evaluation of Active Feature Acquisition Methods for Static Feature Settings
- Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
- Marginalized Importance Sampling for Off-Environment Policy Evaluation
- Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments
- Off-policy Evaluation in Doubly Inhomogeneous Environments
- Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
- π2vec : Policy Representations with Successor Features
- Conformal Off-Policy Evaluation in Markov Decision Processes
- Hallucinated Adversarial Control for Conservative Offline Policy Evaluation
- Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
- Minimax Weight Learning for Absorbing MDPs
- Improving Monte Carlo Evaluation with Offline Data
- First-order Policy Optimization for Robust Policy Evaluation
- A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
- On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
- Learning Bellman Complete Representations for Offline Policy Evaluation
- Supervised Off-Policy Ranking
- Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
- Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
- Oracle Inequalities for Model Selection in Offline Reinforcement Learning
- Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models
- Off-Policy Evaluation for Action-Dependent Non-stationary Environments
- Stateful Offline Contextual Policy Evaluation and Learning
- Off-Policy Risk Assessment for Markov Decision Processes
- Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information
- Offline Policy Evaluation and Optimization under Confounding
- Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
- Safe Evaluation For Offline Learning: Are We Ready To Deploy?
- Low Variance Off-policy Evaluation with State-based Importance Sampling
- Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach
- Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency
- Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
- A Sharp Characterization of Linear Estimators for Offline Policy Evaluation
- A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets
- A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation
- SOPE: Spectrum of Off-Policy Estimators
- Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
- Variance-Aware Off-Policy Evaluation with Linear Function Approximation
- Universal Off-Policy Evaluation
- Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
- Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
- State Relevance for Off-Policy Evaluation
- Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
- Deeply-Debiased Off-Policy Interval Estimation
- Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
- Minimax Model Learning
- Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
- On Instrumental Variable Regression for Deep Offline Policy Evaluation
- Average-Reward Off-Policy Policy Evaluation with Function Approximation
- Sequential causal inference in a single world of connected units
- Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
- CoinDICE: Off-Policy Confidence Interval Estimation
- Off-Policy Interval Estimation with Lipschitz Value Iteration
- Off-Policy Evaluation via the Regularized Lagrangian
- Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
- GenDICE: Generalized Offline Estimation of Stationary Values
- Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
- Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
- Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
- GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
- Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
- Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
- Minimax Weight and Q-Function Learning for Off-Policy Evaluation
- Accountable Off-Policy Evaluation With Kernel Bellman Statistics
- Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
- Batch Stationary Distribution Estimation
- Offline Policy Selection under Uncertainty
- Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
- Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
- Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning
- Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Off-Policy Evaluation in Partially Observable Environments
- Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
- Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
- Off-Policy Evaluation via Off-Policy Classification
- DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections - research/dice_rl)]
- Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
- Batch Policy Learning under Constraints - batch-policy-learn/)]
- More Efficient Off-Policy Evaluation through Regularized Targeted Learning
- Combining parametric and nonparametric models for off-policy evaluation
- Generalizing Off-Policy Learning under Sample Selection Bias
- Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
- Importance Sampling Policy Evaluation with an Estimated Behavior Policy
- Representation Balancing MDPs for Off-policy Policy Evaluation
- Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
- More Robust Doubly Robust Off-policy Evaluation
- Importance Sampling for Fair Policy Selection
- Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
- Consistent On-Line Off-Policy Evaluation
- Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
- Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
- High Confidence Policy Improvement
- High Confidence Off-Policy Evaluation
- Sequential Counterfactual Risk Minimization
- Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
- Multi-Task Off-Policy Learning from Bandit Feedback
- Exponential Smoothing for Off-Policy Learning
- Counterfactual Learning with General Data-generating Policies
- Distributionally Robust Policy Gradient for Offline Contextual Bandits
- Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
- Pessimistic Off-Policy Multi-Objective Optimization
- Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
- Uncertainty-Aware Off-Policy Learning
- Fair Off-Policy Learning from Observational Data
- Interpretable Off-Policy Learning via Hyperbox Search
- Offline Policy Optimization with Eligible Actions
- Towards Robust Off-policy Learning for Runtime Uncertainty
- Safe Optimal Design with Applications in Off-Policy Learning
- Distributionally Robust Policy Learning with Wasserstein Distance
- Local Policy Improvement for Recommender Systems
- Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality
- Fast Offline Policy Optimization for Large Scale Recommendation
- Boosted Off-Policy Learning
- Semi-Counterfactual Risk Minimization Via Neural Networks
- IMO^3: Interactive Multi-Objective Off-Policy Optimization
- Pessimistic Off-Policy Optimization for Learning to Rank
- Non-Stationary Off-Policy Optimization
- Learning from eXtreme Bandit Feedback
- Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values
- Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
- From Importance Sampling to Doubly Robust Policy Gradient
- Efficient Policy Learning from Surrogate-Loss Classification Reductions
- More Efficient Policy Learning via Optimal Retargeting
- Learning When-to-Treat Policies
- Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
- Counterfactual Learning of Continuous Stochastic Policies
- Top-K Off-Policy Correction for a REINFORCE Recommender System
- Semi-Parametric Efficient Policy Learning with Continuous Actions
- Efficient Counterfactual Learning from Bandit Feedback
- Deep Learning with Logged Bandit Feedback
- The Self-Normalized Estimator for Counterfactual Learning
- Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
- Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control
- Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
- Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits
- Distributional Off-policy Evaluation with Bellman Residual Minimization
-
Off-Policy Evaluation and Learning: Applications
- Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
- HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
- When is Off-Policy Evaluation Useful? A Data-Centric Perspective
- Counterfactual Evaluation of Peer-Review Assignment Policies
- Balanced Off-Policy Evaluation for Personalized Pricing
- Multi-Action Dialog Policy Learning from Logged User Feedback
- CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
- Reward Shaping for User Satisfaction in a REINFORCE Recommender
- Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service
- Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
- Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
- Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
- Evaluating Reinforcement Learning Algorithms in Observational Health Settings
- Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
-
Off-Policy Evaluation and Learning: Benchmarks/Experiments
- Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
- SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
- Offline Policy Comparison with Confidence: Benchmarks and Baselines
- Extending Open Bandit Pipeline to Simulate Industry Challenges
- Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation - tech/zr-obp)] [[public dataset](https://research.zozo.com/data.html)]
- Evaluating the Robustness of Off-Policy Evaluation
- Benchmarks for Deep Off-Policy Evaluation - research/deep_ope)]
- Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning - tools)]
-
-
Open Source Software/Implementations
-
Off-Policy Evaluation and Learning: Applications
- NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning
- The Industrial Benchmark Offline RL Datasets
- SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection - rl.readthedocs.io/en/latest/)]
- Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation - obp.readthedocs.io/en/latest/index.html)] [[dataset](https://research.zozo.com/data.html)]
- pyIEOE: Towards An Interpretable Evaluation for Offline Evaluation
- d3rlpy: An Offline Deep Reinforcement Learning Library
- MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning - ui.readthedocs.io/en/v0.20/)]
- COBS: Caltech OPE Benchmarking Suite
- Minari
- CORL: Clean Offline Reinforcement Learning
- Benchmarks for Deep Off-Policy Evaluation
- DICE: The DIstribution Correction Estimation Library
- V-D4RL: Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations
- Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
- RLDS: Reinforcement Learning Datasets
- OEF: Offline Equilibrium Finding
- ExORL: Exploratory Data for Offline Reinforcement Learning
- RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System
- ARLO: A Framework for Automated Reinforcement Learning
- RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
- MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces - gym.readthedocs.io/en/latest/)]
- A Reinforcement Learning-based Volt-VAR Control Dataset
-
-
Blog/Podcast
-
Blog
- Introducing completely free datasets for data-driven deep reinforcement learning
- Counterfactual Evaluation for Recommendation Systems
- Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications
- AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
- D4RL: Building Better Benchmarks for Offline Reinforcement Learning
- Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?
- Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Offline (Batch) Reinforcement Learning: A Review of Literature and Applications
- Data-Driven Deep Reinforcement Learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Tackling Open Challenges in Offline Reinforcement Learning
- An Optimistic Perspective on Offline Reinforcement Learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
- Introducing completely free datasets for data-driven deep reinforcement learning
-
Podcast
- AI Trends 2023: Reinforcement Learning – RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine
- Bandits and Simulators for Recommenders with Olivier Jeunen
- Sergey Levine on Robot Learning & Offline RL
- Off-Line, Off-Policy RL for Real-World Decision Making at Facebook
- Xianyuan Zhan | TalkRL: The Reinforcement Learning Podcast
- MOReL: Model-Based Offline Reinforcement Learning with Aravind Rajeswaran
- Trends in Reinforcement Learning with Chelsea Finn
- Nan Jiang | TalkRL: The Reinforcement Learning Podcast
- Scott Fujimoto | TalkRL: The Reinforcement Learning Podcast
-
-
Related Workshops
-
Podcast
- CONSEQUENCES (RecSys 2023)
- CONSEQUENCES + REVEAL (RecSys 2022)
- Reinforcement Learning Day 2021
- Offline Reinforcement Learning (NeurIPS 2020)
- Reinforcement Learning from Batch Data and Simulation
- Reinforcement Learning for Real Life (RL4RealLife 2020)
- Safety and Robustness in Decision Making (NeurIPS 2019)
- Reinforcement Learning for Real Life (ICML 2019)
- Real-world Sequential Decision Making (ICML 2019)
- Offline Reinforcement Learning (NeurIPS 2022)
- Offline Reinforcement Learning (NeurIPS 2021)
- Reinforcement Learning for Real Life (ICML 2021)
-
-
Tutorials/Talks/Lectures
-
Podcast
- Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
- Representation Learning for Online and Offline RL in Low-rank MDPs
- Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
- Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment
- Deep Reinforcement Learning with Real-World Data
- Planning with Reinforcement Learning
- Imitation learning vs. offline reinforcement learning
- Tutorial on the Foundations of Offline Reinforcement Learning
- Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances
- Offline Reinforcement Learning
- Offline Reinforcement Learning
- Fast Rates for the Regret of Offline Reinforcement Learning
- Bellman-consistent Pessimism for Offline Reinforcement Learning
- Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
- Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
- Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
- Is Pessimism Provably Efficient for Offline RL?
- Adaptive Estimator Selection for Off-Policy Evaluation
- What are the Statistical Limits of Offline RL with Linear Function Approximation?
- Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
- A Gentle Introduction to Offline Reinforcement Learning
- Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation
- Offline Reinforcement Learning: Incorporating Knowledge from Data into RL
- Offline RL
- Learning a Multi-Agent Simulator from Offline Demonstrations
- Towards Reliable Validation and Evaluation for Offline RL
- Batch RL Models Built for Validation
- Offline Reinforcement Learning: From Algorithms to Practical Challenges
- Data Scalability for Robot Learning
- Statistically Efficient Offline Reinforcement Learning
- Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning
- Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
- Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry
- Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings
- Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning
- Scaling Probabilistically Safe Learning to Robotics
- Deep Reinforcement Learning in the Real World
- Counterfactual Evaluation and Learning for Interactive Systems
-
-
Uncategorized
-
Uncategorized
- Haruka Kiyohara
- Yuta Saito - kaso Co., Ltd. / Cornell University)
-
Programming Languages
Categories
Sub Categories
Offline RL: Theory/Methods
681
Off-Policy Evaluation and Learning: Theory/Methods
224
Offline RL: Applications
124
Blog
62
Podcast
59
Offline RL: Benchmarks/Experiments
38
Off-Policy Evaluation and Learning: Applications
36
Review/Survey/Position Papers
15
Off-Policy Evaluation and Learning: Benchmarks/Experiments
8
Uncategorized
2
Keywords
offline-rl
5
pytorch
4
reinforcement-learning
3
datasets
3
deep-learning
3
off-policy-evaluation
2
research
2
deep-reinforcement-learning
2
risk-assessment
1
contextual-bandits
1
multi-armed-bandits
1
gymnasium
1
control
1
exporation
1
model-free
1
mujoco
1
off-policy
1
python
1
unsupevised
1
criteo
1
openai-gym
1
recommendation-system
1