RLDP: Reinforcement Learning Decision-Time Planner
Reinforcement learning (RL) is a state-of-the-art approach to solving sequential decision-making problems in stochastic environments. However, most model-free RL algorithms only produce one action at each timestep. That is, they give no indication of the actions they plan to take in the future. A lack of interpretability in contemporary RL algorithms is a major current hindrance to using reinforcement learning in real-life applications. Our work, Action Forecasting Reinforcement Learning (AFRL), generalizes these 0-step RL algorithms to provide reliable n-step plans, which has implications to improve interpretability and safety. We propose combining a dynamics transitions model with a model-free agent to create a reliable plan. The plan that AFRL produces is shown to be consistent in that the agent will not change the plan unless necessary. Our experiments evaluate AFRL on a range of environments and RL algorithms to show that it is possible to have reliable plans without sacrificing performance.
Coad, Josiah D (2022). RLDP: Reinforcement Learning Decision-Time Planner. Undergraduate Research Scholars Program. Available electronically from