Perturbation Feedback Approaches in Stochastic Optimal Control: Applications to Model-Based and Model-Free Problems in Robotics

Loading...
Thumbnail Image

Date

2019-10-18

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Decision making under uncertainty is an important problem in engineering that is traditionally approached differently in each of the Stochastic optimal control, Reinforcement learning and Motion planning disciplines. One prominent challenge that is common to all is the ‘curse of dimensionality’ i.e, the complexity of the problem scaling exponentially as the state dimension increases. As a consequence, traditional stochastic optimal control methods that attempt to obtain an optimal feedback policy for nonlinear systems are computationally intractable. This thesis explores the application of a near-optimal decoupling principle to obtain tractable solutions in both model-based and model-free problems in robotics. The thesis begins with the derivation of a near-optimal decoupling principle between the open loop plan and the closed loop linear feedback gains, based on the analysis performed with the second-order expansion of the cost-to-go function. This leads to a deterministic perturbation feedback control based solution to fully observable stochastic optimal control problems. Basing on this idea of near-optimal decoupling, a model-based trajectory optimization algorithm called the ‘Trajectory-optimized Perturbation Feedback Controller’ (T-PFC) is proposed. Rather than aiming to solve for the general optimal policy, this algorithm solves for an open-loop trajectory first, followed by the feedback that is automatically entailed by the algorithm from the open-loop plan. The performance is compared against a set of baselines in several difficult robotic planning and control examples that show near identical performance to non-linear model predictive control (NMPC) while requiring much lesser computational effort. Next, we turn on to the investigation of the model-free version of the problem, where a policy is learnt from the data, without incorporating system’s theoretical model. We present a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled ‘open loop - closed loop’ approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a closed loop control is developed around this open loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm is used for the demonstration of the closed loop control. Simulation performance suggests a significant reduction in training time compared to other state of the art reinforcement learning algorithms. Finally, an alternative method for solving the open-loop trajectory in D2C is presented (called as ‘D2C-2.0’). Stemming from the idea of model-based ‘Differential Dynamic Programming’ (DDP), it possesses second-order convergence property (under certain assumptions) and hence is significantly faster to compute the solution than the original D2C algorithm. An efficient way of sampling from the environment to convert it to a model-free algorithm, along with the suitable line-search and regularization schemes are presented. Comparisons are made with the original version of D2C and a state-of-the-art reinforcement learning algorithm using a variety of examples in the MuJoCo simulator. In conclusion, limitations for each of the above methods are discussed and accordingly, some possible directions have been provided for the future work.

Description

Keywords

Reinforcement Learning, Stochastic Optimal Control, Motion Planning, Trajectory Optimization, Robotics

Citation