Show simple item record

dc.contributor.advisorChakravorty, Suman
dc.contributor.advisorKalathil, Dileep
dc.creatorWang, Ran
dc.date.accessioned2023-05-26T17:54:27Z
dc.date.available2023-05-26T17:54:27Z
dc.date.created2022-08
dc.date.issued2022-06-22
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197917
dc.description.abstractThe problem of Reinforcement Learning (RL) is equivalent to the search for an optimal feedback control policy from data without system dynamics information. Most RL techniques search over a complex global nonlinear parametrization, such as deep neural nets, with drawbacks in training efficiency and solution variance. In this dissertation, we propose a decoupled data-based framework for RL/ data-based control that is highly efficient, robust and optimal when compared to state-of-the-art RL approaches. The efforts are primarily in three directions: learning to control 1) efficiently and reliably, 2) for high-dimensional nonlinear complex systems with partial state observations, and 3) under process and sensing uncertainties. First, we propose a decoupling principle that leads to the decoupled data-based control (D2C) framework which designs the open-loop optimal trajectory and the closed-loop feedback law separately to achieve high training efficiency. Its convergence to the global optimum is proved. Simulation results on benchmark examples show its significant advantages in training efficiency, training reliability and robustness to noise over state-of-the-art RL methods. Second, the D2C is extended to partially observed problems using a suitably defined “information state" which is implemented using autoregressive–moving-average (ARMA) system identification. We show that the resulting solution is the global optimum and satisfies a generalized minimum principle for the partially observed problem. The extended D2C technique allows us to solve the optimal control problem for partially observed, high-dimensional and nonlinear robotic systems. Finally, we show that when learning to control in the fully observed case with process noise only, the extended D2C method converges to the global optimum. However, it is also shown that the method gives a biased result in the partially observed case with both process and measurement noise, where multiple rollouts need to be averaged to recover optimality.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectOptimal Control
dc.subjectPartial Observation
dc.subjectReinforcement Learning
dc.subjectNonlinear Systems
dc.titleDecoupled Data-Based Control (D2C) for Complex Robotic Systems
dc.typeThesis
thesis.degree.departmentAerospace Engineering
thesis.degree.disciplineAerospace Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberValasek, John
dc.contributor.committeeMemberBhattacharya, Raktim
dc.type.materialtext
dc.date.updated2023-05-26T17:54:27Z
local.etdauthor.orcid0000-0003-1698-5978


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record