Learning and Control with Delay and Safety Constraints in Networked Systems
Abstract
Many physical systems have underlying safety or capacity considerations that require that the control policy employed ensures the satisfaction of a set of constraints. For instance, in a data network, in addition to maximizing the utility of the users, the controller has to maintain necessary link capacities constraints. Such systems can often be modeled as a constrained Markov Decision Process (CMDP), but the model itself might have unknown or rapidly changing system parameters, which calls for a learning-based solution approach.
Our goal in this thesis is to develop Reinforcement Learning (RL) algorithms to learn a generic CMDP problem and explore applications to communication networks. Here, our goal is to characterize the relationship between constraints and the number of samples needed to ensure a desired level of accuracy. We explore two classes of algorithms, (i) algorithms based on Linear Programming (LP), (ii) algorithms based on a Lagrangian approach. Each of these classes is divided into two sub-classes according to sample collection process. On the one hand, we may collect samples uniformly across state-action pairs, and then develop a control policy based on these samples— called the generative model-based approach. On the other hand, we may collect samples in an online manner by applying a policy on the system, and then continually refining the policy as more samples become available—called the online learning approach. We characterize the sample complexity of the algorithms following both these approaches to obtain near-optimal policies.
We then consider the question of CMDPs in the context of data networks. We desire to solve the problem of serving real-time flows over a multi-hop wireless network. Each flow is composed of packets that have strict deadlines, and the goal is to maximize the weighted timely throughput of the system. Consistent with recent developments using mm-wave communications, we assume that the links are directional, but are lossy, and have unknown probabilities of successful packet transmission. An average link utilization budget constrains the system. The problem thus takes the form of a CMDP with an unknown transition kernel, and we develop new algorithms well suited for data network problems using the insights of RL algorithms for generic CMDPs.
Citation
Hasanzadezonuzy, Aria (2021). Learning and Control with Delay and Safety Constraints in Networked Systems. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /196276.