Performance Guarantee of a Sub-Optimal Policy for a Discrete Markov Decision Process and Its Application to a Robotic Surveillance Problem
Abstract
This dissertation deals with the development and analysis of sub-optimal decision
algorithms for a collection of robots that assist a remotely located operator in
perimeter surveillance. The operator is tasked with the classification of incursions
across the perimeter. Whenever there is an incursion into the perimeter, a nearby
Unattended Ground Sensor (UGS) signals an alert. A robot services the alert by
visiting the alert location, collecting evidence in the form of video imagery, and
transmitting it to the operator.
The accuracy of operator's classification depends on the volume and freshness of
information gathered and provided by the robots at locations where incursions occur.
There are two competing needs for a robot: it needs to spend adequate time at an
alert location to collect evidence for aiding the operator in accurate classification but
it also needs to service other alerts as soon as possible, so that the evidence collected
is relevant. The control problem is to determine the optimal amount of time a robot
must spend servicing an alert. The incursions are stochastic and their statistics are
assumed to be known.
The control problem may be posed as a Markov Decision Problem (MDP). Dynamic
Programming(DP) provides the optimal policy to the MDP. However, because
of the "curse of dimensionality" of DP, finding the optimal policy is not practical in
many applications. For a perimeter surveillance problem with two robots and five
UGS locations, the number of states is of the order of billions. Approximate Dynamic
Programming (ADP) via Linear Programming (LP) provides a way to approximate
the value function and derive sub-optimal strategies. Using state partitioning
and ADP, this dissertation provides different LP formulations for upper and lower
bounds to the value function of the MDP, and shows the relationship between LPs
and MDP. The novel features of this dissertation are (1) the derivation of a tractable
lower bound via LP and state partitioning, (2) the construction of a sub-optimal policy
whose performance exceeds the lower bound, and (3) the derivation of an upper
bound using a non-linear programming formuation. The upper and lower bounds
provides approximation ratio to the value function. Finally, illustrative perimeter
surveillance examples corroborate the results derived in this dissertation.
Citation
Park, Myoungkuk (2014). Performance Guarantee of a Sub-Optimal Policy for a Discrete Markov Decision Process and Its Application to a Robotic Surveillance Problem. Doctoral dissertation, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /152659.