Show simple item record

dc.contributor.advisorTian, Chao
dc.creatorZhou, Ruida
dc.date.accessioned2023-10-12T13:56:22Z
dc.date.available2023-10-12T13:56:22Z
dc.date.created2023-08
dc.date.issued2023-07-23
dc.date.submittedAugust 2023
dc.identifier.urihttps://hdl.handle.net/1969.1/199859
dc.description.abstractWe study the usage of information-theoretic measures in learning problems. The first problem considered is the algorithm-dependent generalization error bound. Conceptually, the mutual information between the output of the learning algorithm and training samples captures the amount of information the algorithm learned from the samples, which reflects the overfitting. This motivated the studies on mutual information-based generalization error bound. We propose the individually conditional individual mutual information (ICIMI) bound based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou. It combines the merits of the existing studies and provides a tighter bound, and in the process of establishing this bound, we introduce a conditional decoupling lemma, which allows us the view the existing bounds in a unified framework. We further propose a stochastic chaining method, which applies the chaining technique in conjunction with information-theoretic measures to bound the generalization error. The stochastic chaining method borrowed intuition from successive refinement and is more flexible than the previous deterministic chaining approach in conjunction with information-theoretic bounds. We finally provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian mean estimation problem. Despite considerable existing efforts in deriving information-theoretic generalization error bounds, applying them to this simple setting where sample average is used as the estimate of the mean value of Gaussian data has not yielded satisfying results. Besides capturing the interplay between learning algorithms and samples, information measures can also be useful to characterize the complexity of the problems. We study the effect of reward variance heterogeneity in the approximate top-m arm identification problem. In this problem, the rewards of pulling each arm are sub-Gaussian but with different variance-proxies, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify m arms with the largest means in probably approximately sense. The worst-case sample complexity of this problem is characterized by a divide-and-conquer style algorithm and a matching lower bound. The sample complexity reveals that the effect of the reward variance heterogeneity is quantified by an Entropy-like function of the variances. In addition to bounding and characterizing certain performance metrics, information measures can also facilitate the design of algorithms. We study the policy optimization in multi-objective reinforcement learning and propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective Markov decision process (MDP) problems. The ARNPG framework introduces Kullback-Leibler divergences with changing anchors as regularization to the intermediate policy update, which enables acceleration as well as bridging the analysis between the policy gradient update and the incorporated first-order methods. Under softmax parameterization with exact gradients, the proposed algorithms inherit the advantages of the integrated first-order methods and are guaranteed to have Õ(1/T) global convergence without further assumptions on the underlying MDP. Experiments are further provided to demonstrate that the proposed algorithms provide superior performance.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectInformation-theoretic measures
dc.subjectgeneralization error bound
dc.subjectbest arm identification
dc.subjectreinforcement learning
dc.titleInformation-Theoretic Measures in Selected Learning Problems
dc.typeThesis
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberKalathil, Dileep
dc.contributor.committeeMemberKumar, Panganamala R
dc.contributor.committeeMemberLiu, Tie
dc.contributor.committeeMemberWang, Zhangyang
dc.type.materialtext
dc.date.updated2023-10-12T13:56:23Z
local.etdauthor.orcid0000-0002-8855-2031


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record