Optimization Methods for Cluster Analysis in Network-based Data Mining
Abstract
This dissertation focuses on two optimization problems that arise in network-based data mining, concerning identification of basic community structures (clusters) in graphs: the maximum edge weight clique and maximum induced cluster subgraph problems. We propose a continuous quadratic formulation for the maximum edge weight clique problem, and establish the correspondence between its local optima and maximal cliques in the graph. Subsequently, we present a combinatorial branch-and-bound algorithm for this problem that takes advantage of a polynomial-time solvable nonconvex relaxation of the proposed formulation. We also introduce a linear-time-computable analytic upper bound on the clique number of a graph, as well as a new method of upper-bounding the maximum edge weight clique problem, which leads to another exact algorithm for this problem. For the maximum induced cluster subgraph problem, we present the results of a comprehensive polyhedral analysis. We derive several families of facet-defining valid inequalities for the IUC polytope associated with a graph. We also provide a complete description of this polytope for some special classes of graphs. We establish computational complexity of the separation problems for most of the considered families of valid inequalities, and explore the effectiveness of employing the corresponding cutting planes in an integer (linear) programming framework for the maximum induced cluster subgraph problem.
Citation
Hosseinian, Seyedmohammadhos (2021). Optimization Methods for Cluster Analysis in Network-based Data Mining. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /193265.