Big Data Algorithms with Limited Local Resources

Huang, Qin

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

View/ Open

HUANG-DISSERTATION-2022.pdf (693.9Kb)

Date

2022-07-26

Author

Huang, Qin

Metadata

Show full item record

Abstract

In this thesis, we will propose a new (theoretical) computational model to study massive data processing. Our model has limited “local" computing resources (e.g., a normal computer) that characterizes the computational power. Specifically, problems in our consideration have two pa-rameters N and k, where N is the input size and is assumed to be extremely large, and k is a parameter measuring the computational power. We point out that N and k are independent. Since N is huge, superlinear-time (e.g., quadratic-time) algorithms would be considered impractical. Therefore, we will study algorithms dealing with massive data that run in time nearly linear in N plus a polynomial of k and space polynomial in k. In particular, we will study algorithms that run in time O(N log k + kO(1)) and space O(kO(1)). Moreover, algorithms are not allowed to modify on the input data. We developed new algorithmic techniques that implement algorithms for solving well-known computational problems on the proposed model. In particular, we study graph matching problems and LINE COVER problem in this model. For the graph matching problems, we present a randomized algorithm that finds a k-matching in a general unweighted graph of size N in time O(N+k5/2), and a randomized algorithm that constructs a maximum weighted k-matching in a general weighted graph of size N in time O(N+k3 log k), where the size N is the number of vertices plus the number of edges of the graph. Both algorithms run in space O(k2) and succeed with very high probability. For the LINE COVER problem, which, given a set S of n points and a parameter k, asks for a set of k lines that cover the the points of S, we present two kernelization algorithms: the first one is a randomized algorithm that runs in time O(n log k +k2(log2 k)(log log k)2) and space O(k2 log2 k), and computes a kernel of size at most k2 with probability at least 1 – 2 ; the other algorithm is deterministic and computes a kernel of size at most k2 in time O(n log k + k3(log3 k) log log k). Both algorithms improve the running time of existing kernelization algorithms for LINE COVER. Moreover, we derive a time lower-bound Ω(n log k) for LINE COVER kernelization algorithms in the algebraic computation trees model, which shows that our randomized kernelization algorithm is nearly optimal. Finally, we study graph matching problems in the popular streaming model. We showed that the new algorithmic techniques developed in our proposed model can be applied to obtain better streaming algorithms for graph matching problems in the streaming model. In particular, we present streaming algorithms for graph matching problems in both the insert-only and dynamic streaming models. In the insert-only streaming model, we present a one-pass randomized algorithm that, with high probability, computes a maximum weighted k-matching in a general weighted graph in space O(k2) and with O(1) update time. In the dynamic streaming model, we present a one-pass randomized algorithm that, with high probability, finds a maximum weighted k-matching in a weighted graph in space O˜(Wk2) 1 and with O˜(1) update time, where W is the number of distinct edge weights. Both algorithms significantly improve on previous results in terms of update time while keeping the same space upper bound. Additionally, we show that any randomized streaming algorithm that finds a k-matching of an unweighted graph requires at least Ω(k2) bits and any randomized streaming algorithm that computes a maximum weighted k-matching of a weighted graph requires at least k2 ·Ω(W log W +1)) bits. Consequently, both algorithms are near-optimal in terms of space and update time complexities. We also proposed a one-pass randomized 1/(1 − )-approximation algorithm for P-WT-MATCHING in the dynamic streaming model that, with high probability, finds a k-matching in space O˜(k2−1 log W ) and with O˜(1) update time, where W  is the ratio of max weight to min weight. Finally, we considered a more general version of P-WT-MATCHING in the dynamic streaming model, in which the weight of an edge is allowed to change during the stream. We showed that the space needed is at least linear in the input size.

Citation

Huang, Qin (2022). Big Data Algorithms with Limited Local Resources. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197991.