The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.
Big Data Algorithms with Limited Local Resources
Abstract
In this thesis, we will propose a new (theoretical) computational model to study massive data processing. Our model has limited “local" computing resources (e.g., a normal computer) that characterizes the computational power. Specifically, problems in our consideration have two pa-rameters N and k, where N is the input size and is assumed to be extremely large, and k is a parameter measuring the computational power. We point out that N and k are independent. Since N is huge, superlinear-time (e.g., quadratic-time) algorithms would be considered impractical. Therefore, we will study algorithms dealing with massive data that run in time nearly linear in N plus a polynomial of k and space polynomial in k. In particular, we will study algorithms that run in time O(N log k + kO(1)) and space O(kO(1)). Moreover, algorithms are not allowed to modify on the input data.
We developed new algorithmic techniques that implement algorithms for solving well-known computational problems on the proposed model. In particular, we study graph matching problems and LINE COVER problem in this model. For the graph matching problems, we present a randomized algorithm that finds a k-matching in a general unweighted graph of size N in time O(N+k5/2), and a randomized algorithm that constructs a maximum weighted k-matching in a general weighted graph of size N in time O(N+k3 log k), where the size N is the number of vertices plus the number of edges of the graph. Both algorithms run in space O(k2) and succeed with very high probability. For the LINE COVER problem, which, given a set S of n points and a parameter k, asks for a set of k lines that cover the the points of S, we present two kernelization algorithms: the first one is a randomized algorithm that runs in time O(n log k +k2(log2 k)(log log k)2) and space O(k2 log2 k), and computes a kernel of size at most k2 with probability at least 1 – 2 ; the other algorithm is deterministic and computes a kernel of size at most k2 in time O(n log k + k3(log3 k) log log k). Both algorithms improve the running time of existing kernelization algorithms for LINE COVER. Moreover, we derive a time lower-bound Ω(n log k) for LINE COVER kernelization algorithms in the algebraic computation trees model, which shows that our randomized kernelization algorithm is nearly optimal.
Finally, we study graph matching problems in the popular streaming model. We showed that the new algorithmic techniques developed in our proposed model can be applied to obtain better streaming algorithms for graph matching problems in the streaming model. In particular, we present streaming algorithms for graph matching problems in both the insert-only and dynamic streaming models. In the insert-only streaming model, we present a one-pass randomized algorithm that, with high probability, computes a maximum weighted k-matching in a general weighted graph in space O(k2) and with O(1) update time. In the dynamic streaming model, we present a one-pass randomized algorithm that, with high probability, finds a maximum weighted k-matching in a weighted graph in space O˜(Wk2) 1 and with O˜(1) update time, where W is the number of distinct edge weights. Both algorithms significantly improve on previous results in terms of update time while keeping the same space upper bound. Additionally, we show that any randomized streaming algorithm that finds a k-matching of an unweighted graph requires at least Ω(k2) bits and any randomized streaming algorithm that computes a maximum weighted k-matching of a weighted graph requires at least k2 ·Ω(W log W +1)) bits. Consequently, both algorithms are near-optimal in terms of space and update time complexities. We also proposed a one-pass randomized 1/(1 − )-approximation algorithm for P-WT-MATCHING in the dynamic streaming model that, with high probability, finds a k-matching in space O˜(k2−1 log W ) and with O˜(1) update time, where W is the ratio of max weight to min weight. Finally, we considered a more general version of P-WT-MATCHING in the dynamic streaming model, in which the weight of an edge is allowed to change during the stream. We showed that the space needed is at least linear in the input size.
Subject
bigdata(nearly) linear-time algorithm
space complexity
graph matching
line cover
rich lines
exact fitting
kernelization
randomized algorithms
complexity lower bounds
algebraic computation trees
streaming algorithms
parameterized algorithms
Citation
Huang, Qin (2022). Big Data Algorithms with Limited Local Resources. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /197991.
Related items
Showing items related by title, author, creator and subject.
-
Morsy, Hazem Kamal (Texas A&M University, 1997)Distributed systems have become a popular computing environment. Due to their high potentials in improving performance and resource sharing, the evolution and maturing of technologies such as networks and computer hardware, ...
-
Lin, Shyh-Horng (Texas A&M University, 1993)The fast progressing of Integrated Circuit demands a highly efficient Automatic Test Pattern Generation algorithm. The problem in ATPG is how to minimize the huge redundant space in a highly reconvergent fanout circuit. ...
-
Kim, Mansuck; Yoon, Byung-Jun (BMC Genomics, 2012)