Big Data Algorithms with Limited Local Resources

Huang, Qin

The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period, even for Texas A&M users with NetID.

Show simple item record

dc.contributor.advisor	Chen, Jianer
dc.creator	Huang, Qin
dc.date.accessioned	2023-05-26T18:05:30Z
dc.date.created	2022-08
dc.date.issued	2022-07-26
dc.date.submitted	August 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/197991
dc.description.abstract	In this thesis, we will propose a new (theoretical) computational model to study massive data processing. Our model has limited “local" computing resources (e.g., a normal computer) that characterizes the computational power. Specifically, problems in our consideration have two pa-rameters N and k, where N is the input size and is assumed to be extremely large, and k is a parameter measuring the computational power. We point out that N and k are independent. Since N is huge, superlinear-time (e.g., quadratic-time) algorithms would be considered impractical. Therefore, we will study algorithms dealing with massive data that run in time nearly linear in N plus a polynomial of k and space polynomial in k. In particular, we will study algorithms that run in time O(N log k + kO(1)) and space O(kO(1)). Moreover, algorithms are not allowed to modify on the input data. We developed new algorithmic techniques that implement algorithms for solving well-known computational problems on the proposed model. In particular, we study graph matching problems and LINE COVER problem in this model. For the graph matching problems, we present a randomized algorithm that finds a k-matching in a general unweighted graph of size N in time O(N+k5/2), and a randomized algorithm that constructs a maximum weighted k-matching in a general weighted graph of size N in time O(N+k3 log k), where the size N is the number of vertices plus the number of edges of the graph. Both algorithms run in space O(k2) and succeed with very high probability. For the LINE COVER problem, which, given a set S of n points and a parameter k, asks for a set of k lines that cover the the points of S, we present two kernelization algorithms: the first one is a randomized algorithm that runs in time O(n log k +k2(log2 k)(log log k)2) and space O(k2 log2 k), and computes a kernel of size at most k2 with probability at least 1 – 2 ; the other algorithm is deterministic and computes a kernel of size at most k2 in time O(n log k + k3(log3 k) log log k). Both algorithms improve the running time of existing kernelization algorithms for LINE COVER. Moreover, we derive a time lower-bound Ω(n log k) for LINE COVER kernelization algorithms in the algebraic computation trees model, which shows that our randomized kernelization algorithm is nearly optimal. Finally, we study graph matching problems in the popular streaming model. We showed that the new algorithmic techniques developed in our proposed model can be applied to obtain better streaming algorithms for graph matching problems in the streaming model. In particular, we present streaming algorithms for graph matching problems in both the insert-only and dynamic streaming models. In the insert-only streaming model, we present a one-pass randomized algorithm that, with high probability, computes a maximum weighted k-matching in a general weighted graph in space O(k2) and with O(1) update time. In the dynamic streaming model, we present a one-pass randomized algorithm that, with high probability, finds a maximum weighted k-matching in a weighted graph in space O˜(Wk2) 1 and with O˜(1) update time, where W is the number of distinct edge weights. Both algorithms significantly improve on previous results in terms of update time while keeping the same space upper bound. Additionally, we show that any randomized streaming algorithm that finds a k-matching of an unweighted graph requires at least Ω(k2) bits and any randomized streaming algorithm that computes a maximum weighted k-matching of a weighted graph requires at least k2 ·Ω(W log W +1)) bits. Consequently, both algorithms are near-optimal in terms of space and update time complexities. We also proposed a one-pass randomized 1/(1 − )-approximation algorithm for P-WT-MATCHING in the dynamic streaming model that, with high probability, finds a k-matching in space O˜(k2−1 log W ) and with O˜(1) update time, where W  is the ratio of max weight to min weight. Finally, we considered a more general version of P-WT-MATCHING in the dynamic streaming model, in which the weight of an edge is allowed to change during the stream. We showed that the space needed is at least linear in the input size.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	bigdata
dc.subject	(nearly) linear-time algorithm
dc.subject	space complexity
dc.subject	graph matching
dc.subject	line cover
dc.subject	rich lines
dc.subject	exact fitting
dc.subject	kernelization
dc.subject	randomized algorithms
dc.subject	complexity lower bounds
dc.subject	algebraic computation trees
dc.subject	streaming algorithms
dc.subject	parameterized algorithms
dc.title	Big Data Algorithms with Limited Local Resources
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Doctor of Philosophy
thesis.degree.level	Doctoral
dc.contributor.committeeMember	Jiang, Anxiao
dc.contributor.committeeMember	Klappenecker, Andreas
dc.contributor.committeeMember	Yan, Catherine
dc.type.material	text
dc.date.updated	2023-05-26T18:05:31Z
local.embargo.terms	2024-08-01
local.embargo.lift	2024-08-01
local.etdauthor.orcid	0000-0002-2528-6733

Files in this item

Name:: HUANG-DISSERTATION-2022.pdf
Size:: 693.9Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record

Big Data Algorithms with Limited Local Resources

Files in this item

This item appears in the following Collection(s)

Related items