Show simple item record

dc.contributor.advisorChen, Jianer
dc.creatorHuang, Qin
dc.date.accessioned2023-05-26T18:05:30Z
dc.date.created2022-08
dc.date.issued2022-07-26
dc.date.submittedAugust 2022
dc.identifier.urihttps://hdl.handle.net/1969.1/197991
dc.description.abstractIn this thesis, we will propose a new (theoretical) computational model to study massive data processing. Our model has limited “local" computing resources (e.g., a normal computer) that characterizes the computational power. Specifically, problems in our consideration have two pa-rameters N and k, where N is the input size and is assumed to be extremely large, and k is a parameter measuring the computational power. We point out that N and k are independent. Since N is huge, superlinear-time (e.g., quadratic-time) algorithms would be considered impractical. Therefore, we will study algorithms dealing with massive data that run in time nearly linear in N plus a polynomial of k and space polynomial in k. In particular, we will study algorithms that run in time O(N log k + kO(1)) and space O(kO(1)). Moreover, algorithms are not allowed to modify on the input data. We developed new algorithmic techniques that implement algorithms for solving well-known computational problems on the proposed model. In particular, we study graph matching problems and LINE COVER problem in this model. For the graph matching problems, we present a randomized algorithm that finds a k-matching in a general unweighted graph of size N in time O(N+k5/2), and a randomized algorithm that constructs a maximum weighted k-matching in a general weighted graph of size N in time O(N+k3 log k), where the size N is the number of vertices plus the number of edges of the graph. Both algorithms run in space O(k2) and succeed with very high probability. For the LINE COVER problem, which, given a set S of n points and a parameter k, asks for a set of k lines that cover the the points of S, we present two kernelization algorithms: the first one is a randomized algorithm that runs in time O(n log k +k2(log2 k)(log log k)2) and space O(k2 log2 k), and computes a kernel of size at most k2 with probability at least 1 – 2 ; the other algorithm is deterministic and computes a kernel of size at most k2 in time O(n log k + k3(log3 k) log log k). Both algorithms improve the running time of existing kernelization algorithms for LINE COVER. Moreover, we derive a time lower-bound Ω(n log k) for LINE COVER kernelization algorithms in the algebraic computation trees model, which shows that our randomized kernelization algorithm is nearly optimal. Finally, we study graph matching problems in the popular streaming model. We showed that the new algorithmic techniques developed in our proposed model can be applied to obtain better streaming algorithms for graph matching problems in the streaming model. In particular, we present streaming algorithms for graph matching problems in both the insert-only and dynamic streaming models. In the insert-only streaming model, we present a one-pass randomized algorithm that, with high probability, computes a maximum weighted k-matching in a general weighted graph in space O(k2) and with O(1) update time. In the dynamic streaming model, we present a one-pass randomized algorithm that, with high probability, finds a maximum weighted k-matching in a weighted graph in space O˜(Wk2) 1 and with O˜(1) update time, where W is the number of distinct edge weights. Both algorithms significantly improve on previous results in terms of update time while keeping the same space upper bound. Additionally, we show that any randomized streaming algorithm that finds a k-matching of an unweighted graph requires at least Ω(k2) bits and any randomized streaming algorithm that computes a maximum weighted k-matching of a weighted graph requires at least k2 ·Ω(W log W +1)) bits. Consequently, both algorithms are near-optimal in terms of space and update time complexities. We also proposed a one-pass randomized 1/(1 − )-approximation algorithm for P-WT-MATCHING in the dynamic streaming model that, with high probability, finds a k-matching in space O˜(k2−1 log W ) and with O˜(1) update time, where W  is the ratio of max weight to min weight. Finally, we considered a more general version of P-WT-MATCHING in the dynamic streaming model, in which the weight of an edge is allowed to change during the stream. We showed that the space needed is at least linear in the input size.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectbigdata
dc.subject(nearly) linear-time algorithm
dc.subjectspace complexity
dc.subjectgraph matching
dc.subjectline cover
dc.subjectrich lines
dc.subjectexact fitting
dc.subjectkernelization
dc.subjectrandomized algorithms
dc.subjectcomplexity lower bounds
dc.subjectalgebraic computation trees
dc.subjectstreaming algorithms
dc.subjectparameterized algorithms
dc.titleBig Data Algorithms with Limited Local Resources
dc.typeThesis
thesis.degree.departmentComputer Science and Engineering
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberJiang, Anxiao
dc.contributor.committeeMemberKlappenecker, Andreas
dc.contributor.committeeMemberYan, Catherine
dc.type.materialtext
dc.date.updated2023-05-26T18:05:31Z
local.embargo.terms2024-08-01
local.embargo.lift2024-08-01
local.etdauthor.orcid0000-0002-2528-6733


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record