ParU: A Task Based Parallel Multifrontal and Unsymmetric Sparse LU Factorization

Mahmoudi Aznaveh, Mohsen

dc.contributor.advisor	Davis, Tim
dc.creator	Mahmoudi Aznaveh, Mohsen
dc.date.accessioned	2023-09-18T16:38:34Z
dc.date.available	2023-09-18T16:38:34Z
dc.date.created	2022-12
dc.date.issued	2022-12-05
dc.date.submitted	December 2022
dc.identifier.uri	https://hdl.handle.net/1969.1/198633
dc.description.abstract	We are introducing a new sparse direct solver with a parallel multifrontal algorithm. The algorithm is designed by borrowing the symbolic analysis phase and some ideas from UMFPACK. UMFPACK is a part of the SuiteSparse package and appears as a built-in routine in MATLAB. Although the general idea of the algorithm is the same, we designed a new algorithm from scratch that is more amenable to parallelism. The new algorithm is right-looking multifrontal with rectangular fronts and uses the sparsest pivot like UMFPACK; however, its data structures are different, and the factorization is more coarse-grained. One of the most significant contributions of this work is the algorithm that cuts various dependencies of the data structure. The only dependency comes from the matrix’s pattern from the elimination tree. Independent tasks can start working in different computing cores using OpenMP tasking. Each task can call BLAS kernels, specifically matrix multiplication and triangular solve. Therefore, there is nested parallelism in this algorithm. To better manage hardware resources, we can exploit parallel BLAS only if we have more computing cores than tasks. In practice, the performance depends on the BLAS library and the input matrix. It is better to have a mixed strategy to have both parallel BLAS with parallel fronts. Therefore, our algorithm is less affected by parallel BLAS and shows good performance compared to UMFPACK. Data structure and memory management are different in ParU, so it needs less memory to solve a system. The current algorithm is implemented for a shared memory environment. However, the algorithm can be an excellent candidate to be implemented in a distributed environment. Moreover, while there are parallel BLAS calls (typically larger than what UMFPACK has), it is also an excel-lent candidate for using hardware accelerators like GPUs.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Sparse linear algebra
dc.subject	Matrix computation
dc.title	ParU: A Task Based Parallel Multifrontal and Unsymmetric Sparse LU Factorization
dc.type	Thesis
thesis.degree.department	Computer Science and Engineering
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Texas A&M University
thesis.degree.name	Doctor of Philosophy
thesis.degree.level	Doctoral
dc.contributor.committeeMember	Sarin, Vivek
dc.contributor.committeeMember	Song, Dezhen
dc.contributor.committeeMember	Foucart, Simon
dc.type.material	text
dc.date.updated	2023-09-18T16:38:35Z
local.etdauthor.orcid	0000-0003-4860-4762

Files in this item

Name:: MAHMOUDIAZNAVEH-DISSERTATION-2 ...
Size:: 1.313Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record