Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms

Belsare, Aditya Sanjay

dc.contributor.advisor	Liu, Jyh-Charn (Steve)
dc.contributor.advisor	Hu, Jiang
dc.creator	Belsare, Aditya Sanjay
dc.date.accessioned	2015-02-05T17:22:02Z
dc.date.available	2016-08-01T05:30:05Z
dc.date.created	2014-08
dc.date.issued	2014-05-27
dc.date.submitted	August 2014
dc.identifier.uri	https://hdl.handle.net/1969.1/153210
dc.description.abstract	Direct sparse solvers are traditionally known to be robust, yet difficult to parallelize. In the context of circuit simulators, they present an important bottleneck where the key steps of LU factorization and forward-backward substitution are repeatedly performed to reach the solution. Limited speedups have been obtained on multi-core CPUs as well as GPUs owing to the strong data dependency in these steps. With the advent of many-core coprocessors like the Intel Xeon Phi with fewer yet powerful cores and wider vector units, sparse LU factorization can be optimized for higher speedups compared to traditional LU decomposition methods like the Gilbert Peierl's algorithm. In this thesis, we first establish Sparse Compressed Row (CSR) as the preferred data structure amongst other popular sparse matrix representations for parallelizing sparse circuit solvers, irrespective of the architecture used. Next, we propose and implement a sparse circuit solver suited for parallelization on both the Nvidia GPU and Intel Xeon Phi platform, which is amenable to vectorization and takes advantage of hardware support, if any, for gather-scatter operations. Finally, we analyze our implementation on multi-core, SIMD and SIMT architectures namely Intel Xeon CPU, Intel Xeon Phi coprocessor and an Nvidia GPU respectively, each using different programming models suited for the respective platform to determine the architecture best suited for parallelizing direct sparse matrix solvers. Our parallel sparse LU factorization achieves an average speedup of 7.18x on the Xeon Phi and 2.75x in case of the GPU implementation on GTX 680 over an Intel 4-core i7 CPU, which is up to 13x faster than a single threaded implementation.	en
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Sparse matrix solver	en
dc.subject	LU Factorization	en
dc.title	Sparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platforms	en
dc.type	Thesis	en
thesis.degree.department	Electrical and Computer Engineering	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Texas A & M University	en
thesis.degree.name	Master of Science	en
thesis.degree.level	Masters	en
dc.contributor.committeeMember	Gratz, Paul V
dc.type.material	text	en
dc.date.updated	2015-02-05T17:22:02Z
local.embargo.terms	2016-08-01
local.etdauthor.orcid	0000-0003-1034-8433

Files in this item

Name:: BELSARE-THESIS-2014.pdf
Size:: 461.2Kb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Electronic Theses, Dissertations, and Records of Study (2002– )
Texas A&M University Theses, Dissertations, and Records of Study (2002– )

Show simple item record