Show simple item record

dc.contributor.advisorLiu, Jyh-Charn (Steve)
dc.contributor.advisorHu, Jiang
dc.creatorBelsare, Aditya Sanjay
dc.date.accessioned2015-02-05T17:22:02Z
dc.date.available2016-08-01T05:30:05Z
dc.date.created2014-08
dc.date.issued2014-05-27
dc.date.submittedAugust 2014
dc.identifier.urihttps://hdl.handle.net/1969.1/153210
dc.description.abstractDirect sparse solvers are traditionally known to be robust, yet difficult to parallelize. In the context of circuit simulators, they present an important bottleneck where the key steps of LU factorization and forward-backward substitution are repeatedly performed to reach the solution. Limited speedups have been obtained on multi-core CPUs as well as GPUs owing to the strong data dependency in these steps. With the advent of many-core coprocessors like the Intel Xeon Phi with fewer yet powerful cores and wider vector units, sparse LU factorization can be optimized for higher speedups compared to traditional LU decomposition methods like the Gilbert Peierl's algorithm. In this thesis, we first establish Sparse Compressed Row (CSR) as the preferred data structure amongst other popular sparse matrix representations for parallelizing sparse circuit solvers, irrespective of the architecture used. Next, we propose and implement a sparse circuit solver suited for parallelization on both the Nvidia GPU and Intel Xeon Phi platform, which is amenable to vectorization and takes advantage of hardware support, if any, for gather-scatter operations. Finally, we analyze our implementation on multi-core, SIMD and SIMT architectures namely Intel Xeon CPU, Intel Xeon Phi coprocessor and an Nvidia GPU respectively, each using different programming models suited for the respective platform to determine the architecture best suited for parallelizing direct sparse matrix solvers. Our parallel sparse LU factorization achieves an average speedup of 7.18x on the Xeon Phi and 2.75x in case of the GPU implementation on GTX 680 over an Intel 4-core i7 CPU, which is up to 13x faster than a single threaded implementation.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectSparse matrix solveren
dc.subjectLU Factorizationen
dc.titleSparse LU Factorization for Large Circuit Matrices on Heterogenous Parallel Computing Platformsen
dc.typeThesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A & M Universityen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelMastersen
dc.contributor.committeeMemberGratz, Paul V
dc.type.materialtexten
dc.date.updated2015-02-05T17:22:02Z
local.embargo.terms2016-08-01
local.etdauthor.orcid0000-0003-1034-8433


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record