Show simple item record

dc.creatorAndros, Jacob
dc.creatorGuhaniyogi, Rajarshi
dc.creatorFrancom, Devin
dc.creatorPasqualini, Donatella
dc.date.accessioned2024-05-01T21:20:04Z
dc.date.available2024-05-01T21:20:04Z
dc.date.issued2024-05-01
dc.identifier.urihttps://hdl.handle.net/1969.1/200959
dc.description.abstractRealistic simulations are crucial for comprehending complex systems in climate and environmental studies. Yet, running sophisticated computational models across a wide range of input settings often overwhelms large computer systems. Statistical surrogate models, or emulators, play a vital role in efficiently exploring the simulator input space. Functional data models involving Gaussian processes (GPs) and their computationally efficient variants have become standard tools for achieving this goal. The conventional centralized processing of such models requires substantial computational and storage resources at the central server. To counter this, emerging distributed Bayesian learning frameworks partition raw data into shards and distribute computations of these shards across machines. While this strategy mitigates data storage costs and improves computation within each machine, concerns arise regarding the sensitivity of distributed inference to shard selection. Motivated by the concept of data sketching in the literature, this article proposes an innovative alternative. Instead of creating data shards, our approach employs multiple random matrices to construct multiple random linear projections, or `"random sketches," of the complete dataset. Posterior inference on functional data models is performed using random sketches on various machines in parallel. These individual inferences are then combined across machines at a central server. By aggregating inference across diverse random matrices, our approach proves resilient to the selection of data sketches, leading to the development of novel robust distributed Bayesian learning approach. An important advantage of our approach is its ability to maintain the privacy of sampling units, as the inference is based on random data sketches that do not allow the recovery of raw data. We illustrate the significance of our approach through various simulated data examples in the realm of Bayesian distributed learning techniques. Finally, we demonstrate the performance of our proposed approach as an emulator with surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator—a choice of simulator for government agencies.en_US
dc.description.sponsorshipNational Science Foundation, Los Alamos National Laboratoriesen_US
dc.language.isoen_USen_US
dc.rightsCC0 1.0 Universal*
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.titleRobust Distributed Learning of Functional Data From Simulators through Data Sketchingen_US
dc.typeTechnical Reporten_US
local.departmentStatisticsen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC0 1.0 Universal
Except where otherwise noted, this item's license is described as CC0 1.0 Universal