Show simple item record

dc.contributor.advisorGhrayeb, Ali
dc.contributor.advisorAbu-Rub, Haithem
dc.creatorZainab, Ameema
dc.date.accessioned2022-07-27T16:39:29Z
dc.date.available2023-12-01T09:21:47Z
dc.date.created2021-12
dc.date.issued2021-11-24
dc.date.submittedDecember 2021
dc.identifier.urihttps://hdl.handle.net/1969.1/196325
dc.description.abstractThe proliferation of smart meters in the grids has resulted in an explosion of large energy datasets. Processing such big data is challenging and usually takes a longer time than the requirement of a short-term load forecast. In the era of big data, where information is one of the key factors in making decisions, this study is drawing attention to the need for data management in smart grids. For the utility to be able to plan the resources accurately and balance the electricity supply and demand, accurate and timely forecasting is required. Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated algorithms must be created which require big data platforms with adequate computational resources. Optimal and effective use of the available computational resources can be attained by maximizing the efficient utilization of the computational nodes of a big data platform. Parallel computing is demanded to allow for optimal resources utilization in dealing with smart grid big data. The work in this research addresses the concerns by deploying parallel computing capabilities to minimize the execution time while maintaining highly accurate load forecasting models. This work utilizes multi-node and multi-core processing to minimize the overall execution time of the forecasting models while ensuring acceptable accuracy by mapping simultaneous jobs to available processors. The obtained results demonstrate the efficacy of the proposed approach through real-time adoption of machine learning (ML) models, diminishing execution time, and enhancing scalability. This research will show how tree-based models have outperformed the other models accomplishing a tradeoff between model accuracy and execution time. The proposed approach is validated on real big data provided by Iberdrola, a Spanish utility company. The data is acquired from one hundred thousand different data sources in the electrical distribution system and amounts to 2.2 billion records approximately. To enhance the analysis further, a master-slave parallel computing paradigm for load forecasting is deployed and experimentally verified. The work proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is developed for optimizing multiple Spark jobs to reduce job completion time. The clustering method is implemented to group the electrical distribution nodes into clusters to reduce the number of required forecasting models, additionally reducing computational time.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectBig Data
dc.subjectLoad Forecasting
dc.subjectMachine Learning
dc.titleReal-Time Big Data Platform for Distributed Energy Load Forecasting with Computing Approaches
dc.typeThesis
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorTexas A&M University
thesis.degree.nameDoctor of Philosophy
thesis.degree.levelDoctoral
dc.contributor.committeeMemberBouhali, Othmane
dc.contributor.committeeMemberMasad, Eyad
dc.contributor.committeeMemberSerpedin, Erchin
dc.contributor.committeeMemberXie, Le
dc.type.materialtext
dc.date.updated2022-07-27T16:39:30Z
local.embargo.terms2023-12-01
local.etdauthor.orcid0000-0002-3754-4162


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record