Real-Time Big Data Platform for Distributed Energy Load Forecasting with Computing Approaches
Abstract
The proliferation of smart meters in the grids has resulted in an explosion of large energy datasets. Processing such big data is challenging and usually takes a longer time than the requirement of a short-term load forecast. In the era of big data, where information is one of the key factors in making decisions, this study is drawing attention to the need for data management in smart grids. For the utility to be able to plan the resources accurately and balance the electricity supply and demand, accurate and timely forecasting is required. Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated algorithms must be created which require big data platforms with adequate computational resources. Optimal and effective use of the available computational resources can be attained by maximizing the efficient utilization of the computational nodes of a big data platform. Parallel computing is demanded to allow for optimal resources utilization in dealing with smart grid big data. The work in this research addresses the concerns by deploying parallel computing capabilities to minimize the execution time while maintaining highly accurate load forecasting models. This work utilizes multi-node and multi-core processing to minimize the overall execution time of the forecasting models while ensuring acceptable accuracy by mapping simultaneous jobs to available processors. The obtained results demonstrate the efficacy of the proposed approach through real-time adoption of machine learning (ML) models, diminishing execution time, and enhancing scalability. This research will show how tree-based models have outperformed the other models accomplishing a tradeoff between model accuracy and execution time. The proposed approach is validated on real big data provided by Iberdrola, a Spanish utility company. The data is acquired from one hundred thousand different data sources in the electrical distribution system and amounts to 2.2 billion records approximately. To enhance the analysis further, a master-slave parallel computing paradigm for load forecasting is deployed and experimentally verified. The work proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is developed for optimizing multiple Spark jobs to reduce job completion time. The clustering method is implemented to group the electrical distribution nodes into clusters to reduce the number of required forecasting models, additionally reducing computational time.
Citation
Zainab, Ameema (2021). Real-Time Big Data Platform for Distributed Energy Load Forecasting with Computing Approaches. Doctoral dissertation, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /196325.