Machine Learning for Subsurface Data Analysis: Applications in Outlier Detection, Signal Synthesis and Core & Completion Data Analysis
Abstract
Application of machine learning has become prominent in many fields and has captured the imaginations of various industries. The development of data driven algorithms and the ongoing digitization of subsurface geological measurements provide a world of opportunities to maximize the exploration and production of resources such as oil, gas, coal and geothermal energy. The current proliferation of data, democratization of state-of- the-art processing technology and computation power provide an avenue for both large and small industry players to maximize the use of their data to run more economic and efficient operations. The aim of this thesis is to discuss the development of robust data- driven methods and their effectiveness in providing insightful information about subsurface properties. The study opens with a brief overview of the current literature regarding application of data driven methods in the oil and gas industry.
Outlier detection can be a strenuous task when data preprocessing for purposes of data- driven modeling. The thesis presents the efficacy of unsupervised outlier detection algorithms for various practical cases by comparing the performance of four outlier detection algorithms using appropriate metrics. Three case were created simulating: noisy measurements, measurements from washout formation and measurements from formations with several thin shale layers. It was observed that the Isolation Forest based model is efficient in detecting a wide range of outlier types with a balanced accuracy score of 0.88, 0.93 and 0.96 for the respective cases, while the DBSCAN based model was effective at detecting outliers due to noisy measurement with balanced accuracy score 0f 0.93.
NMR measurements provide a wealth of geological information for petrophysical analysis and can be key in accurately characterizing a reservoir, however they are expensive and technically challenging to deploy, it has been shown in research that machine learning models can be effective in synthesizing some log data. However, predicting an NMR distribution where each depth is represented by several bins poses a different challenge. In this study, a Random Forest model was used for predicting the NMR T1 distribution in a well using relatively inexpensive and readily available well logs with an r2 score and corrected Mean absolute percentage error of 0.14 and 0.84. The predictions fall within the margin of error and an index was proposed to evaluate the reliability of each prediction based on a quantile regression forest to provide the user more information on the accuracy of the prediction when no data is available to test the model as will be the case in real world application. Using this method engineers and geologist can obtain NMR derived information from a well when no NMR tool has been run with a measure of reliability for each predicted sample/depth.
Identifying sweet spots in unconventional formations can be the difference between an economically viable well and a money pit, in this study clustering techniques in conjunction with feature extraction methods were used to identify potential sweet spots in the Sycamore formation, elemental analysis of the clusters identified the carbonate concentration in sycamore siltstones as the key marker for porosity. This provided information as to why some layers had more production potential than the others. Machine learning algorithms were also used to identify key parameters that affect the productivity of an unconventional well using data from a simulation software. 11 completion parameters (lateral spacing, area (areal spacing), total vertical depth, lateral length, stages, perforation cluster, sand intensity, fluid intensity, pay thickness, fracture ½ length and fracture conductivity lateral length) were used to predict the EUR and IP90 using a random forest model and the normalized mean decrease in impurity was used to identify the key parameter. The lateral length was identified as the key parameter for estimated ultimate recovery and perforation clusters the key parameter for higher IP90 with a normalized mean decrease in impurity of 0.73 and 0.88 respectively.
Machine learning methods can be integrated to optimize numerous industry workflows and therefore has huge potential in the oil and gas industry. It has found wide applications in automating mundane tasks like outlier detection, synthesizing pseudo-data when true data is not available and providing more information on technical operation for sound decision making.
Citation
Osogba, Oghenekaro Jefferson (2020). Machine Learning for Subsurface Data Analysis: Applications in Outlier Detection, Signal Synthesis and Core & Completion Data Analysis. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /192757.