eprintid: 19772
rev_number: 2
eprint_status: archive
userid: 1
dir: disk0/00/01/97/72
datestamp: 2024-06-04 14:19:30
lastmod: 2024-06-04 14:19:30
status_changed: 2024-06-04 14:15:47
type: article
metadata_visibility: show
creators_name: Otchere, D.A.
title: Fundamental error in tree-based machine learning model selection for reservoir characterisation
ispublished: pub
keywords: Forecasting; Forestry; Machine learning; Petroleum prospecting; Petroleum reservoir engineering; Sandstone; Statistics; Visualization, Extra-trees; Machine learning models; Machine learning techniques; Machine-learning; Model Selection; Permeability; Reservoir characterization; Statistical metric; Tree modeling; Tree-based, Data visualization, machine learning; permeability; reservoir characterization; visualization
note: cited By 2
abstract: Over the past two decades, machine learning techniques have been extensively used in predicting reservoir properties. While this approach has significantly contributed to the industry, selecting an appropriate model is still challenging for most researchers. Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach. This study encourages researchers to incorporate data visualization in their analysis and model selection process. To evaluate the suitability of different models in predicting horizontal permeability in the Volve field, wireline logs were used to train Extra-Trees, Ridge, Bagging, and XGBoost models. The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models. Based on statistical metrics, the Extra-Trees model achieved the highest test accuracy of 0.996, RMSE of 19.54 mD, and MAE of 3.18 mD, with XGBoost coming in second. However, when the results were visualised, it was discovered that the XGBoost model was more suitable for the problem being tackled. The XGBoost model was a better predictor within the sandstone interval, while the Extra-Trees model was more appropriate in non-sandstone intervals. Since this study aims to predict permeability in the reservoir interval, the XGBoost model is the most suitable. These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric. Given the heterogeneity of the subsurface, relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem. Â© 2023 Sinopec Petroleum Exploration and Production Research Institute
date: 2024
official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85168369473&doi=10.1016%2fj.engeos.2023.100229&partnerID=40&md5=1b6d5ad5a4de1116022a2f6df38be6f1
id_number: 10.1016/j.engeos.2023.100229
full_text_status: none
publication: Energy Geoscience
volume: 5
number: 2
refereed: TRUE
citation:   Otchere, D.A.  (2024) Fundamental error in tree-based machine learning model selection for reservoir characterisation.  Energy Geoscience, 5 (2).