Data Driven Approaches to Model Building: Applications to Energy Industries



Journal Title

Journal ISSN

Volume Title



George Box’s famous quote “All models are wrong, but some are useful” is now widely known. Mathematical models can be built based on a combination of first principles and available data. The focus of this work is on the application of data-driven modelling approaches in two specific instances of problems in upstream (oil & gas extraction) and downstream (refining & chemicals) industries, namely (a) cementing of wells drilled for production of oil and gas from unconventional resources, such as shales; and (b) design of robust control-relevant models for oil refineries and chemical plants. Shale gas production from horizontal wells faces potential problems related to gas leakage from the cemented annulus of the well into the air and water reserves, with obvious environmental and productivity implications. Whether a well will leak or not depends on several factors, related to cement composition and preparation, the cementing process, well conditions, and others. A model would be useful in assessing ahead of time whether a cementing job will produce a non-leaking well or not. Such a model could be based on first principles, but would be extremely complicated. Alternatively, as done in this work, a model can be built using multivariate statistics and available data from several leaking and non-leaking wells, cemented under different enough scenarios. The model built has 35 input variables (in the broad categories of casing properties, cement and drilling mud properties, and operating conditions) and manages to correctly classify with confidence 81% of wells as leaking or non-leaking in cross-validation tests. An advanced control system relies on a good control-relevant model that is not merely a good approximation of the actual process under control but also satisfies additional properties necessary for controller design. Control-relevant models are typically identified through industrial experiments whose design is considerably more involved than standard design for parameter estimation. The focus of this study is how to design control-relevant identification experiments when elements of the model are already known. A new theoretical framework is developed and its significant advantages over standard methods are illustrated through numerical simulations. Several possibilities for future development are suggested.



Data-driven modeling, Energy Industries


Portions of this document appear in: Panjwani, Shyam, and Michael Nikolaou. "Ensuring integral controllability for robust multivariable control." Computers & Chemical Engineering 92 (2016): 172-179. And in: Panjwani, Shyam, and Michael Nikolaou. "Experiment design for control‐relevant identification of partially known stable multivariable systems." AIChE Journal 62, no. 9 (2016): 2986-3001. And in: Panjwani, Shyam, Jessica McDaniel, and Michael Nikolaou. "Improvement of zonal isolation in horizontal shale gas wells: A data-driven model-based approach." Journal of Natural Gas Science and Engineering 47 (2017): 101-113.