Optimal Abort Policy for Mission-Critical Systems Under Imperfect Condition Monitoring

Published Online:https://doi.org/10.1287/opre.2022.0643

Although most on-demand mission-critical systems are engineered to be reliable to support critical tasks, occasional failures may still occur during missions. To increase system survivability, a common practice is to abort the mission before an imminent failure. We consider optimal mission abort for a system whose deterioration follows a general three-state (normal, defective, failed) semi-Markov chain. The failure is assumed self-revealed, whereas the healthy and defective states have to be inferred from imperfect condition-monitoring data. Because of the non-Markovian process dynamics, optimal mission abort for this partially observable system is an intractable stopping problem. For a tractable solution, we introduce a novel tool of Erlang mixtures to approximate nonexponential sojourn times in the semi-Markov chain. This allows us to approximate the original process by a surrogate continuous-time Markov chain whose optimal control policy can be solved through a partially observable Markov decision process (POMDP). We show that the POMDP optimal policies converge almost surely to the optimal abort decision rules when the Erlang rate parameter diverges. This implies that the expected cost by adopting the POMDP solution converges to the optimal expected cost. Next, we provide comprehensive structural results on the optimal policy of the surrogate POMDP. Based on the results, we develop a modified point-based value iteration algorithm to numerically solve the surrogate POMDP. We further consider mission abort in a multitask setting where a system executes several tasks consecutively before a thorough inspection. Through a case study on an unmanned aerial vehicle, we demonstrate the capability of real-time implementation of our model, even when the condition-monitoring signals are generated with high frequency.

Funding: This work was supported in part by the National Science Foundation of China [Grants 72171037, 72471144, 72371161, and 72071071], Singapore MOE AcRF Tier 2 grants [Grants A-8001052-00-00 and A-8002472-00-00], and the Future Resilient Systems project supported by the National Research Foundation Singapore under its CREATE program.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2022.0643.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.