Online Planning in Nonstationary Environments

Published Online:https://doi.org/10.1287/opre.2022.0604

A central issue in (finite horizon) online planning problems is to synthesize the impact of real-time decisions on the subsequent states of the system and the performance in the remaining time horizon (cost-to-go function). A complete resolution often leads to intractable dynamic programming problems. We propose a computationally efficient approach to this problem that attains near-optimal performance in nonstationary environments. More specifically, we study a general class of online planning problems with concave objective functions and convex (global) feasibility constraints. A wide range of operational problems (e.g., coupon assignment with multiple objectives, order fulfillment with service level requirements, and resource allocation with budget constraints) can be appropriately modeled using our online planning framework. We consider two different settings for online planning: the simulation setting (Simu) where the decision maker (DM) has unlimited access to data samples via a simulator and the sampling-based setting (Samp) where the DM is constrained to a finite set data samples. Leveraging the value of the “gradient” information obtained from offline simulation in Simu or Samp, we develop a generic offline-to-online approach to facilitate online planning. Our proposed approach produces near optimal solutions in both Simu and Samp. Specifically, in Samp, our performance guarantee improves when the number of data samples or the length of planning horizon increases. We present extensive numerical evidence to validate the performance of our approach in a novel coupon assignment problem and a classic supply chain management problem, and discuss its improvement over existing techniques that assume model stationarity.

Funding: This work was supported by the National Natural Science Foundation of China [Grant 72422006], the Singapore Ministry of Education AcRF Tier 2 Grant [Grant T2EP20124-0037], the Hong Kong Research Grants Council General Research Fund [Grant 16502224], and Theme-based Research Scheme [Grant T32-615/24-R].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2022.0604.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.