Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes

Published Online:https://doi.org/10.1287/opre.42.3.439

We develop bounds on the value function and a suboptimal design for the partially observed Markov decision process. These bounds and suboptimal design are based on the M most recent observations and actions. An a priori measure of the quality of these bounds is given. We show that larger M implies tighter bounds. An operations count analysis indicates that (#A#Z)M+1(#S) multiplications and additions are required per successive approximations iteration of the suboptimal design algorithm, where A, Z, and S are the action, observation, and state spaces, respectively, suggesting the algorithm is of potential use for problems with large state spaces. A preliminary numerical study indicates that the quality of the suboptimal design can be excellent.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.