The Optimal Reward Operator in Negative Dynamic Programming

Published Online:https://doi.org/10.1287/moor.17.4.921

We consider the negative dynamic programming model of Strauch [12] and prove that the optimal reward function can be obtained by a transfinite iteration of the optimal reward operator. We show that a player loses nothing by restricting himself to measurable policies, if the returns from nonmeasurable policies are evaluated by lower integrals.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.