The Optimal Reward Operator in Negative Dynamic Programming
Abstract
We consider the negative dynamic programming model of Strauch [12] and prove that the optimal reward function can be obtained by a transfinite iteration of the optimal reward operator. We show that a player loses nothing by restricting himself to measurable policies, if the returns from nonmeasurable policies are evaluated by lower integrals.

