Stochastic Optimization Forests
Abstract
We study contextual stochastic optimization problems, where we leverage rich auxiliary observations (e.g., product characteristics) to improve decision making with uncertain variables (e.g., demand). We show how to train forest decision policies for this problem by growing trees that choose splits to directly optimize the downstream decision quality rather than split to improve prediction accuracy as in the standard random forest algorithm. We realize this seemingly computationally intractable problem by developing approximate splitting criteria that use optimization perturbation analysis to eschew burdensome reoptimization for every candidate split, so that our method scales to large-scale problems. We prove that our splitting criteria consistently approximate the true risk and that our method achieves asymptotic optimality. We extensively validate our method empirically, demonstrating the value of optimization-aware construction of forests and the success of our efficient approximations. We show that our approximate splitting criteria can reduce running time hundredfold while achieving performance close to forest algorithms that exactly reoptimize for every candidate split.
This paper was accepted by Hamid Nazerzadeh, data science.
Funding: This work was supported by the National Science Foundation [Grant 1846210].
Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2022.4458.