Decision Making with Side Information: A Causal Transport Robust Approach

Jincheng Yang
Jincheng Yang
[email protected]
https://orcid.org/0000-0002-3581-9425
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Luhao Zhang
Luhao Zhang
[email protected]
https://orcid.org/0000-0001-8568-3581
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Ningyuan Chen
Ningyuan Chen
[email protected]
https://orcid.org/0000-0002-3948-1011
Department of Management, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada; and Rotman School of Management, University of Toronto, Toronto, Ontario M5S 1A1, Canada
Search for more papers by this author
,
Rui Gao
Corresponding Author
Rui Gao
[email protected]
https://orcid.org/0000-0003-0145-8577
Department of Information, Risk and Operations Management, The University of Texas at Austin, Austin, Texas 78712
Search for more papers by this author
,
Ming Hu
Ming Hu
[email protected]
https://orcid.org/0000-0003-0900-7631
Rotman School of Management, University of Toronto, Toronto, Ontario M5S 1A1, Canada
Search for more papers by this author

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Luhao Zhang

[email protected]

https://orcid.org/0000-0001-8568-3581

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Ningyuan Chen

[email protected]

https://orcid.org/0000-0002-3948-1011

Department of Management, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada; and Rotman School of Management, University of Toronto, Toronto, Ontario M5S 1A1, Canada

Search for more papers by this author

Rui Gao

Corresponding Author

Rui Gao

[email protected]

https://orcid.org/0000-0003-0145-8577

Department of Information, Risk and Operations Management, The University of Texas at Austin, Austin, Texas 78712

Search for more papers by this author

Ming Hu

[email protected]

https://orcid.org/0000-0003-0900-7631

Rotman School of Management, University of Toronto, Toronto, Ontario M5S 1A1, Canada

Search for more papers by this author

Published Online:29 Apr 2026https://doi.org/10.1287/opre.2024.0997

References

Acciaio B, Backhoff-Veraguas J, Carmona R (2019) Extended mean field control problems: Stochastic maximum principle and transport perspective. SIAM J. Control Optim. 57(6):3666–3693.Crossref, Google Scholar
Acciaio B, Backhoff-Veraguas J, Zalashko A (2020) Causal optimal transport and its links to enlargement of filtrations and continuous-time stochastic optimization. Stochastic Processing Appl. 130(5):2918–2953.Crossref, Google Scholar
Analui B, Pflug GC (2014) On distributionally robust multiperiod stochastic optimization. Comput. Management Sci. 11(3):197–220.Crossref, Google Scholar
Backhoff J, Beiglbock M, Lin Y, Zalashko A (2017) Causal transport in discrete time and applications. SIAM J. Optim. 27(4):2528–2562.Crossref, Google Scholar
Ban GY, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.Link, Google Scholar
Ban GY, Gallien J, Mersereau AJ (2019) Dynamic procurement of new products with covariate information: The residual tree method. Manufacturing Service Oper. Management 21(4):798–815.Link, Google Scholar
Bartl D, Wiesel J (2023) Sensitivity of multiperiod optimization problems with respect to the adapted Wasserstein distance. SIAM J. Financial Math. 14(2):704–720.Crossref, Google Scholar
Basciftci B, Ahmed S, Shen S (2021) Distributionally robust facility location problem under decision-dependent stochastic demand. Eur. J. Oper. Res. 292(2):548–561.Crossref, Google Scholar
Bayraksan G, Love DK (2015) Data-driven stochastic programming using phi-divergences. Aleman DM, Thiele AC, eds. The Operations Research Revolution: INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 1–19.Link, Google Scholar
Bazier-Matte T, Delage E (2020) Generalization bounds for regularized portfolio selection with market side information. INFOR: Inform. Systems Oper. Res. 58(2):374–401.Crossref, Google Scholar
Bertsimas D, Georghiou A (2015) Design of near optimal decision rules in multistage adaptive mixed-integer optimization. Oper. Res. 63(3):610–627.Link, Google Scholar
Bertsimas D, Goyal V (2012) On the power and limitations of affine policies in two-stage adaptive optimization. Math. Programming 134(2):491–531.Crossref, Google Scholar
Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
Bertsimas D, Koduri N (2022) Data-driven optimization: A reproducing kernel hilbert space approach. Oper. Res. 70(1):454–471.Link, Google Scholar
Bertsimas D, McCord C (2019) From predictions to prescriptions in multistage optimization problems. Preprint, submitted April 26, https://arxiv.org/abs/1904.11637.Google Scholar
Bertsimas D, Van Parys B (2022) Bootstrap robust prescriptive analytics. Math. Programming 195:39–78.Crossref, Google Scholar
Bertsimas D, Iancu DA, Parrilo PA (2010) Optimality of affine policies in multistage robust optimization. Math. Oper. Res. 35(2):363–394.Link, Google Scholar
Bertsimas D, Iancu DA, Parrilo PA (2011) A hierarchy of near-optimal policies for multistage adaptive optimization. IEEE Trans. Automated Control 56(12):2809–2824.Crossref, Google Scholar
Bertsimas D, McCord C, Sturt B (2023) Dynamic optimization with side information. Eur. J. Oper. Res. 304(2):634–651.Crossref, Google Scholar
Blanchet J, Murthy K (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
Blanchet J, Kang Y, Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probability 56(3):830–857.Crossref, Google Scholar
Brandt MW, Santa-Clara P, Valkanov R (2009) Parametric portfolio policies: Exploiting characteristics in the cross-section of equity returns. Rev. Financial Stud. 22(9):3411–3447.Crossref, Google Scholar
Cao J, Gao R (2021) Contextual decision-making under parametric uncertainty and data-driven optimistic optimization. Optimization Online (October 16), https://optimization-online.org/wp-content/uploads/2021/10/Contextual_optimization-1.pdf.Google Scholar
Carmeli C, De Vito E, Toigo A, Umanitá V (2010) Vector valued reproducing kernel Hilbert spaces and universality. Anal. Appl. (Singapore) 8(01):19–61.Crossref, Google Scholar
Chen X, Sim M, Sun P, Zhang J (2008) A linear decision-based approximation approach to stochastic programming. Oper. Res. 56(2):344–357.Link, Google Scholar
Chenreddy AR, Bandi N, Delage E (2022) Data-driven conditional robust optimization. Adv. Neural Inform. Processing Systems 35:9525–9537.Google Scholar
El Balghiti O, Elmachtoub AN, Grigas P, Tewari A (2019) Generalization bounds in the predict-then-optimize framework. Adv. Neural Inform. Processing Systems 32:14412–14421.Google Scholar
El Housni O, Goyal V (2021) On the optimality of affine policies for budgeted uncertainty sets. Math. Oper. Res. 46(2):674–711.Link, Google Scholar
Elmachtoub AN, Grigas P (2022) Smart “predict, then optimize”. Management Sci. 68(1):9–26.Link, Google Scholar
Elmachtoub A, Liang JCN, McNellis R (2020) Decision trees for decision-making under the predict-then-optimize framework. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (ML Research Press, Cambridge, MA), 2858–2867.Google Scholar
Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1):115–166.Crossref, Google Scholar
Esteban-Pérez A, Morales JM (2022) Distributionally robust stochastic programs with side information based on trimmings. Math. Programming 195(1–2):1069–1105.Crossref, Google Scholar
Estes A (2021) Slow rates of convergence in optimization with side information. Preprint, submitted March 15, https://doi.org/10.2139/ssrn.3803427.Google Scholar
Feng X, He X, Jiao Y, Kang L, Wang C (2024) Deep nonparametric quantile regression under covariate shift. J. Machine Learn. Res. 25(385):1–50.Google Scholar
Gao R (2023) Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. Oper. Res. 71(6):2291–2306.Link, Google Scholar
Gao R, Kleywegt A (2023) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.Link, Google Scholar
Gao R, Arora R, Huang Y (2024a) Data-driven multistage distributionally robust linear optimization with nested distance. Preprint, submitted July 23, https://arxiv.org/abs/2407.16346.Google Scholar
Gao R, Chen X, Kleywegt AJ (2024b) Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 72(3):1177–1191.Link, Google Scholar
Genpact (2019) Food demand forecasting dataset. Kaggle, https://www.kaggle.com/datasets/kannanaikkal/food-demand-forecasting.Google Scholar
Georghiou A, Tsoukalas A, Wiesemann W (2025) On the optimality of affine decision rules in distributionally robust optimization. Management Sci. 72(2):1456–1471.Link, Google Scholar
Hanasusanto GA, Kuhn D (2013) Robust data-driven dynamic programming. Adv. Neural Inform. Processing Systems 26.Google Scholar
Hanasusanto GA, Kuhn D, Wiesemann W (2015) K-adaptability in two-stage robust binary programming. Oper. Res. 63(4):877–891.Link, Google Scholar
Hanasusanto GA, Kuhn D, Wiesemann W (2016) K-adaptability in two-stage distributionally robust binary programming. Oper. Res. Lett. 44(1):6–11.Crossref, Google Scholar
Hannah L, Powell W, Blei D (2010) Nonparametric density estimation for stochastic optimization with an observable state variable. Adv. Neural Inform. Processing Systems 23:820–828.Google Scholar
Ho-Nguyen N, Kılınç-Karzan F (2022) Risk guarantees for end-to-end prediction and optimization processes. Management Sci. 68(12):8680–8698.Link, Google Scholar
Hu Y, Kallus N, Mao X (2022) Fast rates for contextual linear optimization. Management Sci. 68(6):4236–4245.Link, Google Scholar
Hu Y, Wang J, Xie Y, Krause A, Kuhn D (2024) Contextual stochastic bilevel optimization. Adv. Neural Inform. Processing Systems 36.Google Scholar
Iancu DA, Sharma M, Sviridenko M (2013) Supermodularity and affine policies in dynamic robust optimization. Oper. Res. 61(4):941–956.Link, Google Scholar
Jean J (1980) Weak and strong solutions of stochastic differential equations. Stochastics 3(1–4):171–191.Crossref, Google Scholar
Jiang Y (2024) Duality of causal distributionally robust optimization: The discrete-time case. Preprint, submitted January 29, https://arxiv.org/abs/2401.16556.Google Scholar
Kallus N, Mao X (2023) Stochastic optimization forests. Management Sci. 69(4):1975–1994.Link, Google Scholar
Kannan R, Bayraksan G, Luedtke JR (2024) Residuals-based distributionally robust optimization with covariate information. Math. Programming 207:369–425.Crossref, Google Scholar
Kannan R, Bayraksan G, Luedtke JR (2025) Data-driven sample average approximation with covariate information. Oper. Res. 73(6):3245–3259.Link, Google Scholar
Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Netessine S, ed. Operations Research & Management Science in the Age of Analytics. INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 130–166.Google Scholar
Kurtz T (2014) Weak and strong solutions of general stochastic models. Electronic Comm. Probability 19:1–16.Crossref, Google Scholar
Lassalle R (2018) Causal transference plans and their Monge-Kantorovich problems. Stochastic Anal. Appl. 36(3):452–484.Google Scholar
Liu M, Qi M, Shen ZJM (2021) End-to-end deep learning for inventory management with fixed ordering cost and its theoretical analysis. Preprint, submitted July 19, https://doi.org/10.2139/ssrn.3888897.Google Scholar
Liyanage LH, Shanthikumar JG (2005) A practical inventory control policy using operational statistics. Oper. Res. Lett. 33(4):341–348.Crossref, Google Scholar
Loke GG, Tang Q, Xiao Y (2020) Decision-driven regularization: A blended model for predict-then-optimize. Preprint, submitted June 17, https://doi.org/10.2139/ssrn.3623006.Google Scholar
Muñoz MA, Pineda S, Morales JM (2022) A bilevel framework for decision-making under uncertainty with contextual information. Omega (Westport) 108:102575.Crossref, Google Scholar
Nguyen VA, Zhang F, Wang S, Blanchet J, Delage E, Ye Y (2025) Robustifying conditional portfolio decisions via optimal transport. Oper. Res. 73(5):2801–2829.Link, Google Scholar
Oroojlooyjadid A, Snyder LV, Takáč M (2020) Applying deep learning to the newsvendor problem. IISE Trans. 52(4):444–463.Crossref, Google Scholar
Perakis G, Sim M, Tang Q, Xiong P (2023) Robust pricing and production with information partitioning and adaptation. Management Sci. 69(3):1398–1419.Link, Google Scholar
Pflug GC (2010) Version-independence and nested distributions in multistage stochastic optimization. SIAM J. Optim. 20(3):1406–1420.Crossref, Google Scholar
Pflug GC, Pichler A (2012) A distance for multistage stochastic optimization models. SIAM J. Optim. 22(1):1–23.Crossref, Google Scholar
Pflug GC, Pichler A (2014) Multistage Stochastic Optimization (Springer, Berlin).Crossref, Google Scholar
Pflug GC, Pichler A (2015) Dynamic generation of scenario trees. Comput. Optim. Appl. 62(3):641–668.Crossref, Google Scholar
Pflug GC, Pichler A (2016) From empirical observations to tree models for stochastic optimization: Convergence properties. SIAM J. Optim. 26(3):1715–1740.Crossref, Google Scholar
Pflug G, Wozabal D (2007) Ambiguity in portfolio selection. Quant. Finance 7(4):435–442.Crossref, Google Scholar
Pichler A, Shapiro A (2021) Mathematical foundations of distributionally robust multistage optimization. SIAM J. Optim. 31(4):3044–3067.Crossref, Google Scholar
Postek K, Hertog D (2016) Multistage adjustable robust mixed-integer optimization via iterative splitting of the uncertainty set. INFORMS J. Comput. 28(3):553–574.Link, Google Scholar
Qi M, Shen ZJ (2022) Integrating prediction/estimation and optimization with applications in operations management. Chou MC, Gibson H, Staats BR, eds. Tutorials in Operations Research: Emerging and Impactful Topics in Operations (INFORMS, Catonsville, MD), 36–58.Link, Google Scholar
Qi M, Grigas P, Shen ZJ (2025) Integrated conditional estimation-optimization. Oper. Res., ePub ahead of print October 28, https://doi.org/10.1287/opre.2023.0427.Link, Google Scholar
Qi M, Shen ZJ, Zheng Z (2024) Learning newsvendor problems with intertemporal dependence and moderate non-stationarities. Production Oper. Management 33(5):1196–1213.Crossref, Google Scholar
Qi M, Shi Y, Qi Y, Ma C, Yuan R, Wu D, Shen ZJ (2023) A practical end-to-end inventory management model with deep learning. Management Sci. 69(2):759–773.Link, Google Scholar
Rahimian H, Bayraksan G, Homem-de Mello T (2019) Controlling risk and demand ambiguity in newsvendor models. Eur. J. Oper. Res. 279(3):854–868.Crossref, Google Scholar
Rüschendorf L (1985) The Wasserstein distance and approximation theorems. Probability Theory Related Fields 70(1):117–129.Crossref, Google Scholar
Rychener Y, Kuhn D, Sutter T (2023) End-to-end learning for stochastic optimization: A bayesian perspective. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 202 (ML Research Press, Cambridge, MA), 29455–29472.Google Scholar
Sadana U, Chenreddy A, Delage E, Forel A, Frejinger E, Vidal T (2025) A survey of contextual optimization methods for decision-making under uncertainty. Eur. J. Oper. Res. 320(2):271–289.Crossref, Google Scholar
Shafieezadeh-Abadeh S, Kuhn D, Esfahani PM (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
Shapiro A, Dentcheva D, Ruszczyński A (2014) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
Shen Y, Xu P, Zavlanos M (2024) Wasserstein distributionally robust policy evaluation and learning for contextual bandits. Trans. Machine Learn. Res.Google Scholar
Sturt B (2023) A nonparametric algorithm for optimal stopping based on robust optimization. Oper. Res. 71(5):1530–1557.Link, Google Scholar
Subramanyam A, Gounaris CE, Wiesemann W (2019) K-adaptability in two-stage mixed-integer robust optimization. Math. Programming Comput. 1–32.Google Scholar
Toktay LB, Wein LM (2001) Analysis of a forecasting-production-inventory system with stationary demand. Management Sci. 47(9):1268–1281.Link, Google Scholar
Tulabandhula T, Rudin C (2013) Machine learning with operational costs. J. Machine Learn. Res. 14:1989–2028.Google Scholar
Van Parys B, Bennouna MA (2022) Robust two-stage optimization with covariate data. Optimization Online (October 24), https://optimization-online.org/wp-content/uploads/2022/10/main-3.pdf.Google Scholar
Van Parys BP, Esfahani PM, Kuhn D (2021) From data to decisions: Distributionally robust optimization is optimal. Management Sci. 67(6):3387–3402.Link, Google Scholar
Vayanos P, Georghiou A, Yu H (2025) Robust optimization with decision-dependent information discovery. Management Sci. 72(2):1509–1528.Link, Google Scholar
Wang Y, Srivastava PR, Hanasusanto GA, Ho CP (2026) On data-driven prescriptive analytics with side information: A regularized Nadaraya–Watson approach. Manufacturing & Service Operations Management, ePub ahead of print January 5, https://doi.org/10.1287/msom.2024.0997.Link, Google Scholar
Wozabal D (2012) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1):21–47.Crossref, Google Scholar
Xu T, Wenliang LK, Munn M, Acciaio B (2020) COT-GAN: Generating sequential data via causal optimal transport. Adv. Neural Inform. Processing Systems 33:8798–8809.Google Scholar
Yamada T, Watanabe S (1971) On the uniqueness of solutions of stochastic differential equations. J. Math. Kyoto University 11(1):155–167.Crossref, Google Scholar
Yu X, Shen S (2022) Multistage distributionally robust mixed-integer programming with decision-dependent moment-based ambiguity sets. Math. Programming 196(1):1025–1064.Crossref, Google Scholar
Zhang L, Yang J, Gao R (2024) Optimal robust policy for feature-based newsvendor. Management Sci. 70(4):2315–2329.Link, Google Scholar
Zhang L, Yang J, Gao R (2025) A short and general duality proof for Wasserstein distributionally robust optimization. Oper. Res. 73(4):2146–2155.Link, Google Scholar
Zhu K, Thonemann UW (2004) An adaptive forecasting algorithm and inventory policy for products with short life cycles. Naval Res. Logist. 51(5):633–653.Crossref, Google Scholar
Zhu T, Xie J, Sim M (2022) Joint estimation and robustness optimization. Management Sci. 68(3):1659–1677.Link, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:May 03, 2024
Accepted:February 24, 2026
Published Online:April 29, 2026

Cite as

Jincheng Yang, Luhao Zhang, Ningyuan Chen, Rui Gao, Ming Hu (2026) Decision Making with Side Information: A Causal Transport Robust Approach. Operations Research 0(0).

https://doi.org/10.1287/opre.2024.0997

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Decision Making with Side Information: A Causal Transport Robust Approach

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News