Open Access

Proxy-Aided Demand Learning with an Application to Various Pricing Problems

Tao Shen
Tao Shen
[email protected]
https://orcid.org/0009-0001-5464-4464
School of Management & Center for Data Science, Zhejiang University, Hangzhou, Zhejiang 310058, China
Search for more papers by this author
,
Yifan Cui
Corresponding Author
Yifan Cui
[email protected]
https://orcid.org/0000-0002-9957-7955
School of Management & Center for Data Science, Zhejiang University, Hangzhou, Zhejiang 310058, China
Search for more papers by this author

Tao Shen

[email protected]

https://orcid.org/0009-0001-5464-4464

School of Management & Center for Data Science, Zhejiang University, Hangzhou, Zhejiang 310058, China

Search for more papers by this author

Yifan Cui

Corresponding Author

Yifan Cui

[email protected]

https://orcid.org/0000-0002-9957-7955

School of Management & Center for Data Science, Zhejiang University, Hangzhou, Zhejiang 310058, China

Search for more papers by this author

Published Online:13 Nov 2025https://doi.org/10.1287/opre.2025.1793

References

Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Bennett A, Kallus N (2023) Proximal reinforcement learning: Efficient off-policy evaluation in partially observed Markov decision processes. Oper. Res. 72(3):1071–1086.Link, Google Scholar
Bernstein F, Modaresi S, Sauré D (2018) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Google Scholar
Bertsekas DP (2014) Constrained Optimization and Lagrange Multiplier Methods (Academic Press, New York).Google Scholar
Bertsimas D, Kallus N (2023) The power and limits of predictive approaches to observational data-driven optimization: The case of pricing. INFORMS J. Optim. 5(1):110–129.Link, Google Scholar
Bertsimas D, Vayanos P (2017) Data-driven learning in dynamic pricing using adaptive optimization. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
Besbes O, Phillips R, Zeevi A (2010) Testing the validity of a demand model: An operations perspective. Manufacturing Service Oper. Management 12(1):162–183.Link, Google Scholar
Bhattacharya D, Dupas P, Kanaya S (2024) Demand and welfare analysis in discrete choice models with social interactions. Rev. Econom. Stud. 91(2):748–784.Crossref, Google Scholar
Bijmolt TH, Van Heerde HJ, Pieters RG (2005) New empirical generalizations on the determinants of price elasticity. J. Marketing Res. 42(2):141–156.Crossref, Google Scholar
Blanchet JH, Glynn PW, Pei Y (2019) Unbiased multilevel Monte Carlo: Stochastic optimization, steady-state simulation, quantiles, and other applications. Preprint, submitted April 22, https://arxiv.org/abs/1904.09929.Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Bu J, Simchi-Levi D, Wang L (2023) Offline pricing and demand learning with censored data. Management Sci. 69(2):885–903.Link, Google Scholar
Cai H, Shi C, Song R, Lu W (2023) Jump interval-learning for individualized decision making with continuous treatments. J. Machine Learn. Res. 24(140):1–92.Google Scholar
Chen X, Wang Y (2023) Robust dynamic pricing with demand learning in the presence of outlier customers. Oper. Res. 71(4):1362–1386.Link, Google Scholar
Chen J, Bhattacharya R, Keith K (2024) Proximal causal inference with text data. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Proc. 38th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 135983–136017.Google Scholar
Chen B, Chao X, Ahn HS (2019) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.Abstract, Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Chen B, Chao X, Wang Y (2020) Data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.Link, Google Scholar
Chen X, Simchi-Levi D, Wang Y (2022) Privacy-preserving dynamic personalized pricing with demand learning. Management Sci. 68(7):4878–4898.Link, Google Scholar
Chen G, Zeng D, Kosorok MR (2016) Personalized dose finding using outcome weighted learning. J. Amer. Statist. Assoc. 111(516):1509–1521.Crossref, Google Scholar
Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
Cohen MC, Miao S, Wang Y (2021) Dynamic pricing with fairness constraints. Preprint, submitted September 25, https://doi.org/10.2139/ssrn.3930622.Google Scholar
Cohen MC, Leung NHZ, Panchamgam K, Perakis G, Smith A (2017) The impact of linear optimization on promotion planning. Oper. Res. 65(2):446–468.Link, Google Scholar
Cui Y, Pu H, Shi X, Miao W, Tchetgen Tchetgen E (2024) Semiparametric proximal causal inference. J. Amer. Statist. Assoc. 119(546):1348–1359.Crossref, Google Scholar
Dai YH (2002) Convergence properties of the BFGS algorithm. SIAM J. Optim. 13(3):693–701.Crossref, Google Scholar
den Boer AV, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.Link, Google Scholar
Dikkala N, Lewis G, Mackey L, Syrgkanis V (2020) Minimax estimation of conditional moment models. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 12248–12262.Google Scholar
Facchinei F, Kungurtsev V (2023) Stochastic approximation for expectation objective and expectation inequality-constrained nonconvex optimization. Preprint, submitted July 6, https://arxiv.org/abs/2307.02943.Google Scholar
Fan J, Guo Y, Yu M (2022) Policy optimization using semiparametric models for dynamic pricing. J. Amer. Statist. Assoc. 119(545):552–564.Crossref, Google Scholar
Ghassami A, Shpitser I, Tchetgen ET (2023) Partial identification of causal effects using proxy variables. Preprint, submitted April 10, https://arxiv.org/abs/2304.04374.Google Scholar
Ghassami A, Ying A, Shpitser I, Tchetgen ET (2022) Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., Proceedings of the Machine Learning Research, vol. 151 (PMLR, New York), 7210–7239.Google Scholar
Golrezaei N, Jaillet P, Liang JCN (2023) Incentive-aware contextual pricing with non-parametric market noise. Ruiz F, Dy J, van de Meent J-W, eds. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of the Machine Learning Research, vol. 206 (PMLR, New York), 9331–9361.Google Scholar
Gupta S, Kohavi R, Tang D, Xu Y, Andersen R, Bakshy E, Cardin N, et al. (2019) Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explorations Newsletter 21(1):20–35.Crossref, Google Scholar
Hartford J, Lewis G, Leyton-Brown K, Taddy M (2017) Deep IV: A flexible approach for counterfactual prediction. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., Proceedings of the Machine Learning Research, vol. 70 (PMLR, New York), 1414–1423.Google Scholar
Hernán MA, Robins JM (2020) Causal Inference: What If (Chapman & Hall/CRC, Boca Raton, FL).Google Scholar
Hildenbrand W (1983) On the “law of demand.” Econometrica 51(4):997–1019.Crossref, Google Scholar
Ito S, Fujimaki R (2016) Large-scale price optimization via network flow. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3862–3870.Google Scholar
Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
Kallus N, Zhou A (2018) Policy evaluation and optimization with continuous treatments. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist., Proceedings of the Machine Learning Research, vol. 84 (PMLR, New York), 1243–1251.Google Scholar
Kallus N, Mao X, Uehara M (2021) Causal inference under unmeasured confounding with negative controls: A minimax learning approach. Preprint, submitted March 25, https://arxiv.org/abs/2103.14029.Google Scholar
Kegenbekov Z, Jackson I (2021) Adaptive supply chain: Demand-supply synchronization using deep reinforcement learning. Algorithms 14(8):240.Crossref, Google Scholar
Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
Kompa B, Bellamy D, Kolokotrones T, Robins JM, Beam A (2022) Deep learning methods for proximal inference via maximum moment restriction. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 11189–11201.Google Scholar
Kress R (1989) Linear Integral Equations, Applied Mathematical Sciences, vol. 82 (Springer, Berlin).Crossref, Google Scholar
Kuroki M, Pearl J (2014) Measurement bias and effect restoration in causal inference. Biometrika 101(2):423–437.Crossref, Google Scholar
Lau HS, Lau AHL (1999) Manufacturer’s pricing strategy and return policy for a single-period commodity. Eur. J. Oper. Res. 116(2):291–304.Crossref, Google Scholar
Lee S, Homem-de Mello T, Kleywegt AJ (2012) Newsvendor-type models with decision-dependent uncertainty. Math. Methods Oper. Res. 76(2):189–221.Crossref, Google Scholar
Li X, Zheng Z (2023) Dynamic pricing with external information and inventory constraint. Management Sci. 70(9):5985–6001.Google Scholar
Lin KY (2006) Dynamic pricing with real-time demand learning. Eur. J. Oper. Res. 174(1):522–538.Crossref, Google Scholar
Liu A, Lau VK, Kananian B (2019) Stochastic successive convex approximation for non-convex constrained stochastic optimization. IEEE Trans. Signal Processing 67(16):4189–4203.Crossref, Google Scholar
Liu J, Park C, Li K, Tchetgen Tchetgen EJ (2024) Regression-based proximal causal inference. Amer. J. Epidemiology 194(7):2030–2036.Crossref, Google Scholar
Liu P, Yang Z, Wang Z, Sun WW (2025) Contextual dynamic pricing with strategic buyers. J. Amer. Statist. Assoc. 120(550):896–908.Crossref, Google Scholar
Luo Y, Sun WW, Liu Y (2024) Distribution-free contextual dynamic pricing. Math. Oper. Res. 49(1):599–618.Link, Google Scholar
Mastouri A, Zhu Y, Gultchin L, Korba A, Silva R, Kusner M, Gretton A, Muandet K (2021) Proximal causal learning with kernels: Two-stage estimation and moment restriction. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of the Machine Learning Research, vol. 139 (PMLR, New York), 7512–7523.Google Scholar
Miao W, Geng Z, Tchetgen Tchetgen EJ (2018) Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105(4):987–993.Crossref, Google Scholar
Miao R, Qi Z, Shi C, Lin L (2023) Personalized pricing with invalid instrumental variables: Identification, estimation, and policy learning. Preprint, submitted February 24, https://arxiv.org/abs/2302.12670.Google Scholar
Miao W, Shi X, Li Y, Tchetgen Tchetgen EJ (2024) A confounding bridge approach for double negative control inference on causal effects. Statist. Theory Related Fields 8(4):262–273.Crossref, Google Scholar
Pearl J (2009) Causal inference in statistics: An overview. Statist. Surveys 3:96–146.Crossref, Google Scholar
Perakis G, Singhvi D (2023) Dynamic pricing with unknown nonparametric demand and limited price changes. Oper. Res. 72(6):2726–2744.Link, Google Scholar
Qi Z, Miao R, Zhang X (2024) Proximal learning for individualized treatment regimes under unmeasured confounding. J. Amer. Statist. Assoc. 119(546):915–928.Crossref, Google Scholar
Qian M, Murphy SA (2011) Performance guarantees for individualized treatment rules. Ann. Statist. 39(2):1180–1210.Crossref, Google Scholar
Rakshit P, Shi X, Tchetgen ET (2025) Adaptive proximal causal inference with some invalid proxies. Preprint, submitted July 25, https://www.arxiv.org/abs/2507.19623.Google Scholar
Shah V, Johari R, Blanchet J (2019) Semi-parametric dynamic contextual pricing. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2363–2373.Google Scholar
Shen T, Cui Y (2023) Optimal treatment regimes for proximal causal learning. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 47735–47748.Google Scholar
Singh R, Sahani M, Gretton A (2019) Kernel instrumental variable regression. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4593–4605.Google Scholar
Talluri K, Van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
Tang J, Qi Z, Fang E, Shi C (2025) Offline feature-based pricing under censored demand: A causal inference approach. Manufacturing Service Oper. Management 27(2):535–553.Link, Google Scholar
Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W (2024) An introduction to proximal causal inference. Statist. Sci. 39(3):375–390.Crossref, Google Scholar
Wang J, Qi Z, Shi C (2022) Blessing from experts: Super reinforcement learning in confounded environments. Preprint, submitted September 29, https://arxiv.org/abs/2209.15448.Google Scholar
Wang Y, Chen X, Chang X, Ge D (2021) Uncertainty quantification for demand prediction in contextual dynamic pricing. Production Oper. Management 30(6):1703–1717.Crossref, Google Scholar
Wang CH, Wang Z, Sun WW, Cheng G (2023) Online regularization toward always-valid high-dimensional dynamic pricing. J. Amer. Statist. Assoc. 119(548):2895–2907.Crossref, Google Scholar
Wu Y, Fu Y, Wang S, Sun X (2023) Doubly robust proximal causal learning for continuous treatments. 12th Internat. Conf. Learn. Representations (Vienna).Google Scholar
Wu S, Hitt LM, Chen P, Anandalingam G (2008) Customized bundle pricing for information goods: A nonlinear mixed-integer programming approach. Management Sci. 54(3):608–622.Link, Google Scholar
Zhang J, Tchetgen Tchetgen E (2024) On identification of dynamic treatment regimes with proxies of hidden confounders. Preprint, submitted February 22, https://arxiv.org/abs/2402.14942.Google Scholar
Zhang B, Tsiatis AA, Laber EB, Davidian M (2012) A robust method for estimating optimal treatment regimes. Biometrics 68(4):1010–1018.Crossref, Google Scholar
Zhao P, Chambaz A, Josse J, Yang S (2024) Positivity-free policy learning with observational data. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist., Proceedings of the Machine Learning Research, vol. 238 (PMLR, New York), 1918–1926.Google Scholar
Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107(499):1106–1118.Crossref, Google Scholar
Zimmert J, Seldin Y (2019) An optimal algorithm for stochastic and adversarial bandits. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., Proceedings of the Machine Learning Research, vol. 89 (PMLR, New York), 467–475.Google Scholar

Volume 74, Issue 2

March-April 2026

Pages v-ix, 573-1152, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:August 06, 2024
Accepted:September 10, 2025
Published Online:November 13, 2025

Cite as

Tao Shen, Yifan Cui (2025) Proxy-Aided Demand Learning with an Application to Various Pricing Problems. Operations Research 74(2):770-787.

https://doi.org/10.1287/opre.2025.1793

Keywords

Acknowledgments

The authors are thankful to the referees, associate editor, and area editor for helpful comments which led to an improved manuscript. This research was conducted while Tao Shen visited Zhejiang University, and he is grateful to Professor Yifan Cui’s invitation.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Proxy-Aided Demand Learning with an Application to Various Pricing Problems

References

Volume 74, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News