Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk

Published Online:https://doi.org/10.1287/mnsc.2023.03209

References

  • Abeille M, Lazaric A (2017) Linear Thompson sampling revisited. Singh A, Zhu J, eds. Proc. 20th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 54 (PMLR, New York), 176–184.Google Scholar
  • Adusumilli K (2021) Risk and optimal policies in bandit experiments. Preprint, submitted December 13, https://arxiv.org/abs/2112.06363.Google Scholar
  • Aizer A, Doyle JJ Jr (2015) Juvenile incarceration, human capital, and future crime: Evidence from randomly assigned judges. Quart. J. Econom. 130(2):759–803.CrossrefGoogle Scholar
  • Atan O, Zame WR, van der Schaar M (2019) Sequential patient recruitment and allocation for adaptive clinical trials. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 1891–1900.Google Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. ECTA 89(1):133–161.CrossrefGoogle Scholar
  • Athey S, Eckles D, Imbens GW (2018) Exact p-values for network interference. J. Amer. Statist. Assoc. 113(521):230–240.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Azevedo EM, Deng A, Montiel Olea JL, Rao J, Weyl EG (2020) A/b testing with fat tails. J. Political Econom. 128(12):4614–4672.CrossrefGoogle Scholar
  • Bakshy E, Eckles D, Bernstein MS (2014) Designing and deploying online field experiments. Proc. 23rd Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 283–292.Google Scholar
  • Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Bareinboim E, Forney A, Pearl J (2015) Bandits with unobserved confounders: A causal approach. Proc. 29th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (MIT Press, Cambridge, MA), 1342–1350.Google Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Baudry D, Gautron R, Kaufmann E, Maillard O (2021) Optimal Thompson sampling strategies for support-aware CVaR bandits. Internat. Conf. Machine Learn. (PMLR, New York), 716–726.Google Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • Bhat N, Farias VF, Moallemi CC, Sinha D (2020) Near-optimal A-B testing. Management Sci. 66(10):4477–4495.LinkGoogle Scholar
  • Bibaut A, Kallus N, Lindon M (2022) Near-optimal non-parametric sequential tests and confidence sequences with possibly dependent observations. Preprint, submitted December 29, https://arxiv.org/abs/2212.14411.Google Scholar
  • Bijmolt TH, Van Heerde HJ, Pieters RG (2005) New empirical generalizations on the determinants of price elasticity. J. Marketing Res. 42(2):141–156.CrossrefGoogle Scholar
  • Bojinov I, Rambachan A, Shephard N (2021) Panel experiments and dynamic causal effects: A finite population perspective. Quant. Econom. 12(4):1171–1196.CrossrefGoogle Scholar
  • Bojinov I, Simchi-Levi D, Zhao J (2023) Design and analysis of switchback experiments. Management Sci. 69(7):3759–3777.LinkGoogle Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Bu J, Simchi-Levi D, Wang C (2022) Context-based dynamic pricing with partially linear demand model. Adv. Neural Inform. Processing Systems 35(1):23780–23791. Google Scholar
  • Cao Y, Kleywegt A, Wang H (2022) Dynamic pricing for two-sided marketplaces with offer expiration. Preprint, submitted January 31, http://dx.doi.org/10.2139/ssrn.3700227.Google Scholar
  • Cassel A, Mannor S, Zeevi A (2018) A general approach to multi-armed bandits under risk criteria. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, Proceedings of Machine Learning Research (PMLR, New York), 1295–1306.Google Scholar
  • Chang JQL, Tan VYF (2022) A unifying theory of Thompson sampling for continuous risk-averse bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 36(6) (AAAI Press, Palo Alto, CA), 6159–6166.CrossrefGoogle Scholar
  • Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.LinkGoogle Scholar
  • Chen N, Hu M (2023) Frontiers in service science: Data-driven revenue management: The interplay of data, model, and decisions. Service Sci. 15(2):79–91.LinkGoogle Scholar
  • Chen N, Gao X, Xiong Y (2022a) Debiasing samples from online learning using bootstrap. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 8514–8533.Google Scholar
  • Chen X, Jasin S, Shi C (2022b) The Elements of Joint Learning and Optimization in Operations Management, vol. 18 (Springer Nature, Cham, Switzerland).CrossrefGoogle Scholar
  • Chen H, Lu W, Song R (2021) Statistical inference for online decision making: In a contextual bandit setting. J. Amer. Statist. Assoc. 116(533):240–255.CrossrefGoogle Scholar
  • Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and structural parameters. Preprint, submitted July 30, https://arxiv.org/abs/1608.00060.Google Scholar
  • Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • Chintagunta PK, Bonfrer A, Song I (2002) Investigating the effects of store-brand introduction on retailer demand and pricing behavior. Management Sci. 48(10):1242–1267.LinkGoogle Scholar
  • Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.LinkGoogle Scholar
  • Datta H, van Heerde HJ, Dekimpe M, Steenkamp J (2022) Cross-national differences in market response: Line-length, price, and distribution elasticities in 14 Indo-Pacific rim economies. J. Marketing Res. 59(2):251–270.CrossrefGoogle Scholar
  • Dimakopoulou M, Ren Z, Zhou Z (2021) Online multi-armed bandits with adaptive inference. Adv. Neural Inform. Processing Systems 34(1):1939–1951.Google Scholar
  • Dimakopoulou M, Zhou Z, Athey S, Imbens G (2017) Estimation considerations in contextual bandits. Preprint, submitted November 19, https://arxiv.org/abs/1711.07077.Google Scholar
  • Dimakopoulou M, Zhou Z, Athey S, Imbens G (2019) Balanced linear contextual bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 3445–3453.Google Scholar
  • Dudík M, Erhan D, Langford J, Li L (2014) Doubly robust policy evaluation and optimization. Statist. Sci. 29(4):485–511.CrossrefGoogle Scholar
  • Fan L, Glynn PW (2021) The fragility of optimized bandit algorithms. Preprint, submitted September 28, https://arxiv.org/abs/2109.13595.Google Scholar
  • Fan J, Guo Y, Yu M (2021) Policy optimization using semiparametric models for dynamic pricing. Preprint, submitted September 13, https://arxiv.org/abs/2109.06368.Google Scholar
  • Farajtabar M, Chow Y, Ghavamzadeh M (2018) More robust doubly robust off-policy evaluation. Internat. Conf. Machine Learn. (PMLR, New York), 1447–1456.Google Scholar
  • Farias VF, Li AA, Peng T, Zheng AT (2022b) Markovian interference in experiments. Preprint, submitted June 6, https://arxiv.org/abs/2206.02371.Google Scholar
  • Farias V, Moallemi C, Peng T, Zheng A (2022a) Synthetically controlled bandits. Preprint, submitted February 14, https://arxiv.org/abs/2202.07079.Google Scholar
  • Feng Y, Xiao B (1999) Maximizing revenues of perishable assets with a risk factor. Oper. Res. 47(2):337–341.LinkGoogle Scholar
  • Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing Service Oper. Management 18(1):69–88.LinkGoogle Scholar
  • Fisher M, Gallino S, Li J (2018) Competition-based dynamic pricing in online retailing: A methodology validated with field experiments. Management Sci. 64(6):2496–2514.LinkGoogle Scholar
  • Galichet N, Sebag M, Teytaud O (2013) Exploration vs exploitation vs safety: Risk-aware multi-armed bandits. Asian Conf. Machine Learn. (PMLR, New York), 245–260.Google Scholar
  • Glynn PW, Johari R, Rasouli M (2020) Adaptive experimental design with temporal interference: A maximum likelihood approach. Adv. Neural Inform. Processing Systems 33(1):15054–15064. Google Scholar
  • Gönsch J (2017) A survey on risk-averse and robust revenue management. Eur. J. Oper. Res. 263(2):337–348.CrossrefGoogle Scholar
  • Gönsch J, Hassler M, Schur R (2018) Optimizing conditional value-at-risk in dynamic pricing. OR Spectrum 40(3):711–750.CrossrefGoogle Scholar
  • Goyal V, Perivier N (2021) Dynamic pricing and assortment under a contextual MNL demand. Preprint, submitted October 19, https://arxiv.org/abs/2110.10018.Google Scholar
  • Guo R, Zhu H, Chow SM, Ibrahim JG (2012) Bayesian lasso for semiparametric structural equation models. Biometrics 68(2):567–577.CrossrefGoogle Scholar
  • Hadad V, Hirshberg DA, Zhan R, Wager S, Athey S (2021) Confidence intervals for policy evaluation in adaptive experiments. Proc. Natl. Acad. Sci. USA 118(15):e2014602118.CrossrefGoogle Scholar
  • Hahn J, Hirano K, Karlan D (2011) Adaptive experimental design using the propensity score. J. Bus. Econom. Statist. 29(1):96–108.CrossrefGoogle Scholar
  • Han Q, Sun WW, Zhang Y (2022) Online statistical inference for matrix contextual bandit. Preprint, submitted December 21, https://arxiv.org/abs/2212.11385.Google Scholar
  • Howard SR, Ramdas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Preprint, submitted October 18, https://arxiv.org/abs/1810.08240.Google Scholar
  • Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
  • Jin Y, Ren Z, Yang Z, Wang Z (2022) Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality. Preprint, submitted December 19, https://arxiv.org/abs/2212.09900.Google Scholar
  • Johari R, Pekelis L, Walsh DJ (2015) Always valid inference: Bringing sequential analysis to a/b testing. Preprint, submitted December 15, https://arxiv.org/abs/1512.04922.Google Scholar
  • Johari R, Li H, Liskovich I, Weintraub GY (2022) Experimental design in two-sided platforms: An analysis of bias. Management Sci. 68(10):7069–7089.LinkGoogle Scholar
  • Kallus N, Zhou A (2018) Confounding-robust policy improvement. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9289–9299.Google Scholar
  • Kato M, Ishihara T, Honda J, Narita Y. (2020) Efficient adaptive experimental design for average treatment effect estimation. Preprint, submitted February 13, https://arxiv.org/abs/2002.05308.Google Scholar
  • Keskin N, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.LinkGoogle Scholar
  • Keskin NB, Li Y, Song JS (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.LinkGoogle Scholar
  • Keskin NB, Li Y, Sunar N (2024) Data-driven clustering and feature-based retail electricity pricing with smart meters. Oper. Res., ePub ahead of print September 3, https://doi.org/10.1287/opre.2022.0112.Google Scholar
  • Khajonchotpanya N, Xue Y, Rujeerapaiboon N (2021) A revised approach for risk-averse multi-armed bandits under CVaR criterion. Oper. Res. Lett. 49(4):465–472.CrossrefGoogle Scholar
  • Kim Y, Telang R, Vogt WB, Krishnan R (2010) An empirical analysis of mobile voice service and SMS: A structural model. Management Sci. 56(2):234–252.LinkGoogle Scholar
  • Kocabıyıkoğlu A, Popescu I (2011) An elasticity approach to the newsvendor with price-sensitive demand. Oper. Res. 59(2):301–312.LinkGoogle Scholar
  • Kohavi R, Henne RM, Sommerfield D (2007) Practical guide to controlled experiments on the web: Listen to your customers not to the hippo. Proc. 13th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 959–967.Google Scholar
  • Kohavi R, Tang D, Xu Y (2020) Trustworthy Online Controlled Experiments: A Practical Guide to a/b Testing (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Kuang X, Wager S (2021) Weak signal asymptotics for sequentially randomized experiments. Preprint, submitted January 25, https://arxiv.org/abs/2101.09855.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Levin Y, McGill J, Nediak M (2008) Risk in revenue management and dynamic pricing. Oper. Res. 56(2):326–343.LinkGoogle Scholar
  • Li X, Zheng Z (2023) Dynamic pricing with external information and inventory constraint. Management Sci. 70(9):5985–6001.Google Scholar
  • Li L, Munos R, Szepesvári C (2015) Toward minimax off-policy value estimation. Artificial Intelligence Statistics (PMLR, New York), 608–616.Google Scholar
  • Liu P, Yang Z, Wang Z, Sun WW (2023) Contextual dynamic pricing with strategic buyers. Preprint, submitted July 8, https://arxiv.org/abs/2307.04055.Google Scholar
  • Luo Y, Sun WW, Liu Y (2023) Distribution-free contextual dynamic pricing. Math. Oper. Res. 49(1):599–618.LinkGoogle Scholar
  • Mao J, Leme R, Schneider J (2018) Contextual pricing for Lipschitz buyers. Adv. Neural Inform. Processing Systems 31(1):5648–5656.Google Scholar
  • Miao S, Wang Y (2021) Network revenue management with nonparametric demand learning:{T}-regret and polynomial dimension dependency. Preprint, submitted October 22, http://dx.doi.org/10.2139/ssrn.3948140.Google Scholar
  • Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.CrossrefGoogle Scholar
  • Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • Offer-Westort M, Coppock A, Green DP (2021) Adaptive experimental design: Prospects and applications in political science. Amer. J. Political Sci. 65(4):826–844.CrossrefGoogle Scholar
  • Prashanth L, Jagannathan K, Kolla RK (2020) Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions. Proc. 37th Internat. Conf. Machine Learn. (PMLR, New York), 5577–5586.Google Scholar
  • Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
  • Qin C, Russo D (2022) Adaptivity and confounding in multi-armed bandit experiments. Preprint, submitted February 18, https://arxiv.org/abs/2202.09036.Google Scholar
  • Robinson PM (1988) Root-n-consistent semiparametric regression. Econometrica 56(4):931–954.CrossrefGoogle Scholar
  • Sani A, Lazaric A, Munos R (2012) Risk-aversion in multi-armed bandits. Proc. 26th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (Curran Associates Inc., Red Hook, NY), 3275–3283.Google Scholar
  • Schur R, Gönsch J, Hassler M (2019) Time-consistent, risk-averse dynamic pricing. Eur. J. Oper. Res. 277(2):587–603.CrossrefGoogle Scholar
  • Scott SL (2015) Multi-armed bandit experiments in the online service economy. Appl. Stoch. Models Bus. Indust. 31(1):37–45.CrossrefGoogle Scholar
  • Shah V, Johari R, Blanchet J (2019) Semi-parametric dynamic contextual pricing. Adv. Neural Inform. Processing Systems 32(1):2363–2373.Google Scholar
  • Simchi-Levi D, Wang C (2023) Pricing experimental design: Causal effect, expected revenue and tail risk. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn., Proc. Machine Learn. Res., vol. 202 (PMLR, New York), 31788–31799. Google Scholar
  • Simchi-Levi D, Wang C (2024) Multi-armed bandit experimental design: Online decision-making and adaptive inference. Management Sci., ePub ahead of print September 20, https://doi.org/10.1287/mnsc.2023.00492.LinkGoogle Scholar
  • Simchi-Levi D, Zheng Z, Zhu F (2022) A simple and optimal policy design with safety against heavy-tailed risk for multi-armed bandits. Preprint, submitted June 7, https://arxiv.org/abs/2206.02969.Google Scholar
  • Slivkins A (2011) Contextual bandits with similarity information. Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 19 (PMLR, New York), 679–702.Google Scholar
  • Swaminathan A, Joachims T (2015) Batch learning from logged bandit feedback through counterfactual risk minimization. J. Machine Learn. Res. 16(1):1731–1755. Google Scholar
  • Tellis GJ (1988) The price elasticity of selective demand: A meta-analysis of econometric models of sales. J. Marketing Res. 25(4):331–341.CrossrefGoogle Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Wager S, Xu K (2021) Experimenting in equilibrium. Management Sci. 67(11):6694–6715.LinkGoogle Scholar
  • Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang YX, Agarwal A, Dudık M (2017) Optimal and adaptive off-policy evaluation in contextual bandits. Internat. Conf. Machine Learn. (PMLR, New York), 3589–3597.Google Scholar
  • Wang Y, Chen B, Simchi-Levi D (2021b) Multimodal dynamic pricing. Management Sci. 67(10):6136–6152.LinkGoogle Scholar
  • Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
  • Wang H, Talluri K, Li X (2021a) On dynamic pricing with covariates. Preprint, submitted December 25, https://arxiv.org/abs/2112.13254.Google Scholar
  • Wang CH, Wang Z, Sun WW, Cheng G (2024) Online regularization toward always-valid high-dimensional dynamic pricing. J. Amer. Statist. Assoc. 119(548):2895–2907.CrossrefGoogle Scholar
  • Wen X, Sun WW, Zhang Y (2023) Online tensor inference. Preprint, submitted December 28, https://arxiv.org/abs/2312.17111.Google Scholar
  • Xiong R, Athey S, Bayati M, Imbens G (2019) Optimal experimental design for staggered rollouts. Preprint, submitted November 9, https://arxiv.org/abs/1911.03764.Google Scholar
  • Xu J, Wang YX (2021) Logarithmic regret in feature-based dynamic pricing. Adv. Neural Inform. Processing Systems 34(1):13898–13910.Google Scholar
  • Xu JJ, Fader PS, Veeraraghavan S (2019) Designing and evaluating dynamic pricing policies for major league baseball tickets. Manufacturing Service Oper. Management 21(1):121–138.LinkGoogle Scholar
  • Zhan R, Hadad V, Hirshberg DA, Athey S (2021) Off-policy evaluation via adaptive weighting with data from contextual bandits. Proc. 27th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2125–2135.Google Scholar
  • Zhou Z, Athey S, Wager S (2022) Offline multi-action policy learning: Generalization and optimization. Oper. Res. 71(1):148–183.LinkGoogle Scholar
  • Zhu Q, Tan V (2020) Thompson sampling algorithms for mean-variance bandits. Internat. Conf. Machine Learn. (PMLR, New York), 11599–11608.Google Scholar
  • Zimin A, Ibsen-Jensen R, Chatterjee K (2014) Generalized risk-aversion in stochastic multi-armed bandits. Preprint, submitted May 5, https://arxiv.org/abs/1405.0833.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.