Multimodal Dynamic Pricing
Published Online:27 Jan 2021https://doi.org/10.1287/mnsc.2020.3819
References
- (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Proc. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 1–9.Google Scholar
- (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Crossref, Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
- (2007) Improved rates for the stochastic continuum-armed bandit problem. Proc. Conf. Comput. Learn. Theory (COLT) (Springer, Berlin, Heidelberg), 454–468.Google Scholar
- (2013) Bandits with knapsacks. IEEE 54th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 207–216.Google Scholar
- (2020) Online decision-making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
- (2015) On the surprising sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
- (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends® Machine Learn. 5(1):1–122.Google Scholar
- (2011) X-armed bandits. J. Machine Learn. Res. 12(May):1655–1695.Google Scholar
- (2009) Online optimization in x-armed bandits. D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, eds. Proc. Adv. Neural Inform. Processing Systems (NIPS), vol. 21 (Curran Associates, Inc.), 201–208.Google Scholar
- (2011) Convergence rates of efficient global optimization algorithms. J. Machine Learn. Res. 12(Oct):2879–2904.Google Scholar
- (2019) A primal-dual learning algorithm for personalized dynamic pricing with an inventory constraint. Working paper, Hong Kong University of Science and Technology, Hong Kong.Google Scholar
- (2019) Network revenue management with online inverse batch gradient descent method. Working paper, University of Cincinnati, Cincinnati.Google Scholar
- (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost-sales and censored demand. Math. Oper. Res. Forthcoming.Link, Google Scholar
- (2019) A nonparametric self-adjusting control for joint learning and optimization of multi-product pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.Link, Google Scholar
- (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 208–214.Google Scholar
- (2009) Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Automat. Control 54(6):1243–1253.Crossref, Google Scholar
- (2003) Texturing & Modeling: A Procedural Approach (Chapman and Hall/CRC, London).Google Scholar
- (1993) Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21(1):196–216.Crossref, Google Scholar
- (2018) Local Polynomial Modelling and Its Applications (Routledge, Abingdon-on-Thames, UK).Crossref, Google Scholar
- (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. Proc. Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), 385–394.Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
- (2015) Black-box optimization of noisy functions with unknown smoothness. Proc. Adv. Neural Inform. Processing Systems (NIPS), 667–675.Google Scholar
- (2019) Smoothness-adaptive stochastic bandits. Preprint, submitted October 22, https://arxiv.org/abs/1910.09714.Google Scholar
- (2018) Hyperparameter optimization: A spectral approach. Proc. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Applied Math . 6(1):4–22.Crossref, Google Scholar
- (2019) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, University of Michigan, Ann Arbor.Google Scholar
- (2017) Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Machine Learn. Res. 18(1):6765–6816.Google Scholar
- (2016) A ranking approach to global optimization. Proc. Internat. Conf. Machine Learn. (ICML), 1539–1547.Google Scholar
- (2017) Global optimization of Lipschitz functions. Proc. Internat. Conf. Machine Learn. (ICML), 2314–2323.Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2019) Phase transitions and cyclic phenomena in bandits with switching constraints. Preprint, submitted June 6, https://ssrn.com/abstract=3380783.Google Scholar
- (2019) Optimization of smooth functions with noisy observations: Local minimax rates. IEEE Trans. Inform. Theory 65(11):7350–7366.Crossref, Google Scholar
- (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar
- (1992) On the Gittens index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.Crossref, Google Scholar
- (1980) Multi-armed bandits and the Gittens index. J. R. Statist. Soc. B . 42(2):143–149.Google Scholar

