Feature Misspecification in Sequential Learning Problems

Published Online:https://doi.org/10.1287/mnsc.2022.00328

References

  • Agrawal S (2019) Recent advances in multiarmed bandits for sequential decision making. INFORMS TutORials Oper. Res. 167–188.Google Scholar
  • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 2012 Conf. Learning Theory, vol. 23 (JMLR: Workshop and Conference Proceedings, New York), 39:1–39:26.Google Scholar
  • Ahn D, Shin D (2020) Ordinal optimization with generalized linear model. Bae KH, Feng B, Kim S, Lazarova-Molnar S, Zheng Z, Roeder T, Thiesing R, eds. Proc. 2020 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3008–3019.Google Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Branke J, Chick SE, Schmidt C (2007) Selecting a selection procedure. Management Sci. 53(12):1916–1932.LinkGoogle Scholar
  • Chang YI (1999) Strong consistency of maximum quasi-likelihood estimate in generalized linear models via a last time. Statist. Probab. Lett. 45(3):237–246.CrossrefGoogle Scholar
  • Chen CH, Lee LH (2011) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, vol. 1 (World Scientific, Singapore).Google Scholar
  • Chen B, Chao X, Ahn H (2019) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.AbstractGoogle Scholar
  • Chen CH, He D, Fu M, Lee LH (2008) Efficient simulation budget allocation for selecting an optimal subset. INFORMS J. Comput. 20(4):579–595.LinkGoogle Scholar
  • Chen CH, Lin J, Yücesan E, Chick SE (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10:251–270.CrossrefGoogle Scholar
  • Chick SE, Branke J, Schmidt C (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71–80.LinkGoogle Scholar
  • Cooper WL, de Mello TH, Kleywegt AJ (2006) Models of the spiral-down effect in revenue management. Oper. Res. 54(5):968–987.LinkGoogle Scholar
  • Cooper WL, de Mello TH, Kleywegt AJ (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.LinkGoogle Scholar
  • Dai L (1996) Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. J. Optim. Theory Appl. 91(2):363–388.CrossrefGoogle Scholar
  • den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • den Boer AV, Zwart B (2015) Mean square convergence rates for maximum quasi-likelihood estimators. Stochastic Systems 4(2):375–403.LinkGoogle Scholar
  • Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize.” Management Sci. 68(1):9–26.LinkGoogle Scholar
  • Foster DJ, Krishnamurthy A, Luo H (2019) Model selection for contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14741–14752.Google Scholar
  • Foster DJ, Gentile C, Mohri M, Zimmert J (2020) Adapting to misspecification in contextual bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11478–11489.Google Scholar
  • Frazier P, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47:2410–2439.CrossrefGoogle Scholar
  • Gabillon V, Ghavamzadeh M, Lazaric A (2012) Best arm identification: A unified approach to fixed budget and fixed confidence. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY), 3212–3220.Google Scholar
  • Ghosh A, Chowdhury SR, Gopalan A (2017) Misspecified linear bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI, Washington, DC).Google Scholar
  • Glynn P, Juneja S (2004) A large deviations perspective on ordinal optimization. Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds. Proc. 2004 Winter Simulation Conf. (IEEE, Piscataway, NJ), 577–585.Google Scholar
  • Godfrey LG (1991) Misspecification Tests in Econometrics: The Lagrange Multiplier Principle and Other Approaches (Cambridge University Press, Cambridge, UK).Google Scholar
  • Gupta SS (1965) On some multiple decision (selection and ranking) rules. Technometrics 7(2):225–245.CrossrefGoogle Scholar
  • Joutard C (2004) Large deviations for m-estimators. Math. Methods Statist. 13(2):179–200.Google Scholar
  • Kao Y, Roy BV, Yan X (2009) Directed regression. Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 22 (Curran Associates, Inc., Red Hook, NY), 889–897.Google Scholar
  • Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learning Res. 17(1):1–42.Google Scholar
  • Kazerouni A, Wein LM (2021) Best arm identification in generalized linear bandits. Oper. Res. Lett. 49(3):365–371.CrossrefGoogle Scholar
  • Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.LinkGoogle Scholar
  • Kim SH, Nelson BL (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Tran. Modeling Comput. Simulation 11:251–273.CrossrefGoogle Scholar
  • Kim SH, Nelson BL (2006) Selecting the best system. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science: Simulation, vol. 13 (Elsevier, Boston), 501–534.Google Scholar
  • Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: Survey and practical guide. Data Mining Knowledge Discovery 18(1):140–181.CrossrefGoogle Scholar
  • Lai T, Robbins H (1982) Iterated least squares in multiperiod control. Advances Appl. Math. 3(1):50–73.CrossrefGoogle Scholar
  • Lai TL, Wei CZ (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.CrossrefGoogle Scholar
  • Lattimore T, Szepesvari C, Weisz G (2020) Learning with good feature representations in bandits and in RL with a generative model. Internat. Conf. Machine Learning (PMLR, New York), 5662–5670.Google Scholar
  • Lau TE, Ho YC (1997) Universal alignment probabilities and subset selection for ordinal optimization. J. Optim. Theory Appl. 93(3):455–489.CrossrefGoogle Scholar
  • Li X, Zhang X, Zheng Z (2018) Data-driven ranking and selection: High-dimensional covariates and general dependence. Rabe M, Juan AA, Mustafee N, Skoogh A, Jain S, Johansson B, eds. Proc. 2018 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1933–1944.Google Scholar
  • Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: A survey. Decision Support Systems 74:12–32.CrossrefGoogle Scholar
  • McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd ed. (Chapman & Hall, London).CrossrefGoogle Scholar
  • Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • Pacchiano A, Phan M, Abbasi Yadkori Y, Rao A, Zimmert J, Lattimore T, Szepesvari C (2020) Model selection in contextual stochastic bandit problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10328–10337.Google Scholar
  • Rusmevichientong P, Roy BV, Glynn PW (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.LinkGoogle Scholar
  • Russo D (2020) Simple Bayesian algorithms for best arm identification. Oper. Res. 68(6):1625–1647.LinkGoogle Scholar
  • Salemi P, Nelson BL, Staum J (2014) Discrete optimization via simulation using Gaussian Markov random fields. Tolk A, Diallo SY, Ryzhov IO, Yilmaz L, Buckley S, Miller JA, eds. Proc. 2014 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3809–3820.Google Scholar
  • Shen H, Hong LJ, Zhang X (2017) Ranking and selection with covariates. Chan WKV, D’Ambrogio A, Zacharewicz G, Mustafee N, Wainer G, Page E, eds. Proc. 2017 Winter Simulation Conf. (IEEE, Piscataway, NJ), 2137–2148.Google Scholar
  • Shin D, Broadie M, Zeevi A (2018) Tractable sampling strategies for ordinal optimization. Oper. Res. 66(6):1693–1712.LinkGoogle Scholar
  • Soare M, Lazaric A, Munos R (2014) Best-arm identification in linear bandits. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 828–836.Google Scholar
  • Szechtman R, Yucesan E (2008) A new perspective on feasibility determination. Mason SJ, Hill RR, Mönch L, Rose O, Jefferson T, Fowler JW, eds. Proc. 2008 Winter Simulation Conf. (IEEE, Piscataway, NJ), 273–280.Google Scholar
  • Taylor JB (1974) Asymptotic properties of multiperiod control rules in the linear regression model. Internat. Econom. Rev. 15(2):472–484.CrossrefGoogle Scholar
  • van Ryzin G, Vulcano G (2015) A market discovery algorithm to estimate a general class of nonparametric choice models. Management Sci. 61(2):281–300.LinkGoogle Scholar
  • Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
  • White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 25(1):1–25.CrossrefGoogle Scholar
  • White H (1996) Estimation, Inference and Specification Analysis (Cambridge University Press, New York).Google Scholar
  • Xu L, Honda J, Sugiyama M (2018) A fully adaptive algorithm for pure exploration in linear bandits. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 843–851.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.