Feature Misspecification in Sequential Learning Problems
References
- (2019) Recent advances in multiarmed bandits for sequential decision making. INFORMS TutORials Oper. Res. 167–188.Google Scholar
- (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 2012 Conf. Learning Theory, vol. 23 (JMLR: Workshop and Conference Proceedings, New York), 39:1–39:26.Google Scholar
- (2020) Ordinal optimization with generalized linear model. Bae KH, Feng B, Kim S, Lazarova-Molnar S, Zheng Z, Roeder T, Thiesing R, eds. Proc. 2020 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3008–3019.Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2007) Selecting a selection procedure. Management Sci. 53(12):1916–1932.Link, Google Scholar
- (1999) Strong consistency of maximum quasi-likelihood estimate in generalized linear models via a last time. Statist. Probab. Lett. 45(3):237–246.Crossref, Google Scholar
- (2011) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, vol. 1 (World Scientific, Singapore).Google Scholar
- (2019) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.Abstract, Google Scholar
- (2008) Efficient simulation budget allocation for selecting an optimal subset. INFORMS J. Comput. 20(4):579–595.Link, Google Scholar
- (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10:251–270.Crossref, Google Scholar
- (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71–80.Link, Google Scholar
- (2006) Models of the spiral-down effect in revenue management. Oper. Res. 54(5):968–987.Link, Google Scholar
- (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.Link, Google Scholar
- (1996) Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. J. Optim. Theory Appl. 91(2):363–388.Crossref, Google Scholar
- (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
- (2015) Mean square convergence rates for maximum quasi-likelihood estimators. Stochastic Systems 4(2):375–403.Link, Google Scholar
- (2021) Smart “predict, then optimize.” Management Sci. 68(1):9–26.Link, Google Scholar
- (2019) Model selection for contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14741–14752.Google Scholar
- (2020) Adapting to misspecification in contextual bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11478–11489.Google Scholar
- (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47:2410–2439.Crossref, Google Scholar
- (2012) Best arm identification: A unified approach to fixed budget and fixed confidence. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY), 3212–3220.Google Scholar
- (2017) Misspecified linear bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI, Washington, DC).Google Scholar
- (2004) A large deviations perspective on ordinal optimization. Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds. Proc. 2004 Winter Simulation Conf. (IEEE, Piscataway, NJ), 577–585.Google Scholar
- (1991) Misspecification Tests in Econometrics: The Lagrange Multiplier Principle and Other Approaches (Cambridge University Press, Cambridge, UK).Google Scholar
- (1965) On some multiple decision (selection and ranking) rules. Technometrics 7(2):225–245.Crossref, Google Scholar
- (2004) Large deviations for m-estimators. Math. Methods Statist. 13(2):179–200.Google Scholar
- (2009) Directed regression. Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 22 (Curran Associates, Inc., Red Hook, NY), 889–897.Google Scholar
- (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learning Res. 17(1):1–42.Google Scholar
- (2021) Best arm identification in generalized linear bandits. Oper. Res. Lett. 49(3):365–371.Crossref, Google Scholar
- (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
- (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Tran. Modeling Comput. Simulation 11:251–273.Crossref, Google Scholar
- (2006) Selecting the best system. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science: Simulation, vol. 13 (Elsevier, Boston), 501–534.Google Scholar
- (2009) Controlled experiments on the web: Survey and practical guide. Data Mining Knowledge Discovery 18(1):140–181.Crossref, Google Scholar
- (1982) Iterated least squares in multiperiod control. Advances Appl. Math. 3(1):50–73.Crossref, Google Scholar
- (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.Crossref, Google Scholar
- (2020) Learning with good feature representations in bandits and in RL with a generative model. Internat. Conf. Machine Learning (PMLR, New York), 5662–5670.Google Scholar
- (1997) Universal alignment probabilities and subset selection for ordinal optimization. J. Optim. Theory Appl. 93(3):455–489.Crossref, Google Scholar
- (2018) Data-driven ranking and selection: High-dimensional covariates and general dependence. Rabe M, Juan AA, Mustafee N, Skoogh A, Jain S, Johansson B, eds. Proc. 2018 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1933–1944.Google Scholar
- (2015) Recommender system application developments: A survey. Decision Support Systems 74:12–32.Crossref, Google Scholar
- (1989) Generalized Linear Models, 2nd ed. (Chapman & Hall, London).Crossref, Google Scholar
- (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
- (2020) Model selection in contextual stochastic bandit problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10328–10337.Google Scholar
- (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.Link, Google Scholar
- (2020) Simple Bayesian algorithms for best arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
- (2014) Discrete optimization via simulation using Gaussian Markov random fields. Tolk A, Diallo SY, Ryzhov IO, Yilmaz L, Buckley S, Miller JA, eds. Proc. 2014 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3809–3820.Google Scholar
- (2017) Ranking and selection with covariates. Chan WKV, D’Ambrogio A, Zacharewicz G, Mustafee N, Wainer G, Page E, eds. Proc. 2017 Winter Simulation Conf. (IEEE, Piscataway, NJ), 2137–2148.Google Scholar
- (2018) Tractable sampling strategies for ordinal optimization. Oper. Res. 66(6):1693–1712.Link, Google Scholar
- (2014) Best-arm identification in linear bandits. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 828–836.Google Scholar
- (2008) A new perspective on feasibility determination. Mason SJ, Hill RR, Mönch L, Rose O, Jefferson T, Fowler JW, eds. Proc. 2008 Winter Simulation Conf. (IEEE, Piscataway, NJ), 273–280.Google Scholar
- (1974) Asymptotic properties of multiperiod control rules in the linear regression model. Internat. Econom. Rev. 15(2):472–484.Crossref, Google Scholar
- (2015) A market discovery algorithm to estimate a general class of nonparametric choice models. Management Sci. 61(2):281–300.Link, Google Scholar
- (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar
- (1982) Maximum likelihood estimation of misspecified models. Econometrica 25(1):1–25.Crossref, Google Scholar
- (1996) Estimation, Inference and Specification Analysis (Cambridge University Press, New York).Google Scholar
- (2018) A fully adaptive algorithm for pure exploration in linear bandits. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 843–851.Google Scholar

