Feature Misspecification in Sequential Learning Problems

Dohyun Ahn
Dohyun Ahn
[email protected]
https://orcid.org/0000-0002-0304-0636
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
Search for more papers by this author
,
Dongwook Shin
Corresponding Author
Dongwook Shin
[email protected]
https://orcid.org/0000-0002-2984-0148
HKUST Business School, Clear Water Bay, Kowloon, Hong Kong
Search for more papers by this author
,
Assaf Zeevi
Assaf Zeevi
[email protected]
https://orcid.org/0000-0003-1075-6664
Graduate School of Business, Columbia University, New York, New York 10025
Search for more papers by this author

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong

Search for more papers by this author

Dongwook Shin

Corresponding Author

Dongwook Shin

[email protected]

https://orcid.org/0000-0002-2984-0148

HKUST Business School, Clear Water Bay, Kowloon, Hong Kong

Search for more papers by this author

Assaf Zeevi

[email protected]

https://orcid.org/0000-0003-1075-6664

Graduate School of Business, Columbia University, New York, New York 10025

Search for more papers by this author

Published Online:29 Aug 2024https://doi.org/10.1287/mnsc.2022.00328

References

Agrawal S (2019) Recent advances in multiarmed bandits for sequential decision making. INFORMS TutORials Oper. Res. 167–188.Google Scholar
Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 2012 Conf. Learning Theory, vol. 23 (JMLR: Workshop and Conference Proceedings, New York), 39:1–39:26.Google Scholar
Ahn D, Shin D (2020) Ordinal optimization with generalized linear model. Bae KH, Feng B, Kim S, Lazarova-Molnar S, Zheng Z, Roeder T, Thiesing R, eds. Proc. 2020 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3008–3019.Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Branke J, Chick SE, Schmidt C (2007) Selecting a selection procedure. Management Sci. 53(12):1916–1932.Link, Google Scholar
Chang YI (1999) Strong consistency of maximum quasi-likelihood estimate in generalized linear models via a last time. Statist. Probab. Lett. 45(3):237–246.Crossref, Google Scholar
Chen CH, Lee LH (2011) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, vol. 1 (World Scientific, Singapore).Google Scholar
Chen B, Chao X, Ahn H (2019) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.Abstract, Google Scholar
Chen CH, He D, Fu M, Lee LH (2008) Efficient simulation budget allocation for selecting an optimal subset. INFORMS J. Comput. 20(4):579–595.Link, Google Scholar
Chen CH, Lin J, Yücesan E, Chick SE (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10:251–270.Crossref, Google Scholar
Chick SE, Branke J, Schmidt C (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71–80.Link, Google Scholar
Cooper WL, de Mello TH, Kleywegt AJ (2006) Models of the spiral-down effect in revenue management. Oper. Res. 54(5):968–987.Link, Google Scholar
Cooper WL, de Mello TH, Kleywegt AJ (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.Link, Google Scholar
Dai L (1996) Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. J. Optim. Theory Appl. 91(2):363–388.Crossref, Google Scholar
den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
den Boer AV, Zwart B (2015) Mean square convergence rates for maximum quasi-likelihood estimators. Stochastic Systems 4(2):375–403.Link, Google Scholar
Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize.” Management Sci. 68(1):9–26.Link, Google Scholar
Foster DJ, Krishnamurthy A, Luo H (2019) Model selection for contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14741–14752.Google Scholar
Foster DJ, Gentile C, Mohri M, Zimmert J (2020) Adapting to misspecification in contextual bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11478–11489.Google Scholar
Frazier P, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47:2410–2439.Crossref, Google Scholar
Gabillon V, Ghavamzadeh M, Lazaric A (2012) Best arm identification: A unified approach to fixed budget and fixed confidence. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY), 3212–3220.Google Scholar
Ghosh A, Chowdhury SR, Gopalan A (2017) Misspecified linear bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI, Washington, DC).Google Scholar
Glynn P, Juneja S (2004) A large deviations perspective on ordinal optimization. Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds. Proc. 2004 Winter Simulation Conf. (IEEE, Piscataway, NJ), 577–585.Google Scholar
Godfrey LG (1991) Misspecification Tests in Econometrics: The Lagrange Multiplier Principle and Other Approaches (Cambridge University Press, Cambridge, UK).Google Scholar
Gupta SS (1965) On some multiple decision (selection and ranking) rules. Technometrics 7(2):225–245.Crossref, Google Scholar
Joutard C (2004) Large deviations for m-estimators. Math. Methods Statist. 13(2):179–200.Google Scholar
Kao Y, Roy BV, Yan X (2009) Directed regression. Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 22 (Curran Associates, Inc., Red Hook, NY), 889–897.Google Scholar
Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learning Res. 17(1):1–42.Google Scholar
Kazerouni A, Wein LM (2021) Best arm identification in generalized linear bandits. Oper. Res. Lett. 49(3):365–371.Crossref, Google Scholar
Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
Kim SH, Nelson BL (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Tran. Modeling Comput. Simulation 11:251–273.Crossref, Google Scholar
Kim SH, Nelson BL (2006) Selecting the best system. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science: Simulation, vol. 13 (Elsevier, Boston), 501–534.Google Scholar
Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: Survey and practical guide. Data Mining Knowledge Discovery 18(1):140–181.Crossref, Google Scholar
Lai T, Robbins H (1982) Iterated least squares in multiperiod control. Advances Appl. Math. 3(1):50–73.Crossref, Google Scholar
Lai TL, Wei CZ (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.Crossref, Google Scholar
Lattimore T, Szepesvari C, Weisz G (2020) Learning with good feature representations in bandits and in RL with a generative model. Internat. Conf. Machine Learning (PMLR, New York), 5662–5670.Google Scholar
Lau TE, Ho YC (1997) Universal alignment probabilities and subset selection for ordinal optimization. J. Optim. Theory Appl. 93(3):455–489.Crossref, Google Scholar
Li X, Zhang X, Zheng Z (2018) Data-driven ranking and selection: High-dimensional covariates and general dependence. Rabe M, Juan AA, Mustafee N, Skoogh A, Jain S, Johansson B, eds. Proc. 2018 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1933–1944.Google Scholar
Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: A survey. Decision Support Systems 74:12–32.Crossref, Google Scholar
McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd ed. (Chapman & Hall, London).Crossref, Google Scholar
Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
Pacchiano A, Phan M, Abbasi Yadkori Y, Rao A, Zimmert J, Lattimore T, Szepesvari C (2020) Model selection in contextual stochastic bandit problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10328–10337.Google Scholar
Rusmevichientong P, Roy BV, Glynn PW (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.Link, Google Scholar
Russo D (2020) Simple Bayesian algorithms for best arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
Salemi P, Nelson BL, Staum J (2014) Discrete optimization via simulation using Gaussian Markov random fields. Tolk A, Diallo SY, Ryzhov IO, Yilmaz L, Buckley S, Miller JA, eds. Proc. 2014 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3809–3820.Google Scholar
Shen H, Hong LJ, Zhang X (2017) Ranking and selection with covariates. Chan WKV, D’Ambrogio A, Zacharewicz G, Mustafee N, Wainer G, Page E, eds. Proc. 2017 Winter Simulation Conf. (IEEE, Piscataway, NJ), 2137–2148.Google Scholar
Shin D, Broadie M, Zeevi A (2018) Tractable sampling strategies for ordinal optimization. Oper. Res. 66(6):1693–1712.Link, Google Scholar
Soare M, Lazaric A, Munos R (2014) Best-arm identification in linear bandits. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 828–836.Google Scholar
Szechtman R, Yucesan E (2008) A new perspective on feasibility determination. Mason SJ, Hill RR, Mönch L, Rose O, Jefferson T, Fowler JW, eds. Proc. 2008 Winter Simulation Conf. (IEEE, Piscataway, NJ), 273–280.Google Scholar
Taylor JB (1974) Asymptotic properties of multiperiod control rules in the linear regression model. Internat. Econom. Rev. 15(2):472–484.Crossref, Google Scholar
van Ryzin G, Vulcano G (2015) A market discovery algorithm to estimate a general class of nonparametric choice models. Management Sci. 61(2):281–300.Link, Google Scholar
Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 25(1):1–25.Crossref, Google Scholar
White H (1996) Estimation, Inference and Specification Analysis (Cambridge University Press, New York).Google Scholar
Xu L, Honda J, Sugiyama M (2018) A fully adaptive algorithm for pure exploration in linear bandits. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 843–851.Google Scholar

Volume 71, Issue 5

May 2025

Pages iv-vi, 3641-4531

Article Information

Supplemental Material

Metrics

Information

Received:February 02, 2022
Accepted:March 20, 2024
Published Online:August 29, 2024

Cite as

Dohyun Ahn; , Dongwook Shin; , Assaf Zeevi (2024) Feature Misspecification in Sequential Learning Problems. Management Science 71(5):4066-4086.

https://doi.org/10.1287/mnsc.2022.00328

Keywords

Acknowledgments

The authors thank Vivek Farias, the department editor, for providing valuable feedback that helped improve the paper. The authors also thank the associate editor and the referees for their thoughtful and detailed comments on the earlier version of the manuscript; their suggestions greatly enhanced the quality and the presentation of the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Feature Misspecification in Sequential Learning Problems

References

Volume 71, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News