Online Learning with Sample Selection Bias

Divya Singhvi
Corresponding Author
Divya Singhvi
[email protected]
https://orcid.org/0000-0001-8763-015X
Leonard N. Stern School of Business, New York University, New York, New York 10012
Search for more papers by this author
,
Somya Singhvi
Somya Singhvi
[email protected]
https://orcid.org/0000-0003-3999-7189
Marshall School of Business, University of Southern California, Los Angeles, California 90005
Search for more papers by this author

Divya Singhvi

Corresponding Author

Divya Singhvi

[email protected]

https://orcid.org/0000-0001-8763-015X

Leonard N. Stern School of Business, New York University, New York, New York 10012

Search for more papers by this author

Somya Singhvi

[email protected]

https://orcid.org/0000-0003-3999-7189

Marshall School of Business, University of Southern California, Los Angeles, California 90005

Search for more papers by this author

Published Online:19 Mar 2025https://doi.org/10.1287/opre.2023.0223

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Inc., Red Hook, NY), 2312–2320.Google Scholar
Abernethy JD, Amin K, Zhu R (2016) Threshold bandits, with and without censored feedback. Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 4889–4897.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the mnl-bandit. Kale S, Shamir O, eds. Proc. 30th Conf. Learn. Theory, Proceedings of Machine Learning Research (PMLR, New York), 76–78.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Ahn D, Shin D, Zeevi A (2023) Feature misspecification in sequential learning problems. Preprint, submitted May 11, https://dx.doi.org/10.2139/ssrn.3860650.Google Scholar
Alaei S, Malekian A, Mostagir M (2016) A dynamic model of crowdfunding. Working paper, Ross School of Business, Ann Arbor, MI.Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
Ban G-Y, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Barry TE (1987) The development of the hierarchy of effects: An historical perspective. Current Issues Res. Advertising 10(1–2):251–295.Google Scholar
Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2022b) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
Bastani H, Harsha P, Perakis G, Singhvi D (2022a) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management 24(4):2010–2028.Link, Google Scholar
Bekkers R, Wiepking P (2011) Who gives? A literature review of predictors of charitable giving part one: Religion, education, age and socialisation. Voluntary Sector Rev. 2(3):337–365.Crossref, Google Scholar
Bhatia R (2007) Perturbation Bounds for Matrix Eigenvalues (SIAM, Philadelphia).Crossref, Google Scholar
Boudet J, Gregg B, Rathje K, Stein E, Vollhardt K (2019) The future of personalization: And how to get ready for it. Accessed January 8, https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/the-future-of-personalization-and-how-to-get-ready-for-it.Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Preprint, submitted April 25, https://arxiv.org/abs/1204.5721.Google Scholar
Cameron AC, Trivedi PK (2005) Microeconometrics: Methods and Applications (Cambridge University Press, Cambridge, MA).Crossref, Google Scholar
Cao J, Sun W (2019) Dynamic learning of sequential choice bandit problem under marketing fatigue. Van Hentenryck P, Zhou ZH, eds. Proc. 33rd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3264–3271.Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Chen B, Chao X, Wang Y (2020) Data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.Link, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2023) Nonstationary reinforcement learning: The blessing of (more) optimism. Management Sci. 69(10):5722–5739.Link, Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 355–366.Google Scholar
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 586–594. Google Scholar
Foster DJ, Krishnamurthy A, Luo H (2019) Model selection for contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14846–14857.Google Scholar
Foster DJ, Gentile C, Mohri M, Zimmert J (2020) Adapting to misspecification in contextual bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11478–11489.Google Scholar
Garg N, Johari R (2021) Designing informative rating systems: Evidence from an online labor market. Manufacturing Service Oper. Management 23(3):589–605.Link, Google Scholar
Ghosh A, Chowdhury SR, Gopalan A (2017) Misspecified linear bandits. Singh SP, Markovitch S, eds. Proc. Thirty-First AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3761–3767.Google Scholar
GoFundMe (2020) Inspire hope: The gofundme 2020 giving report. Accessed May 10, 2022, https://www.gofundme.com/2020.Google Scholar
Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47(1):153–161.Crossref, Google Scholar
Howard JA, Sheth JN (1969) The theory of buyer behavior. New York 63:145.Google Scholar
Hu M, Li X, Shi M (2015) Product and pricing decisions in crowdfunding. Marketing Sci. 34(3):331–345.Link, Google Scholar
Jain L, Jamieson K (2018) Firing bandits: Optimizing crowdfunding. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 2206–2214.Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
Johari R, Schmit S (2018) Learning with abandonment. Preprint, submitted February 23, https://arxiv.org/abs/1802.08718.Google Scholar
Johari R, Kamble V, Kanoria Y (2021) Matching while learning. Oper. Res. 69(2):655–681.Link, Google Scholar
Kao Y-M, Bora Keskin N, Shang K (2020) Bayesian dynamic pricing and subscription period selection with unknown customer utility. Preprint, submitted December 16, https://dx.doi.org/10.2139/ssrn.3722376.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, MA).Crossref, Google Scholar
Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 2071–2080.Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Rappa M, Jones P, Freire J, Chakrabarti S, eds. Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
Lo I, Manshadi V, Rodilitz S, Shameli A (2024) Commitment on volunteer crowdsourcing platforms: Implications for growth and engagement. Manufacturing Service Oper. Management 26(5):1787–1805.Google Scholar
Manshadi V, Rodilitz S (2020) Online policies for efficient volunteer crowdsourcing. Manea M, Syrgkanis V, Weinberg SM, eds. Proc. 21st ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 315–316.Google Scholar
Manshadi V, Rodilitz S, Saban D, Suresh A (2022) Online algorithms for matching platforms with multi-channel traffic. Preprint, submitted March 28, https://arxiv.org/abs/2203.15037.Google Scholar
Maystre L, Russo D, Zhao Y (2023) Optimizing audio recommendations for the long-term: A reinforcement learning perspective. Preprint, submitted February 7, https://arxiv.org/abs/2302.03561.Google Scholar
Mejia J, Urrea G, Pedraza-Martinez AJ (2019) Operational transparency on crowdfunding platforms: Effect on donations for emergency response. Production Oper. Management 28(7):1773–1791.Crossref, Google Scholar
Mersereau AJ (2015) Demand estimation from censored observations with inventory record inaccuracy. Manufacturing Service Oper. Management 17(3):335–349.Link, Google Scholar
Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
Oh M-h, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–11.Google Scholar
Papanastasiou Y, Bimpikis K, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Link, Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper Res. 35(2):395–411.Link, Google Scholar
Russo D, Van Roy B (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.Link, Google Scholar
Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
Sisco MR, Weber EU (2019) Examining charitable giving in real-world online donations. Nature Comm. 10(1):1–8.Crossref, Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272.Google Scholar
Smith VH, Kehoe MR, Cremer ME (1995) The private provision of public goods: Altruism and voluntary giving. J. Public Econom. 58(1):107–126.Crossref, Google Scholar
Tropp JA (2012) User-friendly tail bounds for sums of random matrices. Foundations Comput. Math. 12:389–434.Crossref, Google Scholar
Verhaert GA, Van den Poel D (2011) Empathy as added value in predicting donation behavior. J. Bus. Res. 64(12):1288–1295.Crossref, Google Scholar
Verma A, Hanawal M, Rajkumar A, Sankaran R (2019) Censored semi-bandits: A framework for resource allocation with censored feedback. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14526–14536.Google Scholar
Wunderink SR (2002) Individual financial donations to charities in the Netherlands: Why, how and how much? J. Nonprofit Public Sector Marketing 10(2):21–39.Crossref, Google Scholar
Xu Z, Meisami A, Tewari A (2021) Decision making problems with funnel structure: A multi-task learning approach with application to email marketing campaigns. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 130 (PMLR, New York), 127–135.Google Scholar

Volume 73, Issue 5

September-October 2025

Pages iii-vii, 2297-2866, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:April 27, 2023
Accepted:January 21, 2025
Published Online:March 19, 2025

Cite as

Divya Singhvi, Somya Singhvi (2025) Online Learning with Sample Selection Bias. Operations Research 73(5):2458-2476.

https://doi.org/10.1287/opre.2023.0223

Keywords

Acknowledgments

The authors thank Gustavo Vulcano, the anonymous associate editor, and two anonymous reviewers for constructive comments; Jackie Baek and Fiorin Ciocan for valuable advice that improved the paper; and Tiantong (Frank) Li for assistance in data analysis.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Learning with Sample Selection Bias

References

Volume 73, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News