Online Learning with Sample Selection Bias

Published Online:https://doi.org/10.1287/opre.2023.0223

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Abernethy JD, Amin K, Zhu R (2016) Threshold bandits, with and without censored feedback. Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 4889–4897.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the mnl-bandit. Kale S, Shamir O, eds. Proc. 30th Conf. Learn. Theory, Proceedings of Machine Learning Research (PMLR, New York), 76–78.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Ahn D, Shin D, Zeevi A (2023) Feature misspecification in sequential learning problems. Preprint, submitted May 11, https://dx.doi.org/10.2139/ssrn.3860650.Google Scholar
  • Alaei S, Malekian A, Mostagir M (2016) A dynamic model of crowdfunding. Working paper, Ross School of Business, Ann Arbor, MI.Google Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
  • Ban G-Y, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Barry TE (1987) The development of the hierarchy of effects: An historical perspective. Current Issues Res. Advertising 10(1–2):251–295.Google Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022b) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Bastani H, Harsha P, Perakis G, Singhvi D (2022a) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management 24(4):2010–2028.LinkGoogle Scholar
  • Bekkers R, Wiepking P (2011) Who gives? A literature review of predictors of charitable giving part one: Religion, education, age and socialisation. Voluntary Sector Rev. 2(3):337–365.CrossrefGoogle Scholar
  • Bhatia R (2007) Perturbation Bounds for Matrix Eigenvalues (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Boudet J, Gregg B, Rathje K, Stein E, Vollhardt K (2019) The future of personalization: And how to get ready for it. Accessed January 8, https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/the-future-of-personalization-and-how-to-get-ready-for-it.Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Preprint, submitted April 25, https://arxiv.org/abs/1204.5721.Google Scholar
  • Cameron AC, Trivedi PK (2005) Microeconometrics: Methods and Applications (Cambridge University Press, Cambridge, MA).CrossrefGoogle Scholar
  • Cao J, Sun W (2019) Dynamic learning of sequential choice bandit problem under marketing fatigue. Van Hentenryck P, Zhou ZH, eds. Proc. 33rd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3264–3271.Google Scholar
  • Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.LinkGoogle Scholar
  • Chen B, Chao X, Wang Y (2020) Data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2023) Nonstationary reinforcement learning: The blessing of (more) optimism. Management Sci. 69(10):5722–5739.LinkGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 208–214.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 355–366.Google Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 586–594. Google Scholar
  • Foster DJ, Krishnamurthy A, Luo H (2019) Model selection for contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14846–14857.Google Scholar
  • Foster DJ, Gentile C, Mohri M, Zimmert J (2020) Adapting to misspecification in contextual bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11478–11489.Google Scholar
  • Garg N, Johari R (2021) Designing informative rating systems: Evidence from an online labor market. Manufacturing Service Oper. Management 23(3):589–605.LinkGoogle Scholar
  • Ghosh A, Chowdhury SR, Gopalan A (2017) Misspecified linear bandits. Singh SP, Markovitch S, eds. Proc. Thirty-First AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3761–3767.Google Scholar
  • GoFundMe (2020) Inspire hope: The gofundme 2020 giving report. Accessed May 10, 2022, https://www.gofundme.com/2020.Google Scholar
  • Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47(1):153–161.CrossrefGoogle Scholar
  • Howard JA, Sheth JN (1969) The theory of buyer behavior. New York 63:145.Google Scholar
  • Hu M, Li X, Shi M (2015) Product and pricing decisions in crowdfunding. Marketing Sci. 34(3):331–345.LinkGoogle Scholar
  • Jain L, Jamieson K (2018) Firing bandits: Optimizing crowdfunding. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 2206–2214.Google Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
  • Johari R, Schmit S (2018) Learning with abandonment. Preprint, submitted February 23, https://arxiv.org/abs/1802.08718.Google Scholar
  • Johari R, Kamble V, Kanoria Y (2021) Matching while learning. Oper. Res. 69(2):655–681.LinkGoogle Scholar
  • Kao Y-M, Bora Keskin N, Shang K (2020) Bayesian dynamic pricing and subscription period selection with unknown customer utility. Preprint, submitted December 16, https://dx.doi.org/10.2139/ssrn.3722376.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, MA).CrossrefGoogle Scholar
  • Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 2071–2080.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Rappa M, Jones P, Freire J, Chakrabarti S, eds. Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
  • Lo I, Manshadi V, Rodilitz S, Shameli A (2024) Commitment on volunteer crowdsourcing platforms: Implications for growth and engagement. Manufacturing Service Oper. Management 26(5):1787–1805.Google Scholar
  • Manshadi V, Rodilitz S (2020) Online policies for efficient volunteer crowdsourcing. Manea M, Syrgkanis V, Weinberg SM, eds. Proc. 21st ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 315–316.Google Scholar
  • Manshadi V, Rodilitz S, Saban D, Suresh A (2022) Online algorithms for matching platforms with multi-channel traffic. Preprint, submitted March 28, https://arxiv.org/abs/2203.15037.Google Scholar
  • Maystre L, Russo D, Zhao Y (2023) Optimizing audio recommendations for the long-term: A reinforcement learning perspective. Preprint, submitted February 7, https://arxiv.org/abs/2302.03561.Google Scholar
  • Mejia J, Urrea G, Pedraza-Martinez AJ (2019) Operational transparency on crowdfunding platforms: Effect on donations for emergency response. Production Oper. Management 28(7):1773–1791.CrossrefGoogle Scholar
  • Mersereau AJ (2015) Demand estimation from censored observations with inventory record inaccuracy. Manufacturing Service Oper. Management 17(3):335–349.LinkGoogle Scholar
  • Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • Oh M-h, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–11.Google Scholar
  • Papanastasiou Y, Bimpikis K, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.LinkGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper Res. 35(2):395–411.LinkGoogle Scholar
  • Russo D, Van Roy B (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.LinkGoogle Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Sisco MR, Weber EU (2019) Examining charitable giving in real-world online donations. Nature Comm. 10(1):1–8.CrossrefGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272.Google Scholar
  • Smith VH, Kehoe MR, Cremer ME (1995) The private provision of public goods: Altruism and voluntary giving. J. Public Econom. 58(1):107–126.CrossrefGoogle Scholar
  • Tropp JA (2012) User-friendly tail bounds for sums of random matrices. Foundations Comput. Math. 12:389–434.CrossrefGoogle Scholar
  • Verhaert GA, Van den Poel D (2011) Empathy as added value in predicting donation behavior. J. Bus. Res. 64(12):1288–1295.CrossrefGoogle Scholar
  • Verma A, Hanawal M, Rajkumar A, Sankaran R (2019) Censored semi-bandits: A framework for resource allocation with censored feedback. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 14526–14536.Google Scholar
  • Wunderink SR (2002) Individual financial donations to charities in the Netherlands: Why, how and how much? J. Nonprofit Public Sector Marketing 10(2):21–39.CrossrefGoogle Scholar
  • Xu Z, Meisami A, Tewari A (2021) Decision making problems with funnel structure: A multi-task learning approach with application to email marketing campaigns. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 130 (PMLR, New York), 127–135.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.