Online Learning with Sample Selection Bias

Divya Singhvi
Corresponding Author
Divya Singhvi
[email protected]
https://orcid.org/0000-0001-8763-015X
Leonard N. Stern School of Business, New York University, New York, New York 10012
Search for more papers by this author
,
Somya Singhvi
Somya Singhvi
[email protected]
https://orcid.org/0000-0003-3999-7189
Marshall School of Business, University of Southern California, Los Angeles, California 90005
Search for more papers by this author

Divya Singhvi

Corresponding Author

Divya Singhvi

[email protected]

https://orcid.org/0000-0001-8763-015X

Leonard N. Stern School of Business, New York University, New York, New York 10012

Search for more papers by this author

Somya Singhvi

[email protected]

https://orcid.org/0000-0003-3999-7189

Marshall School of Business, University of Southern California, Los Angeles, California 90005

Search for more papers by this author

Published Online:19 Mar 2025https://doi.org/10.1287/opre.2023.0223

Abstract

We consider the problem of personalized recommendations on online platforms, where user preferences are unknown, and users interact with the platform through a series of sequential decisions (such as clicking to watch on video platforms or clicking to donate on donation platforms). The platform aims to maximize the final outcome (e.g., viewing duration on video platforms or donations on donation platforms). However, the platform only observes the final outcome for users who complete the first stage (clicking on the recommendation). The final outcome for users who do not complete the first stage (not clicking on the recommendation) remains unobserved (also referred to as funneling). This censoring of outcomes creates a selection bias issue, as the observed outcomes at different stages are often correlated. We demonstrate that failing to account for this selection bias results in biased estimates and suboptimal recommendations. In fact, well-performing personalized learning algorithms perform poorly and incur linear regret in this setting. Therefore, we propose the sample selection bandit (SSB) algorithm, which combines Heckman’s two-step estimator with the “optimism under uncertainty” principle to address the sample selection bias issue. We show that the SSB algorithm achieves a rate-optimal regret rate (up to logarithmic terms) of $\tilde{O} (\sqrt{T})$ . Furthermore, we conduct extensive numerical experiments on both synthetic data and real donation data collected from GoFundMe (a crowdfunding platform), demonstrating significant improvements over benchmark state-of-the-art learning algorithms in this setting.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0223.

Volume 73, Issue 5

September-October 2025

Pages iii-vii, 2297-2866, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:April 27, 2023
Accepted:January 21, 2025
Published Online:March 19, 2025

Cite as

Divya Singhvi, Somya Singhvi (2025) Online Learning with Sample Selection Bias. Operations Research 73(5):2458-2476.

https://doi.org/10.1287/opre.2023.0223

Keywords

Acknowledgments

The authors thank Gustavo Vulcano, the anonymous associate editor, and two anonymous reviewers for constructive comments; Jackie Baek and Fiorin Ciocan for valuable advice that improved the paper; and Tiantong (Frank) Li for assistance in data analysis.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Learning with Sample Selection Bias

Abstract

Volume 73, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News