Trade-Offs in Leveraging External Data Capabilities: Evidence from a Field Experiment in an Online Search Market

Xiaoxia Lei
Xiaoxia Lei
[email protected]
https://orcid.org/0000-0003-4781-1305
Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200030, China
Search for more papers by this author
,
Yixing Chen
Yixing Chen
[email protected]
https://orcid.org/0000-0001-5509-4161
Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556
Search for more papers by this author
,
Ananya Sen
Corresponding Author
Ananya Sen
[email protected]
https://orcid.org/0000-0002-9082-6871
Heinz College, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200030, China

Search for more papers by this author

Yixing Chen

[email protected]

https://orcid.org/0000-0001-5509-4161

Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556

Search for more papers by this author

Ananya Sen

Corresponding Author

Ananya Sen

[email protected]

https://orcid.org/0000-0002-9082-6871

Heinz College, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:11 Aug 2025https://doi.org/10.1287/mnsc.2023.01834

References

Agrawal A, Gans J, Goldfarb A (2018) Prediction Machines: The Simple Economics of Artificial Intelligence (Harvard Business Press, Boston). Google Scholar
Allcott H, Castillo JC, Gentzkow M, Musolff L, Salz T (2024) Sources of market power in web search evidence from a field experiment. Technical report, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Alrashed T, Almahmoud J, Zhang AX, Karger DR (2020) ScrAPIr: Making web data APIs accessible to end users. Proc. 2020 CHI Conf. Human Factors Comput. Systems (Association for Computing Machinery, New York), 1–12. Google Scholar
Analytical Methods Committee (1994) Is my calibration linear? Analyst 119(11):2363–2366.Crossref, Google Scholar
Benzell SG, Hersh J, Van Alstyne M (2023) How APIs create growth by inverting the firm. Management Sci. 70(10):7120–7141.Link, Google Scholar
Beraja M, Yang DY, Yuchtman N (2023) Data-intensive innovation and the state: Evidence from AI firms in China. Rev. Econ. Stud. 90(4):1701–1723.Crossref, Google Scholar
Brennan J, Cong Y, Yu Y, Lin L, Peng Y, Meng C, Han N, Pouget-Abadie J, Holtz DM (2025) Reducing symbiosis bias through better A/B tests of recommendation algorithms. Proc. 34th ACM Web Conf. (WWW) (Association for Computing Machinery, New York), 3702–3715.Google Scholar
Brinkmann D (2022) Why real-time data pipelines are so hard. MLOps Community. https://mlops.community/why-real-time-data-pipelines-are-so-hard/.Google Scholar
Cai F, de Rijke M (2016) A survey of query auto completion in information retrieval. Foundations Trends Inform. Retrieval 10(4):273–363.Crossref, Google Scholar
Casella G, Berger RL (2002) Statistical Inference, 2nd ed. (Duxbury Press, Pacific Grove, CA).Google Scholar
Chan T, Hamdi N, Hui X, Jiang Z (2022) The value of verified employment data for consumer lending: Evidence from Equifax. Marketing Sci. 41(4):795–814.Link, Google Scholar
Chiou L, Tucker C (2017) Search engines and data retention: Implications for privacy and antitrust. Technical report, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Covington P, Adams J, Sargin E (2016) Deep neural networks for YouTube recommendations. Proc. 10th ACM Conf. Recommender Systems (Association for Computing Machinery, New York), 191–198.Google Scholar
Deng A, Knoblich U, Lu J (2018) Applying the delta method in metric analytics: A practical guide with novel ideas. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 233–242.Google Scholar
Duranton S, Gourévitch A, Baltassis E, Khendek Y, Quarta L, Fernández M, Rubio MM (2021) Is your company gaining momentum in data? Boston Consulting Group (November 17), https://www.bcg.com/publications/2021/companies-data-capabilities-progress.Google Scholar
Fang L, Chen Y, Farronato C, Yuan Z, Wang Y (2024) Platform information provision and consumer search: A field experiment. Technical report, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Goli A, Reiley DH, Zhang H (2024) Personalizing ad load to optimize subscription and ad revenues: Product strategies constructed from experiments on Pandora. Marketing Sci. 44(2):327–352.Link, Google Scholar
Goli A, Huang J, Reiley D, Riabov N (2018) Measuring consumer sensitivity to audio advertising: A field experiment on Pandora internet radio. Preprint, submitted April 22, https://dx.doi.org/10.2139/ssrn.3166676.Google Scholar
Google (2019) How google fights disinformation. White paper. https://safety.google/intl/en_uk/stories/fighting-misinformation-online/.Google Scholar
Gordon BR, Moakler R, Zettelmeyer F (2023) Close enough? A large-scale exploration of non-experimental approaches to advertising measurement. Marketing Sci. 42(4):768–793.Link, Google Scholar
Gulli A (2013) A deeper look at autosuggest. Bing blog. https://blogs.bing.com/search/March-2013/A-Deeper-Look-at-Autosuggest.Google Scholar
Gupta AK, Smith KG, Shalley CE (2006) The interplay between exploration and exploitation. Acad. Management J. 49(4):693–706.Crossref, Google Scholar
Hagiu A, Wright J (2023) Data-enabled learning, network effects, and competitive advantage. RAND J. Econom. 54(4):638–667.Crossref, Google Scholar
Havakhor T, Rahman MS, Zhang T, Zhu C (2024) Tech-enabled financial data access, retail investors, and gambling-like behavior in the stock market. Management Sci. 71(2):1646–1670.Link, Google Scholar
Johansson JK (1979) Advertising and the s-curve: A new approach. J. Marketing Res. 16(3):346–354.Crossref, Google Scholar
Klein TJ, Kurmangaliyeva M, Prüfer J, Prüfer P, Park NN (2022) How important are user-generated data for search result quality? Experimental evidence. Technical report, CEPR Discussion Paper DP17934, Center for Economic and Policy Research, Washington, DC.Google Scholar
Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013) Online controlled experiments at large scale. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1168–1176.Google Scholar
Kopliku A, Pinel-Sauvagnat K, Boughanem M (2014) Aggregated search: A new information retrieval paradigm. ACM Comput. Surveys 46(3):1–31.Crossref, Google Scholar
Kucharavy A, Schillaci Z, Maréchal L, Würsch M, Dolamic L, Sabonnadiere R, David DP, Mermoud A, Lenders V (2023) Fundamentals of generative large language models and perspectives in cyber-defense. Preprint, submitted March 21, https://arxiv.org/abs/2303.12132.Google Scholar
Laverty KJ (1996) Economic “short-termism”: The debate, the unresolved issues, and the implications for management practice and research. Acad. Management Rev. 21(3):825–860.Crossref, Google Scholar
Li H, Kettinger WBJ (2021) The building blocks of software platforms: Understanding the past to forge the future. J. Assoc. Inform. Systems 22(6):1524–1555.Google Scholar
Little JD (1979) Aggregate advertising models: The state of the art. Oper. Res. 27(4):629–667.Link, Google Scholar
Mitra B, Craswell N (2015) Query auto-completion for rare prefixes. Proc. 24th ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 1755–1758.Google Scholar
Nagaraj A (2022) The private impact of public data: Landsat satellite maps increased gold discoveries and encouraged entry. Management Sci. 68(1):564–582. Link, Google Scholar
Nandy P, Venugopalan D, Lo C, Chatterjee S (2021) A/B testing for recommender systems in a two-sided marketplace. Proc. 35th Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6466–6477.Google Scholar
OpenAI (2024) Terms of use. OpenAI blog. https://openai.com/policies/terms-of-use/.Google Scholar
Park DH, Chiba R (2017) A neural language model for query auto-completion. Proc. 40th Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (Association for Computing Machinery, New York), 1189–1192.Google Scholar
Peukert C, Sen A, Claussen J (2024) The editor and the algorithm: Recommendation technology in online news. Management Sci. 70(9):5816–5831.Abstract, Google Scholar
Rahmandad H (2012) Impact of growth opportunities and competition on firm-level capability development trade-offs. Organ. Sci. 23(1):138–154.Link, Google Scholar
Sanderson M (2008) Ambiguous queries: Test collections need more sense. Proc. 31st ACM SIGIR Conf. Res. Development Inform. Retrieval (Association for Computing Machinery, New York), 499–506.Google Scholar
Sarwar BM (2001) Sparsity, scalability, and distribution in recommender systems. University of Minnesota. Google Scholar
Schaefer M, Sapi G (2023) Complementarities in learning from data: Insights from general search. Inform. Econom. Policy 65:101063.Crossref, Google Scholar
Serban I, Sordoni A, Bengio Y, Courville A, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. Proc. 30th AAAI Conf. Artificial Intelligence, vol. 30 (AAAI Press, Palo Alto, CA).Google Scholar
Stoica I, Song D, Popa RA, Patterson D, Mahoney MW, Katz R, Joseph AD, et al. (2017) A Berkeley view of systems challenges for AI. Preprint, submitted December 15, https://arxiv.org/abs/1712.05855.Google Scholar
Sullivan D (2018) How Google autocomplete works in Search. Google blog. https://blog.google/products/search/how-google-autocomplete-works-search/.Google Scholar
Sun T, Yuan Z, Li C, Zhang K, Xu J (2024) The value of personal data in internet commerce: A high-stakes field experiment on data regulation policy. Management Sci. 70(4):2645–2660.Link, Google Scholar
Tang D, Agarwal A, O’Brien D, Meyer M (2010) Overlapping experiment infrastructure: More, better, faster experimentation. Proc. 16th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 17–26.Google Scholar
Ubaldi B (2013) Open government data: Towards empirical analysis of open government data initiatives. Technical report, Organisation for Economic Co-operation and Development, Paris.Google Scholar
Ursu RM (2018) The power of rankings: Quantifying the effect of rankings on online consumer search and purchase decisions. Marketing Sci. 37(4):530–552.Link, Google Scholar
Wernerfelt N, Tuchman A, Shapiro B, Moakler R (2024) Estimating the value of offsite data to advertisers on meta. Marketing Sci. 44(2):268–286.Link, Google Scholar
Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A (2015) From infrastructure to culture: A/B testing challenges in large scale social networks. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2227–2236.Google Scholar
Xue L, Song P, Rai A, Zhang C, Zhao X (2019) Implications of application programming interfaces for third‐party new app development and copycatting. Production Oper. Management 28(8):1887–1902.Crossref, Google Scholar
Yang J, Sahni NS, Nair HS, Xiong X (2024) Advertising as information for ranking e-commerce search listings. Marketing Sci. 43(2):360–377.Link, Google Scholar
Yoganarasimhan H (2020) Search personalization using machine learning. Management Sci. 66(3):1045–1070.Link, Google Scholar
Zaif Z (2023) Search box optimization. The Medium (December 22), https://googlerankingexpert.medium.com/search-box-optimization-ff883b2fb56e.Google Scholar
Zaveri A, Dastgheib S, Wu C, Whetzel T, Verborgh R, Avillach P, Korodi G, et al. (2017) SmartAPI: Towards a more intelligent network of web APIs. Semantic Web 14th Internat. Conf. Proc. Part II 14 (Springer, Cham, Switzerland), 154–169.Crossref, Google Scholar
Zeleti FA, Ojo A (2017) Open data value capability architecture. Inform. Systems Frontiers 19(2):337–360.Crossref, Google Scholar
Zhao Y, Yildirim P, Chintagunta PK (2023) Privacy regulations and online search friction: Evidence from GDPR. Working paper, Marketing Science Institute, New York.Google Scholar
Zheng S, Tong S, Kwon HE, Burtch G, Li X (2025) Frontiers: Recommending what to search: Sales volume and consumption diversity effects of a query recommender system. Marketing Sci. 44(3):516–524.Google Scholar

Volume 72, Issue 4

April 2026

Pages 2681-3628, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:June 15, 2023
Accepted:December 06, 2024
Published Online:August 11, 2025

Cite as

Xiaoxia Lei, Yixing Chen, Ananya Sen (2025) Trade-Offs in Leveraging External Data Capabilities: Evidence from a Field Experiment in an Online Search Market. Management Science 72(4):2998-3015.

https://doi.org/10.1287/mnsc.2023.01834

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Trade-Offs in Leveraging External Data Capabilities: Evidence from a Field Experiment in an Online Search Market

References

Volume 72, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News