Toward a Liquid Biopsy: Greedy Approximation Algorithms for Active Sequential Hypothesis Testing

Kyra Gan
Corresponding Author
Kyra Gan
[email protected]
https://orcid.org/0000-0002-5147-8747
Cornell Tech, Cornell University, New York, New York 10044
Search for more papers by this author
,
Su Jia
Su Jia
[email protected]
https://orcid.org/0000-0002-8948-0834
Center for Data Science for Enterprise and Society, Cornell University, Ithaca, New York 10044
Search for more papers by this author
,
Andrew Li
Andrew Li
[email protected]
https://orcid.org/0000-0002-9552-6421
Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author
,
Sridhar Tayur
Sridhar Tayur
[email protected]
https://orcid.org/0000-0002-8008-400X
Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Kyra Gan

Corresponding Author

Kyra Gan

[email protected]

https://orcid.org/0000-0002-5147-8747

Cornell Tech, Cornell University, New York, New York 10044

Search for more papers by this author

Su Jia

[email protected]

https://orcid.org/0000-0002-8948-0834

Center for Data Science for Enterprise and Society, Cornell University, Ithaca, New York 10044

Search for more papers by this author

Andrew Li

[email protected]

https://orcid.org/0000-0002-9552-6421

Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Sridhar Tayur

[email protected]

https://orcid.org/0000-0002-8008-400X

Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:19 Aug 2025https://doi.org/10.1287/mnsc.2023.00829

References

Adler M, Heeringa B (2008) Approximating optimal binary decision trees. Goel A, Jansen K, Rolim JDP, Rubinfeld R, eds. Approximation Randomization Combin. Optim. Algorithms Techniques. APPROX RANDOM 2008, Lecture Notes in Computer Science, vol. 5171 (Springer, Berlin, Heidelberg), 1–9.Crossref, Google Scholar
American Cancer Society (2024) Cancer statistics center. Retrieved January 23, https://cancerstatisticscenter.cancer.org/.Google Scholar
Arkin EM, Meijer H, Mitchell JSB, Rappaport D, Skiena SS (1998) Decision trees for geometric models. Internat. J. Comparative Geometry Appl. 8(3):343–363. Crossref, Google Scholar
Armitage P (1950) Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis. J. Roy. Statist. Soc. Ser. B 12(1):137–144.Crossref, Google Scholar
Awasthi P, Balcan MF, Long PM (2017) The power of localization for efficiently learning linear separators with noise. J. ACM 63(6):1–27.Crossref, Google Scholar
Azar Y, Gamzu I (2011) Ranking with submodular valuations. Proc. Twenty-Second Annual ACM-SIAM Sympos. Discrete Algorithms (SODA '11) (Society for Industrial and Applied Mathematics, Philadelphia), 1070–1079. Google Scholar
Azar Y, Gamzu I, Yin X (2009) Multiple intents re-ranking. Proc. Forty-First Annual ACM Sympos. Theory Comput. (STOC '09) (Association for Computing Machinery, New York), 669–678.Google Scholar
Balcan MF, Beygelzimer A, Langford J (2006) Agnostic active learning. Proc. 23rd Internat. Conf. Machine Learn. (ICML '06) (Association for Computing Machinery, New York), 65–72.Google Scholar
Banerjee S, Karri SPK, Chatterjee S, Pal M, Paul RR, Chatterjee J (2016) Multimodal diagnostic segregation of oral leukoplakia and cancer. 2016 Internat. Conf. Systems Medicine Biol. (ICSMB) (IEEE, Piscataway, NJ), 67–70.Google Scholar
Bendich A, Wilczok T, Borenfreund E (1965) Circulating DNA as a possible factor in oncogenesis. Science 148(3668):374–376. Crossref, Google Scholar
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, et al. (2018) The UK biobank resource with deep phenotyping and genomic data. Nature 562(7726):203–209. Crossref, Google Scholar
Castro RM, Nowak RD (2007) Minimax bounds for active learning. Bshouty NH, Gentile C, eds. Learn. Theory. COLT 2007, Lecture Notes in Computer Science, vol. 4539 (Springer, Berlin, Heidelberg), 5–19.Google Scholar
Chakaravarthy VT, Pandit V, Roy S, Sabharwal Y (2009) Approximating decision trees with multiway branches. Albers S, Marchetti-Spaccamela A, Matias Y, Nikoletseas S, Thomas W, eds. Automata Languages Programming. ICALP 2009, Lecture Notes in Computer Science, vol. 5555 (Springer, Berlin, Heidelberg), 210–221.Google Scholar
Chan KA, Woo JK, King A, Zee BC, Lam WJ, Chan SL, Chu SW, et al. (2017) Analysis of plasma Epstein–Barr virus DNA to screen for nasopharyngeal cancer. New England J. Medicine 377(6):513–522. Crossref, Google Scholar
Chawla S, Gergatsouli E, Teng Y, Tzamos C, Zhang R (2020) Pandora’s box with correlations: Learning and approximation. 2020 IEEE 61st Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 1214–1225.Google Scholar
Chen Y, Hassani SH, Karbasi A, Krause A (2015) Sequential information maximization: When is greedy near-optimal? Grünwald P, Hazan E, Kale S, eds. Proc. 28th Conf. Learn. Theory, vol. 40 (PMLR, New York), 338–363.Google Scholar
Chernoff H (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.Crossref, Google Scholar
Cohen K, Zhao Q (2014) Quickest anomaly detection: A case of active hypothesis testing. 2014 Inform. Theory Appl. Workshop (ITA) (IEEE, Piscataway, NJ), 1–5.Google Scholar
Cohen JD, Li LU, Wang Y, Thoburn C, Afsari B, Danilova L, Douville C, et al.. (2018) Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359(6378):926–930.Crossref, Google Scholar
Cosmic (2019) Cosmic: Catalogue of somatic mutations in cancer. https://cancer.sanger.ac.uk/.Google Scholar
Crowley E, Di Nicolantonio F, Loupakis F, Bardelli A (2013) Liquid biopsy: Monitoring cancer-genetics in the blood. Nature Rev. Clinical Oncology 10(8):472–484. Crossref, Google Scholar
Dasgupta S (2005) Analysis of a greedy active learning strategy. Proc. 18th Internat. Conf. Neural Inform. Processing Systems (NIPS '04) (MIT Press, Cambridge, MA), 337–344.Google Scholar
Du D, Hwang FK (2000) Combinatorial Group Testing and Its Applications, Series on Applied Mathematics, vol. 12 (World Scientific, Singapore).Google Scholar
Even-Dar E, Mannor S, Mansour Y (2002) Pac bounds for multi-armed bandit and Markov decision processes. Proc. 15th Annual Conf. Comput. Learn. Theory (COLT '02) (Springer-Verlag, Berlin, Heidelberg), 255–270. Google Scholar
Feige U (1998) A threshold of ln n for approximating set cover. J. ACM 45(4):634–652.Crossref, Google Scholar
Ferguson MK, Wang J, Hoffman PC, Haraf DJ, Olak J, Masters GA, Vokes EE (2000) Sex-associated differences in survival of patients undergoing resection for lung cancer. Ann. Thoracic Surgery 69(1):245–249. Crossref, Google Scholar
Freund Y, et al. (1998) Large margin classification using the perceptron algorithm. Proc. Eleventh Annual Conf. Comput. Learn. Theory (COLT '98) (Association for Computing Machinery, New York), 209–217.Google Scholar
Galleri (2024) What is Galleri?: Test performance. Accessed October 15, 2024, https://www.galleri.com/what-is-galleri/test-performance.Google Scholar
Gan K, Jia S, Li A (2021) Greedy approximation algorithms for active sequential hypothesis testing. Proc. 35th Internat. Conf. Neural Inform. Processing Systems (NIPS '21) (Curran Associates Inc., Red Hook, NY), Article 383, 5012–5024.Google Scholar
Gan K, et al. (2021) Causal inference with selectively deconfounded data. Internat. Conf. Artificial Intelligence Statist. (AISTATS '21) (PMLR, New York), 2791–2799.Google Scholar
Golovin D, Krause A (2011) Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artificial Intelligence Res. 42:427–486.Google Scholar
Guillory A, Bilmes JA (2011) Simultaneous learning and covering with adversarial noise. Proc. 28th Internat. Conf. Machine Learn. (ICML '11) (Omnipress, Madison, WI), 369–379.Google Scholar
Hanneke S, Yang L (2015) Minimax analysis of active learning. J. Machine Learn. Res. 16(1):3487–3602. Google Scholar
Hanneke S (2014) Theory of disagreement-based active learning. Foundations Trends Machine Learn. 7(2–3):131–309.Crossref, Google Scholar
Im S, et al. (2016) Minimum latency submodular cover. ACM Trans. Algorithms 13(1):1–28.Crossref, Google Scholar
Jemal A, et al. (2010) Cancer statistics, 2010. CA Cancer J. Clinicians 60(5):277–300. Google Scholar
Jia S, et al. (2019) Optimal decision tree with noisy outcomes. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Jung K, Fleischhacker M, Rabien A (2010) Cell-free DNA in the blood as a solid tumor biomarker—A critical appraisal of the literature. Clinica Chimica Acta 411(21–22):1611–1624.Crossref, Google Scholar
Kamath G, Tzamos C (2019) Anaconda: A non-adaptive conditional sampling algorithm for distribution testing. Proc. Thirtieth Annual ACM-SIAM Sympos. Discrete Algorithms (SODA '19) (Society for Industrial and Applied Mathematics, Philadelphia), 679–693.Google Scholar
Kim Y, Jeon J, Mejia S, Yao CQ, Ignatchenko V, Nyalwidhe JO, Garmolini AO, et al. (2016) Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer. Nature Comm. 7:11906.Google Scholar
Kosaraju SR, et al. (1999) On an optimal split tree problem. Proc. 6th Internat. Workshop Algorithms Data Structures (WADS '99) (Springer-Verlag, Berlin, Heidelberg), 157–168.Google Scholar
Krause A, et al. (2008) Robust submodular observation selection. JMLR 9(12):2761–2801.Google Scholar
Laurent H, et al. (1976) Constructing optimal binary decision trees is np-complete. Informs. Proc. Lett. 5(1):15–17. Crossref, Google Scholar
Li CN, et al. (2003) Cell-free DNA is released from tumor cells upon cell death: A study of tissue cultures of tumor cell lines. J. Clinical Laboratory Anal. (Oxford) 17(4):103–107. Crossref, Google Scholar
Liu M, Oxnard G, Klein E, Swanton CS, Seiden MV, Liu MC, Oxnard GR, et al. (2020) Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncology 31(6):745–759. Crossref, Google Scholar
Lorden G (1977) Nearly-optimal sequential tests for finitely many parameter values. Ann. Statist. 5(1):1–21.Google Scholar
Naghshvar M, Javidi T (2013) Active sequential hypothesis testing. Ann. Statist. 41(6):2703–2738.Crossref, Google Scholar
Navidi F, Kambadur P, Nagarajan V (2020) Adaptive submodular ranking and routing. Oper. Res. 68(3):856–877.Link, Google Scholar
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math. Programming 14(1):265–294.Crossref, Google Scholar
Nowak RD (2009) Noisy generalized binary search. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (NIPS '09) (Curran Associates Inc., Red Hook, NY), 1366–1374.Google Scholar
O’Rourke N, et al. (2000) Lung cancer treatment waiting times and tumour growth. Clinical Oncology 12(3):141–144.Crossref, Google Scholar
Paez JG, Jänne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, et al. (2004) EGFR mutations in lung cancer: Correlation with clinical response to gefitinib therapy. Science 304(5676):1497–1500.Crossref, Google Scholar
Razavi P, Li BT, Abida W, Aravanis A, Jung B, Shen R, Hou C, et al. (2017) Performance of a high-intensity 508-gene circulating-tumor DNA (ctDNA) assay in patients with metastatic breast, lung, and prostate cancer. J Clin Oncol. 35(18):LBA11516-LBA11516.Crossref, Google Scholar
Settles B (2009) Active learning literature survey. Technical report, University of Wisconsin-Madison, Madison.Google Scholar
Siegel RL, Miller KD, Jemal A (2015) Cancer statistics, 2015. Cancer J. Clinicians 65(1):5–29.Crossref, Google Scholar
Suzuki N, et al. (2008) Characterization of circulating DNA in healthy human plasma. Clinica Chimica Acta 387(1–2):55–58. Crossref, Google Scholar
Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, et al. (2019) Cosmic: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47(D1):D941–D947.Crossref, Google Scholar
Wald A (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16(2):117–186.Crossref, Google Scholar
Wang Y, et al. (2016) Noise-adaptive margin-based active learning and lower bounds under tsybakov noise condition. Proc. Thirtieth AAAI Conf. Artificial Intelligence (AAAI '16), vol. 30 (AAAI Press, Palo Alto, CA), 2180–2186.Google Scholar

Volume 72, Issue 4

April 2026

Pages 2681-3628, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:March 14, 2023
Accepted:October 24, 2024
Published Online:August 19, 2025

Cite as

Kyra Gan, Su Jia, Andrew Li, Sridhar Tayur (2025) Toward a Liquid Biopsy: Greedy Approximation Algorithms for Active Sequential Hypothesis Testing. Management Science 72(4):3291-3308.

https://doi.org/10.1287/mnsc.2023.00829

Keywords

Acknowledgments

Kyra Gan and Su Jia contributed equally to the manuscript. Kyra Gan is part of School of Operations Research and Information Engineering at Cornell Andrew Li is affiliated with Operations Research and Sridhar with Operations Management within Tepper School of Business.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Toward a Liquid Biopsy: Greedy Approximation Algorithms for Active Sequential Hypothesis Testing

References

Volume 72, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News