Free Access

A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns

Karthik Srinivasan
Corresponding Author
Karthik Srinivasan
[email protected]
https://orcid.org/0000-0002-1608-6190
School of Business, University of Kansas, Lawrence, Kansas 66045
Search for more papers by this author
,
Faiz Currim
Faiz Currim
[email protected]
https://orcid.org/0000-0002-5025-811X
Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721
Search for more papers by this author
,
Sudha Ram
Sudha Ram
[email protected]
https://orcid.org/0000-0001-6053-1311
Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721
Search for more papers by this author

Karthik Srinivasan

Corresponding Author

Karthik Srinivasan

[email protected]

https://orcid.org/0000-0002-1608-6190

School of Business, University of Kansas, Lawrence, Kansas 66045

Search for more papers by this author

Faiz Currim

[email protected]

https://orcid.org/0000-0002-5025-811X

Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721

Search for more papers by this author

Sudha Ram

[email protected]

https://orcid.org/0000-0001-6053-1311

Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721

Search for more papers by this author

Published Online:26 Nov 2024https://doi.org/10.1287/ijds.2022.9016

References

Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J. Roy. Statist. Soc. Ser B Statist. Methodology 82(4).Google Scholar
Banner KM, Higgs MD (2017) Considerations for assessing model averaging of regression coefficients. Ecological Appl. 27(1).Google Scholar
Batista GEAPA, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl. Artificial Intelligence 17(5–6):519–533.Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees, vol. 19 (Routledge).Google Scholar
Camino RD, Hammerschmidt CA, State R (2020) Working with deep generative models and tabular data imputation. Proc. Internat. Conf. Machine Learn. (Vienna).Google Scholar
Chen X, Xie MG (2014) A split-and-conquer approach for analysis of extraordinarily large data. Statist. Sinica 24(4).Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B Methodological 39(1):1–22.Google Scholar
Fanaee-T H, Gama J (2013) Event labeling combining ensemble detectors and background knowledge. Progress Artificial Intelligence 2:113–127.Google Scholar
Friedman N (1997) Learning belief networks in the presence of missing values and hidden variables. Proc. 14th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 125–133.Google Scholar
Friedman JH, Kohavi R, Yun Y (1996) Lazy decision tree. Proc. Thirteenth Natl. Conf. Artificial Intelligence, vol. 1 (AAAI Press, Palo Alto, CA), 717–724.Google Scholar
GSA (2021) Wellbuilt for wellbeing. Accessed November 18, 2024, https://www.gsa.gov/governmentwide-initiatives/federal-highperformance-green-buildings/resource-library/health/wellbuilt-for-wellbeing.Google Scholar
Han J, Kamber M, Pei J (2022) Data Mining: Concepts and Techniques, 4th ed. (Morgan Kaufmann).Google Scholar
Hansen BE, Racine JS (2012) Jackknife model averaging. J. Econometrics 167(1):38–46.Google Scholar
Hastie T, Mazumder R, Lee JD, Zadeh R (2015) Matrix completion and low-rank SVD via fast alternating least squares. J. Machine Learn. Res. 16:3367–3402.Google Scholar
Ipsen NB, Mattei PA, Frellsen J (2020) How to deal with missing data in supervised deep learning? Proc. Internat. Conf. Machine Learn.Google Scholar
Kowarik A, Templ M (2016) Imputation with the R package VIM. J. Statist. Software 74:1–16.Google Scholar
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(December):18–22.Google Scholar
Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data (John Wiley & Sons, Hoboken, NJ).Google Scholar
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 4766–4775.Google Scholar
Mattei PA, Freiisen J (2019) MIWAE: Deep generative modelling and imputation of incomplete data sets. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J. Machine Learn. Res. 11:2287–2322.Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592.Google Scholar
Rubin DB (2004) Multiple Imputation for Nonresponse in Surveys (John Wiley & Sons, Hoboken, NJ).Google Scholar
Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J. Machine Learn. Res. 8:1625–1657.Google Scholar
Salomon JA, Reinhart A, Bilinski A, Chua EJ, la Motte-Kerr W, Rönn MM, Reitsma MB, et al. (2021) The US COVID-19 Trends and Impact Survey: Continuous real-time measurement of COVID-19 symptoms, risks, protective behaviors, testing, and vaccination. Proc. Natl. Acad. Sci. USA 118(51):e2111454118.Google Scholar
Schafer JL, Graham JW (2002) Missing data: Our view of the state of the art. Psych. Methods 7(2):142–177.Google Scholar
Schuurmans D, Greiner R (1997) Learning to classify incomplete examples. Comput. Learn. Theory Natural Learn. Systems Making Learn. Systems Practice, vol. 4 (MIT Press, Cambridge, MA), 87–105.Google Scholar
Smieja M, Struski L, Tabor J, Zielinski B, Spurek P (2018) Processing of missing data by neural networks. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Srinivasan K, Currim F, Lindberg CM, Razjouyan J, Gilligan B, Lee H, Canada KJ, et al. (2023) Discovery of associative patterns between workplace sound level and physiological wellbeing using wearable devices and empirical Bayes modeling. NPJ Digital Medicine 6(1):1–10.Google Scholar
Srinivasan K, Currim F, Ram S, Lindberg C, Sternberg E, Skeath P, Najafi B, et al. (2016) Feature importance and prediction modeling for multi- source healthcare data with missing values. Proc. 6th Internat. Conf. Digital Health (ACM, New York), 1–8.Google Scholar
Stekhoven DJ, Buhlmann P (2012) Missforest: Non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118.Google Scholar
Ushey K, Allaire J, Tang Y (2018) reticulate: Interface to “Python.” CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.package.reticulate.Google Scholar
van Buuren S (2018) Flexible Imputation of Missing Data, 2nd ed. (CRC Press, Boca Raton, FL).Google Scholar
Van Buuren S, Groothuis-Oudshoorn K (2011) Multivariate imputation by chained equations. J. Statist. Software 45(3):1–67.Google Scholar
Woźnica K, Biecek P (2020) Does imputation matter? Benchmark for predictive models. Proc. Internat. Conf. Machine Learn.Google Scholar
Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J (2013) Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York).Google Scholar
Xue F, Qu A (2021) Integrating multisource block-wise missing data in model selection. J. Amer. Statist. Assoc. 116(536):1914–1927.Google Scholar
Yoon J, Jordon J, Van Der Schaar M (2018) GAIN: Missing data imputation using generative adversarial nets. Proc. 35th Internat. Conf. Machine Learn., vol. 13 (PMLR, New York).Google Scholar
Yu G, Li Q, Shen D, Liu Y (2020) Optimal sparse linear prediction for block-missing multi-modality data without imputation. J. Amer. Statist. Assoc. 115(531):1406–1419.Google Scholar
Yuan L, Wang Y, Thompson PM, Narayan VA, Ye J (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61(3):622–632.Google Scholar
Zhou D, Cai T, Lu J (2023) Multi-source learning via completion of block-wise overlapping noisy matrices. J. Machine Learn. Res. 221:1–43.Google Scholar
Zhu H, Li G, Lock EF (2020) Generalized integrative principal component analysis for multi-type data with block-wise missing structure. Biostatistics (Oxford, England) 21(2):302–318.Google Scholar

cover image INFORMS Journal on Data Science

Volume 4, Issue 1

January-March 2025

Pages 1-99, C2

Article Information

Supplemental Material

Metrics

Information

Received:May 27, 2022
Accepted:September 18, 2024
Published Online:November 26, 2024

Cite as

Karthik Srinivasan; , Faiz Currim, Sudha Ram (2024) A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns. INFORMS Journal on Data Science 4(1):85-99.

https://doi.org/10.1287/ijds.2022.9016

Keywords

PDF download

Available Issues

Available Issues

A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns

References

Volume 4, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News