A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns

Published Online:https://doi.org/10.1287/ijds.2022.9016

References

  • Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J. Roy. Statist. Soc. Ser B Statist. Methodology 82(4).Google Scholar
  • Banner KM, Higgs MD (2017) Considerations for assessing model averaging of regression coefficients. Ecological Appl. 27(1).Google Scholar
  • Batista GEAPA, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl. Artificial Intelligence 17(5–6):519–533.Google Scholar
  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees, vol. 19 (Routledge).Google Scholar
  • Camino RD, Hammerschmidt CA, State R (2020) Working with deep generative models and tabular data imputation. Proc. Internat. Conf. Machine Learn. (Vienna).Google Scholar
  • Chen X, Xie MG (2014) A split-and-conquer approach for analysis of extraordinarily large data. Statist. Sinica 24(4).Google Scholar
  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B Methodological 39(1):1–22.Google Scholar
  • Fanaee-T H, Gama J (2013) Event labeling combining ensemble detectors and background knowledge. Progress Artificial Intelligence 2:113–127.Google Scholar
  • Friedman N (1997) Learning belief networks in the presence of missing values and hidden variables. Proc. 14th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 125–133.Google Scholar
  • Friedman JH, Kohavi R, Yun Y (1996) Lazy decision tree. Proc. Thirteenth Natl. Conf. Artificial Intelligence, vol. 1 (AAAI Press, Palo Alto, CA), 717–724.Google Scholar
  • GSA (2021) Wellbuilt for wellbeing. Accessed November 18, 2024, https://www.gsa.gov/governmentwide-initiatives/federal-highperformance-green-buildings/resource-library/health/wellbuilt-for-wellbeing.Google Scholar
  • Han J, Kamber M, Pei J (2022) Data Mining: Concepts and Techniques, 4th ed. (Morgan Kaufmann).Google Scholar
  • Hansen BE, Racine JS (2012) Jackknife model averaging. J. Econometrics 167(1):38–46.Google Scholar
  • Hastie T, Mazumder R, Lee JD, Zadeh R (2015) Matrix completion and low-rank SVD via fast alternating least squares. J. Machine Learn. Res. 16:3367–3402.Google Scholar
  • Ipsen NB, Mattei PA, Frellsen J (2020) How to deal with missing data in supervised deep learning? Proc. Internat. Conf. Machine Learn.Google Scholar
  • Kowarik A, Templ M (2016) Imputation with the R package VIM. J. Statist. Software 74:1–16.Google Scholar
  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(December):18–22.Google Scholar
  • Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY), 4766–4775.Google Scholar
  • Mattei PA, Freiisen J (2019) MIWAE: Deep generative modelling and imputation of incomplete data sets. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
  • Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J. Machine Learn. Res. 11:2287–2322.Google Scholar
  • Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592.Google Scholar
  • Rubin DB (2004) Multiple Imputation for Nonresponse in Surveys (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J. Machine Learn. Res. 8:1625–1657.Google Scholar
  • Salomon JA, Reinhart A, Bilinski A, Chua EJ, la Motte-Kerr W, Rönn MM, Reitsma MB, et al. (2021) The US COVID-19 Trends and Impact Survey: Continuous real-time measurement of COVID-19 symptoms, risks, protective behaviors, testing, and vaccination. Proc. Natl. Acad. Sci. USA 118(51):e2111454118.Google Scholar
  • Schafer JL, Graham JW (2002) Missing data: Our view of the state of the art. Psych. Methods 7(2):142–177.Google Scholar
  • Schuurmans D, Greiner R (1997) Learning to classify incomplete examples. Comput. Learn. Theory Natural Learn. Systems Making Learn. Systems Practice, vol. 4 (MIT Press, Cambridge, MA), 87–105.Google Scholar
  • Smieja M, Struski L, Tabor J, Zielinski B, Spurek P (2018) Processing of missing data by neural networks. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Srinivasan K, Currim F, Lindberg CM, Razjouyan J, Gilligan B, Lee H, Canada KJ, et al. (2023) Discovery of associative patterns between workplace sound level and physiological wellbeing using wearable devices and empirical Bayes modeling. NPJ Digital Medicine 6(1):1–10.Google Scholar
  • Srinivasan K, Currim F, Ram S, Lindberg C, Sternberg E, Skeath P, Najafi B, et al. (2016) Feature importance and prediction modeling for multi- source healthcare data with missing values. Proc. 6th Internat. Conf. Digital Health (ACM, New York), 1–8.Google Scholar
  • Stekhoven DJ, Buhlmann P (2012) Missforest: Non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118.Google Scholar
  • Ushey K, Allaire J, Tang Y (2018) reticulate: Interface to “Python.” CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.package.reticulate.Google Scholar
  • van Buuren S (2018) Flexible Imputation of Missing Data, 2nd ed. (CRC Press, Boca Raton, FL).Google Scholar
  • Van Buuren S, Groothuis-Oudshoorn K (2011) Multivariate imputation by chained equations. J. Statist. Software 45(3):1–67.Google Scholar
  • Woźnica K, Biecek P (2020) Does imputation matter? Benchmark for predictive models. Proc. Internat. Conf. Machine Learn.Google Scholar
  • Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J (2013) Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York).Google Scholar
  • Xue F, Qu A (2021) Integrating multisource block-wise missing data in model selection. J. Amer. Statist. Assoc. 116(536):1914–1927.Google Scholar
  • Yoon J, Jordon J, Van Der Schaar M (2018) GAIN: Missing data imputation using generative adversarial nets. Proc. 35th Internat. Conf. Machine Learn., vol. 13 (PMLR, New York).Google Scholar
  • Yu G, Li Q, Shen D, Liu Y (2020) Optimal sparse linear prediction for block-missing multi-modality data without imputation. J. Amer. Statist. Assoc. 115(531):1406–1419.Google Scholar
  • Yuan L, Wang Y, Thompson PM, Narayan VA, Ye J (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61(3):622–632.Google Scholar
  • Zhou D, Cai T, Lu J (2023) Multi-source learning via completion of block-wise overlapping noisy matrices. J. Machine Learn. Res. 221:1–43.Google Scholar
  • Zhu H, Li G, Lock EF (2020) Generalized integrative principal component analysis for multi-type data with block-wise missing structure. Biostatistics (Oxford, England) 21(2):302–318.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.