Reproducible Feature Selection for High-Dimensional Measurement Error Models

Xin Zhou
Xin Zhou
[email protected]
https://orcid.org/0009-0007-3480-9317
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
Search for more papers by this author
,
Yang Li
Corresponding Author
Yang Li
[email protected]
https://orcid.org/0000-0002-1202-1082
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
Search for more papers by this author
,
Zemin Zheng
Corresponding Author
Zemin Zheng
[email protected]
https://orcid.org/0000-0002-0240-9411
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
Search for more papers by this author
,
Jie Wu
Jie Wu
[email protected]
https://orcid.org/0009-0003-2771-365X
School of Big Data and Statistics, Anhui University, Hefei 230601, China
Search for more papers by this author
,
Jiarui Zhang
Jiarui Zhang
[email protected]
https://orcid.org/0000-0001-8675-1662
Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong 999077, China
Search for more papers by this author

International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Search for more papers by this author

Yang Li

Corresponding Author

Yang Li

[email protected]

https://orcid.org/0000-0002-1202-1082

International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Search for more papers by this author

Zemin Zheng

Corresponding Author

Zemin Zheng

[email protected]

https://orcid.org/0000-0002-0240-9411

International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Search for more papers by this author

Jie Wu

[email protected]

https://orcid.org/0009-0003-2771-365X

School of Big Data and Statistics, Anhui University, Hefei 230601, China

Search for more papers by this author

Jiarui Zhang

[email protected]

https://orcid.org/0000-0001-8675-1662

Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong 999077, China

Search for more papers by this author

Published Online:7 Nov 2024https://doi.org/10.1287/ijoc.2023.0282

References

Agarwal A, Shah D, Shen D, Song D (2021) On robustness of principal component regression. J. Amer. Statist. Assoc. 116(536):1731–1745.Crossref, Google Scholar
Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann. Statist. 43(5):2055–2085.Google Scholar
Barber RF, Candès EJ (2019) A knockoff filter for high-dimensional selective inference. Ann. Statist. 47(5):2504–2537.Google Scholar
Barber RF, Candès EJ, Samworth RJ (2020) Robust inference with knockoffs. Ann. Statist. 48(3):1409–1431.Google Scholar
Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
Bates S, Candès EJ, Janson L, Wang W (2021) Metropolized knockoff sampling. J. Amer. Statist. Assoc. 116(535):1413–1427.Crossref, Google Scholar
Belloni A, Chernozhukov V, Kaul A (2017) Confidence bands for coefficients in high dimensional linear models with error-in-variables. Preprint, submitted March 1, https://arxiv.org/abs/1703.00469.Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B 57(1):289–300.Crossref, Google Scholar
Byrd M, McGee M (2019) A simple correction procedure for high-dimensional generalized linear models with measurement error. Preprint, submitted December 26, https://arxiv.org/abs/1912.11740.Google Scholar
Cai T, Liu W, Luo X (2011) A constrained ℓ1 minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106(494):594–607.Crossref, Google Scholar
Candès EJ, Fan Y, Janson L, Lv J (2018) Panning for gold: ‘Model-X’ knockoffs for high-dimensional controlled variable selection. J. Roy. Statist. Soc. B 80(3):551–577.Crossref, Google Scholar
Cao Y, Sun X, Yao Y (2024) Controlling the false discovery rate in transformational sparsity: Split knockoffs. J. Roy. Statist. Soc. B 86(2):386–410.Crossref, Google Scholar
Cheng Y, Wang X, Xia Y (2021) Supervised t-distributed stochastic neighbor embedding for data visualization and classification. J. Amer. Statist. Assoc. 111(2):394–406.Google Scholar
Chudik A, Kapetanios G, Pesaran H (2018) A one covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models. Econometrica 86(4):1479–1512.Crossref, Google Scholar
Datta A, Zou H (2017) CoCoLasso for high-dimensional error-in-variables regression. Ann. Statist. 45(6):2400–2426.Google Scholar
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1-ball for learning in high dimensions. McCallum A, Roweis S, eds. Proc. 25th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 272–279.Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J. Roy. Statist. Soc. B 70(5):849–911.Crossref, Google Scholar
Fan Y, Lv J (2016) Innovated scalable efficient estimation in ultra-large Gaussian graphical models. Ann. Statist. 44(5):2098–2126.Google Scholar
Fan Y, Demirkaya E, Lv J (2019) Nonuniformity of p-values can occur early in diverging dimensions. J. Machine Learn. Res. 20(77):1–33.Google Scholar
Fan Y, Gao L, Lv J (2024) ARK: Robust knockoffs inference with coupling. Preprint, submitted June 4, https://arxiv.org/abs/2307.04400.Google Scholar
Fan J, Han X, Gu W (2012) Control of the false discovery rate under arbitrary covariance dependence (with discussion). J. Amer. Statist. Assoc. 107(499):1019–1045.Crossref, Google Scholar
Fan Y, Demirkaya E, Li G, Lv J (2020a) RANK: Large-scale inference with graphical nonlinear knockoffs. J. Amer. Statist. Assoc. 115(529):362–379.Crossref, Google Scholar
Fan Y, Lv J, Sharifvaghefi M, Uematsu Y (2020b) IPAD: Stable interpretable forecasting with knockoffs inference. J. Amer. Statist. Assoc. 115(532):1822–1834.Crossref, Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441.Crossref, Google Scholar
Huang N, Mojumder P, Sun T, Lv J, Golden JM (2021) Not registered? Please sign-up first: A randomized field experiment on the ex-ante registration request. Inform. Systems Res. 32(3):914–931.Link, Google Scholar
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264.Crossref, Google Scholar
Jayachandran S, Sharma S, Kaufman P, Raman P (2005) The role of relational information processes and technology use in customer relationship management. J. Marketing 69(4):177–192.Crossref, Google Scholar
Jiang F, Zhou Y, Liu J, Ma Y (2023) On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization. Ann. Statist. 51(1):233–259.Google Scholar
Kallus N, Mao X, Udell M (2018) Causal inference with noisy and missing covariates via matrix factorization. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 6921–6932.Google Scholar
Li M, Li R, Ma Y (2021) Inference in high-dimensional linear measurement error models. J. Multivariate Anal. 184:104759.Crossref, Google Scholar
Liang H, Li R (2009) Variable selection for partially linear models with measurement errors. J. Amer. Statist. Assoc. 104(485):234–248.Crossref, Google Scholar
Loh PL, Tan LX (2018) High-dimensional robust precision matrix estimation: Cellwise corruption under ϵ-contamination. Electronic J. Statist. 12(1):1429–1467.Google Scholar
Loh PL, Wainwright MJ (2012) High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40(3):1637–1664.Google Scholar
Paulson C, Luo L, James GM (2018) Efficient large-scale internet media selection optimization for online display advertising. J. Marketing Res. 55(4):489–506.Crossref, Google Scholar
Ravikumar P, Wainwright MJ, Raskutti G, Yu B (2011) High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic J. Statist. 5:935–980.Crossref, Google Scholar
Reppe S, Refvem H, Gautvik VT, Olstad OK, Høvring PI, Reinholt FP, Holden M, Frigessi A, Jemtland R, Gautvik KM (2010) Eight genes are highly associated with BMD variation in postmenopausal Caucasian women. Bone 46(3):604–612.Crossref, Google Scholar
Rocke DM, Durbin B (2001) A model for measurement error for gene expression arrays. J. Comput. Biol. 8(6):557–569.Crossref, Google Scholar
Romano Y, Sesia M, Candès E (2020) Deep knockoffs. J. Amer. Statist. Assoc. 115(532):1861–1872.Crossref, Google Scholar
Sørensen Ø, Frigessi A, Thoresen M (2015) Measurement error in lasso: Impact and likelihood bias correction. Statist. Sinica 25(2):809–829.Google Scholar
Sørensen Ø, Hellton KH, Frigessi A, Thoresen M (2018) Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Statist. 27(4):739–749.Crossref, Google Scholar
Sui C, Wen H, Han J, Chen T, Gao Y, Wang Y, Yang L, Guo L (2023) Decreased gray matter volume in the right middle temporal gyrus associated with cognitive dysfunction in preeclampsia superimposed on chronic hypertension. Frontiers Neurosci. 17:1138952.Crossref, Google Scholar
Tang CY, Fan Y, Kong Y (2020) Precision matrix estimation by inverse principal orthogonal decomposition. Commun. Math. Res. 36(1):68–92.Crossref, Google Scholar
Uematsu Y, Tanaka S (2019) High-dimensional macroeconomic forecasting and variable selection via penalized regression. Econom. J. 22(1):34–56.Crossref, Google Scholar
van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42(3):1166–1202.Google Scholar
Wang Z, Xue L (2019) Inference for high dimensional linear models with error-in-variables. Comm. Statist. Simulation Comput. 13(1):1–10.Google Scholar
Wang Y, Wang J, Balakrishnan S, Singh A (2019) Rate optimal estimation and confidence intervals for high-dimensional regression with missing covariates. J. Multivariate Anal. 174:104526.Crossref, Google Scholar
Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, Harvey D, et al. (2017) The Alzheimer’s Disease Neuroimaging Initiative 3: Continued innovation for clinical trial improvement. Alzheimer’s Dementia 13(5):561–571.Crossref, Google Scholar
Yang M, Adomavicius G, Burtch G, Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
Yuan M (2010) High dimensional inverse covariance matrix estimation via linear programming. J. Machine Learn. Res. 11(79):2261–2286.Google Scholar
Zhang CH, Zhang S (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J. Roy. Statist. Soc. B 76(1):217–242.Crossref, Google Scholar
Zhao J, Zhou Y, Liu Y (2024) Estimation of linear functionals in high-dimensional linear models: From sparsity to nonsparsity. J. Amer. Statist. Assoc. 119(546):1579–1591.Crossref, Google Scholar
Zheng Z, Lv J, Lin W (2021) Nonsparse learning with latent variables. Oper. Res. 69(1):346–359.Link, Google Scholar
Zhou X, Li Y, Zheng Z, Wu J, Zhang J (2024) Reproducible feature selection for high-dimensional measurement error models. http://dx.doi.org/ 10.1287/ijoc.2023.0282.cd, https://github.com/INFORMSJoC/2023.0282.Google Scholar
Zhu F, Iansiti M (2012) Entry into platform-based markets. Strategic Management J. 33(1):88–106.Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 37, Issue 5

September-October 2025

Pages v-viii, 1143-1432, iii

Article Information

Supplemental Material

Metrics

Information

Received:August 11, 2023
Accepted:September 23, 2024
Published Online:November 07, 2024

Cite as

Xin Zhou, Yang Li, Zemin Zheng, Jie Wu, Jiarui Zhang (2024) Reproducible Feature Selection for High-Dimensional Measurement Error Models. INFORMS Journal on Computing 37(5):1350-1368.

https://doi.org/10.1287/ijoc.2023.0282

Keywords

Acknowledgments

The authors sincerely thank the editors and referees for their valuable comments that helped improve the article substantially.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Reproducible Feature Selection for High-Dimensional Measurement Error Models

References

Volume 37, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News