Diversity Subsampling: Custom Subsamples from Large Data Sets
Published Online:22 Nov 2023https://doi.org/10.1287/ijds.2022.00017
References
- (2019) dpp sampler.py. Accessed March 26, 2020, https://github.com/Stanford-ILIAD/DPP-Batch-Active-Learning/blob/master/classification_synthetic/dpp_sampler.py.Google Scholar
- (2019) Batch active learning using determinantal point processes. Preprint, submitted June 19, https://arxiv.org/abs/1906.07975.Google Scholar
- (2022) Density regression with conditional support points. Technometrics 64(3):1–13.Google Scholar
- (1986) Stochastic sampling in computer graphics. ACM Trans. Graphics 5(1):51–72.Google Scholar
- (1977) Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Soc. B 39(1):1–22.Google Scholar
- (2014) Event labeling combining ensemble detectors and background knowledge. Progress Artificial Intelligence 2:113–127.Google Scholar
- (1995) Bayesian Data Analysis (Chapman and Hall/CRC Press, Boca Raton, FL).Google Scholar
- (2020) Map inference for customized determinantal point processes via maximum inner product search. Chiappa S, Calandra R, eds. Proc. Internat. Conf. on Artificial Intelligence and Statist. (PMLR, New York), 2797–2807.Google Scholar
- (2020) Scalable active learning for object detection. Proc. IEEE Intelligent Vehicles Sympos. (IEEE, New York), 1430–1435.Google Scholar
- (2022) Population quasi-Monte Carlo. J. Comput. Graphics Statist. 31(3):1–14.Google Scholar
- (2021) Supervised compression of big data. Statist. Analysis Data Mining 14(3):217–229.Google Scholar
- (1969) Computer aided design of experiments. Technometrics 11(1):137–148.Google Scholar
- (1995) An exact algorithm for maximum entropy sampling. Oper. Res. 43(4):684–691.Link, Google Scholar
- (1979) Multivariate k-nearest neighbor density estimates. J. Multivariate Anal. 9(1):1–15.Google Scholar
- (1967) Some methods for classification and analysis of multivariate observations. Le Cam LM, Neyman J, eds. Proc. 5th Berkeley Sympos. on Math. Statist. and Probability, vol. 1 (University of California Press, Downtown Oakland, CA), 281–297.Google Scholar
- (2018) Support points. Ann. Statist. 46(6A):2562–2592.Google Scholar
- (1992) Hierarchical poisson disk sampling distributions. Fiume E, ed. Proc. Conf. on Graphics Interface, vol. 92 (Canadian Information Processing Society, Mississauga, ON, Canada), 94–105.Google Scholar
- (1962) On estimation of a probability density function and mode. Ann. Math. Statist. 33(3):1065–1076.Google Scholar
- (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
- (2011) Investigating the influence of data splitting on the predictive ability of qsar/qspr models. Structural Chemistry 22(4):795–804.Google Scholar
- , et al. (2021) A survey of deep active learning. ACM Comput. Survey 54(9):1–40.Google Scholar
- (2009) Gaussian mixture models. Encyclopedia Biometrics 741:659–663.Google Scholar
- (2022) Energy: E-statistics: Multivariate inference via the energy of data. https://CRAN.R-project.org/package=energy.Google Scholar
- (1956) Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27(3):832–837.Google Scholar
- (1987) A noniterative sampling/importance resampling alternative to data augmentation for creating a few imputations when fractions of missing information are modest: The sir algorithm. J. Amer. Statist. Assoc. 82:544–546.Google Scholar
- (1988) Using the sir algorithm to simulate posterior distributions. Bayesian Statist. 3:395–402.Google Scholar
- (2022) Fast diversity subsampling from a data set. Accessed June 2, 2022, https://pypi.org/project/FADS/.Google Scholar
- (2022) A fast and low-cost approach for the discrimination of commercial aged cachaças using synchronous fluorescence spectroscopy and multivariate classification. J. Sci. Food Agriculture 102(11):4918–4926.Google Scholar
- (2003) Improved sampling-importance resampling and reduced bias importance sampling. Scandinavian J. Statist. 30(4):719–737.Google Scholar
- (2022a) scsampler. Accessed February 9, 2022, https://github.com/SONGDONGYUAN1994/scsampler.Google Scholar
- (2022b) scSampler: Fast diversity-preserving subsampling of large-scale single-cell transcriptomic data. Bioinformatics 38(11):3126–3127.Google Scholar
- (2003) E-statistics: The energy of statistical samples. Technical report, Bowling Green State University, Department of Mathematics and Statistics, Bowling Green, Ohio.Google Scholar
- (2004) Testing for equal distributions in high dimension. InterStat 5:1–6.Google Scholar
- (1992) Variable kernel density estimation. Ann. Statist. 20(3):1236–1265.Google Scholar
- (2018) Active model learning and diverse action sampling for task and motion planning. Maciejewski AA, ed. Proc. IEEE/RSJ Internat. Conf. on Intelligent Robots and Systems (IEEE, New York), 4107–4114.Google Scholar
- (2018) Pool-based sequential active learning for regression. IEEE Trans. Neural Network Learn. Systems 30(5):1348–1359.Google Scholar
- (2010) Passive sampling for regression. Proc. IEEE Internat. Conf. on Data Mining (IEEE, New York), 1151–1156.Google Scholar
- (2015) Sample elimination for generating poisson disk sample sets. Comput. Graphics Forum 34(2):25–32.Google Scholar
- (2016) cysampleelim.h. Accessed September 17, 2020, https://github.com/cemyuksel/cyCodeBase/blob/master/cySampleElim.h.Google Scholar

