A Conditional Gradient Approach for Nonparametric Estimation of Mixing Distributions

Srikanth Jagabathula
Srikanth Jagabathula
http://orcid.org/0000-0002-4854-3181
Stern School of Business, New York University, New York, New York 10012;Harvard Business School, Harvard University, Boston, Massachusetts 02163
Search for more papers by this author
,
Lakshminarayanan Subramanian
Lakshminarayanan Subramanian
Courant Institute of Mathematical Sciences, New York University, New York, New York 10012
Search for more papers by this author
,
Ashwin Venkataraman
Ashwin Venkataraman
Harvard Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 02138
Search for more papers by this author

Stern School of Business, New York University, New York, New York 10012;Harvard Business School, Harvard University, Boston, Massachusetts 02163

Search for more papers by this author

Lakshminarayanan Subramanian

Courant Institute of Mathematical Sciences, New York University, New York, New York 10012

Search for more papers by this author

Ashwin Venkataraman

Harvard Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 02138

Search for more papers by this author

Published Online:21 May 2020https://doi.org/10.1287/mnsc.2019.3373

References

Bach F (2013) Learning with submodular functions: A convex optimization perspective. Foundations Trends Machine Learn. 6(2–3):145–373.Crossref, Google Scholar
Berry S, Levinsohn J, Pakes A (1995) Automobile prices in market equilibrium. Econometrica 63(4):841–890.Crossref, Google Scholar
Bhat CR (1997) An endogenous segmentation mode choice model with an application to intercity travel. Transportation Sci. 31(1):34–48.Link, Google Scholar
Bohning D, Schlattmann P, Lindsay B (1992) Computer-assisted analysis of mixtures (C.A.MAN): Statistical algorithms. Biometrics 48(1):283–303.Crossref, Google Scholar
Bronnenberg BJ, Kruger MW, Mela CF (2008) Database paper—the IRI marketing data set. Marketing Sci. 27(4):745–748.Link, Google Scholar
Clarkson KL (2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4):63.Crossref, Google Scholar
Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. Proc. 10th Internat. Conf. World Wide Web (ACM, New York), 613–622.Google Scholar
Feng L, Dicker LH (2018) Approximate nonparametric maximum likelihood for mixture models: A convex optimization approach to fitting arbitrary multivariate mixing distributions. Comput. Statist. Data Anal. 122:80–91.Crossref, Google Scholar
Fox JT, il Kim K, Ryan SP, Bajari P (2011) A simple estimator for the distribution of random coefficients. Quant. Econom. 2(3):381–418.Crossref, Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Google Scholar
Garber D, Hazan E (2015) Faster rates for the Frank-Wolfe method over strongly-convex sets. Proc. 32nd Internat. Conf. Machine Learn. (ICML-15) (ACM, New York), 541–549.Google Scholar
Guélat J, Marcotte P (1986) Some comments on Wolfe’s ‘away step’. Math. Programming 35(1):110–119.Crossref, Google Scholar
Harchaoui Z, Juditsky A, Nemirovski A (2015) Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Programming 152(1–2):75–112.Crossref, Google Scholar
Hauser JR (2014) Consideration-set heuristics. J. Bus. Res. 67(8):1688–1699.Crossref, Google Scholar
Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann. Statist. 32(1):384–406.Crossref, Google Scholar
Jagabathula S, Rusmevichientong P (2016) A nonparametric joint assortment and price choice model. Management Sci. 63(9):3128–3145.Link, Google Scholar
Jagabathula S, Rusmevichientong P (2019) The limit of rationality in choice modeling: Formulation, computation, and implications. Management Sci. 65(5):2196–2215.Google Scholar
Jaggi M (2011) Sparse convex optimization methods for machine learning. Unpublished PhD thesis, ETH Zürich, Zurich.Google Scholar
Jaggi M (2013) Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. Machine Learn. (ICML-13) (ACM, New York), 427–435.Google Scholar
Jaggi M, Sulovsk M (2010) A simple algorithm for nuclear norm regularized problems. Proc. 27th Internat. Conf. Machine Learn. (ICML-10) (ACM, New York), 471–478.Google Scholar
James J (2017) MM algorithm for general mixed multinomial logit models. J. Appl. Econometrics 32(4):841–857.Crossref, Google Scholar
Jiang W, Zhang CH (2009) General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37(4):1647–1684.Crossref, Google Scholar
Joulin A, Tang K, Fei-Fei L (2014) Efficient image and video co-localization with Frank-Wolfe algorithm. Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision–ECCV 2014, Lecture Notes in Computer Science, vol. 8694 (Springer, Cham, Switzerland), 253–268.Google Scholar
Kamishima T, Kazawa H, Akaho S (2005) Supervised ordering—an empirical survey. 5th IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 673–676.Google Scholar
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27(4):887–906.Crossref, Google Scholar
Krishnan RG, Lacoste-Julien S, Sontag D (2015) Barrier Frank-Wolfe for marginal inference. Adv. Neural Inform. Processing Systems 28:532–540.Google Scholar
Lacoste-Julien S, Jaggi M (2015) On the global linear convergence of Frank-Wolfe optimization variants. Adv. Neural Inform. Processing Systems 28:496–504.Google Scholar
Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73(364):805–811.Crossref, Google Scholar
Lapersonne E, Laurent G, Le Goff JJ (1995) Consideration sets of size one: An empirical investigation of automobile purchases. Internat. J. Res. Marketing 12(1):55–66.Crossref, Google Scholar
Lindsay BG (1983) The geometry of mixture likelihoods: A general theory. Ann. Statist. 11(1):86–94.Crossref, Google Scholar
Lindsay BG (1995) Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5 (Institute of Mathematical Statistics, Hayward, CA).Google Scholar
Liu TY (2009) Learning to rank for information retrieval. Foundations Trends Inform. Retrieval 3(3):225–331.Crossref, Google Scholar
McFadden D, Train K (2000) Mixed MNL models for discrete response. J. Appl. Econometrics 15(5):447–470.Crossref, Google Scholar
McLachlan G, Peel D (2000) Finite Mixture Models (John Wiley & Sons, New York).Google Scholar
Méndez-Díaz I, Miranda-Bront JJ, Vulcano G, Zabala P (2014) A branch-and-cut algorithm for the latent-class logit assortment problem. Discrete Appl. Math. 164(1):246–263.Crossref, Google Scholar
Nocedal J, Wright SJ (2006) Numerical Optimization, 2nd ed. (Springer, New York).Google Scholar
Petrin A, Train K (2010) A control function approach to endogeneity in consumer choice models. J. Marketing Res. 47(1):3–13.Crossref, Google Scholar
Prechelt L (2012) Early stopping—but when? Montavon G, Orr GB, Müller KR, eds. Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin), 53–67.Google Scholar
Robbins H (1950) A generalization of the method of maximum likelihood-estimating a mixing distribution. Ann. Math. Statist. 21(2):314–315.Google Scholar
Shalev-Shwartz S, Srebro N, Zhang T (2010) Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. Optim. 20(6):2807–2832.Crossref, Google Scholar
Train KE (2008) EM algorithms for nonparametric estimation of mixing distributions. J. Choice Model. 1(1):40–69.Crossref, Google Scholar
Train KE (2009) Discrete Choice Methods with Simulation, 2nd ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Wang YX, Sadhanala V, Dai W, Neiswanger W, Sra S, Xing E (2016) Parallel and distributed block-coordinate Frank-Wolfe algorithms. Proc. 33rd Internat. Conf. Machine Learn. (ICML-16) (ACM, New York), 1548–1557.Google Scholar
Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constructive Approximation 26(2):289–315.Crossref, Google Scholar
Zangwill WI (1969) Nonlinear Programming: A Unified Approach (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar
Zhang T (2003) Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inform. Theory 49(3):682–691.Crossref, Google Scholar

Volume 66, Issue 8

August 2020

Pages v, 3295-3798, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:May 04, 2017
Accepted:March 21, 2019
Published Online:May 21, 2020

Cite as

Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman (2020) A Conditional Gradient Approach for Nonparametric Estimation of Mixing Distributions. Management Science 66(8):3635-3656.

https://doi.org/10.1287/mnsc.2019.3373

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Conditional Gradient Approach for Nonparametric Estimation of Mixing Distributions

References

Volume 66, Issue 8

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News