A Conditional Gradient Approach for Nonparametric Estimation of Mixing Distributions

Published Online:https://doi.org/10.1287/mnsc.2019.3373

References

  • Bach F (2013) Learning with submodular functions: A convex optimization perspective. Foundations Trends Machine Learn. 6(2–3):145–373.CrossrefGoogle Scholar
  • Berry S, Levinsohn J, Pakes A (1995) Automobile prices in market equilibrium. Econometrica 63(4):841–890.CrossrefGoogle Scholar
  • Bhat CR (1997) An endogenous segmentation mode choice model with an application to intercity travel. Transportation Sci. 31(1):34–48.LinkGoogle Scholar
  • Bohning D, Schlattmann P, Lindsay B (1992) Computer-assisted analysis of mixtures (C.A.MAN): Statistical algorithms. Biometrics 48(1):283–303.CrossrefGoogle Scholar
  • Bronnenberg BJ, Kruger MW, Mela CF (2008) Database paper—the IRI marketing data set. Marketing Sci. 27(4):745–748.LinkGoogle Scholar
  • Clarkson KL (2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Trans. Algorithms 6(4):63.CrossrefGoogle Scholar
  • Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. Proc. 10th Internat. Conf. World Wide Web (ACM, New York), 613–622.Google Scholar
  • Feng L, Dicker LH (2018) Approximate nonparametric maximum likelihood for mixture models: A convex optimization approach to fitting arbitrary multivariate mixing distributions. Comput. Statist. Data Anal. 122:80–91.CrossrefGoogle Scholar
  • Fox JT, il Kim K, Ryan SP, Bajari P (2011) A simple estimator for the distribution of random coefficients. Quant. Econom. 2(3):381–418.CrossrefGoogle Scholar
  • Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Google Scholar
  • Garber D, Hazan E (2015) Faster rates for the Frank-Wolfe method over strongly-convex sets. Proc. 32nd Internat. Conf. Machine Learn. (ICML-15) (ACM, New York), 541–549.Google Scholar
  • Guélat J, Marcotte P (1986) Some comments on Wolfe’s ‘away step’. Math. Programming 35(1):110–119.CrossrefGoogle Scholar
  • Harchaoui Z, Juditsky A, Nemirovski A (2015) Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Programming 152(1–2):75–112.CrossrefGoogle Scholar
  • Hauser JR (2014) Consideration-set heuristics. J. Bus. Res. 67(8):1688–1699.CrossrefGoogle Scholar
  • Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann. Statist. 32(1):384–406.CrossrefGoogle Scholar
  • Jagabathula S, Rusmevichientong P (2016) A nonparametric joint assortment and price choice model. Management Sci. 63(9):3128–3145.LinkGoogle Scholar
  • Jagabathula S, Rusmevichientong P (2019) The limit of rationality in choice modeling: Formulation, computation, and implications. Management Sci. 65(5):2196–2215.Google Scholar
  • Jaggi M (2011) Sparse convex optimization methods for machine learning. Unpublished PhD thesis, ETH Zürich, Zurich.Google Scholar
  • Jaggi M (2013) Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Proc. 30th Internat. Conf. Machine Learn. (ICML-13) (ACM, New York), 427–435.Google Scholar
  • Jaggi M, Sulovsk M (2010) A simple algorithm for nuclear norm regularized problems. Proc. 27th Internat. Conf. Machine Learn. (ICML-10) (ACM, New York), 471–478.Google Scholar
  • James J (2017) MM algorithm for general mixed multinomial logit models. J. Appl. Econometrics 32(4):841–857.CrossrefGoogle Scholar
  • Jiang W, Zhang CH (2009) General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37(4):1647–1684.CrossrefGoogle Scholar
  • Joulin A, Tang K, Fei-Fei L (2014) Efficient image and video co-localization with Frank-Wolfe algorithm. Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision–ECCV 2014, Lecture Notes in Computer Science, vol. 8694 (Springer, Cham, Switzerland), 253–268.Google Scholar
  • Kamishima T, Kazawa H, Akaho S (2005) Supervised ordering—an empirical survey. 5th IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 673–676.Google Scholar
  • Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27(4):887–906.CrossrefGoogle Scholar
  • Krishnan RG, Lacoste-Julien S, Sontag D (2015) Barrier Frank-Wolfe for marginal inference. Adv. Neural Inform. Processing Systems 28:532–540.Google Scholar
  • Lacoste-Julien S, Jaggi M (2015) On the global linear convergence of Frank-Wolfe optimization variants. Adv. Neural Inform. Processing Systems 28:496–504.Google Scholar
  • Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73(364):805–811.CrossrefGoogle Scholar
  • Lapersonne E, Laurent G, Le Goff JJ (1995) Consideration sets of size one: An empirical investigation of automobile purchases. Internat. J. Res. Marketing 12(1):55–66.CrossrefGoogle Scholar
  • Lindsay BG (1983) The geometry of mixture likelihoods: A general theory. Ann. Statist. 11(1):86–94.CrossrefGoogle Scholar
  • Lindsay BG (1995) Mixture Models: Theory, Geometry and Applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5 (Institute of Mathematical Statistics, Hayward, CA).Google Scholar
  • Liu TY (2009) Learning to rank for information retrieval. Foundations Trends Inform. Retrieval 3(3):225–331.CrossrefGoogle Scholar
  • McFadden D, Train K (2000) Mixed MNL models for discrete response. J. Appl. Econometrics 15(5):447–470.CrossrefGoogle Scholar
  • McLachlan G, Peel D (2000) Finite Mixture Models (John Wiley & Sons, New York).Google Scholar
  • Méndez-Díaz I, Miranda-Bront JJ, Vulcano G, Zabala P (2014) A branch-and-cut algorithm for the latent-class logit assortment problem. Discrete Appl. Math. 164(1):246–263.CrossrefGoogle Scholar
  • Nocedal J, Wright SJ (2006) Numerical Optimization, 2nd ed. (Springer, New York).Google Scholar
  • Petrin A, Train K (2010) A control function approach to endogeneity in consumer choice models. J. Marketing Res. 47(1):3–13.CrossrefGoogle Scholar
  • Prechelt L (2012) Early stopping—but when? Montavon G, Orr GB, Müller KR, eds. Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin), 53–67.Google Scholar
  • Robbins H (1950) A generalization of the method of maximum likelihood-estimating a mixing distribution. Ann. Math. Statist. 21(2):314–315.Google Scholar
  • Shalev-Shwartz S, Srebro N, Zhang T (2010) Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. Optim. 20(6):2807–2832.CrossrefGoogle Scholar
  • Train KE (2008) EM algorithms for nonparametric estimation of mixing distributions. J. Choice Model. 1(1):40–69.CrossrefGoogle Scholar
  • Train KE (2009) Discrete Choice Methods with Simulation, 2nd ed. (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang YX, Sadhanala V, Dai W, Neiswanger W, Sra S, Xing E (2016) Parallel and distributed block-coordinate Frank-Wolfe algorithms. Proc. 33rd Internat. Conf. Machine Learn. (ICML-16) (ACM, New York), 1548–1557.Google Scholar
  • Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constructive Approximation 26(2):289–315.CrossrefGoogle Scholar
  • Zangwill WI (1969) Nonlinear Programming: A Unified Approach (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar
  • Zhang T (2003) Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inform. Theory 49(3):682–691.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.