A Consensus Algorithm for Linear Support Vector Machines
Published Online:30 Aug 2021https://doi.org/10.1287/mnsc.2021.4042
References
- (2011) Distributed delayed stochastic optimization. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 873–881.Google Scholar
- (2017) QSGD: Communication-efficient SGD via gradient quantization and encoding. von Luxburg U, Guyon IM, Guyon I, Bengio S, Wallach H, Fergus R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Red Hook, NY), 1709–1720.Google Scholar
- (2003) Estimating aggregates on a peer-to-peer network. Technical Report 2003-24, Stanford InfoLab, Stanford University, Stanford, CA.Google Scholar
- (2011) Scaling Up Machine Learning: Parallel and Distributed Approaches (Cambridge University Press, New York).Crossref, Google Scholar
- (1997) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua NH).Google Scholar
- (2008) Average consensus problems in networks of agents with delayed communications. Automatica 44(8):1985–1995.Crossref, Google Scholar
- (2005) Convergence in multi-agent coordination, consensus and flocking. Proc. 44th IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 2996–3000.Google Scholar
- (2019) Distributed optimization for deep learning with gossip exchange. Neurocomputing 330:287–296.Crossref, Google Scholar
- (1994) Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence 67(2):245–285.Crossref, Google Scholar
- (2011) The tradeoffs of large scale learning. Sra S, Nowozin S, Wright SJ, eds. Optimization for Machine Learning (MIT Press, Cambridge, MA), 351–368.Google Scholar
- (2007) Large-Scale Kernel Machines (MIT Press, Cambridge, MA).Crossref, Google Scholar
- (2005) Gossip algorithms: Design, analysis and applications. Proc. IEEE 24th Annual Joint Conf. IEEE Comput. Comm. Soc. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1653–1664.Google Scholar
- (2006) Randomized gossip algorithms. IEEE/ACM Trans. Networking 14(SI):2508–2530.Google Scholar
- (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Machine Learn. 3(1):1–122.Crossref, Google Scholar
- (2019) Consensus based ADMM implementation discussion via email communication with the authors, June.Google Scholar
- (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
- (2005) Learning support vector machines from distributed data sources. Proc. 20th Natl. Conf. Artificial Intelligence, vol. 4 (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 1602–1603.Google Scholar
- (2007) Average consensus on networks with transmission noise or quantization. Proc. Eur. Control Conf. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1852–1857.Google Scholar
- (2010) Detecting management fraud in public companies. Management Sci. 56(7):1146–1160.Link, Google Scholar
- (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):27. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.Crossref, Google Scholar
- (2008) Coordinate descent method for large-scale L2-loss linear support vector machines. J. Machine Learn. Res. 9:1369–1398.Google Scholar
- (2007) Training a support vector machine in the primal. Neural Comput. 19(5):1155–1178.Crossref, Google Scholar
- (2002) SMOTE: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16(1):321–357.Crossref, Google Scholar
- (2018) GossipGraD: Scalable deep learning using gossip communication based asynchronous gradient descent. Preprint, submitted March 15, https://arxiv.org/abs/1803.05880.Google Scholar
- (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput. 10(4):18–26.Crossref, Google Scholar
- (1987) Epidemic algorithms for replicated database maintenance. Schneider FB, ed. Proc. ACM Sympos. Principles Distributed Comput. (Association for Computing Machinery, New York), 1–12.Google Scholar
- (2006) Geographic gossip: efficient aggregation for sensor networks. Proc. Fifth Internat. Conf. Inform. Processing Sensor Networks (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 69–76.Google Scholar
- (2010) Gossip algorithms for distributed signal processing. Proc. IEEE 98(11):1847–1864.Crossref, Google Scholar
- (2009) Efficient online and batch learning using forward backward splitting. J. Machine. Learn. Res. 10:2899–2934.Google Scholar
- (2012) Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Trans. Automat. Control 57(3):592–606.Crossref, Google Scholar
- (2018) Consensus-based modeling using distributed feature construction with ILP. Machine Learn. 107(5):825–858.Crossref, Google Scholar
- (2018) Waste collection in urban areas: A case study. INFORMS J. Appl. Anal. 48(4):307–322.Link, Google Scholar
- (2008) LIBLINEAR: A library for large linear classification. J. Machine Learn. Res. 9:1871–1874.Google Scholar
- (2006) Training a support vector machine based classifier in distributed sensor networks. Proc. 14th Eur. Signal Processing Conf. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1–5.Google Scholar
- (2009) Optimal gossip algorithm for distributed consensus SVM training in wireless sensor networks. Proc. 16th Internat. Conf. Digital Signal Processing (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1–6.Google Scholar
- (2010) Consensus-based distributed support vector machines. J. Machine Learn. Res. 99:1663–1707.Google Scholar
- (1998) Large margin classification using the perceptron algorithm. Proc. 11th Annual Conf. Comput. Learn. Theory, 209–217.Google Scholar
- (2010) Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33(1):1–22.Crossref, Google Scholar
- (1997) Using on-line sensors in statistical process control. Management Sci. 43(7):1017–1028.Link, Google Scholar
- (2005) Parallel support vector machines: The cascade SVM. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 521–528.Google Scholar
- (2004) Result analysis of the NIPS 2003 feature selection challenge. Proc. 17th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 545–552.Google Scholar
- (2008) A parallel decomposition solver for SVM: Distributed dual ascend using Fenchel duality. IEEE Conf. Comput. Vision Pattern Recognition (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1–8.Google Scholar
- (2009) Gadget SVM: A gossip-based sub-gradient SVM solver. Proc. Internat. Conf. Machine Learn., Numerical Math. Machine Learn. Workshop (Association for Computing Machinery, New York).Google Scholar
- (2019) Scaling up sparse support vector machines by simultaneous feature and sample reduction. J. Machine Learn. Res. 20(121):1–39.Google Scholar
- (1987) Reasoning about beliefs and actions under computational resource constraints. Lemmer J, Levitt T, Kanal L, eds. Proc. Third Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 429–447.Google Scholar
- (2003) Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Automat. Control 48(6):988–1001.Crossref, Google Scholar
- (1998) Text categorization with support vector machines: Learning with many relevant features. Proc. Eur. Conf. Machine Learn. (Springer, Berlin), 137–142.Google Scholar
- (1999) Making large-scale support vector machine learning practical. Advances in Kernel Methods (MIT Press, Cambridge, MA), 169–184.Google Scholar
- (2006) Training linear SVMs in linear time. Proc. 12th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 217–226.Google Scholar
- (2009) Sparse kernel SVMs via cutting-plane training. Machine Learn. 76(2–3):179–193.Crossref, Google Scholar
- , Chan Pe (2000) Advances in Distributed and Parallel Knowledge Discovery (AAAI Press, Menlo Park, CA).Google Scholar
- (2010) Minefleet: An overview of a widely adopted distributed vehicle performance data mining system. Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 37–46.Google Scholar
- (2005) A modified finite Newton method for fast solution of large scale linear SVMs. J. Machine Learn. Res. 6:341–361.Google Scholar
- (2006) Building support vector machines with reduced classifier complexity. J. Machine Learn. Res. 7:1493–1515.Google Scholar
- (2003) Gossip-based computation of aggregate information. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 482–491.Google Scholar
- (2015) A distributed support vector machine learning over wireless sensor networks. IEEE Trans. Cybern. 45(11):2599–2611.Crossref, Google Scholar
- (1971) Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33(1):82–95.Crossref, Google Scholar
- (2019a) Decentralized stochastic optimization and gossip algorithms with compressed communication. Proc. 36th Internat. Conf. Machine Learn. (Proceedings of Machine Learning Research), 97:3478–3487.Google Scholar
- (2019b) Decentralized deep learning with arbitrary communication compression. Preprint, submitted July 22, https://arxiv.org/abs/1907.09356.Google Scholar
- (2018) Randomized distributed mean estimation: Accuracy vs. communication. Frontiers Appl. Math. Statist. 4:62.Crossref, Google Scholar
- (2012) Separable approximate optimization of support vector machines for distributed sensing. Flach PA, De Bie T, Cristianini N, eds. Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 7524 (Springer, Berlin), 387–402.Google Scholar
- (1988) Approximate processing in real-time problem solving. AI Magazine 9(1):49–61.Google Scholar
- (2017) Utility-based link recommendation for online social networks. Management Sci. 63(6):1938–1952.Link, Google Scholar
- (2015) Data sparseness in linear SVM. Yang Q, Wooldridge M, eds. Proc. 24th Internat. Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 3628–3634.Google Scholar
- (2014) Efficient mini-batch training for stochastic optimization. Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 661–670.Google Scholar
- (2018) Deep gradient compression: Reducing the communication bandwidth for distributed training. Proc. Fifth Internat. Conf. Learn. Representations (OpenReview.net, Vancouver).Google Scholar
- (2008) Distributed parallel support vector machines in strongly connected networks. IEEE Trans. Neural Networks 19(7):1167–1178.Crossref, Google Scholar
- (1996) Distributed Algorithms (Morgan Kaufmann Publishers, San Francisco).Google Scholar
- (2002) A finite Newton method for classification. Optim. Methods Software 17(5):913–929.Crossref, Google Scholar
- (2015) The internet of things: Mapping the value beyond the hype. Technical report, McKinsey Global Institute, Washington, DC.Google Scholar
- (2009) Large-scale support vector machines: Algorithms and theory. Technical report, University of California, San Diego.Google Scholar
- (1995) Knowledge-based anytime computation. Mellish CS, ed. Proc. 14th Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann Publishers, San Francisco), 775–781.Google Scholar
- (2007) Geographic gossip on geometric random graphs via affine combinations. Proc. ACM Sympos. Principles Distributed Comput. (Association for Computing Machinery, New York), 388–389.Google Scholar
- (2009a) Cooperative distributed multi-agent optimization. Palomar DP, Eldar YC, eds. Convex Optimization in Signal Processing and Communications (Cambridge University Press, Cambridge, UK), 340–384.Google Scholar
- (2009b) Distributed subgradient methods for multi-agent optimization. IEEE Trans. Automat. Control. 54(1):48–61.Crossref, Google Scholar
- (2010) Constrained consensus and optimization in multi-agent networks. IEEE Trans. Automatic Control 55(4):922–938.Crossref, Google Scholar
- (2013) Gossip learning with linear models on fully distributed data. Concurrency Comput. 25(4):556–571.Crossref, Google Scholar
- (1997a) An improved training algorithm for support vector machines. Neural Networks Signal Processing VII. Proc. 1997 IEEE Signal Processing Soc. Workshop (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 130–136.Google Scholar
- (1997b) Training support vector machines: An application to face detection. Proc. IEEE Comp. Soc. Conf. Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 130–136.Google Scholar
- (2013) Data Science for Business (OReilly, New York).Google Scholar
- (2011) Mining of Massive Datasets (Cambridge University Press, New York).Crossref, Google Scholar
- (2010) Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147(3):516–545.Crossref, Google Scholar
- (1994) The basic ideas in neural networks. Comm. ACM 37(3):87–92.Crossref, Google Scholar
- (2016) BPPGD: Budgeted parallel primal gradient descent kernel SVM on spark. IEEE First Internat. Conf. Data Sci. Cyberspace (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 74–79.Google Scholar
- (2016) Distributed semi-supervised support vector machines. Neural Networks 80(C):43–52.Crossref, Google Scholar
- (2009) Gossip algorithms. Foundations Trends Networking 3(1):1–125.Crossref, Google Scholar
- (2012) Boosting: Foundations and Algorithms (MIT Press, Cambridge, MA).Crossref, Google Scholar
- (2007) Pegasos: Primal estimated sub-gradient solver for SVM. Ghahramani Z, ed. Proc. 24th Annual Internat. Conf. Machine Learn (Association for Computing Machinery, New York), 807–814 .Google Scholar
- (2017) Regularization and feature selection for large dimensional data. Preprint, submitted December 6, https://arxiv.org/abs/1712.01975v3.Google Scholar
- (2019) Targeting prospective customers: Robustness of machine-learning methods to typical data challenges. Management Sci. 66(6):2495–2522.Link, Google Scholar
- (1982) Models of Bounded Rationality, vol. 2 (MIT Press, Cambridge, MA).Google Scholar
- (2017) Motivating process compliance through individual electronic monitoring: An empirical examination of hand hygiene in healthcare. Management Sci. 63(5):1563–1585.Link, Google Scholar
- (2003) Sparseness of support vector machines. J. Machine. Learn. Res. 4:1071–1105.Google Scholar
- (2016) Distributed Support Vector Machines: An Overview (Springer International Publishing, Berlin), 109–138.Google Scholar
- (1999) Handling concept drifts in incremental learning with support vector machines. Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 317–321.Google Scholar
- (2013) Mini-batch primal and dual methods for SVMs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn. (PMLR), 28(3):1022–1030.Google Scholar
- (2006) Distributed Systems: Principles and Paradigms, 2nd ed. (Prentice-Hall, Upper Saddle River, NJ).Google Scholar
- (2018) Communication compression for decentralized training. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 7652–7662.Google Scholar
- (2012) Multiple-UAV coordination and communications in tactical edge networks. IEEE Comm. Magazine 50(10):48–55.Crossref, Google Scholar
- (1984) Problems in decentralized decision making and computation. Unpublished PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- (1986) Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Automatic Control 31(9):803–812.Crossref, Google Scholar
- (2010) A scalable support vector machine for distributed classification in ad hoc sensor networks. Neurocomputing 74(1–3):394–400.Crossref, Google Scholar
- (2018) Gradient sparsification for communication-efficient distributed optimization. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 1299–1309.Google Scholar
- (1992) Control charts in the presence of data correlation. Management Sci. 38:1084–1105.Link, Google Scholar
- (2017) Terngrad: Ternary gradients to reduce communication in distributed deep learning. von Luxburg U, Guyon IM, Guyon I, Bengio S, Wallach H, Fergus R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Red Hook, NY), 1509–1519.Google Scholar
- (2016) RD-SVM: A resilient distributed support vector machine. Internat. Conf. Acoustics, Speech Signal Processing (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 2444–2448.Google Scholar
- (2012) Large linear classification when data cannot fit in memory. ACM Trans. Knowledge Discovery Data 5(4):23.Google Scholar
- (2017) IBM cognitive technology helps Aqualia to reduce costs and save resources in wastewater treatment. INFORMS J. Applied Anal. 47(5):411–424.Link, Google Scholar
- (2000) Large Scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, vol. 1759 (Springer-Verlag, Berlin).Crossref, Google Scholar
- (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. Proc. 21st Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 116.Google Scholar
- (2003) Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. Proc. 20th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 888–895.Google Scholar
- (2019) The value of pop-up stores on retailing platforms: Evidence from a field experiment with Alibaba. Management Sci. 65(11):5143–5151.Google Scholar
- (2017) ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. Proc. 34th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 4035–4043.Google Scholar
- (1993) Operational rationality through compilation of anytime algorithms. Unpublished PhD thesis, Computer Science Division, University of California Berkeley, Berkeley.Google Scholar
- (1996) Using anytime algorithms in intelligent systems. AI Magazine 17(3):73–83.Google Scholar

