A New Likelihood Ratio Method for Training Artificial Neural Networks

Published Online:https://doi.org/10.1287/ijoc.2021.1088

References

  • Azulay A, Weiss Y (2018) Why do deep convolutional. networks generalize so poorly to small image transformations? Preprint, submitted May 30, http://arxiv.org/abs/1805.12177.Google Scholar
  • Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2):1008–1031.CrossrefGoogle Scholar
  • Cao X, Gong NL (2017) Mitigating evasion attacks to deep neural networks via region-based classification. Proc. 33rd Annual Comput. Security Appl. Conf., 278–287.Google Scholar
  • Carlini N, Wagner D (2017) Toward evaluating the robustness of neural networks. Proc. 2017 IEEE Symp. Security Privacy, 39–57.Google Scholar
  • Carlini N, Mishra P, Vaidya T, Zhang Y, Sherr M, Shields C, Wagner D, Zhou W (2016) Hidden voice commands. USENIX Security Symp., 513–530.Google Scholar
  • Choromanski K, Rowland M, Sindhwani V, Turner RE, Weller A (2018) Structured evolution with compact architectures for scalable policy optimization. Preprint, submitted April 6, https://arxiv.org/abs/1804.02395.Google Scholar
  • Cui Z, Fu MC, Hu J-Q, Liu Y, Peng Y, Zhu L (2020) On the variance of single-run unbiased stochastic derivative estimators. INFORMS J. Comput. 32(2):390–407.AbstractGoogle Scholar
  • Elsayed GF, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow IJ, Sohl-Dickstein J (2018) Adversarial examples that fool both human and computer vision. Preprint, submitted February 22, https://arxiv.org/abs/1802.08195.Google Scholar
  • Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. Preprint, revised April 10, https://arxiv.org/abs/1707.08945.Google Scholar
  • Fu MC (2015) Stochastic gradient estimation. Fu MC, ed. Handbooks of Simulation Optimization (Springer, New York), 105–147.CrossrefGoogle Scholar
  • Fu MC, Hong LJ, Hu J-Q (2009) Conditional Monte Carlo estimation of quantile sensitivities. Management Sci. 55(12):2019–2027.LinkGoogle Scholar
  • Glynn PW (1990) Likelihood ratio gradient estimation for stochastic systems. Comm. ACM. 33(10):75–84.CrossrefGoogle Scholar
  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
  • Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://arxiv.org/abs/1412.6572.Google Scholar
  • Haykin S (1998) Neural Networks: A Comprehensive Foundation (Prentice Hall, Hoboken, NJ).Google Scholar
  • Haykin SS (2009) Neural Networks and Learning Machines, vol. 3 (Pearson, Upper Saddle River, NJ).Google Scholar
  • Heidelberger P, Cao X-R, Michael A Zazanis RS (1988) Convergence properties of infinitesimal perturbation analysis estimates. Management Sci. 34(11):1281–1302.LinkGoogle Scholar
  • Heidergott B, Leahu H (2010) Weak differentiability of product measures. Math. Oper. Res. 35(1):27–51.LinkGoogle Scholar
  • Heidergott B, Volk-Makarewicz W (2016) A measure-valued differentiation approach to sensitivity analysis of quantiles. Math. Oper. Res. 41(1):293–317.LinkGoogle Scholar
  • Hendrycks D, Dietterich TG (2018) Benchmarking neural network robustness to common corruptions and perturbations. Preprint, submitted July 4, http://arxiv.org/abs/1807.01697.Google Scholar
  • Ho Y-C, Cao X-R (1991) Discrete Event Dynamic Systems and Perturbation Analysis (Kluwer Academic Publishers, Boston).CrossrefGoogle Scholar
  • Ho Y-C, Cao X, Cassandras C (1983) Infinitesimal and finite perturbation analysis for queueing networks. Automatica 19(4):439–445.CrossrefGoogle Scholar
  • Hong LJ (2009) Estimating quantile sensitivities. Oper. Res. 57(1):118–130.LinkGoogle Scholar
  • Jin J, Dundar A, Culurciello E (2015) Robust convolutional neural networks under adversarial noise. Preprint, submitted November 19, https://arxiv.org/abs/1511.06306.Google Scholar
  • Jordan MI (2018) Dynamical, symplectic and stochastic perspectives on gradient-based optimization. Proc. Internat. Congress of Mathematicians: Rio de Janeiro 2018.Google Scholar
  • LeCun YA, Bottou L, Orr GB, Müller K-R (2012) Efficient backprop. Yann AL, Leon B, Genevieve BO, Klaus-Robert M, eds. Neural Networks: Tricks of the Trade (Springer, New York), 9–48.CrossrefGoogle Scholar
  • L’Ecuyer P (1990) A unified view of the IPA, SF, and LR gradient estimation techniques. Management Sci. 36(11):1364–1383.LinkGoogle Scholar
  • Lehman J, Chen J, Clune J, Stanley KO (2018) Es is more than just a traditional finite-difference approximator. Proc. Genetic Evolutionary Comput. Conf. (ACM), 450–457.Google Scholar
  • Liao F, Liang M, Dong Y, Pang T, Zhu J, Hu X (2017) Defense against adversarial attacks using high-level representation guided denoiser. Preprint, submitted December 8, https://arxiv.org/abs/1712.02976.Google Scholar
  • Liu G, Hong LJ (2011) Kernel estimation of the Greeks for options with discontinuous payoffs. Oper. Res. 59(1):96–108.LinkGoogle Scholar
  • Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte Carlo gradient estimation in machine learning. J. Machine Learn. Res. 21(132):1–62.Google Scholar
  • Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.CrossrefGoogle Scholar
  • Norgaard M, Ravn O, Poulsen NK, Hansen LK (2000) Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook. Advanced Textbooks in Control and Signal Processing (Springer, Berlin).Google Scholar
  • Papernot N, Jha S, Fredrikson M, Celik ZB, McDaniel P, Swami A (2016) The limitations of deep learning in adversarial settings. 2016 IEEE Eur. Symp. Security Privacy, 372–387.Google Scholar
  • Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. Internat. Conf. Machine Learn., 1310–1318.Google Scholar
  • Peng Y, Fu MC, Hu J-Q, Heidergott B (2018) A new unbiased stochastic derivative estimator for discontinuous sample performances with structural parameters. Oper. Res. 66(2):487–499.LinkGoogle Scholar
  • Pflug GC (1996) Optimization of Stochastic Models (Kluwer Academic, Boston).CrossrefGoogle Scholar
  • Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Networks 12(1):145–151.CrossrefGoogle Scholar
  • Recht B, Roelofs R, Schmidt L, Shankar V (2018) Do CIFAR-10 classifiers generalize to CIFAR-10? Preprint, submitted June 1, http://arxiv.org/abs/1806.00451.Google Scholar
  • Reiman MI, Weiss A (1989) Sensitivity analysis for simulations via likelihood ratios. Oper. Res. 37(5):830–844.LinkGoogle Scholar
  • Rubinstein RY (1986) The score function approach for sensitivity analysis of computer simulation models. Math. Comput. Simulation 28(5):351–379.CrossrefGoogle Scholar
  • Rubinstein RY (1992) Sensitivity analysis of discrete event systems by the “push out” method. Ann. Oper. Res. 39(1):229–250.CrossrefGoogle Scholar
  • Rubinstein RY, Shapiro A (1993) Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method (Wiley, New York).Google Scholar
  • Rumerlhar DE, Hinton GE, Williams RJ (1986) Learning representation by back-propagating errors. Nature 323:533–536.CrossrefGoogle Scholar
  • Sabour S, Faghri F, Cao Y, Fleet DJ (2015) Adversarial manipulation of deep representations. Preprint, submitted November 16, https://arxiv.org/abs/1511.05122.Google Scholar
  • Schalkoff RJ (1997) Artificial Neural Networks, vol. 1 (McGraw-Hill, New York).Google Scholar
  • Sharda R (1994) Neural networks for the ms/or analyst: An application bibliography. Interfaces 24(2):116–130.LinkGoogle Scholar
  • Szegedy C, Sutskever I, Bruna J, Erhan D, Goodfellow I, Zaremba W, Fergus R (2014) Intriguing properties of neural networks. Preprint, submitted December 21, https://arxiv.org/abs//1312.6199.Google Scholar
  • Tam KY, Kiang MY (1992) Managerial applications of neural networks: The case of bank failure predictions. Management Sci. 38(7):926–947.LinkGoogle Scholar
  • Wang Y, Fu MC, Marcus SI (2012) A new stochastic derivative estimator for discontinuous payoff functions with application to financial derivatives. Oper. Res. 60(2):447–460.LinkGoogle Scholar
  • Xiao L, Peng Y, Hong LJ, Ke Z, Yang S (2020) Training artificial neural networks by generalized likelihood ratio method: An effective way to improve robustness. Proc. IEEE Trans. Automation Sci. Engrg (Institute of Electrical and Electronics Engineers (IEEE), New York), 1343–1348.Google Scholar
  • Zantedeschi V, Nicolae M-I, Rawat A (2017) Efficient defenses against adversarial attacks. Proc. 10th ACM Workshop Artificial Intelligence Security (Association for Computing Machinery, New York), 39–49.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.