Optimization algorithm for artificial neural networks
This article is about the computer algorithm. For the biological process, see neural backpropagation.
Backpropagation can also refer to the way the result of a playout is propagated up the search tree in Monte Carlo tree search.
This article may be in need of reorganization to comply with Wikipedia's layout guidelines. The reason given is: Inconsistent use of variable names and terminology without images to match. Please help by editing the article to make improvements to the overall structure.(August 2022) (Learn how and when to remove this message)
Strictly the term backpropagation refers only to the algorithm for computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent.[14] In 1986 David E. Rumelhart et al. published an experimental analysis of the technique.[15] This contributed to the popularization of backpropagation and helped to initiate an active period of research in multilayer perceptrons.
^Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Masters) (in Finnish). University of Helsinki. pp. 6–7.
^Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Optimization Stories. Documenta Matematica, Extra Volume ISMP. pp. 389–400. S2CID15568746.
^Goodfellow, Bengio & Courville (2016, p. 217–218), "The back-propagation algorithm described here is only one approach to automatic differentiation. It is a special case of a broader class of techniques called reverse mode accumulation."
^Rosenblatt, Frank (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms Cornell Aeronautical Laboratory. Report no. VG-1196-G-8 Report (Cornell Aeronautical Laboratory). Spartan. pp. Page XIII Table of contents, Page 292 "13.3 Back-Propagating Error Correction Procedures", Page 301 "figure 39 BACK-PROPAGATING ERROR-CORRECTION EXPERIMENTS".
^Bryson, Arthur E. (1962). "A gradient method for optimizing multi-stage allocation processes". Proceedings of the Harvard Univ. Symposium on digital computers and their applications, 3–6 April 1961. Cambridge: Harvard University Press. OCLC498866871.
^Goodfellow, Bengio & Courville 2016, p. 200, "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."
^Cite error: The named reference learning-representations was invoked but never defined (see the help page).