Part of a series on |
Machine learning and data mining |
---|
In machine learning, backpropagation is a gradient estimation method commonly used for training a neural network to compute its parameter updates.
It is an efficient application of the chain rule to neural networks. Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming.[1][2][3]
Strictly speaking, the term backpropagation refers only to an algorithm for efficiently computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent, or as an intermediate step in a more complicated optimizer, such as Adaptive Moment Estimation.[4] The local minimum convergence, exploding gradient, vanishing gradient, and weak control of learning rate are main disadvantages of these optimization algorithms. The Hessian and quasi-Hessian optimizers solve only local minimum convergence problem, and the backpropagation works longer. These problems caused researchers to develop hybrid[5] and fractional[6] optimization algorithms.
Backpropagation had multiple discoveries and partial discoveries, with a tangled history and terminology. See the history section for details. Some other names for the technique include "reverse mode of automatic differentiation" or "reverse accumulation".[7]
© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search