Additive smoothing

In statistics, additive smoothing, also called Laplace smoothing[1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts from a -dimensional multinomial distribution with trials, a "smoothed" version of the counts gives the estimator

where the smoothed count , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in § Pseudocount below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) and the uniform probability Invoking Laplace's rule of succession, some authors have argued[citation needed] that α should be 1 (in which case the term add-one smoothing[2][3] is also used)[further explanation needed], though in practice a smaller value is typically chosen.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

  1. ^ C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.
  2. ^ Jurafsky, Daniel; Martin, James H. (June 2008). Speech and Language Processing (2nd ed.). Prentice Hall. p. 132. ISBN 978-0-13-187321-6.
  3. ^ Russell, Stuart; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education, Inc. p. 863.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search