Sample complexity

The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function.

More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1.

There are two variants of sample complexity:

  • The weak variant fixes a particular input-output distribution;
  • The strong variant takes the worst-case sample complexity over all input-output distributions.

The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples.

However, if we are only interested in a particular class of target functions (e.g, only linear functions) then the sample complexity is finite, and it depends linearly on the VC dimension on the class of target functions.[1]

  1. ^ Cite error: The named reference :0 was invoked but never defined (see the help page).

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search