No free lunch in search and optimization

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure (e.g., is a differentiable function) that can be exploited more efficiently (e.g., Newton's method in optimization) than random search or even has closed-form solutions (e.g., the extrema of a quadratic polynomial) that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search^[1] and optimization,^[2] is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference).^[3] Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction.^[4]

In the "no free lunch" metaphor, each "restaurant" (problem-solving procedure) has a "menu" associating each "lunch plate" (problem) with a "price" (the performance of the procedure in solving the problem). The menus of restaurants are identical except in one regard – the prices are shuffled from one restaurant to the next. For an omnivore who is as likely to order each plate as any other, the average cost of lunch does not depend on the choice of restaurant. But a vegan who goes to lunch regularly with a carnivore who seeks economy might pay a high average cost for lunch. To methodically reduce the average cost, one must use advance knowledge of a) what one will order and b) what the order will cost at various restaurants. That is, improvement of performance in problem-solving hinges on using prior information to match procedures to problems.^[2]^[4]

In formal terms, there is no free lunch when the probability distribution on problem instances is such that all problem solvers have identically distributed results. In the case of search, a problem instance in this context is a particular objective function, and a result is a sequence of values obtained in evaluation of candidate solutions in the domain of the function. For typical interpretations of results, search is an optimization process. There is no free lunch in search if and only if the distribution on objective functions is invariant under permutation of the space of candidate solutions.^[5]^[6]^[7] This condition does not hold precisely in practice,^[6] but an "(almost) no free lunch" theorem suggests that it holds approximately.^[8]

^ Wolpert, D. H.; Macready, W. G. (1995). "No Free Lunch Theorems for Search". Technical Report SFI-TR-95-02-010. Santa Fe Institute. S2CID 12890367.
^ ^a ^b Wolpert, D. H.; Macready, W. G. (1997). "No Free Lunch Theorems for Optimization". IEEE Transactions on Evolutionary Computation. 1: 67–82. CiteSeerX 10.1.1.138.6606. doi:10.1109/4235.585893. S2CID 5553697.
^ Wolpert, David (1996). "The Lack of A Priori Distinctions between Learning Algorithms". Neural Computation. Vol. 8. pp. 1341–1390. doi:10.1162/neco.1996.8.7.1341. S2CID 207609360.
^ ^a ^b Schaffer, Cullen (1994). "A conservation law for generalization performance" (PDF). In Willian, H.; Cohen, W. (eds.). International Conference on Machine Learning. San Francisco: Morgan Kaufmann. pp. 259–265.
^ Streeter, M. (2003) "Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold," Genetic and Evolutionary Computation – GECCO 2003, pp. 1418–1430.
^ ^a ^b Igel, C., and Toussaint, M. (2004) "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions," Journal of Mathematical Modelling and Algorithms 3, pp. 313–322.
^ English, T. (2004) No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227–234.
^ S. Droste, T. Jansen, and I. Wegener. 2002. "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions," Theoretical Computer Science, vol. 287, no. 1, pp. 131–144.

[WM95-1] Wolpert, D. H.; Macready, W. G. (1995). "No Free Lunch Theorems for Search". Technical Report SFI-TR-95-02-010. Santa Fe Institute. S2CID 12890367.

[WM97-2] Wolpert, D. H.; Macready, W. G. (1997). "No Free Lunch Theorems for Optimization". IEEE Transactions on Evolutionary Computation. 1: 67–82. CiteSeerX 10.1.1.138.6606. doi:10.1109/4235.585893. S2CID 5553697.

[Wolpert96-3] Wolpert, David (1996). "The Lack of A Priori Distinctions between Learning Algorithms". Neural Computation. Vol. 8. pp. 1341–1390. doi:10.1162/neco.1996.8.7.1341. S2CID 207609360.

[Schaffer94-4] Schaffer, Cullen (1994). "A conservation law for generalization performance" (PDF). In Willian, H.; Cohen, W. (eds.). International Conference on Machine Learning. San Francisco: Morgan Kaufmann. pp. 259–265.

[Streeter-5] Streeter, M. (2003) "Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold," Genetic and Evolutionary Computation – GECCO 2003, pp. 1418–1430.

[Igel-6] Igel, C., and Toussaint, M. (2004) "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions," Journal of Mathematical Modelling and Algorithms 3, pp. 313–322.

[English2004-7] English, T. (2004) No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227–234.

[ANFL-8] S. Droste, T. Jansen, and I. Wegener. 2002. "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions," Theoretical Computer Science, vol. 287, no. 1, pp. 131–144.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]