TD-Gammon

TD-Gammon is a computer backgammon program developed in 1992 by Gerald Tesauro at IBM's Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained by a form of temporal-difference learning, specifically TD-Lambda.

The final version of TD-Gammon (2.1) was trained with 1.5 million games of self-play, and achieved a level of play just slightly below that of the top human backgammon players of the time. It explored strategies that humans had not pursued and led to advances in the theory of correct backgammon play.

In 1998, during a 100-game series, it was defeated by the world champion by a mere margin of 8 points. Its unconventional assessment of some opening strategies had been accepted and adopted by expert players.^[1]

^ Sammut, Claude; Webb, Geoffrey I., eds. (2010), "TD-Gammon", Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 955–956, doi:10.1007/978-0-387-30164-8_813, ISBN 978-0-387-30164-8, retrieved 2023-12-25

[1] Sammut, Claude; Webb, Geoffrey I., eds. (2010), "TD-Gammon", Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 955–956, doi:10.1007/978-0-387-30164-8_813, ISBN 978-0-387-30164-8, retrieved 2023-12-25

[1]