IBM alignment models

The IBM alignment models are a sequence of increasingly complex models used in statistical machine translation to train a translation model and an alignment model, starting with lexical translation probabilities and moving to reordering and word duplication.^[1]^[2] They underpinned the majority of statistical machine translation systems for almost twenty years starting in the early 1990s, until neural machine translation began to dominate. These models offer principled probabilistic formulation and (mostly) tractable inference.^[3]

The IBM alignment models were published in parts in 1988^[4] and 1990,^[5] and the entire series is published in 1993.^[1] Every author of the 1993 paper subsequently went to the hedge fund Renaissance Technologies.^[6]

The original work on statistical machine translation at IBM proposed five models, and a model 6 was proposed later. The sequence of the six models can be summarized as:

Model 1: lexical translation
Model 2: additional absolute alignment model
Model 3: extra fertility model
Model 4: added relative alignment model
Model 5: fixed deficiency problem.
Model 6: Model 4 combined with a HMM alignment model in a log linear way

^ ^a ^b Brown, Peter F.; Pietra, Vincent J. Della; Pietra, Stephen A. Della; Mercer, Robert L. (1993-06-01). "The mathematics of statistical machine translation: parameter estimation". Comput. Linguist. 19 (2): 263–311. ISSN 0891-2017.
^ "IBM Models". SMT Research Survey Wiki. 11 September 2015. Retrieved 26 October 2015.
^ Yarin Gal; Phil Blunsom (12 June 2013). "A Systematic Bayesian Treatment of the IBM Alignment Models" (PDF). University of Cambridge. Archived from the original (PDF) on 4 Mar 2016. Retrieved 26 October 2015.
^ Brown, P.; Cocke, J.; Della Pietra, S.; Della Pietra, V.; Jelinek, F.; Mercer, R.; Roossin, P. (1988). "A Statistical Approach to Language Translation". Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics.
^ Brown, Peter F.; Cocke, John; Della Pietra, Stephen A.; Della Pietra, Vincent J.; Jelinek, Fredrick; Lafferty, John D.; Mercer, Robert L.; Roossin, Paul S. (1990). "A Statistical Approach to Machine Translation". Computational Linguistics. 16 (2): 79–85.
^ walutowyjohn (2013-01-28). "A Visionary Gift: Della Pietra Family Endows Biomedical Imaging Chair - SBU News". Stony Brook University News. Retrieved 2025-01-06.

[:1-1] Brown, Peter F.; Pietra, Vincent J. Della; Pietra, Stephen A. Della; Mercer, Robert L. (1993-06-01). "The mathematics of statistical machine translation: parameter estimation". Comput. Linguist. 19 (2): 263–311. ISSN 0891-2017.

[2] "IBM Models". SMT Research Survey Wiki. 11 September 2015. Retrieved 26 October 2015.

[3] Yarin Gal; Phil Blunsom (12 June 2013). "A Systematic Bayesian Treatment of the IBM Alignment Models" (PDF). University of Cambridge. Archived from the original (PDF) on 4 Mar 2016. Retrieved 26 October 2015.

[4] Brown, P.; Cocke, J.; Della Pietra, S.; Della Pietra, V.; Jelinek, F.; Mercer, R.; Roossin, P. (1988). "A Statistical Approach to Language Translation". Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics.

[5] Brown, Peter F.; Cocke, John; Della Pietra, Stephen A.; Della Pietra, Vincent J.; Jelinek, Fredrick; Lafferty, John D.; Mercer, Robert L.; Roossin, Paul S. (1990). "A Statistical Approach to Machine Translation". Computational Linguistics. 16 (2): 79–85.

[6] walutowyjohn (2013-01-28). "A Visionary Gift: Della Pietra Family Endows Biomedical Imaging Chair - SBU News". Stony Brook University News. Retrieved 2025-01-06.

[1]

[2]

[3]

[4]

[5]

[6]