[PDF][PDF] Scalable modified Kneser-Ney language model estimation

K Heafield, I Pouzyrevsky, JH Clark… - Proceedings of the 51st …, 2013 - aclanthology.org
Proceedings of the 51st Annual Meeting of the Association for …, 2013aclanthology.org
We present an efficient algorithm to estimate large modified Kneser-Ney models including
interpolation. Streaming and sorting enables the algorithm to scale to much larger models by
using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB
RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation
experiments with this model show improvement of 0.8 BLEU point over constrained systems
for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is …
Abstract
We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
aclanthology.org
Showing the best result for this search. See all results