Google Scholar

Synthetic and natural noise both break neural machine translation

Y Belinkov, Y Bisk - arXiv preprint arXiv:1711.02173, 2017 - arxiv.org

arXiv preprint arXiv:1711.02173, 2017•arxiv.org

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary
issues, learn morphology, and move us closer to completely end-to-end translation systems.
Unfortunately, they are also very brittle and easily falter when presented with noisy data. In
this paper, we confront NMT models with synthetic and natural sources of noise. We find that
state-of-the-art models fail to translate even moderately noisy texts that humans have no
trouble comprehending. We explore two approaches to increase model robustness …

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

arxiv.org

Show moreShow less

Save Cite Cited by 765 Related articles All 5 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Synthetic and natural noise both break neural machine translation