[PDF][PDF] Hierarchical probabilistic neural network language model

F Morin, Y Bengio - International workshop on artificial …, 2005 - proceedings.mlr.press
F Morin, Y Bengio
International workshop on artificial intelligence and statistics, 2005proceedings.mlr.press
In recent years, variants of a neural network architecture for statistical language modeling
have been proposed and successfully applied, eg in the language modeling component of
speech recognizers. The main advantage of these architectures is that they learn an
embedding for words (or other symbols) in a continuous space that helps to smooth the
language model and provide good generalization even when the number of training
examples is insufficient. However, these models are extremely slow in comparison to the …
Abstract
In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, eg in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used n-gram models, both for training and recognition. As an alternative to an importance sampling method proposed to speed-up training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.
proceedings.mlr.press
Showing the best result for this search. See all results