Google Scholar

[PDF][PDF] Hierarchical probabilistic neural network language model

F Morin, Y Bengio - International workshop on artificial …, 2005 - proceedings.mlr.press

International workshop on artificial intelligence and statistics, 2005•proceedings.mlr.press

Abstract

In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, eg in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used n-gram models, both for training and recognition. As an alternative to an importance sampling method proposed to speed-up training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.

proceedings.mlr.press

Show moreShow less

Save Cite Cited by 1371 Related articles All 12 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] Hierarchical probabilistic neural network language model