Google Scholar

Fix your classifier: the marginal value of training the last weight layer

E Hoffer, I Hubara, D Soudry - arXiv preprint arXiv:1801.04540, 2018 - arxiv.org

arXiv preprint arXiv:1801.04540, 2018•arxiv.org

Neural networks are commonly used as models for classification for a wide variety of tasks.
Typically, a learned affine transformation is placed at the end of such models, yielding a per-
class value used for classification. This classifier can have a vast number of parameters,
which grows linearly with the number of possible classes, thus requiring increasingly more
resources. In this work we argue that this classifier can be fixed, up to a global scale
constant, with little or no loss of accuracy for most tasks, allowing memory and computational …

Neural networks are commonly used as models for classification for a wide variety of tasks. Typically, a learned affine transformation is placed at the end of such models, yielding a per-class value used for classification. This classifier can have a vast number of parameters, which grows linearly with the number of possible classes, thus requiring increasingly more resources. In this work we argue that this classifier can be fixed, up to a global scale constant, with little or no loss of accuracy for most tasks, allowing memory and computational benefits. Moreover, we show that by initializing the classifier with a Hadamard matrix we can speed up inference as well. We discuss the implications for current understanding of neural network models.

arxiv.org

Show moreShow less

Save Cite Cited by 104 Related articles All 4 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Fix your classifier: the marginal value of training the last weight layer