Learning two layer rectified neural networks in polynomial time

A Bakshi, R Jayaram… - Conference on Learning …, 2019 - proceedings.mlr.press
Conference on Learning Theory, 2019proceedings.mlr.press
We consider the following fundamental problem in the study of neural networks: given input
examples $ x\in\mathbb {R}^ d $ and their vector-valued labels, as defined by an underlying
generative neural network, recover the weight matrices of this network. We consider two-
layer networks, mapping $\mathbb {R}^ d $ to $\mathbb {R}^ m $, with a single hidden layer
and $ k $ non-linear activation units $ f (\cdot) $, where $ f (x)=\max\{x, 0\} $ is the ReLU
activation function. Such a network is specified by two weight matrices, $\mathbf …
Abstract
We consider the following fundamental problem in the study of neural networks: given input examples and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, mapping to , with a single hidden layer and non-linear activation units , where is the ReLU activation function. Such a network is specified by two weight matrices, , such that the label of an example is given by , where is applied coordinate-wise. Given samples as a matrix and the label of the network on these samples, our goal is to recover the weight matrices and . More generally, our labels may be corrupted by noise, and instead we observe where is some noise matrix. Even in this case, we may still be interested in recovering good approximations to the weight matrices and . In this work, we develop algorithms and hardness results under varying assumptions on the input and noise. Although the problem is NP-hard even for , by assuming Gaussian marginals over the input we are able to develop polynomial time algorithms for the approximate recovery of and . Perhaps surprisingly, in the noiseless case our algorithms recover \textit {exactly}, ie with no error, in\textit {strongly} polynomial time. To the best of the our knowledge, this is the first algorithm to accomplish exact recovery for the ReLU activation function. For the noisy case, we give the first polynomial time algorithm that approximately recovers the weights in the presence of mean-zero noise . Our algorithms generalize to a larger class of\textit {rectified} activation functions, when , and otherwise. Although our polynomial time results require to have full column rank, we also give a fixed-parameter tractable algorithm (in ) when does not have this property. Lastly, we give a fixed-parameter tractable algorithm for more arbitrary noise matrices , so long as they are independent of .
proceedings.mlr.press
Showing the best result for this search. See all results