Beyond the neural-net application, this post may be interesting to aficionados of floating-point weirdness.
The gist of it is: Neural networks usually have an explicit nonlinearity in them, like a tanh function, to allow them to learn nonlinear functions. But when implemented on IEEE floating-point hardware, you can omit the explicit nonlinearity and still learn nonlinear functions with deep “linear” networks. That’s because IEEE floating point has a highly nonlinear representation when very near to zero, which an appropriate training algorithm can exploit.
Beyond the neural-net application, this post may be interesting to aficionados of floating-point weirdness.
The gist of it is: Neural networks usually have an explicit nonlinearity in them, like a
tanhfunction, to allow them to learn nonlinear functions. But when implemented on IEEE floating-point hardware, you can omit the explicit nonlinearity and still learn nonlinear functions with deep “linear” networks. That’s because IEEE floating point has a highly nonlinear representation when very near to zero, which an appropriate training algorithm can exploit.