1. 3
  1.  

  2. 5

    Beyond the neural-net application, this post may be interesting to aficionados of floating-point weirdness.

    The gist of it is: Neural networks usually have an explicit nonlinearity in them, like a tanh function, to allow them to learn nonlinear functions. But when implemented on IEEE floating-point hardware, you can omit the explicit nonlinearity and still learn nonlinear functions with deep “linear” networks. That’s because IEEE floating point has a highly nonlinear representation when very near to zero, which an appropriate training algorithm can exploit.