1. 9
    1. 1

      How can a “first-principles understanding of deep neural networks” make sense?

      I mean, look at the general linear model. It is consider a fairly well worked out statistical model. You could say that there is a fairly good theoretical understanding of how the model works (I remember it as if errors are normal, min square regression yields a maximum likelihood estimator. I’m probably butchering the formulation but it doesn’t matter for this). But if you apply that model to, say, a physical system, you need some understanding of the system or you’re just application of this model doesn’t really has a theoretical basis.

      Now, jump to neural networks. These are in practice just curve fitting. Even more, their intended use case is “stuff we have no model of”. Moreover, they aim to approximate stuff with a variety of concrete internal structures that the approximation process doesn’t directly take into account. Which is to say they model on the level of “it’s probably X” but not on the level of “here’s the estimated distribution of the thing’s error around X” and I don’t see how they could get something like that.

    🇬🇧 The UK geoblock is lifted, hopefully permanently.