1. 5
  1.  

  2. 17

    Replacing statistics by clueless button pushing, aka machine learning, would be a disaster.

    I understand that statistics is unpopular: it requires people to think and carefully collect lots of data. But it can be used to provide insight about how things work.

    Machine learning is a black box. Chuck in lots of data, push some buttons and sprout clueless techno-babble.

    Machine learning in SE research is a train wreck

    Software effort estimation is mostly fake research

    1. 4

      I understand that statistics is unpopular: it requires people to think and carefully collect lots of data. But it can be used to provide insight about how things work.

      That’s not why statistics are different from ML methods. The term you’re looking for is “explainability”, the ability for an inferred model to actually explain the situation being observed to the practitioner. You’re strawmanning statistics and ML while trying to make your point.

      Machine learning is a black box. Chuck in lots of data, push some buttons and sprout clueless techno-babble.

      Could you please actually respond to the article instead of grinding your axe? Or, just don’t respond at all. I think axe grinding just decreases the quality of commentary on the forum. Not answering and not voting is a way to deal with topics that you don’t have a substantive critique for but you still dislike.

      1. 3
        1. The links you provided are non-specific whining related to… ahm, some talks about ML at UCL and software effort estimation research is wrong. I’m unsure how this is at all related to the OP.

        Replacing statistics by clueless button pushing, aka machine learning, would be a disaster. There is no hard line between the two, but IF you grant that people are clueless about stat models, which they are, we might as well switch to a paradigm that generates a more powerful model that doesn’t require people to understand the model to begin with in order to ensure valid results.

        Machine learning is a black box. Chuck in lots of data, push some buttons and sprout clueless techno-babble.

        So are statistical models, unless you understand them, in which case they are obviously outright wrong in a lot of cases, and thus they are worst than a black box, they are a fake model of reality (see, for example: assuming everything fits a Gaussian or making binaries/multiclass issues out of continuous problems). So if people are already taking a black-box approach, we might as well useless biased black boxes that generate more powerful models with validation techniques that don’t require strong assumptions.

        On the whole, I’d love to actually understand your issues with the content, other than the title, because based on the posts you linked I don’t think our views of “proper research” are actually very different.

        1. 3

          Using statistics does not guarantee that the model is correct. No technique guarantees correctness.

          The appropriate adage is: “All models are wrong, but some are useful”.

          1. 2

            Exactly, which is why any modelling effort requires:

            1. An explicit target (What does “useful” mean?)
            2. The best model possible to achieve that target

            The statistical approaches a lot of fields currently use don’t do either very well, hence, the article.

      2. 7

        This is an interesting article. I have a few “big picture” issues with it that I discuss below.

        The argument in the introduction seems somewhat all over the place. The points that the author raises are interesting, but I am not sure if I agree with the conclusion. For example, the author claims that

        Furthermore, the complexity of the statistical apparatus being used leads to the general public being unable to understand scientific research.

        as a critique of “classical statistical” methods. While I agree with the author that scientific and statistical literacy is a problem, I am at a loss of the eventual conclusion that statistical learning models, that are even less interpretable than classical models, are a solution for this problem.

        I am also not sure if I follow how these three conclusions that the author highlights for the post follow from the article:

        1. Simplicity; It can be explained to an average 14-year-old (without obfuscating its shortcomings)
        2. Inferential power; It allows for more insight to be extracted out of data
        3. Safety against data manipulation; It disallows techniques that permit clever uses of “classical statistics” to generate fallacious conclusions

        As far as 1, a 14 year old in a regular algebra class will be able to carry out a simple linear regression with 1 variable. That’s because the methods are rather simple and it is easy enough to understand the difference between the line of best fit and the observed points. I am not completely sure if the same is true for even the simplest statistical learning models.

        Maybe I am not understanding what “inferential power” means here, but my understanding of machine learning is that those models excel at predictions, as in out of sample predictions, rather than inferences. I am also fuzzy about the kind of insights that the author is going to discuss.

        Finally, I am not sure how data manipulation (as in data cleaning, for example) is connected to data analysis and how statistical learning “fixes” them. All the ideas that are discussed further in the post seem to be on the data analysis side, so I don’t see the connection between this last insight to the rest of the post.

        As a final comment, most research is done to answer specific research questions. Usually a researcher asks a question and picks the best methods to answer it. The author seems to advocate moving from inferential questions (e.g., that is the difference between a group that received X and a group that didn’t) towards predictive questions (e.g., what will happen if this group receive X, maybe?). I wooden if a better motivation for the post needs to be an epistemological first rather than a methodological/technical one.

        A few more fine-grained questions:

        • In point iii), isn’t the author just describing jackknifing/leave-one-out? The same idea can be applied to classical methods to calculate non-parametric standard errors/p-values. What am I missing here?
        • In point vi), there seem to be a strong assumption that all confounding variables are observed and that the deconfounding technique that the author describes can be applied to the data (by the way, if I understood the procedure that is described, it should be the same idea behind classical multiple regression). The most difficult thing about deconfounding is trying to account for the effects of unobserved variables (e.g., omitted variable bias). I am not sure if machine learning methods are better or worse than classical methods to address these issues.
        1. 3

          that are even less interpretable than classical models, are a solution for this problem.

          I agree that ML models are less interpretable. But what I think lends itself to explainability is performing CV and reporting accuracy as a function of that (easy to get, relatively speaking) and making the error-function explicit, i.e. informing people of what you’re trying to optimize and comparing your model to a “base” assumption or previous models on that metric.

          You don’t need to understand how the model operates to get the above, but in statistics, the underlying model (assumptions about it’s error distribution) is actually relevant to interpreting e.g. the p-values.

          All the ideas that are discussed further in the post seem to be on the data analysis side, so I don’t see the connection between this last insight to the rest of the post.

          I focus on how data analysis can be done under this paradigm because a lot of the strong argument pro using outdated statistical model seems to focus on their power of “explaining” the data while making predictions at the same time. So I’m trying to showcase why that also works just as well if not better with more complex models.

          I agree it’s in part a different point.

          I wooden if a better motivation for the post needs to be an epistemological first rather than a methodological/technical one.

          In part the post came as part of a long series of post about why a predictive based epistemology is the only way to actually make sense of the world, and why e.g. a “equation first” epistemology is just a subset of that which happens to work decently in some domains (e.g. medieval physics).

          But I guess the article specifically was meant mostly towards people that already agree a predictive epistemology makes more sense to pursue knowledge about the world, to actually lay out how it could be done better than now.

          In part my fault for not tackling this assumption, but at that point I’d have had a literal book instead of a way-too-long article.

          In point iii), isn’t the author just describing jackknifing/leave-one-out? The same idea can be applied to classical methods to calculate non-parametric standard errors/p-values. What am I missing here?

          You aren’t missing anything. Classical methods would work just fine under this paradigm and I assume for many problems a linear regression would still end up being a sufficiently complex model to capture all relevant connections.

          In point vi), there seem to be a strong assumption that all confounding variables are observed and that the deconfounding technique that the author describes can be applied to the data (by the way, if I understood the procedure that is described, it should be the same idea behind classical multiple regression). The most difficult thing about deconfounding is trying to account for the effects of unobserved variables (e.g., omitted variable bias). I am not sure if machine learning methods are better or worse than classical methods to address these issues.

          I’d argue accounting for the effect of unobserved variables is essentially impossible without making ridiculous assumptions about the external world (see psychology and degrees of freedom). I agree that ML models don’t add anything new here.

        2. 6

          Reader: feel free to read the article, but you won’t find anything valuable. The parts that are radical aren’t accurate, and the parts that are accurate aren’t radical.

          1. 2

            See the end of the article, 50% of what I’m saying is a rehash of ideas that existed since the 70s, I even gave a link to a 73 paper presented to the royal society outlining basically the same CV method I propose, but were never implemented widely enough or (to my knowledge) explained in modern terminology for a layman to understand.

            I’m glad most things are obvious and not radical, that’s my whole point, these things are obvious and are method that should have been used everywhere since forever.

            I would love to hear what parts aren’t accurate though, as it stands this just seems to be a drive-by sneer.

          2. 3

            That being said, code is systematized rigour, it’s mathematics/logic that can be computer validated rather than supposed to shakey parsing of the human mind. So, ahm, just take a few weeks and learn some basic python?

            [earlier]

            I read papers, I don’t run experiments

            I can tell.

            1. 3

              This article leaves me uneasy between the broad brush paints against not-sure-what and pushing from one fallacy (scientists don’t know enough stats, the system is rigged with stupid indices to evaluate research (p-value)) to another one (highly complex models without any control or understand but that may or not predict something based another set of indices). I find it dishonest to shout out only social sciences where medecine and biomedical sciences are at the heart of the replication crisis. The first list about the differences between “classical statistics” and ML seems false based on how I learned. Cross-validation was always part of the models evaluation, [experimental design] is utterly essential knowledge to assure the analysis of “Aan experiment (in generally : How, why and by whom your data are created is essential knowledge to identify biais at the origin of the dataset).

              One really good book to better understand statistics for anyone is Statistics done wrong by Alex Reinhart.

              First quote learned about models and the only to always keep in mind “All models are wrong, but some are useful” attributed to G. Box.

              1. 1

                There is a replication crisis in all areas of research. It just so happens that Medicine and biomedical researchers are the first to make a major fuss about the issue, and start doing something about it.

                The replication crisis is motivated by the low status of performing replications.

                Software engineering is a long way away from a replication crisis, because so few experiments are done in the first place.

              2. 3

                I think this article’s heart is in the right place but I’m not getting what I think the author would like out of it. When I’m deciding on a model, I rarely think of “ML vs statistics”, but my decision tree goes as follows:

                1. Does OLS Linear/Logistic Regression work?
                2. If not, how much does explainability matter? (e.g. will I go with a Bayesian Model/Net or something else)
                3. How do I arrive at meaningful conclusions? (Decision functions, bias variance tradeoff, etc)
                4. Choose an overall model architecture
                5. Run it. Profit!!
                6. (Not really…… Cry because that didn’t work and try again)

                Moreover, while I realize p-values were just a bit of a footnote in your overall piece, I’d love to see a more expanded discussion of “fidelity” in models. I’m partial to effect sizes myself, or Bayesian Credible Intervals myself since they sidestep a lot of the problems with p-values.

                1. 2

                  Correlation is not transitive, but causation is transitive. We cannot simply replace causality with correlations. (Additionally, causality is bounded by the laws of spacetime, while correlations can happen in concomitant elsewheres.)

                  The parts of machine learning which are formalizeable and explicable are not replacements for statistics, but part of its standard techniques; polynomial regression and SVMs can make it easier to see which inputs are important.

                  1. 1

                    Not sure I get your point. Are you saying: correlation expresses relationships between/among items (events, observations) and causation expresses the direction of a relationship and the value of that direction? Is your point that correlation is like a state machine (depicting “as is”), and that causation can only be determined when we have a large collection of observations over a period of time (“transitive”)?

                    1. 1

                      (Pearson) correlation is a computed coefficient between datasets. Correlation is not transitive; there exist datasets X, Y, and Z such that the correlations between X and Y, and Y and Z, are both high; but the correlation between X and Z is low.

                      Causation is usually transitive by definition. In Bayesian networks, an interpretation of directed acyclic graphs, causality is encoded by graph edges, and transitivity of causality is encoded by graph paths. Each edge is weighted with a probability, and we can multiply the probabilities along each path in a form of Bayes’ rule to give a weight for that path.

                      Then, my point is that it’s not always possible to recover a correct Bayesian network from a family of Pearson correlations. This is because, while we can easily map each measured dataset to a putative causal vertex, we cannot map correlations between datasets to edges in a Bayesian network by simply transforming the correlation coefficients into edge weights. We cannot tell whether a correlation is actually causal or merely a spurious correlation.

                      This is but one special case of the general concept that correlation does not necessarily imply causation.