1. 5
statisticsdonewrong.com
1.

2. 2

The way p-values are used in this world is a bit silly. We’ve decided completely arbitrarily that we’re okay with 5% of published results being noise, but even that is not what `p* = 0.05` even means, because we can’t meaningfully quantify over “possibly publishable results”, so we have this inverse situation where we accept that, yes, 5 percent of the time, we’ll get a “significant” finding in the contrived example where there is only noise (i.e. there is not a very slight positive difference or a very slight negative one, but none by construction). In truth, if 96% of the things we tried to test were false, we would expect more than half of published results to be false.

To me it’s obvious that the right approach to statistics would be Bayesianism– if we had any good way to determine priors. In truth, frequentism is just the approach of applying a uniform[1] prior, but the existence of meaningful nonlinear transformation of variables is alone enough to prove that there’s no such thing as a transform-independent uniform prior. The issue, of course, is that for many questions we have no idea what the correct priors should be. It’s especially hard to choose unbiased priors on questions that seem binary like “Does this drug work?”

Bayesian results can be unsatisfying insofar as they often involve generating possible worlds (from a posterior distribution) and using a Monte Carlo simulation to answer the question we care about, but that’s probably the most truthful assessment of what can actually infer about the world based on probability. It’s easy enough to come up with scenarios where the maximum likelihood solution is misleading (i.e. taxicab problem) or even wrong.

[1] Of course, this “prior” isn’t always a proper one, i.e. a probability distribution. For example, maximum-likelihood fitting asserts a prior where each model is equally probable, but on many spaces (even R^n) there is no distribution for which this is true. To me, this isn’t a major issue because while probability measures have certain limitations, probabilities as odds-ratios still work, even for “possible worlds” spaces and topologies far more pathological than R^n.