1. 18
speakerdeck.com
1.

2. 2

“Non-intuitive”… The only intuition we’re born with is suckling. Everything else is habituation.

While I understand the value of sampling methods, they also kind of move your uncertainty from the data into the sampling method itself. What if you run a sampling method only once and you happen to actually draw a very weird sample that leads you to an unlikely conclusion? If you actually perform a computation, the only uncertainty is in your data and you can have mathematical certainty of what your uncertainty is!

Understanding Gaussian or t distributions is not that complicated. Making it look horrid by showing the formulae for their density functions is unnecessary. Most statisticians don’t know the t density’s formula off the top of their heads, but still find it intuitive: it’s a Gaussian with thicker tails and is practically a Gaussian once it hits a couple dozen degrees of freedom.

1. 2

I have a Facebook friend who has been describing his struggle writing a social science dissertation with little statistical background. When he says “I hate it that R doesn’t just let you easily plug in your data with default parameters and simply give you an answer”, that’s a sausage factory tour I didn’t ask for but it brings home the point that you should never just learn and instantly apply statistical methods without some broader understanding.

Yes, the reason statistics is hard isn’t simply because algorithm or function X is hard.

Statistics is hard because you always need to consider assumptions beyond what’s available in a spreadsheet full of data.

Statistics is hard because your own actions create a different situation outside of the data - ie, what does a value Y of statistic Z(data) “mean”? Well, it tends to mean one thing if you just choose that statistic going into the experiment and it means another if you sifted between methods until you got the one that gave you the result you were after.

Statistic is hard because no statistical routine should be applied automatically or naively. The reason statistical language is obscure isn’t because Neyman et. al were intentionally obtuse but rather because just a value doesn’t necessarily tell you anything. Instead, the effective and meaningful application of a statistic (a value calculated from data) requires one to have a handle on one’s data, that one has a good, consistent methodology going from data collection through evaluation of results.

1. 2

What if you run a sampling method only once and you happen to actually draw a very weird sample that leads you to an unlikely conclusion?

That’s why you don’t sample once. You’ll notice all the sampling methods in the slide-deck repeated the trial many times. Statistically, you will find samples that lie far out on the distribution. If it really was an outlier in some sense, the noise will be washed out by the signal. And if you continue finding these “anomalies”, the stats will inform you that these oddities are not due to random chance at all.

Obviously, there is always a preference to data collection:

1. Collect the entire population’s data. If you have all the data, you can make much more definitive claims
2. Realistically, you cannot always sample the entire population. So instead you collect samples from the population, and endeavor to collect unbiased samples (stratified sampling, etc). In an ideal world, you collect enough samples to do your stats
3. Realistically, you cannot even collect enough samples from your population to perform strong stats (e.g not enough turtles stacking shells to make any claim). So you start using methods described in the deck (shuffling, bootstrapping, etc) to draw inference from your limited sample data.

And in the end, I believe the sampling methods listed in this deck are just proxies for the underlying statistical formula. Simulating a coin flipping is the same as calculating the probability distributions directly…but some people may understand the simulation easier than the math.

1. 3

That’s why you don’t sample once. You’ll notice all the sampling methods in the slide-deck repeated the trial many times.

I understand. You can still sample many times and reach a very unlikely conlcusion, just like you can flip 20 heads in a row.

My point is that I would prefer if the actual mathematics were explained instead of being hidden away for fear of scaring away your audience. If we can handle programming, we can handle mathematics.

1. 2

Ah, I misinterpreted your comment then, apologies.

I think explaining things in terms of computation is a valid teaching strategy though. If someone can understand the reasoning behind a computation, they can apply that knowledge to understanding the underlying math. People unfortunately have a gut reaction to statistics being boring/dull/hard, so this could be a nice “gateway” to more complete understanding