Because everything is better when delivered in a cultured accent. Includes excellent deconstructions of arguments against the evolution of the eye, bacterial flagella, and others.
And, rebuttals of some criticisms of the video, which includes some excellent examples of common creationist tactics.
Statistics refers to various methods of looking at subsets of whole populations, called samples. Because it is usually impossible to use the entire population of whatever you are looking at (ie, when polling you can’t called /every/ American), we use samples of those populations and then extrapolate our results from that sample.
In science as in polling, you do want to try to have a representative sample, or to be comparing similar groups, so that your sample resembles the population. Single traits in the population, like height, tend to be distributed along a bell curve, as below – there are a lot of people of middle height, with some very short and some very tall people.
That’s a ‘normal’ curve, where the mean (average) equals the mode (most frequent score). There’s some fancier stuff about how tall or spread-out the curve is, but that’s essentially it.
Now, most statistical tests that you do in science assume that whatever you’re measuring is normally distributed (as in that curve above). They then compare the means of the difference groups in the experiment. See, there’s no way of knowing just from looking at two groups if they’re really that different, or if those differences are due only to chance. If you’ve measured, say, the reaction times of a medicated vs. a non-medicated group, you can’t just say ‘The medicated group had a mean reaction time of 500ms, and the non-medicated group had a mean reaction time of 450ms, therefore the medication slows reaction time.’ This might not be true – because of all the potential sources of error, the differences in reaction time can’t be solely attributed to the medication. What you need to know is /how/ different the reaction times are, and if you can say those differences are /significant/.
In the most general terms, what you would do here is find the difference between the two group averages, and then divide that by the total error measure of both groups. That’s a bit tricky to get into, so please for now take my word for it that you can get a mathematical representation of the error in the measurement (it’s like the +/- 0.05mL from high school science).
If you’re doing this by hand, you then take your score and compare it to a table of scores, based on your sample size. Normally, what we use as the significance cut-off is a p-value of < 0.05. What this means if our score falls into the top or bottom 2.5% on the curve above, and any difference we found between our groups is thus pretty unlikely to have occurred by chance alone. This is why polling data is reported as '19 times out of 20', since that's a 5% chance of the result being a coincidence.
Now, this is fine as it goes. However, if you want to compare more than two groups, things get tricky, because you'll often need to compare each group to one another, meaning instead of two comparisons you'll have 3 (3 groups), 5 (4 groups), 10 (5 groups), and so on (feel free to correct me if I messed up the comparison numbers there – the idea's still sounds).
Now, if I ask you to pull one gold coin out of a jar of 19 brass ones, you have a 5% chance of getting the right one. And the next time I ask you to do it, you still have a 5% chance, assuming you put the coin you pulled out first back inside. But – the second time you do it you have a 9.75% /cumulative/ chance of drawing the gold coin. The third time, you have a 14.3% chance. And so on. So how do we keep the chance 5% each time?
The same problem occurs during statistics. The more comparisons you do, the higher the likelihood of you getting a significant result through chance alone. Thus, we use a variety of techniques to make sure we're still looking at results which are unlikely to be found through chance alone. The method I use most often is a fairly conservative one (ie I'm more likely to discard a real difference as non-significant) called Bonferroni correction, which makes my significance criteria stricter for every comparison I do – essentially adding 20 coins for every time I plan to draw. Others do the equivalent of adding 20 coins after every new draw.
Now, where am I going with all this, other than giving some nifty insight into science and statistics?
Well, watch this and then come back:
From about 1:20 and again from 6:30 or so is what I'm thinking of. Coincidences seem marvellously unlikely to us, but the fact is that they are simply due to multiple comparisons. The more times we 'compare' things, the more likely we are to get a match, no matter how unlikely. The more people you meet, the more likely you are to meet someone else with a one-legged uncle names Charles. Spectacularly unlikely, yes, but if every person you meet counts as another comparison of means, or uncles, it becomes less and less unlikely.
This generalizes quite a bit. Lightning striking a church? There is a lot of lightning, and a lot of churches. Ignoring any real effects (churches are often on hills, have spires with metal bits, etc), by chance alone you'd almost have to /expect/ it to happen to some church over time.
Many things that seem unlikely are unlikely only in isolation – given the large number of things on Earth, and the fact that things happen over time, seemingly unlikely things are going to happen fairly regularly, especially when viewed in retrospect.
We need to try to correct for this. And that's what science strives for.