The central limit theorem
As stated earlier, the importance of the the normal distribution lies in the fact that it arises from the averaging process. The theoretical foundation of this statement is The Central Limit Theorem or CTL.
In its purest form, CTL deals with a sequence \[(X_i)_{i=1}^{\infty} = (X_1,X_2,X_3,\ldots)\] of independent, identically distributed random variables; we often refer to such a sequence as i.i.d.
Averaging
Given sequence of random variables, we can form a sequence of averages: \[\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i.\]
The first big theorem in the theory of random variables is the law of large numbers which states that, if \((X_i)_{i=1}^{\infty}\) is an i.i.d. sequence of random variables, each with mean \(\mu\), then \(\bar{X}_n\) almost surely approaches \(\mu\).
The central limit theorem is related but gives us more information about the how the convergence occurs.
Statement from Devore’s text
Let \(X_1,X_2,X_3,\ldots,X_n\) be a random sample from a distribution with mean \(\mu\) and variance \(\sigma^2\). Then if \(n\) is sufficiently large, \(\bar{X}\) has approximately a normal distribution with mean \(\mu\) and variance \(\sigma^2/n\). The larger the value of \(n\), the better the approximation.
Stated like a true statistician!!
In practice, we’ll often think of each \(X_i\) as representing and individual from some sample chosen from a whole population.
A computer experiment
We can illustrate the central limit theorem with a little computer experiment. Suppose we have a large data set of 20000 values. We grab a small sample of size 1, 4, 16, 32, or 64 from that data set. We then compute the average of the sample. As it turns out, the spread of each histogram seems to be about half as much as the previous.