What do we do if we want to use the statistical techniques that we've learned to this point to model data, but our samples just aren't large enough? We use a new distribution (called the $t$-distribution) that's specifically designed to account for the extra variability inherent in smaller samples.
This is essentially section 7.1 of our textbook.
Suppose we have data on the level of mercury in dolphin muscle arising from 19 dolphins. We'd like a confidence interval for the average amount of mercury in dolphins. Unfortunately, we can't measure mercury in dolphin muscle without killing the dolphin. While it might be important for researchers to assess the threat of mercury in the ocean, they do not want to go kill more dolphins to get that data. What can they conclude from this data set, even though it's a bit too small to use a normal distribution?
The normal distribution, as awesome as it is, requires that we work with large sample sizes - at least 30 and more is better.
The $t$-distribution is similar but better suited to small sample sizes.
Just as with the normal distribution, there's not just one $t$-distribution but, rather, a family of distributions.
Like all continuous distributions, we compute probabilities with the $t$-distribution by computing the area under a curve. We do so using either a computer or a table.
The $t$-distribution is closely related to the standard normal distribution but has heavier tails to account for the extra variability inherent in small sample sizes. As the degrees of freedom increases, the corresponding $t$-distribution gets closer and closer to the standard normal.
The mean of the $t$-distribution is zero and its variance is related to the degrees of freedom $\nu$ by $$\sigma^2 = \frac{\nu}{\nu-2}.$$
Unlike the normal distribution, there's no easy way to translate from a $t$-distribution with one standard deviation to another standard one. As a result, it's less common to use tables and more common to use software than it is with the normal.
Given a particular number of degrees of freedom, however, there is a standard way to derive a $t$-score that's analogous to the $z$-score for the normal distribution. This $t$-score (which is also called a test statistic) is a crucial thing that you need to know when using tables for the $t$-distribution.
Finding a confidence interval using a $t$-distribution is a lot like finding one using the normal. It'll have the form $$ [\overline{x}-ME, \overline{x}+ME], $$ where the margin of error $ME$ is $$ ME = t^* \frac{\sigma}{\sqrt{n}}. $$ Note that the familiar $z^*$ multiplier has been replaced with a $t^*$-multiplier; it plays the exact same role but it comes from the $t$-distribution, rather than the normal distribution.
Here's the actual data recording the mercury content in dolphin muscle of 19 Risso’s dolphins from the Taiji area in Japan:
d = [
2.57,4.43,2.09,7.68,4.77,2.12,5.13,5.71,5.33,3.31,
7.49,4.91,2.58,1.08,6.60,3.91,3.97,6.18,5.90
]
Let's try to use this to compute a 95% confidence interval for the average mercury content in Risso's dolphins.
Here's how we would use Python to compute the mean and standard deviation for out data:
import numpy as np
m = np.mean(d)
s = np.std(d, ddof=1)
[m,s]
[4.513684210526317, 1.8806388114630852]
The crazy looking ddof
parameter forces np.std
to compute the sample standard deviation, rather than the population standard deviation - i.e. it uses an $n-1$ in the denominator, rather than an $n$.
Alternatively, you can get the mean and standard deviation with our basic calculator for numeric data
Next, the multiplier $t^*$ can be computed using t.ppf
from the scipy.stats
module:
from scipy.stats import t
tt = t.ppf(0.975,df=18)
tt
2.10092204024096
Note that the $t^*>2$ and $2$, of course, would be the multiplier for the normal distribution. This makes some sense because the $t$-distribution is more spread out than the normal.
Alternatively, you can try our new t-distribution calculator.
Finally, the confidence interval is:
[m-tt*s/np.sqrt(19), m+tt*s/np.sqrt(19)]
[3.6072453185234767, 5.420123102529157]
We can also use the $t$-distribution to do hypothesis tests with small sample size. When running a hypothesis test for the mean of numerical data, we again have a hypothesis test that looks like so:
\begin{align} H_0 : \mu&=\mu_0 \\ H_A : \mu&\neq \mu_0 \: (\text{or} > \text{or} <). \end{align}As before, $\mu$ denotes the actual population mean and $\mu_0$ is an assumed, specific value. The question is, whether recently collected data supports the alternative hypothesis (two-sided here, though it could be one-sided).
As before, we compute something that looks like a $Z$-score, though more generally, it's typically called a test statistic. Assuming the data has sample size $n$, mean $\bar{x}$, and standard deviation $s$, then the test-statistic is $$\frac{\bar{x}-\mu_0}{s/\sqrt{n}}.$$ We then evaluate the test statistic using the appropriate $t$-distribution to determine whether to reject the null hypothesis or not.
Returning to the dolphin example, let's suppose that the desired average level of mercury is 3. Let's run a hypothesis test to see if the data supports the alternative hypothesis that the actual average is larger than $3$ to a 99% level of confidence.
I guess our hyptothesis test should look like
\begin{align*} H_0 : \mu&=\mu_0 \\ H_A : \mu&> \mu_0. \end{align*}Recall that the computed mean and standard deviation are
$$\bar{x} = 4.51368 \text{ and } s = 1.8806388.$$Also, the sample size is 19 so our test statistic is
from numpy import sqrt
m = 3
xbar = 4.51368
n = 19
s = 1.8806388
se = s/sqrt(n)
T = (xbar-m)/se
T
3.50837074767289
Here's the p-value:
1-t.cdf(T,n-1)
0.0012548211679794807
Thus, we reject the null!
Our new t-distribution calculator can do this, too!
We could also note that our test statistic is within the so-called rejection region, which is to the right of our rejection point of about $2.55$, as shown in the following plot of the $t$-distribution with 18 degrees of freedom:
The rejection region starts at a critical point that can be computed with Python code like so:
t.ppf(0.99,18)
2.552379630179453
That's where the $2.55$ comes from!
But again, you can try our new t-distribution calculator.
We can apply the $t$-distribution to either a sample mean for numerical data or a sample proportion for categorical data - just as we would use the normal distribution when the sample size is large enough.
Conventional wisdom states that about 5% of the general population has blue eyes. Scanning our class data, I find that 2 out of the 13 women in the class, or 15.38%, have blue eyes. Should this evidence dissuade us from believing the generally accepted value at the 90% level of confidence?
More precisely, if $p$ represents the proportion of folks with blue eyes, we are considering the hypothesis test:
\begin{align*} H_0 &: p=0.05 \\ H_A &: p\neq0.05. \end{align*}We compute the test-statistic exactly as we would a Z-score.
from numpy import sqrt
p = 0.05
n = 13
phat = 2/n
se = sqrt(p*(1-p)/n)
T = (p-phat)/se
T
-1.7179688600346001
Here's the p-value:
from scipy.stats import t
2*t.cdf(T,12)
0.11147190137424734
Note that we multiply by 2, since this is a two sided hypothesis test. This gives us a value larger than 0.1, thus we fail to reject the null.
Once again, we could have done this computation with our new t-distribution calculator.