Small samples

Thu, Mar 28, 2024

An example problem

Suppose we have the following data on the level of mercury in dolphin muscle arising from 19 Risso’s dolphins from the Taiji area in Japan:

2.57 4.43 2.09 7.68 4.77 2.12 5.13 5.71 5.33 3.31
7.49 4.91 2.58 1.08 6.6 3.91 3.97 6.18 5.9

The units are micrograms of mercury per gram of dolphin muscle. We now ask - what kinds of conclusions could we draw from this data? Could we

  • Compute a confidence interval in which we genuinely have confidence?
  • Or, run a hypothesis test see if the average concentration is larger than some desirable amount?

Sample size

The data we have is one of many possible samples and, as such, its mean may be considered to be a random variable.

The basic idea in statistics is to first determine the distribution of these types of random variables and then the model the data based on that distribution.

To this point, we’ve used the normal distribution to model our data and with good reason - the central limit theorem tells us that this process should work well for means provided that the sample size is sufficiently large.

A bare minimum for “sufficiently large” is 30. That assumes, though, that the underlying data is normally distributed so, really, 100 or more is better.

The \(t\)-Distribution

We now discuss a new distribution (called the \(t\)-distribution) that’s specifically designed to account for the extra variability inherent in smaller samples.

General comments

  • As with the normal distribution, there’s not just one \(t\)-distribution but, rather, a family of distributions. In fact, there’s one for each sample size.
  • Each \(t\)-distribution looks qualitatively like the normal distribution. The larger the sample size, the closer the corresponding \(t\)-distribution is to normal.
  • The smaller the sample size, the heavier are the tails of the corresponding \(t\)-distribution. That’s exactly what accounts for the extra variability and, for example, leads to larger confidence intervals.
  • Like all continuous distributions, we compute probabilities with the \(t\)-distribution by computing the area under a curve. We do so using either a computer or a table.

A picture

The \(t\)-distribution is closely related to the standard normal distribution but has heavier tails to account for the extra variability inherent in small sample sizes. As the degrees of freedom increases, the corresponding \(t\)-distribution gets closer and closer to the standard normal.

A formula

There’s even a formula:

\[f(t) = \frac{\left(\frac{\nu-1}{2}\right)!} {\sqrt{\nu\pi}\,\left(\frac{\nu-2}{2}\right)!} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}\]

As you can check on Desmos.

Some more detailed comments

  • The mean of the \(t\)-distribution is zero and its variance is related to the degrees of freedom \(\nu\) by \[\sigma^2 = \frac{\nu}{\nu-2}.\]
  • Unlike the normal distribution, there’s no easy way to translate from a \(t\)-distribution with one standard deviation to another standard one. As a result, it’s less common to use tables and more common to use software than it is with the normal.
  • Given a particular number of degrees of freedom, however, there is a standard way to derive a \(t\)-score that’s analogous to the \(z\)-score for the normal distribution. This \(t\)-score (which is also called a test statistic) is a crucial thing that you need to know when using tables for the \(t\)-distribution.

Confidence intervals

Finding a confidence interval using a \(t\)-distribution is a lot like finding one using the normal. It’ll have the form \[ [\overline{x}-ME, \overline{x}+ME], \] where the margin of error \(ME\) is \[ ME = t^* \frac{\sigma}{\sqrt{n}}. \] Note that the familiar \(z^*\) multiplier has been replaced with a \(t^*\)-multiplier; it plays the exact same role but it comes from the \(t\)-distribution, rather than the normal distribution.

Data

Recall that we have the following actual data recording the mercury content in dolphin muscle of 19 dolphins:

data = [
    2.57,4.43,2.09,7.68,4.77,2.12,5.13,5.71,5.33,3.31,
    7.49,4.91,2.58,1.08,6.60,3.91,3.97,6.18,5.90
]

We’ve now typed out the data into a Python list. Thus, we can attempt to use it, together with NumPy and SciPy tools, to compute a 95% confidence interval for the average mercury content in Risso’s dolphins.

Computations with Python

Here’s how we would use Python to compute the mean and standard deviation for our data:

import numpy as np
m = np.mean(data)
s = np.std(data, ddof=1)
[m,s]
[4.513684210526317, 1.8806388114630852]

Note that we specify ddof=1 so that np.std computes the sample standard deviation, rather than the population standard deviation - i.e. it uses an \(n-1\) in the denominator, rather than an \(n\). That’s particularly important when dealing with small sample sizes!

The standard error

The standard error, of course, is the underlying standard deviation divided by the square root of the sample size:

n = len(data)
se = s/np.sqrt(n)
se
0.4314481330772647

The \(t^*\)-multiplier

Next, the multiplier \(t^*\) can be computed using t.ppf from the scipy.stats module:

from scipy.stats import t
tt = t.ppf(0.975, df = len(data)-1)
tt
2.10092204024096

Note that the \(t^*>2\) and \(2\), of course, would be the multiplier for the normal distribution. This makes some sense because the \(t\)-distribution is more spread out than the normal.

The confidence interval

Finally, the confidence interval is:

[m-tt*s/np.sqrt(19), m+tt*s/np.sqrt(19)]
[3.6072453185234767, 5.420123102529157]

A hypothesis test

Suppose that scientists have determined that we really need to keep the mercury concentration down to 3.2 micrograms per gram of dolphin muscle. We can ask:

Does the data support the conclusion that the average dolphin has more than 3.2 micrograms of mercury per gram of muscle?

We’d like to work at a 90% level of confidence.

Statement

We should clearly state our Hypothesis test. That is, if \(\mu\) represents the actual average concentration, then our hypotheses are

\[\begin{align} H_0 &: \mu=3.2 \longleftarrow \text{sometimes written } \mu\leq3.2 \\ H_A &: \mu > 3.2. \end{align}\]

Since we are working at the 90% level of confidence, we specify a significance level of \(\alpha=0.1\). Thus, a \(p\)-value less than this indicates that we reject the null.

A picture

The dots show the actual measurements; we’d like to stay below the red line.

Pictorially, it doesn’t look so good. :(

Mean and standard error

We’ve still got the mean of the data m, the assumed or desired mean of m0=3.2 and the standard error:

[m, se]
[4.513684210526317, 0.4314481330772647]

Thus, we can compute a test statistic:

m0 = 3.2
test_stat = (m-m0)/se
test_stat
3.0448253447212372

Conclusion

Finally, we compute the \(p\)-value:

1-t.cdf(test_stat,18)
0.003485335996901595

Doesn’t look so good!

More specifically, since our \(p\)-value is less than \(\alpha=0.1\), we reject the null hypothesis that the average concentration doesn’t exceed 3.2 in favor of the alternative hypothesis that it does.