2.57 | 4.43 | 2.09 | 7.68 | 4.77 | 2.12 | 5.13 | 5.71 | 5.33 | 3.31 |

7.49 | 4.91 | 2.58 | 1.08 | 6.6 | 3.91 | 3.97 | 6.18 | 5.9 |

Thu, Mar 28, 2024

Suppose we have the following data on the level of mercury in dolphin muscle arising from 19 Risso’s dolphins from the Taiji area in Japan:

2.57 | 4.43 | 2.09 | 7.68 | 4.77 | 2.12 | 5.13 | 5.71 | 5.33 | 3.31 |

7.49 | 4.91 | 2.58 | 1.08 | 6.6 | 3.91 | 3.97 | 6.18 | 5.9 |

The units are micrograms of mercury per gram of dolphin muscle. We now ask - what kinds of conclusions could we draw from this data? Could we

- Compute a confidence interval in which we genuinely have confidence?
- Or, run a hypothesis test see if the average concentration is larger than some desirable amount?

The data we have is one of many possible samples and, as such, its mean may be considered to be a random variable.

The basic idea in statistics is to first determine the distribution of these types of random variables and then the model the data based on that distribution.

To this point, we’ve used the normal distribution to model our data and with good reason - the central limit theorem tells us that this process should work well for means *provided that the sample size is sufficiently large*.

A bare minimum for “sufficiently large” is 30. That assumes, though, that the underlying data is normally distributed so, really, 100 or more is better.

We now discuss a new distribution (called the \(t\)-distribution) that’s specifically designed to account for the extra variability inherent in smaller samples.

- As with the normal distribution, there’s not just one \(t\)-distribution but, rather, a
*family*of distributions. In fact, there’s one for each sample size. - Each \(t\)-distribution looks qualitatively like the normal distribution. The larger the sample size, the closer the corresponding \(t\)-distribution is to normal.
- The smaller the sample size, the heavier are the tails of the corresponding \(t\)-distribution. That’s exactly what accounts for the extra variability and, for example, leads to larger confidence intervals.
- Like all continuous distributions, we compute probabilities with the \(t\)-distribution by computing the area under a curve. We do so using either a computer or a table.

The \(t\)-distribution is closely related to the standard normal distribution but has heavier tails to account for the extra variability inherent in small sample sizes. As the degrees of freedom increases, the corresponding \(t\)-distribution gets closer and closer to the standard normal.

There’s even a formula:

\[f(t) = \frac{\left(\frac{\nu-1}{2}\right)!} {\sqrt{\nu\pi}\,\left(\frac{\nu-2}{2}\right)!} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}\]

As you can check on Desmos.

- The mean of the \(t\)-distribution is zero and its variance is related to the degrees of freedom \(\nu\) by \[\sigma^2 = \frac{\nu}{\nu-2}.\]
- Unlike the normal distribution, there’s no easy way to translate from a \(t\)-distribution with one standard deviation to another standard one. As a result, it’s less common to use tables and more common to use software than it is with the normal.
- Given a particular number of degrees of freedom, however, there is a standard way to derive a \(t\)-score that’s analogous to the \(z\)-score for the normal distribution. This \(t\)-score (which is also called a
*test statistic*) is a crucial thing that you need to know when using tables for the \(t\)-distribution.

Finding a confidence interval using a \(t\)-distribution is a lot like finding one using the normal. It’ll have the form \[ [\overline{x}-ME, \overline{x}+ME], \] where the margin of error \(ME\) is \[ ME = t^* \frac{\sigma}{\sqrt{n}}. \] Note that the familiar \(z^*\) multiplier has been replaced with a \(t^*\)-multiplier; it plays the exact same role but it comes from the \(t\)-distribution, rather than the normal distribution.

Recall that we have the following actual data recording the mercury content in dolphin muscle of 19 dolphins:

We’ve now typed out the data into a Python list. Thus, we can attempt to use it, together with NumPy and SciPy tools, to compute a 95% confidence interval for the average mercury content in Risso’s dolphins.

Here’s how we would use Python to compute the mean and standard deviation for our data:

`[4.513684210526317, 1.8806388114630852]`

Note that we specify `ddof=1`

so that `np.std`

computes the sample standard deviation, rather than the population standard deviation - i.e. it uses an \(n-1\) in the denominator, rather than an \(n\). That’s particularly important when dealing with small sample sizes!

The standard error, of course, is the underlying standard deviation divided by the square root of the sample size:

Next, the multiplier \(t^*\) can be computed using `t.ppf`

from the `scipy.stats`

module:

Note that the \(t^*>2\) and \(2\), of course, would be the multiplier for the normal distribution. This makes some sense because the \(t\)-distribution is more spread out than the normal.

Finally, the confidence interval is:

Suppose that scientists have determined that we really need to keep the mercury concentration down to 3.2 micrograms per gram of dolphin muscle. We can ask:

Does the data support the conclusion that the average dolphin has more than 3.2 micrograms of mercury per gram of muscle?

We’d like to work at a 90% level of confidence.

We should clearly state our Hypothesis test. That is, if \(\mu\) represents the actual average concentration, then our hypotheses are

\[\begin{align} H_0 &: \mu=3.2 \longleftarrow \text{sometimes written } \mu\leq3.2 \\ H_A &: \mu > 3.2. \end{align}\]

Since we are working at the 90% level of confidence, we specify a significance level of \(\alpha=0.1\). Thus, a \(p\)-value less than this indicates that we reject the null.

The dots show the actual measurements; we’d like to stay below the red line.

Pictorially, it doesn’t look so good. :(

We’ve still got the mean of the data `m`

, the assumed or desired mean of `m0=3.2`

and the standard error:

Thus, we can compute a test statistic:

Finally, we compute the \(p\)-value:

Doesn’t look so good!

More specifically, since our \(p\)-value is less than \(\alpha=0.1\), we reject the null hypothesis that the average concentration doesn’t exceed 3.2 in favor of the alternative hypothesis that it does.