At this point, we’ve learned quite a few statistical tests. Here’s an outline with a few more details below:
During our last week, we’ll meet
This is the simplest, first situation that we dealt with. We are measuring the mean of numerical data. In the simplest case, we have one data sample - just a list of numbers.
The question is - does that data support the hypothesis that the mean of the population from which is was drawn is some particular number? If our data has sample mean \(\bar{x}\) and we suspec the population mean is \(\mu_0\), then our two-sided hypothesis can be written
A one sided hypothesis can be written with a greater or less, rather than a not equal.
The \(z\)-score for our mean is \[Z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}.\] We use then compare this against the standard normal distribution or a \(t\)-distribution (depending on the sample size) to compute the \(p\)-value. There are a couple of examples in our notes on the \(t\)-test.
These are very much like our tests for means. We are dealing now with proportions of categorical data. We often think of this in terms of a random variable \(X\) that is binomially distributed; thus, we need to know the binomoial distribution after dividing through by \(n\):
\[\begin{align} \mu &= p &\sigma^2 &= p(1-p)/n &\sigma &= \sqrt{p(1-p)/n} \end{align}\]Our hypothesis can be written
\[\begin{align} H_0 : \hat{p}=p_0 \\ H_A : \hat{p} \neq p_0 \end{align}\]Ultimately, we compute the \(p\) value using either a normal distribution (if the sample size is large) or a \(t\)-distribution (if the sample size is small). There are a couple of examples in our intro notes on Hypothesis Testing.
We use these tests when we have two numerical data sets that are independent of one another and we want to compute the difference between their means.
If the sets have sizes \(n_1\) and \(n_2\), we analyze the difference of the two means using a \(t\)-test with
Our hypothesis test again looks like
\[ \begin{array}{ll} H_0: & \mu_1 = \mu_2 \\ H_A: & \mu_1 \neq \mu_2 \end{array} \]
There are some examples of this in our notes on Relating Data Sets.
We use these tests when we have two categorical data sets that are independent of one another and we want to compute the difference between their proportions.
This is very similar to the difference of the two means but we now use
\[\hat{p} = \hat{p}_1 - \hat{p}_2\] and \[SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}.\]
We again use the minimum of \(n_1-1\) and \(n_2-1\) as the degrees of freedom.
Our hypothesis test again looks like
\[ \begin{array}{ll} H_0: & \hat{p}_1 = \hat{p}_2 \\ H_A: & \hat{p}_1 \neq \hat{p}_2 \\ \end{array} \]
There are again some examples of this in our notes on Relating Data Sets.
We use this when we have two data sets that are paired in a natural way; that is, each data point in one set corresponds to a particular data point in the other set.
Such a data set can be translated to a single data set by simply subtracting the data sets pair-wise.
Our hypotesis test looks like
\[ \begin{array}{ll} H_0: & \mu_1 = \mu_2 \\ H_A: & \mu_1 \neq \mu_2 \\ \end{array} \]
There are again some examples of this in our notes on Relating Data Sets.
The chi-square test is a method for assessing a model when the data are binned.
In this situation, we have two data sets, call them
Our hypothesis test looks like
We then compute the \(\chi^2\) statistic \[\chi^2 = \frac{(O_1 - E_1)^2}{E_1} + \frac{(O_2 - E_2)^2}{E_2} + \cdots + \frac{(O_k - E_k)^2}{E_k}\] and use the \(\chi^2\) distribution with \(k-1\) degrees of freedom.
There are some examples in our notes on the \(\chi^2\)-test.
Comments
Commonalities
All of the tests have a few things in common.
Differences
Perhaps the most obvious difference centers on the type of data being considered: numerical vs categorical.
There are other differences too, though.
Understanding these helps you know which to apply in a certain situation.