Very commonly, we wish to compare two data sets. To do so, we’ll apply the same hypothesis test that we’ve used before but with a mean and standard deviation specially chosen to match the problem at hand. Depending upon the degrees of freedom, we might choose to compute the \(p\)-value using a \(t\)-distribution or a normal distribution.
As an example, we can compare the running speeds of two age groups as I happen to have the results from the 2015 Peachtree Road Race stored on my webspace. Suppose, for example, that we wish to explore the question: Are men between the ages of 35 and 40 generally faster than men between the ages of 45 and 50?
To answer this question, we’ll read in our entire data set and grab a random sample of size 100 from each group of interest. While were at it, let’s compute the mean of each group.
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
men = subset(df, Gender='M')
young = subset(men, 35<=Age & Age<40)
old = subset(men, 45<=Age & Age<50)
set.seed(1) # For reproducibility
young_times = sample(young$Net.Time, 100)
old_times = sample(old$Net.Time, 100)
mu_young = mean(young_times)
mu_old = mean(old_times)
c(mu_young,mu_old)
## [1] 73.41334 78.38337
As expected, the average of the younger times is less than average of the older times. What can we infer, though, about the general population from this small sample?
The fabulous t.test
command provides the ability to compare two means directly. If we have two groups with means \(\mu_1\) and \(\mu_2\), the logical hypotheses statements are something like
In our case, if \(\mu_1\) represents the average of the younger times and \(\mu_2\) represents the average of the older times, we might be more interested in hypotheses like
We can run this like so:
t.test(young_times, old_times, alternative = "less")
##
## Welch Two Sample t-test
##
## data: young_times and old_times
## t = -1.7147, df = 191.39, p-value = 0.04401
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.1792582
## sample estimates:
## mean of x mean of y
## 73.41334 78.38337
The p-value
indicates that, at a 95% confidence level, ther is a genuine difference.
The ideas behind the computations that produce the above \(t\)-test are quite similar to the hypothesis tests we’ve done before. The major new ingredients are formulae to combine the measures of the two original datasets into one. To do so, suppose that
Then, we analyze the difference of the two means using a \(t\)-test with
In this set up, the expression \[\frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\] is often called the test statistic.
We should point out that these same techniques work for large sample sizes but, when the sample sizes are so large that the degrees of freedom exceeds 30, we can just use the normal distribution, rather than the \(t\)-distribution. This works since the normal distribution is a limit of \(t\)-distributions. Thus, the t.test
command works regardless of sample sizes.
Sam thinks that there is a difference in quality of life between rural and urban living. He collects information from obituaries in newspapers from urban and rural towns in Idaho to see if there is a difference in life expectancy. A sample of 4 people from rural towns give a life expectancy of \(\bar{x}_r = 42\) years with a standard deviation of \(\sigma_r=6.99\) years. A sample of 6 people from larger towns give \(\bar{x}_u=81.9\) years and \(\sigma_u=5.64\) years. Does this provide evidence that people living in rural Idaho communities have a shorter life expectancy than those in more urban communities to a \(95\%\) level of confidence?
Solution: Our hypothesis test looks like
There are three degrees of freedom and the test statistic is
\[\frac{\bar{x}_r - \bar{x}_u}{\sqrt{\frac{\sigma_1^r}{n_r} + \frac{\sigma_u^2}{n_u}}} = \frac{72-81.9}{\sqrt{\frac{6.99^2}{4}+\frac{5.64^2}{6}}} = -2.36543.\]
We have a two-sided alternative hypothesis, thus we compute the \(p\)-value via
2*pt(-2.36543, 3)
## [1] 0.09891221
I guess we cannot reject the null hypothesis.
Alternatively, we could use a table. If we look at our \(t\)-table, we see something that looks like so:
one tail | 0.100 | 0.050 | 0.025 | 0.010 | 0.005 |
two tails | 0.200 | 0.100 | 0.050 | 0.020 | 0.010 |
df 1 | 3.08 | 6.31 | 12.71 | 31.82 | 63.66 |
2 | 1.89 | 2.92 | 4.30 | 6.96 | 9.92 |
3 | 1.64 | 2.35 | 3.18 | 4.54 | 5.84 |
4 | 1.53 | 2.13 | 2.78 | 3.75 | 4.60 |
The entries in this table are critical \(t^*\) values. The columns indicate several common choices for confidence level and are alternately labeled either one-sided or two. The rows correspond to degrees of freedom. Thus, you can figure out where a given \(t\)-score lies relative to your critical value.
Look in the row where \(df=3\). As we move from left to right along this row, the corresponding \(p\)-values must decrease. We are interested in the column where the two-sided test is equal to 0.05. The corresponding \(t^*\) value in our row and column is 3.16. Since our \(t\)-score of 2.36 is less than that, the \(p\)-value must be larger than \(0.05\), thus we (again) fail to reject the null-hypothesis.
The ideas behind the comparison of two sample proportions is very similar to the ideas behind the comparison of two sample means. We’ve just got to figure out the correct formulation and parameters to use in our \(t\)-test.
Let’s illustrate the ideas in the context of a problem. It’s widely believed that Trump’s support among men is stronger than his support among women. Let’s use some data to test this.
According to a recent Reuter’s poll Trump’s most recent approval rating stands at 40% but there appears to be a difference between the views of men and the views of women. Among the 1009 men surveyed, 44% approve of Trump. Among the 1266 women surveyed, only 36% approve of Trump.
Does this data support our conjecture that Trump’s support among men is higher than that among women to a 95% level of confidence?
Solution: Let’s first clearly state our hypotheses. Let’s suppose that \(p_m\) represents the proportion of men who support Trump and \(p_w\) represent the proportion of women who support Trump. Our hypothesis test can be written
The point behind the reformulation to compare with zero is that it gives us just one number that we can apply a standard \(t\)-test to. Now, we have measured proportions of \(\hat{p}_m = 0.44\) and \(\hat{p}_w = 0.36\). Thus, we want to run our test with \[\hat{p} = \hat{p}_m - \hat{p}_w = 0.44 - 0.36 = 0.08.\] We want just one standard error as well, which we get by adding the variances in the two samples. That is, \[SE = \sqrt{\frac{\hat{p}_m(1-\hat{p}_m)}{n_m} + \frac{\hat{p}_w(1-\hat{p}_w)}{n_w}} = \sqrt{\frac{0.44 \times 0.56}{1009} + \frac{0.36 \times 0.64}{1266}} \approx 0.02064.\]
Of course, I computed this with R:
pm = 0.44
pw = 0.36
se = sqrt( pm*(1-pm)/1009 + pw*(1-pw)/1266 )
se
## [1] 0.02064444
We can now compute our test statistic: \[T = \frac{\hat{p}_m - \hat{p}_w}{SE}.\] via
t = 0.08/se
t
## [1] 3.875136
With this very large test statistic, we can reject the null hypothesis and conclude with confidence that there is a difference between the way that men and women view Trump.
We have learned the most basic version of the Student’s \(t\)-test. There are a number of variations which involve slighltly different forumulae. R’s t.test
command, for example, uses the Welch’s \(t\)-test, which has a much more complicated formula for the degrees of freedom parameter.
Importantly, MyOpenMath uses the following formula for computing standard error for proportions:
\[SE = \sqrt{\frac{\bar{p}(1-\bar{p})}{n_1} + \frac{\bar{p}(1-\bar{p})}{n_2}},\] where \[\bar{p} = \frac{\hat{p}_1 n_1 + \hat{p}_2 n_2}{n_1+n_2}.\]
Thus, \(\bar{p}\) is simply a weighted average of the two measured proportions \(\hat{p}_1\) and \(\hat{p}_2\). In the Trump approval poll example, we’d get:
\[\bar{p} = \frac{0.44 \times 1009 + 0.36 \times 1266}{1009+1266} = 0.395481\] so that \[SE = \sqrt{\frac{0.395\times0.605}{1009} + \frac{0.395\times0.605}{1266}}=0.02063.\]
Note that our prior computation of the standard error yielded \(0.02064\) so this is rarely a big deal.