Here are the solutions to the practice problems for next week's final exam.
Suppose we randomly select 105 competitors in the 2012 Boston Marathon found an average time of 253.77 minutes with a standard deviation of 45.72 minutes. We wish to write down a $98\%$ confidence interval for this data.
As we know, the confidence interval should have the form $$\bar{x} \pm z^* SE,$$ where $\bar{x}=253.77$ is the computed mean and the standard error is the computed standard deviation divided by the sample size: $$SE = s/\sqrt{n} = 45.72/\sqrt{105} \approx 4.46$$ That last part is the solution to part (a).
To find $z^*$ for a $98\%$ confindence level, look for $(1-0.98)/2 = 0.01$ in the normal table. It looks like, we get a value between $2.32$ and $2.33$. (Either of those is fine, though I prefer to round up.)
For the final answer, we get
$$\bar{x} \pm z^* SE = 253.77 \pm 2.33 \times 4.46.$$Suppose we randomly select 4 runners from the 2012 Boston marathon and find their times in minutes to be
273.8 | 203.5 | 259.4 | 246.1 |
The mean is $$\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{273.8 + 203.5 + 259.4 +246.1}{4} = 245.7.$$
The standard deviation is $$\begin{aligned} s &= \sqrt{\frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + \cdots + (x_n - \bar{x})^2}{n-1}} \\ &= \sqrt{\frac{(273.8 - 245.7)^2 + (203.5 - 245.7)^2 + (259.5 - 245.7)^2 + (246.1 - 245.7)^2}{3}} \\ &\approx 30.3373. \end{aligned}$$
The standard error is $$SE = \frac{s}{\sqrt{n}} = \frac{30.3373}{\sqrt{4}} = 15.16.$$
To write down the confidence interval, we'll need a $t^*$ multiplier, which we look up in our $t$-table. We find that for a a two-tailed significance of level of $0.05$, we'll need $t^*=3.18$:
Finally, our confidence interval is $$\bar{x} \pm t^* \times SE = 245.7 \pm 3.18 \times 15.15.$$
A random sample of 1200 runners in the 2012 Boston Marathon found that 507 of them were women. Run a hypothesis test to check the null hypothesis that half of marathon runners are women against the alternative hypothesis that less than half of marathon runners are women. Be sure to
The hypothesis test is $$\begin{aligned} & H_0: p = 0.5 \\ & H_A: p < 0.5, \end{aligned}$$ where $p$ denotes the proportion of marathon runners who are women.
The standard error is $$SE = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.5\times0.5}{1200}} \approx 0.0144.$$
The test-statistic is $$T = \frac{\hat{p} - p}{SE} = \frac{\frac{507}{1200} - 0.5}{0.0144} \approx -5.38$$
To conclude the test, we'll clearly reject the null hypothesis with such an extreme test statistic.
In the 2012 Boston Marathon, there were
Use this information to test the null hypotesis that $\mu_{40}=\mu_{50}$ against the alternative hypthesis that $\mu_{40} < \mu_{50}$, where $\mu_{40}$ denotes the average time of runners in their forties and $\mu_{50}$ denotes the average time of runners in their fifties. Be sure to
The hypothesis test is $$\begin{aligned} & H_0: \mu_{40} - \mu_{50} = 0 \\ & H_A: \mu_{40} - \mu_{50} < 0 \\ \end{aligned}$$ where $\mu_{40}$ denotes the average time of marathon runners in their 40s and $\mu_{50} denotes the average time of marathon runners in their 50s.
The standard error is $$SE = \sqrt{\frac{s_{40}^2}{n_{40}} + \frac{s_{50}^2}{n_{50}}} = \sqrt{\frac{43.7^2}{7217} + \frac{44.7^2}{4156}} = 0.863.$$
The test-statistic is $$\frac{\mu_{40}-\mu_{50}}{SE} = \frac{255.2-270.8}{0.863} = -18.0765.$$ We, again, immediately reject the null hypothesis.
In the 2012 Boston Marathon, there were 59 runners under the age of 40 who had also run the Boston Marathon in 2002 when they were under the age of 30. I computed the pairwise difference of those runners' times in 2012 minus their times in 2002 and found a mean of 26.1 minutes with a standard deviation of 32.7 minutes. Let's use this data to run a hypothesis test to see if runners slow down over this age range. Specifically, let $\mu_1$ denote their first time in 2002 and let $\mu_2$ denote their second time in 2012. Test the null hypothesis that $\mu_1 = \mu_2$ vs the alternative hypothesis that $\mu_1<\mu_2$ at the $99\%$ confidence level. Be sure to
The hypothesis test is $$\begin{aligned} & H_0: \mu_{1} - \mu_{2} = 0 \\ & H_A: \mu_{1} - \mu_{2} < 0. \\ \end{aligned}$$
The standard error is $$SE = s/\sqrt{n} = 32.7/\sqrt{59} = 4.257.$$
The test statistic is $$\frac{\bar{x}-0}{SE} = \frac{26.1}{4.257} = 6.13.$$ Once again, we immediately reject the null.
The picture below shows a scatter plot for a random sample of 1200 runners in the Boston Marathon. The $x$-coordinate of each point corresponds to the runner's age and the $y$-coordinate corresponds to the runners time in minutes. The regression line for the data is also shown and has formula $y = 0.719x + 227.6.$
LinregressResult(slope=0.71868458889427973, intercept=227.5951853339659, rvalue=0.258752071669993, pvalue=8.263097704087485e-20, stderr=0.13124270715070266)
Simply plug the number $x=56$ into the regression formula to get a prediction of $$0.719\times56 + 227.6 = 267.864.$$
The positive slope of the regression line indicates a positive relationship between the variables, which rules out the negative numbers. Given how scattered about the points are, I'd have to go with $0.2$ over $0.9$.
The outrageously small $p$-value of
pvalue=8.263097704087485e-20indicates that we certainly reject the null hypothesis of no relationship, in favor of the alternative hypothesis that there is a relationship between the variables.