Here are the solutions to the practice problems for next week's midterm exam.

Min | 1st Q | Med | 3rd Q | Max |
---|---|---|---|---|

82 | 141 | 163 | 185 | 276 |

Your box-plot ought to look something like so:

Of course, it need not be quite so precise, but some sort of scale is essential.

To be clear, the various vertical hash-marks correspond to the following labels:

Of course, your version will be hand-drawn.

The CDC recently released the results of its National Health Interview Survey (NHIS). Data in this report come from the combined 2010-2015 NHIS, a large health survey of the U.S. population random sample of U.S. households. The main objective of NHIS is to monitor the health of the U.S. population. The data matrix below shows the first two rows of a simplified version of some of the data.

gender | age | height | weight | frequency | duration |
---|---|---|---|---|---|

F | 40 | 5.58 | 115 | 4 | 30 |

M | 54 | 5.8 | 160 | 6 | 60 |

- What type of study is this - observational study or controlled experiment?
- Identify the variables in the table and classify them as numerical or categorical.

This is an observational study, since it literally records observations using a survey. It's clearly not an experiment, since there's no control and treatment groups.

The variables are given by the column names. There types (presented in another table) are:

gender | age | height | weight | frequency | duration |
---|---|---|---|---|---|

categorical | numeric | numeric | numeric | numeric | numeric |

nominal | discrete | continuous | discrete | discrete | discrete |

The mean $\mu$ of $\{9,2,4,5\}$ is

$$\frac{9+2+4+5}{4} = \frac{20}{4}=5.$$Note that we are simply using the formula

$$\mu=\frac{x_1+x_2+\cdots+x_n}{n}.$$The sample standard deviation of $\{9,2,4,5\}$ is

$$\sqrt{\frac{(9-5)^2+(2-5)^2+(4-5)^2+(5-5)^2}{3}} = \sqrt{\frac{26}{3}} \approx 2.94392.$$We are now using the formula

$$\sigma = \sqrt{\frac{(x_1-\mu)^2 + (x_2-\mu)^2 + \cdots + (x_n-\mu)^2}{n-1}}.$$- Using the normal distribution rules of thumb, what is the percentile of a score of 700?
- Referring to a normal table, what is the percentile score of a score of 640?

By "the normal distribution rules of thumb", we mean the 68-95-99.7 rule as pictured here:

Since $700=500 + 2\times100$, a score of 700 is two standard deviations past the normal. Now our rules of thumb tell us that 95% of the population lies within two standard deviations from the normal. Geometrically, that corresponds to the fact that the shaded area under the standard normal in the figure below is about 0.95. Accounting for the white area to the left of the blue area (which is $0.05/2=0.025$), that yields an area of 0.975 to the left of $Z=2$ for the standard normal. Thus, 700 should be at percentile 97.5.

The $Z$-score for 640 is $$Z = \frac{640-500}{100} = 1.4$$ Looking up 1.4 in this normal table, we see that a score of 640 should put you at percentile 91.92.

Note: You'll be provided the table in the previous link during the exam. It's simply a static version of our interactive table.

Suppose a random sample of 100 people from a population produces an average weight of 165.84 with a standard deviation of 34.44. Use this data to write down a 95% confidence interval for the weights of people in the population.

Our solution should look something like $$ [\bar{x}-ME,\bar{x}+ME] = \left[ \bar{x} - z^*\times \frac{\sigma}{\sqrt{n}}, \bar{x} + z^*\times \frac{\sigma}{\sqrt{n}} \right], $$ where

- $\bar{x}=165.84$ is the measured average,
- $z^*=2$ is the $z^*$-multiplier for a 95% level of confidence,
- $\sigma=34.44$ is the standard deviation of the sample, and
- $n=100$ is the sample size.

Taking that all into account, our margin of error is: $$ME = z^* \times SE = 2 \times \frac{34.44}{\sqrt{100}} = 6.888.$$ Thus, our interval is: $$[165.85 - 6.888, 165.85 + 6.888] = [158.962, 172.738].$$

According to FiveThirtyEight, a recent poll of 1005 adults conducted by Ipsos for Reuters found an approval rating of 56% for Joe Biden. Use this data to construct a 95% confidence interval for Biden's approval rating.

Our solution should look something like $$ [\hat{p}-ME,\hat{p}+ME] = \left[ \hat{p} - z^*\times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p} + z^*\times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \right], $$ where

- $\hat{p}=0.54$ is the measured proportion,
- $z^*=2$ is the $z^*$-multiplier for a 95% level of confidence,
- $\sigma=\sqrt{p(1-p)}$ is the standard deviation computed from the proportion, and
- $n=100$ is the sample size.

We again use a $z^*$-multiplier of $z^*=2$ but we now have a different formulation of the standard error: $$ME = z^* \times SE = 2 \times \sqrt{\frac{p(1-p)}{n}} = 2\times\sqrt{\frac{0.56\times0.44}{1005}}\approx0.031316.$$ Thus, our interval is $$[0.56-0.031316,0.56+0.031316] = [0.528684, 0.591316].$$

I'd like to construct a poll to determine a confidence interval for Joe Biden's approval rating. If I'd like the margin of error to be $\pm2\%$, how large should my sample size be?

We again use a 95% level of confidence. Thus our margin of error is $$ME = z^* \sqrt{\frac{p(1-p)}{n}} < 2 \sqrt{\frac{1/4}{n}} \stackrel{?}{<}0.02.$$ Solving that last inequality for $n$, we find $$n>\frac{1}{4} \left(\frac{2}{0.02}\right)^2 = 2500.$$