Last time, we learned about Confidence intervals for means.
Today we’ll do something very similar with proportions
Recap on Means
We collect data on \(n\) individuals and compute some summary statistic \(\bar{x}\) of a numeric variable from that data set. The corresponding confidence interval has the form
\[[\bar{x} - ME, \bar{x} + ME],\]
where \(ME\) stands for the Margin of Error
Margin of error
Margin of error has the form
\[ME = z^* \times SE,\]
where \(z^*\), the z-star multiplier, is chosen from the standard normal table to yield the desired degree of confidence and
\[SE = \sigma/\sqrt{n}\]
denotes the standard error, which is the standard deviation of the underlying population and \(n\) is the sample size.
An computer based example
Suppose we’d like to use a small sample to estimate the average height of the 20000 people in our CDC data set. We could draw a sample (perhaps, of size 100) compute the mean, standard deviation, and standard error of the sample, and use all that to compute our confidence interval. The code to do so might like like so:
import pandas as pdcdc_data = pd.read_csv('https://marksmath.org/data/cdc.csv')m = cdc_data.height.mean()s = cdc_data.height.std()sample = cdc_data.sample(100)sm = sample.height.mean()ss = sample.height.std()se = ss/10{"population_mean": m, "sample_mean": sm, "margin_of_error": se, "confidence_interval": [sm -2*se, sm +2*se], "in_there": sm-2*se < m and m < sm+2*se}
Suppose we take a random sample of 100 North Carolinians and check whether they are left handed or right handed. If 13 of them are left handed, we would say that the proportion of them who are left handed is \(13\%\). That \(13\%\) is a sample proportion\(\hat{p}\) that estimates the population proportion\(p\).
Note that a proportion is a numerical quantity, even though the data is categorical. Thus, we can compute confidence intervals in a very similar way. Just as with sample means, the sampling process leads to a random variable and, if certain assumptions are met, then we can expect that random variable to be normally distributed.
Standard deviation for a proportion
One notable computational difference between finding confidence intervals for proportions as compared to those for means is how we find the underlying standard deviation. For numerical data, we simply estimate the population standard deviation with standard deviation for the sample.
For a sample proportion, if we identify success (being left handed, for example) with a \(1\) and failure as a \(0\), then the resulting standard deviation is
\[\sigma = \sqrt{p(1-p)}.\]
This is simply the standard deviation associated with one Bernouli trial
It follows that the standard deviation associated with \(n\) trials is
In the NC left/right handed example we have \[SE = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.13\times0.87}{100}} \approx 0.0336303.\]
Example
Suppose we draw a random sample of 132 people and find that 16 of them have blue eyes. Use this data to write down a \(95\%\) confidence interval for the proportion of people with blue eyes.
Solution: We have \(\hat{p}=16/132 \approx 0.1212\) and