Confidence intervals for proportions

Last week, we talked about confidence intervals for means coming from numerical data. Today, we'll do something quite similarfor proportions that arise from categorical data.

Like, last Wednesday's presentation, this is related to sections 5.1 and 5.2 of our text but follows that even more closely.

Sample proportions

Suppose we take a random sample of 100 North Carolinians and check whether they are left handed or right handed. If 13 of them are left handed, we would say that the proportion of them who are left handed is $13\%$. That $13\%$ is a sample proportion $\hat{p}$ that estimates the population proportion $p$.

Note that a proportion is a numerical quantity, even though the data is categorical. Thus, we can compute confidence intervals in a very similar way. Just as with sample means, the sampling process leads to a random variable and, if certain assumptions are met, then we can expect that random variable to be normally distributed.

Standard deviation for a proportion

One notable computational difference between finding confidence intervals for propritions as compared to those for means is how we find the underlying standard deviation. For numerical data, we simply estimate the population standard deviation with standard deviation for the sample. For a sample proportion, if we identify success (being left handed, for example) with a $1$ and failure as a $0$, then (as we know from our discussion of the binomial distribution) the resulting standard deviation is

$$\sigma = \sqrt{p(1-p)}.$$

It follows that the standard error is

$$SE = \frac{\sigma}{\sqrt{n}} = \sqrt{\frac{p(1-p)}{n}}.$$

Example

Suppose we draw a random sample of 132 people and find that 16 of them have blue eyes. Use this data to write down a $95\%$ confidence interval for the proportion of people with blue eyes.

Solution: We have $\hat{p}=16/132 \approx 0.1212$ and

$$SE(\hat{p}) = \sqrt{(16/132)\times(116/132)/132} \approx 0.02840718.$$

Thus, our confidence interval is

$$0.1212 \pm 2\times0.0284 = [0.0644, 0.178].$$

More on margin of error

If you read the details of political surveys, you're likely to come across the term "margin of error" at some point. Five Thirty Eight, for example, maintains a running Trump approval rating page. The page also points to poll details for a slew of polls. Check out the first one, namely the Gallup poll. There, we read "Daily results are based on telephone interviews with approximately 1,400 national adults; Margin of error is $\pm 3$ percentage points". What does that mean?

Definition

When we write a confidence interval as $$s \pm z^* \times SE,$$ Then, $z^* \times SE$ is the margin of error. Geometrically, it's the distance that the interval extends in either direction from the measured statistic $s$.

So, where's the $\pm 3$ come from?

Computation

Suppose we're writing down a confidence interval for a proportion. In this case, approve or disapprove. If the actual proportion is $p$ and our sample size is $n$, then the standard error is

$$\sqrt{\frac{p(1-p)}{n}}.$$

In our case, the take $n\approx 1500$. Furthermore the biggest that $p(1-p)$ can be is $1/4$. You can see this by taking a look at a graph:

Finishing the computation

In our example, the sample size was 1500. Thus, the standard error is at most

$$SE \leq \sqrt{\frac{1/4}{1500}} \approx 0.01290994.$$

Now, for a $95\%$ confidence interval, we take $z^* = 2$ so that our margin of error is at most

$$ME \leq 2\times0.01290994 \approx 0.026,$$

which is rounded up to 3 percentage points.

This is a common thing to shoot for in political polls, which is why you often see sample sizes close to 1500.

Examples

A hypothetical political race

In a political contest, where there are two candidates and victory requires a simple majority, a candidate likes to be more than 3 percentage points above $50\%$. Can you see why?

An actual political race

In early November of last year, FiveThirtyEight reported that Donald Trump was only 3.3 percentage points behind Hilary Clinton - almost within the margin of error. In fact, Clinton ended up winning the popular vote by $2.1\%$.

A poll on the first amendment

A recent poll by the Brookings Institute asks the following question of 1578 college students: "Is hate speech constitutionally protected?" Here are the results expressed as percentages:

Political Affiliation Type of College Gender
All Dem Rep Ind Public Private Female Male
Yes 39 39 44 40 38 43 31 51
No 44 41 39 44 44 44 49 38
Don’t know 16 15 17 17 17 13 21 11

Use this to write down a 95% confidence interval for the percentage of students who believe that hate speech is not constitutionally protected.

Solution for the first amendment poll

We have $\hat{p}=0.44$ yielding an estimate of the standard error of

$$SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.44\times0.56}{1578}}\approx0.0125.$$

Thus, our confidence interval is

$$0.44 \pm 2\times0.0125 = [0.415,0.456].$$

Choosing sample size

Recall that the margin of error generally depends on three things:

  • Confidence level,
  • underlying standard deviation, and
  • the sample size.

Sometimes, we require a specific confidence level and margin of error and, for a sample proportion, the underling standard deviation $\sqrt{p(1-p)}$ is never larger than $1/2$. Thus, we can always obtain the desired confidence level and margin of error by choosing sample size large enough.

The inequality

In order to choose the sample size, we simply set up the inequality

$$z^*\sqrt{\frac{p(1-p)}{n}} < ME,$$

where $z^*$ corresponds to the desired confidence level and $ME$ is the desired confidence level. Since $\sqrt{p(1-p)}<1/2$, this simplfies to

$$z^*\frac{1/2}{\sqrt{n}} < ME \: \text{ or } \: n>\frac{{z^*}^2}{4ME^2}.$$

Example

Suppose we wish to determine the percentage of voters who support our candidate. We'd like a $95\%$ level of confidence to $\pm2\%$ points. What sample size should guarantee this?

Simple solution: For a 95% level of confidence, we might take $z^*=2$ together with the given margin of error $ME=0.02$ to get

$$n>\frac{{z^*}^2}{4ME^2} = \frac{2^2}{2\times0.02^2} = 2500.$$

Thus, a pollster would probably be happy with $n=2500$ folks in the poll.

More precision

We can get a more precise (possibly smaller) sample size by using more precise estimates to $z^*$. In fact, for the homework, you need three digits of precision for your $z^*$ multiplier. For 90, 95, and 99%, these are

Conf:90%95%99%
$z^*$1.2821.9602.326

In the previous problem, we would have:

$$n>\frac{{z^*}^2}{4ME^2} = \frac{1.960^2}{2\times0.02^2} = 2400.$$

Thus, the HW would like to see 2401.