This presentation discusses a technique to evaluate political polls in the context of an election. The general topic, called the difference of two proportions is discussed in of our text but without any political examples. The specific techniques presented here are discussed in this document from ABC News.
Most of our examples for inference thus far have involved a single mean or proportion but we are often interested in the comparison between two or more quantities. A simple example along these lines is provided by the so-called difference of two proportions.
There are a number of ways this type of computation can arise but one way of tremendous current interest is in an election poll.
FiveThirtyEight provides information on quite a lot of polls, including some that are specific to North Carolina. One such poll (performed by Redfield & Wilton Strategies and added on October 9 of 2020) was applied to 938 likely North Carolina voters and found that 49% of them planned to vote for Biden while only 44% planned to vote for Trump. How can we assess this data?
Suppose we think in terms of 95% confidence intervals for the two proportions. First, we need a $z^*$-multiplier:
from scipy.stats import norm
zz = norm.ppf(0.975)
zz
1.959963984540054
Of course, we could have used our normal calculator page.
Here's the 95% confidence interval for the proportion of Biden voters:
import numpy as np
n = 938
p1 = 0.49
se1 = np.sqrt(p1*(1-p1)/n)
[p1-zz*se1,p1+zz*se1]
[0.45800885384859663, 0.5219911461514033]
The problem is that Trump has a confidence interval as well, which might just overlap.
Here's the 95% confidence interval for the proportion of Trump voters:
p2 = 0.44
se2 = np.sqrt(p2*(1-p2)/n)
[p2-zz*se2,p2+zz*se2]
[0.4082336714348243, 0.4717663285651757]
Unfortunately, the right hand endpoint is just a little biger than the left hand endpoint for Biden. So what do we do?
I guess the real question we're interest in is whether or not $p_1>p_2$ - or, equivalently, $p_1-p_2>0$.
Two proportions that are linked in this fashion are called multinomial proportions. Their difference is a random variable and we can write down a hypothesis test for this situation that looks like:
$$ \begin{align} H_0: p_1-p_2 &= 0 \\ H_A: p_1-p_2 &> 0. \end{align} $$We need a standard error for the difference of two multinomial proportions, just as we always do when running a hyptothesis test. The formula for this case is a bit different from what we're used to though:
$$SE=\sqrt{\frac{(p_1+p_2)-(p_1-p_2)^2}{n}}.$$We can redefine everything we need and compute the standard error as follows:
import numpy as np
n = 938
p1 = 0.49
p2 = 0.44
se = np.sqrt(((p1+p2)-(p1-p2)**2)/n)
se
0.03144528534056026
Here's the test statistic (which is a general term including the concept of a Z-score). Note that we're really comparing the difference $p_1-p_2$ to zero; thus we get just $p_1-p_2$ in the numerator:
T = (p1-p2)/se
T
1.5900634851453106
Now that we have the test-statistic, we can compare it against the normal to compute a $p$-value:
from scipy.stats import norm
1-norm.cdf(T)
0.05591024783765053
Since this is greater than 0.1, we fail to reject the null hypothesis in favor of the alternative hypothesis that the proportion of Biden voters is larger the proportion of Trump voters.