Analyzing elections

This presentation discusses a technique to evaluate political polls in the context of an election. The general topic, called the difference of two proportions is discussed in of our text but without any political examples. The specific techniques presented here are discussed in this document from ABC News.

The difference of two proportions

Most of our examples for inference thus far have involved a single mean or proportion but we are often interested in the comparison between two or more quantities. A simple example along these lines is provided by the so-called difference of two proportions.

There are a number of ways this type of computation can arise but one way of tremendous current interest is in an election poll.

Example

FiveThirtyEight provides information on quite a lot of polls, including some that are specific to North Carolina. One such poll (performed by Redfield & Wilton Strategies and added on October 9 of 2020) was applied to 938 likely North Carolina voters and found that 49% of them planned to vote for Biden while only 44% planned to vote for Trump. How can we assess this data?

Over simple analysis using confidence intervals

Suppose we think in terms of 90% confidence intervals for the two proportions. First, we need a $z^*$-multiplier:

from scipy.stats import norm
zz = norm.ppf(0.95)
zz
1.6448536269514722

Of course, we could have used our normal calculator page.

Biden's confidence interval

Here's the 90% confidence interval for the proportion of Biden voters:

import numpy as np
n = 938
p1 = 0.49
se1 = np.sqrt(p1*(1-p1)/n)
[p1-zz*se1,p1+zz*se1]
[0.4631521838194292, 0.5168478161805707]

The problem is that Trump has a confidence interval as well, which might just overlap.

Trumps's confidence interval

Here's the 90% confidence interval for the proportion of Trump voters:

p2 = 0.44
se2 = np.sqrt(p2*(1-p2)/n)
[p2-zz*se2,p2+zz*se2]
[0.4133408566853727, 0.4666591433146273]

Unfortunately, the right hand endpoint is just a little biger than the left hand endpoint for Biden. So what do we do?

Dealing with multinomial proportions

I guess the real question we're interest in is whether or not $p_1>p_2$ - or, equivalently, $p_1-p_2>0$.

Two proportions that are linked in this fashion are called multinomial proportions. Their difference is a random variable and we can write down a hypothesis test for this situation that looks like:

$$ \begin{align} H_0: p_1-p_2 &= 0 \\ H_A: p_1-p_2 &> 0. \end{align} $$

Standard error for multinomial proportions

We need a standard error for the difference of two multinomial proportions, just as we always do when running a hyptothesis test. The formula for this case is a bit different from what we're used to though:

$$SE=\sqrt{\frac{(p_1+p_2)-(p_1-p_2)^2}{n}}.$$

Computing the standard error

We can redefine everything we need and compute the standard error as follows:

import numpy as np
n = 938
p1 = 0.49
p2 = 0.44
se = np.sqrt(((p1+p2)-(p1-p2)**2)/n)
se
0.03144528534056026

The test statistic

Here's the test statistic (which is a general term including the concept of a Z-score). Note that we're really comparing the difference $p_1-p_2$ to zero; thus we get just $p_1-p_2$ in the numerator:

T = (p1-p2)/se
T
1.5900634851453106

The $p$-value

Now that we have the test-statistic, we can compare it against the normal to compute a $p$-value:

from scipy.stats import norm
1-norm.cdf(T)
0.05591024783765053

Since this is less than 0.1, we reject the null hypothesis in favor of the alternative hypothesis that the proportion of Biden voters is larger the proportion of Trump voters.