Often in statistics, we want to answer a simple yes or no question.

The idea behind hypothesis testing is to

  1. Clearly state the question we are trying to answer in terms of two competing hypotheses and
  2. Assess the likelihood of the two hypotheses in light of data that’s been collected.

The two competing statements in a hypothesis test are typically called the null hypothesis and the alternative hypothesis.

A silly coin flipping example

Suppose I’ve got a coin that I tell you is fair. We then flip the coin 10 times and it comes up heads every time. You might feel you have cause to doubt my claim.

We can test the claim by first articulating the hypotheses:

Often, we’ll try to make the hypotheses more quantitative. For example, if we let \(p\) denote the probability of getting a head, then we could rephrase those statements as

We now examine the null hypothesis in light of the observed data, namely we got 10 heads and no tails. Assuming the coin is fair, the probability of that happenning is \(1/2^{10} \approx 0.0009765625\).

This computation represents a \(p\)-value - that is, the probability of generating the data as least as favorable to the alternative hypothesis under the assumption of the null hypothesis.

The computation of the \(p\)-value above was relatively easy, since we had a simple coin flipping example. In the context of statistical studies, we often use a normal distribution to compute \(p\)-values.

However we compute the \(p\)-value, a very small \(p\)-value represents evidence against the null hypothesis and in favor of the alternative hypothesis. A commonly used threshold for the \(p\)-value is \(0.05\) indicating a \(95\%\) level of confidence that we should reject the null hypothesis.

A normal example

According to Wikipedia, around 10% of the population is left handed. A random sample of 211 people found that 29 were left handed. Does this data support the Wikipedia’s estimate?


Note the distinction between the two versions of the alternative hypotheses. The first is called a one sided hypothesis and the second is called a two sided hypothesis.

Thus, there are basically two problems here. In both, we must compare the null hypotheses to one of the two alternative hypotheses. Written symbollically, our null and alternative hypotheses are

\[\begin{align} H_0 : \hat{p}=0.1 \\ H_A : \hat{p} > 0.1 \end{align}\] or \[\begin{align} H_0 : \hat{p}=0.1 \\ H_A : \hat{p} \neq 0.1 \end{align}\]

The first hypotheses test is one-sided; the second is two-sided.

The fundamental definition of a p-value is still the same: the probability that of obtaining the observed data or worse, under the assumption of the null hypotheses. In this problem, our null mean and standard deviation are \(0.1\) and \[\sqrt{0.1\times0.9/211} = 0.02065285.\] Our observed data is \(\hat{p} = 29/211\), which is larger than \(0.1\).

For the first, one-sided test, the p-value is

1 - pnorm(29/211, 0.1, sqrt(0.1*0.9/211))
## [1] 0.0349266

As this is smaller than \(0.05\), we reject the null hypotheses. For the second, two-sided test, the p-value is twice this, thus we don’t reject the null hypotheses.