Hypotheses Testing

Framework

The null hypotheses \(H_0\) represents a skeptical perspective or status quo
The alternative hypotheses \(H_A\) represents an alternative claim under consideration
We don’t reject the null hypotheses unless we have very evidence in favor of the alternative hypotheses.
In the case of numerical variables, this evidence is often stated in terms of confidence intervals

Example

Students from the 2011 YRBSS (Youth Risk Behavior Surveillance System) lifted weights (or performed other strength training exercises) 3.09 days per week on average. We want to determine if the YRBSS sample data set provides strong evidence that YRBSS students selected in 2013 are lifting more or less than the 2011 YRBSS students, versus the other possibility that there has been no change.

We simplify these three options into two competing hypotheses:

\(H_0\): The average days per week that YRBSS students lifted weights was the same for 2011 and 2013.
\(H_A\): The average days per week that YRBSS students lifted weights was different for 2013 than in 2011.

Confidence intervals

Denote the average days per week that YRBSS students lifted weights in 2011 by \(\mu_{11}\) - also known as the null value.
Denote the average days per week that YRBSS students lifted weights in 2013 \(\mu_{13}\)
To reject the null hypotheses, we would need to find a confidence interval for \(\mu_{13}\) that does not contain \(\mu_{11}\).

Pushing the example further

Suppose \(\mu_{11}\) is known to be 3.09
To reject \(H_0\), we need a confidence interval for \(\mu_{11}\) that doesn’t contain 3.09.
Suppose our sample of 100 students from the 2013 YRBSS survey, has an average of \(\bar{x} = 2.78\) days with a standard deviation of \(s = 2.56\) days.
General confidence interval: \[\bar{x} \pm z^{*} SE_{\bar{x}}\]
If we’d like a 95% confidence interval, we take \(z^{*} = 1.96\).
The standard error is \[SE_{\bar{x}} = \frac{s_{13}}{\sqrt{n}}= \frac{2.56}{100}=0.256.\]
The confidence interval is \((2.27,3.29)\)
We do not reject the null hypotheses

Types of errors

Type 1: rejecting the null hypothesis when it is actually true
Type 2: accepting the null hypothesis when it is actually false

Significance levels

The p-value is a concrete measure of the strength of the evidence against the null hypotheses.
Formally, it is the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true.

Example - Coin flipping

If we flip a coin 10 times then, assuming the coin is fair, the probability of 10 straight heads is \(1/2^{10} \approx 0.0009765625\); you’d have cause to doubt the fairness of the coin.

In the context of statistical studies, we often use a normal distribution to compute p-values. If, in the case of the coin, we suspect the coin is weighted heads, we’d write.

\(H_0\): \(\mu=5\) (the expected number of heads in 10 flips)
\(H_A\): \(\mu>5\)

The estimated probability of the observed using a normal distribution with mean \(\mu\) and standard deviation \(\sqrt{10}/2\) is

1 - pnorm(10,5,sqrt(10)/2)

## [1] 0.0007827011

Conditions to check for normality

Random sample
Need less than 10% of population for independence
Large enough
Typically, at least 30

Another example

The internet will happily tell you that we all slow down with age. Let’s test that using some data from the 2015 Peachtree road race. I’ve got a CSV file that contains the times for all 54796 non-professional runners. Let’s read it in and take a look:

df <- read.csv('https://www.marksmath.org/classes/Summer2017Stat185/data/peach_tree2015.csv')
dim(df)

## [1] 54796    11

library(knitr)
kable(head(df))

X	Div.Place	Name	Bib	Age	Place	Gender.Place	Clock.Time	Net.Time	Hometown	Gender
6451	1	SCOTT OVERALL	72	32	1	1	29.500	29.500	SUTTON, UNITED KINGDOM	M
6452	2	BEN PAYNE	74	33	2	2	29.517	29.517	COLORADO SPRINGS, CO	M
4092	1	GRIFFITH GRAVES	79	25	3	3	29.633	29.633	BLOWING ROCK, NC	M
4093	2	SCOTT MACPHERSON	87	28	4	4	29.800	29.783	COLUMBIA, MO	M
6453	3	ELKANAH KIBET	77	32	5	5	29.883	29.883	FAYETTEVILLE, NC	M
4094	3	MATT LLANO	71	26	6	6	30.200	30.200	FLAGSTAFF, AZ	M

Let’s grab a “young” subset of men between the ages of 35 and 40 and an “old” subset of men between the ages of 40 and 45.

men <- subset(df, Gender='M')
young <- subset(men, 35<=Age & Age<40)
old <- subset(men, 40<=Age & Age<45)

We’ll then select a random sample of size 100 from each age group and compute the sample means.

set.seed(1) # For reproducibility
young_times <- sample(young$Net.Time, 100)
old_times <- sample(old$Net.Time, 100)

mu_young = mean(young_times)
mu_old = mean(old_times)
c(mu_young,mu_old)

## [1] 73.41334 77.00703

Perhaps, we’re not surprised to see that the sample means satisfy mu_old > mu_young - but is the result statistically significant or is likely just by chance?

Put another way, let \(\mu\) be the population mean of the old_times. Our null and alternative hypotheses may be written symbolically:

\(H_0\): \(\mu=73.41334\)
\(H_A\): \(\mu>73.41334\)

Do we have sufficient evidence to reject \(H_0\)?

We use the \(p\)-value to explore this question. That is, we compute the probability that we could get the observed sample mean mu_old or higher under the assumption that the times are normally distributed with mean mu_young. Since we are investigating the distribution of the sample mean, we use the standard error as the standard deviation.

Before going to all this trouble, we should mention that (1) we have genuine random samples of (2) large enough size. While the data are a bit skew, it’s not so bad with a sample of size 100.

hist(old_times, 6)

Now, here’s the critical computation:

se = sd(old_times)/10
se

## [1] 2.175528

1 - pnorm(mu_old, mu_young, se)

## [1] 0.04928052

Thus, we (barely) reject the Null Hypotheses

Sample proportions

Many data (often, categorical data) is more easily stated in terms of proportions, rather than in raw quantities. For example, we might be interested in the proportion of people who respond to a medical treatement, or the proportion of gun owners in a city, or the proportion of folks who are left handed.

In order to solve this, we need to understand the mean and standard deviation associated with proportions. Recall first, the mean and standard deviation associated with the binomial distribution.

Suppose the probability of a single trial being a success is \(p\). Then, the probability of observing exactly \(k\) successes in \(n\) independent trials is given by

\[{n\choose k}p^k(1-p)^{n-k} = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}.\]

Additionally, the mean, variance, and standard deviation of the number of observed successes are \[\begin{align} \mu &= np &\sigma^2 &= np(1-p) &\sigma &= \sqrt{np(1-p)} \end{align}\]

Now, if \(X\) is a random variable that tells us raw quantity, then \[\hat{p} = \frac{X}{n}\] is random variable that tells us proportion. To get the mean and standard deviation of \(\hat{p}\), we just divide the mean and standard deviation of \(X\) through by \(n\). Thus, \[\begin{align} \mu &= p &\sigma^2 &= p(1-p)/n &\sigma &= \sqrt{p(1-p)/n} \end{align}\]

Example

Suppose that about 10% of people are left handed. A random sample of 211 people found that 29 were left handed. Does this data support the hypotheses that 10% of folks are left handed?

Does this data support the null hypotheses that 10% of the population is left handed?
Does this data supoort the alternative hypotheses that more than 10% of the populationis left handed?
Does this data supoort the alternative hypotheses that 10% of the population is not left handed?

There are basically two problems here. In both, we must compare the null hypotheses to one of the two alternative hypotheses. Written symbollically, our null and alternative hypotheses are \[\begin{align} H_0 : \hat{p}=0.1 \\ H_A : \hat{p} > 0.1 \end{align}\] or \[\begin{align} H_0 : \hat{p}=0.1 \\ H_A : \hat{p} \neq 0.1 \end{align}\]

The first hypotheses test is one-sided; the second is two-sided.

The fundamental definition of a p-value is still the same: the probability that of obtaining the observed data or worse, under the assumption of the null hypotheses. In this problem, our null mean and standard deviation are \(0.1\) and \[\sqrt{0.1\times0.9/211} = 0.02065285.\] Our observed data is \(\hat{p} = 29/211\), which is larger than \(0.1\).

For the first, one-sided test, the p-value is

1 - pnorm(29/211, 0.1, sqrt(0.1*0.9/211))

## [1] 0.0349266

A this is smaller than one, we reject the null hypotheses. For the second, two-sided test, the p-value is twice this, thus we don’t reject the null hypotheses.