Students from the 2011 YRBSS (Youth Risk Behavior Surveillance System) lifted weights (or performed other strength training exercises) 3.09 days per week on average. We want to determine if the YRBSS sample data set provides strong evidence that YRBSS students selected in 2013 are lifting more or less than the 2011 YRBSS students, versus the other possibility that there has been no change.
We simplify these three options into two competing hypotheses:
If we flip a coin 10 times then, assuming the coin is fair, the probability of 10 straight heads is \(1/2^{10} \approx 0.0009765625\); you’d have cause to doubt the fairness of the coin.
In the context of statistical studies, we often use a normal distribution to compute p-values. If, in the case of the coin, we suspect the coin is weighted heads, we’d write.
The estimated probability of the observed using a normal distribution with mean \(\mu\) and standard deviation \(\sqrt{10}/2\) is
1 - pnorm(10,5,sqrt(10)/2)
## [1] 0.0007827011
The internet will happily tell you that we all slow down with age. Let’s test that using some data from the 2015 Peachtree road race. I’ve got a CSV file that contains the times for all 54796 non-professional runners. Let’s read it in and take a look:
df <- read.csv('https://www.marksmath.org/classes/Summer2017Stat185/data/peach_tree2015.csv')
dim(df)
## [1] 54796 11
library(knitr)
kable(head(df))
X | Div.Place | Name | Bib | Age | Place | Gender.Place | Clock.Time | Net.Time | Hometown | Gender |
---|---|---|---|---|---|---|---|---|---|---|
6451 | 1 | SCOTT OVERALL | 72 | 32 | 1 | 1 | 29.500 | 29.500 | SUTTON, UNITED KINGDOM | M |
6452 | 2 | BEN PAYNE | 74 | 33 | 2 | 2 | 29.517 | 29.517 | COLORADO SPRINGS, CO | M |
4092 | 1 | GRIFFITH GRAVES | 79 | 25 | 3 | 3 | 29.633 | 29.633 | BLOWING ROCK, NC | M |
4093 | 2 | SCOTT MACPHERSON | 87 | 28 | 4 | 4 | 29.800 | 29.783 | COLUMBIA, MO | M |
6453 | 3 | ELKANAH KIBET | 77 | 32 | 5 | 5 | 29.883 | 29.883 | FAYETTEVILLE, NC | M |
4094 | 3 | MATT LLANO | 71 | 26 | 6 | 6 | 30.200 | 30.200 | FLAGSTAFF, AZ | M |
Let’s grab a “young” subset of men between the ages of 35 and 40 and an “old” subset of men between the ages of 40 and 45.
men <- subset(df, Gender='M')
young <- subset(men, 35<=Age & Age<40)
old <- subset(men, 40<=Age & Age<45)
We’ll then select a random sample of size 100 from each age group and compute the sample means.
set.seed(1) # For reproducibility
young_times <- sample(young$Net.Time, 100)
old_times <- sample(old$Net.Time, 100)
mu_young = mean(young_times)
mu_old = mean(old_times)
c(mu_young,mu_old)
## [1] 73.41334 77.00703
Perhaps, we’re not surprised to see that the sample means satisfy mu_old > mu_young
- but is the result statistically significant or is likely just by chance?
Put another way, let \(\mu\) be the population mean of the old_times
. Our null and alternative hypotheses may be written symbolically:
Do we have sufficient evidence to reject \(H_0\)?
We use the \(p\)-value to explore this question. That is, we compute the probability that we could get the observed sample mean mu_old
or higher under the assumption that the times are normally distributed with mean mu_young
. Since we are investigating the distribution of the sample mean, we use the standard error as the standard deviation.
Before going to all this trouble, we should mention that (1) we have genuine random samples of (2) large enough size. While the data are a bit skew, it’s not so bad with a sample of size 100.
hist(old_times, 6)
Now, here’s the critical computation:
se = sd(old_times)/10
se
## [1] 2.175528
1 - pnorm(mu_old, mu_young, se)
## [1] 0.04928052
Thus, we (barely) reject the Null Hypotheses
Many data (often, categorical data) is more easily stated in terms of proportions, rather than in raw quantities. For example, we might be interested in the proportion of people who respond to a medical treatement, or the proportion of gun owners in a city, or the proportion of folks who are left handed.
In order to solve this, we need to understand the mean and standard deviation associated with proportions. Recall first, the mean and standard deviation associated with the binomial distribution.
Suppose the probability of a single trial being a success is \(p\). Then, the probability of observing exactly \(k\) successes in \(n\) independent trials is given by
\[{n\choose k}p^k(1-p)^{n-k} = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}.\]
Additionally, the mean, variance, and standard deviation of the number of observed successes are \[\begin{align} \mu &= np &\sigma^2 &= np(1-p) &\sigma &= \sqrt{np(1-p)} \end{align}\]Suppose that about 10% of people are left handed. A random sample of 211 people found that 29 were left handed. Does this data support the hypotheses that 10% of folks are left handed?
The first hypotheses test is one-sided; the second is two-sided.
The fundamental definition of a p-value is still the same: the probability that of obtaining the observed data or worse, under the assumption of the null hypotheses. In this problem, our null mean and standard deviation are \(0.1\) and \[\sqrt{0.1\times0.9/211} = 0.02065285.\] Our observed data is \(\hat{p} = 29/211\), which is larger than \(0.1\).
For the first, one-sided test, the p-value is
1 - pnorm(29/211, 0.1, sqrt(0.1*0.9/211))
## [1] 0.0349266
A this is smaller than one, we reject the null hypotheses. For the second, two-sided test, the p-value is twice this, thus we don’t reject the null hypotheses.