An archived instance of Mark's Discourse site as of Tuesday July 18, 2017.

ANOVA for Peachtree times

mark

(5 points)

Recall the Peachtree road race data that we have for 2010. You can read it into an R dataframe like so:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Select a random age $n$ between 25 and 55 and run an ANOVA test to see if runners speeds change over the ages from $n$ to $n+5$. Thus, in our last example on our ANOVA outline page, the ages variable might be generated by

set.seed(987654321)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
ejoy90

Input Data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

set.seed(*********)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages:

##Out:
ages
[1] 44 45 46 47 48 49a

start_age
[1] 44

Table:

##Out:
Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5    2807  561.38  1.2234 0.2951
Residuals 7897 3623723  458.87

Conclusion:

Since the p-value is larger than 0.05 we cannot reject the null hypothesis.

wolfpack77

First, read in the data set and input code:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(121212123)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

List of ages:

ages
#out
[1] 27 28 29 30 31 32

Results:

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5    3428  685.56  1.6844 0.1346
Residuals 6977 2839643  407.00

Conclusion:

Since the p-value is greater than 0.05, we cannot reject the null hypothesis that runners speeds don't change over time.

Bellaj

First enter data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Select Random age n

set.seed(STUDENT ID)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)

Ages:

37,38,39,40,41,42

Start age:

37

ANOVA model:

df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Results

Response: Net.Time
                  Df  Sum Sq Mean Sq F value  Pr(>F)  
 Age            5    5244 1048.75  2.4319 0.03276 *
 Residuals      7668 3306751  431.24

We can reject the null because the P value is less than 0.05

Jenna

Enter Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

set.seed(MY_ID)
start_age = sample(25:55, 1) 
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

##Out:

ages
[1] 52 53 54 55 56 57

Results

##Out: 
Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value    Pr(>F)    
Age          5   10503 2100.53  4.4858 0.0004406 ***
Residuals 6215 2910261  468.26                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can reject the null hypothesis

Amelia

Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(ID)
start_age = sample(25:50, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

ages
#out
[1] 27 28 29 30 31 32

start_age
out#
[1] 27

Table

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5    3428  685.56  1.6844 0.1346
Residuals 6977 2839643  407.00

p value is larger than 0.05, cannot reject the null

Sarcasticswimmer

Input

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

set.seed(#########)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)

ages
[1] 36 37 38 39 40 41

df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Out

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value  Pr(>F)  
Age     5    5106 1021.29  2.3648 0.03738 *
Residuals 7552 3261540  431.88                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

We reject the null hypothesis as the p-value is under 0.5

Alison

First, read in the data set and input code:

 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
> set.seed(ID #)
> start_age = sample(30:35, 1)
> ages = start_age:(start_age+5)
> df2 = subset(df, Age %in% ages)
> df2$Age = factor(df2$Age, labels = ages)
> modl = lm(Net.Time ~ Age, data = df2)
> anova(modl)

Ages:

ages
[1] 35 36 37 38 39 40

Results:

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          5    6290 1257.98  2.9256 0.01213 *
Residuals 7577 3258059  429.99                  

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, we can reject the null hypothesis since the p value is less than .05

YOU_SHALL_NEVER_KNOW

Grab Some Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Let's Set An Age Range

set.seed(BLAH_BLAH_BLAH)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)

Our Ages Are

ages

##out
## [1] 34 35 36 37 38 39

Set up That ANOVA

df2=subset(df,Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)

anova(modl)

Which Turns Out as Such

## Analysis of Variance Table

## Response: Net.Time
##            Df  Sum Sq Mean Sq F value  Pr(>F)  
## Age          5    4231  846.25  1.9653 0.08043 .
## Residuals 7541 3247099  430.59                  
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Dat P-Value

0.08043

We accept the null; runners' speed don't seem to chance over time

Prestonw

First, Input the data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(930335010)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

ages
#out
[1] 43 44 45 46 47 48

start_age
#out
[1] 43

Results:

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          5    4532  906.48  2.0081 0.07422 .
Residuals 7999 3610930  451.42                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Therefore, we reject the null hypothesis.

Sierra

Read in data and input code for ages:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(141414141)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
ages

## Output:
## [1] 42 43 44 45 46 47

Set up and run an ANOVA:

df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

## Output:
## Analysis of Variance Table

## Response: Net.Time
##             Df  Sum Sq Mean Sq F value Pr(>F)
## Age          5    2853  570.66  1.2971 0.2619
## Residuals 8071 3550934  439.96

Thus, since the p-value is greater than 0.05, the null hypothesis cannot be rejected.

PaulWall

Enter Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(ID)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

ages
##Out [1] 32 33 34 35 36 37

start_age
##Out [1] 32

Results

##Out:
Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5     843  168.51  0.3954 0.8523
Residuals 7581 3230426  426.12

The p-value is way bigger than 0.05, so we do not reject null hypothesis.

Kristian

Here's what I did, I input this data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(student ID)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
ages

Ages:

53, 54, 55, 56, 57, 58

Start Age:

53

Then I put this date to find my ANOVA:

df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

And the output was:

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value   Pr(>F)   
Age          5    8708 1741.56   3.684 0.002489 **
Residuals 5866 2773068  472.74

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion:

My P-Value is 0.002489, we can note that it is smaller than 0.05, therefore we can reject the null hypothesis.

monehish

Input data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(930XXXXXX)
start_age = sample(25:55, 1)
ages = start_age: (start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

55 56 57 58 59 60

Starting Age

55

Results:

Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq F value    Pr(>F)    
Age          5   13361  2672.1  5.5129 4.565e-05 ***
Residuals 4941 2394900   484.7

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

p-value = 4.565e-05

The p-value is < .05 so we would reject the null hypothesis.

kd95
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(930111111)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

[1] 54 55 56 57 58 59

Analysis

Analysis of Variance Table

Response: Net.Time
             Df  Sum Sq Mean Sq F value    Pr(>F)    
Age          5   18669  3733.7  7.8169 2.426e-07 ***
Residuals 5389 2574033   477.6                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion
Since the p-value is much lower than 0.05, we can reject the null hypothesis that runners' times do not change over the age range 54-59.

FRD

Data-frame parameters

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(896745231)
start_age = sample(25:50, 1)
ages = start_age:(start_age+5)

Ages

ages
**[1] 37 38 39 40 41 42
start_age
**[1] 37


Generating Data-frame 2 and the Linear Model

df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)

ANOVA Table

anova(modl)
**Analysis of Variance Table

**Response: Net.Time
**            Df  Sum Sq Mean Sq F value  Pr(>F)  
**Age          5    5244 1048.75  2.4319 0.03276 *
**Residuals 7668 3306751  431.24                  
**---
**Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion

We reject the null hypothesis with a 95% confidence interval

Andy

Our null hypothesis =

Runners speed changes over time

Input data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(930357379)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Ages

ages
[1] 40 41 42 43 44 45

Results

Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          5    4097  819.48  1.8605 0.09773 .
Residuals 8088 3562558  440.47                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Our p-value (0.09773) is greater than our 95% confidence interval (0.05), so we cannot reject the null. :worried:

Nonamaker

Get Data

df = read.csv('https://www.marksmath.org/data/
peach_tree2015.csv')
set.seed(*********)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age,labels=ages)
modl=lm(Net.Time~Age,data=df2)
anova(modl)

Ages

ages

[1] 49 50 51 52 53 54

start_age

[1] 49


Table

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value 
Pr(>F)
Age          5    3350  669.99  1.4283 
0.2105
Residuals 7134 3346496  469.09

Results

p > .05, therefore null hypothesis cannot be disproven.