An archive the questions from Mark's Summer 2018 Stat 185.

ANOVA for Peachtree times

mark

(10 points)

Recall the Peachtree road race data that we have for 2010. You can read it into an R dataframe like so:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Select a random age n between 25 and 55 and run an ANOVA test to see if runners speeds change over the ages from n to n+5. Thus, in our last example on our ANOVA outline page, the ages variable might be generated by

set.seed(0)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
jthomps6

Here is my code for the ANOVA Test.

Code

df=read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
ages = 25:55
df2 = subset(df, Age %in% ages)

df2$Age = factor(df2$Age, labels = ages)
mod1 = lm(Net.Time ~ Age, data = df2)
anova(mod1)

Output

    Analysis of Variance Table

Response: Net.Time
             Df   Sum Sq Mean Sq F value    Pr(>F)    
Age          30   204276  6809.2  15.483 < 2.2e-16 ***
Residuals 37836 16639296   439.8                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion
H_o: Runners speed does not change over age.
H_A: Runners speed does change over age.

With this given p-value the null hypothesis is rejected. The the speed of the runners do change over age.

philycheesestk

Here is how I did things:

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(26)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

That code gave me this summary table:

Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq  F value  Pr(>F)   
Age          5    6474 1294.85  3.1188 0.008154 **
Residuals 6558 2722702  415.17                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion:

Ho: μ1=μ2
Ha: at least two μ’s are different

calculated p-value: 0.008154

At a 95% level of confidence, we are able to reject the null.

jgilfill

Code:

 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
 set.seed(3)
 start_age = sample(25:55, 1)
 ages = start_age:(start_age+5)
 ages = 25:35
 df2 = subset(df, Age %in% ages)
 df2$Age = factor(df2$Age, labels = ages)
 modl = lm(Net.Time ~ Age, data = df2)
 anova(modl)

Response:

Analysis of Variance Table

Response: Net.Time

                    Df     Sum    Sq Mean  Sq F value  Pr(>F) 
  Age               10    8808     880.83      2.0892 0.02194
  Residuals      12821 5405583     421.62                 

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*

Conclusion:

Since the calculated p-value is 0.02194, we are at a 95% confidence level to reject the null hypothesis.

ktaylor4

Input

df=read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(21)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
runners=subset(df,Age %in%ages)
runners$Age=factor(runners$Age, labels = ages)
runners.mod1= lm(Net.Time~Age,data = runners)
anova(runners.mod1)

Output

Analysis of Variance Table

Response: Net.Time
              Df  Sum Sq   Mean Sq  F value Pr(>F)
Age           5    3350    669.99   1.4283 0.2105
Residuals     7134 3346496 469.09    

Conclusion
H_0: μ_1=μ_2 Running speed stays the same over the age span
H_a: u_1 ≠ u_2 Running speeds will not stay the same over the age span.

P-value of 0.2105 is > 0.05. I fail to reject the null hypothesis at the 95% confidence level.

AlexisBrandt
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(1)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
ages = 25:35
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Analysis of Variance Table

Response: Net.Time
         Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          10    8808  880.83  2.0892 0.02194 *
Residuals 12821 5405583  421.62                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

With a p-value of .02194 and a 95% confidence level, I reject the null.

robin

Code Used

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(14)
start_age = sample(27:48, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
mod1 = lm(Net.Time ~ Age, data = df2)
anova(mod1)

Output

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5     843  168.51  0.3954 0.8523
Residuals 7581 3230426  426.12

Interpretation

There is no difference in speed from n to n+5, p-value 0.8523. Therefore professor will not be any slower next year.

KBiehler1

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(9)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
mod1 = lm(Net.Time ~ Age, data = df2)
anova(mod1)

OutPut

Analysis of Variance Table

Response: Net.Time
     Df  Sum Sq Mean Sq F value Pr(>F)
Age   5     802  160.30  0.3759 0.8656
Residuals 7470 3185914  426.49               

Interpretation
At a 95% confidence level with a p-value of 0.8656 we would fail to reject the null hypothesis.

albeatty
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(32)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 <- subset(df, Age %in% ages)
df2$Age <- factor(df2$Age, labels = ages)
df2.mod1 <- lm(Net.Time ~ Age, data = df2)
anova(df2.mod1)
Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          5    4097  819.48  1.8605 0.09773 .
Residuals 8088 3562558  440.47                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

conclusion

with a p-value of 0.09773 we can reject the null hypothesis and confirm that runners’ times do change from ages 32 to 37

Henry

code:

 set.seed(0)
 start_age = 36
 ages = start_age:(start_age+5)
 set2 = subset(df, age %in% ages)

 set2 = subset(df, Age %in% ages)
 set2$Age = factor(set2$Age, labels = ages)
mod = lm(Net.Time ~ Age,data= set2)
anova(mod)

Return:

Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq F value  Pr(>F)  
Age          5    5106 1021.29  2.3648 0.03738 *
Residuals 7552 3261540  431.88                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

H_O: μ_1=μ_2
H_A:μ_1 ≠ μ_2

At a 95% confidence level, we reject the null hypothesis due to a p-value of 0.03738

Lumpyhead00
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(12)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels =ages)modl = lm(Net.Time ~ Age, data =    df2)anova(modl)

Output:

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value    Pr(>F)
Age          5    3339  667.70   1.609     0.1539
Residuals 7195 2985803  414.98  

My P-value is 0.1539 is small enough to accept the null.

jmahan

Code

 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(13)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Output

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value Pr(>F)
Age          5    2087  417.32  0.8947 0.4835
Residuals 7415 3458485  466.42   

Statement of Hypothesis

H_0: μ_1=μ_2
H_a: At least 2 of μ's are different

The output of the code generated a P-value of 0.4835.

0.4835 > 0.05

So at a 95% confidence level we fail to reject the null hypothesis.

mdavis9
**Code**
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

set.seed(11)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
mod1 = lm(Net.Time ~ Age, data = df2)
anova(mod1)
Analysis of Variance Table

Response: Net.Time
            Df  Sum Sq Mean Sq F value Pr(>F)
Age          5     650  129.95  0.3005 0.9128
Residuals 7618 3294407  432.45
**Hypothesis**
That over time there speed did not change.

Conclusion
With the given P-value the hypothesis is rejected.

mmealie

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
set.seed(28)
start_age = sample(25:55, 1)
ages = start_age:(start_age+5)
df2 = subset(df, Age %in% ages)
df2$Age = factor(df2$Age, labels = ages)
modl = lm(Net.Time ~ Age, data = df2)
anova(modl)

Output

Analysis of Variance Table

Response: Net.Time
        Df  Sum Sq Mean Sq F value   Pr(>F)   
Age          5    6474 1294.85  3.1188 0.008154 **
Residuals 6558 2722702  415.17                    
 ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion
H_0 = μ_1=μ_2
H_a = At least two μ's are different.

p-value is less than .05, therefore we reject the null.

mark

With a 95% confidence level??

mark

Keep in mind - the smaller the p-value, the more likely you are to reject the null. :slight_smile:

mark

Which hypothesis? There’s more than one. :slight_smile: