An archive the questions from Mark's Summer 2018 Stat 185.

Peachtree linear regression

mark

Use the following code to grab a sample from our Peachtree data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(0) # Your seed is set to your position in the Groovy Class Randomizer
dfs = df2[sample(length(df2$Age), 200),]

Then, run a linear regression on your sample and interpret. Specifically:

  • What is the linear model relating Net Time to Age?
  • Does your model look good? (i.e., should you reject the Null at 95%?)
  • How fast does your model predict that your spectacular, 54 year old statistics professor be next year???
jthomps6

Here is how I interpreted the data.

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(6) # Your seed is set to your position in the Groovy Class Randomizer
dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Response

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
    Min      1Q  Median      3Q     Max 
-32.451 -17.052  -6.045  12.129  86.536 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  57.6166     6.7235   8.569 2.94e-15 ***
Age           0.3442     0.1379   2.496   0.0134 *  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.99 on 198 degrees of freedom
Multiple R-squared:  0.03049,	Adjusted R-squared:  0.0256 
F-statistic: 6.228 on 1 and 198 DF,  p-value: 0.01339

Rplot

Interpretation

The model does look good with the comparison to age and the net time of the males who are greater than 25 years of age.

The estimated time for a 54 year old statistics teacher would be 76.2 next year.

P-Value = 0.0134

jgilfill

Code:

          df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
          df2 = subset(df, Gender == "M" & Age > 25)
          set.seed(3) # Your seed is set to your position in the Groovy Class Randomizer
          dfs = df2[sample(length(df2$Age), 200),]
          plot.reg = lm(Net.Time~Age, data = dfs)
          summary(plot.reg)

Response:

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min 1Q Median 3Q Max
-31.617 -13.933 -3.666 11.684 62.310

       Coefficients:
                   Estimate     Std. Error t value    Pr(>|t|)
       (Intercept) 57.6962      4.9084     11.755     < 2e-16 ***
       Age         0.2819       0.1050     2.684      0.00788 **

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.24 on 198 degrees of freedom
Multiple R-squared: 0.03512, Adjusted R-squared: 0.03024
F-statistic: 7.206 on 1 and 198 DF, p-value: 0.00788

taken from y=mx + b
y= 0.2819x + 57.6962

Mark’s Time

t=0.2818(54)+57.6962

t= 72.9134 or t= 72.91 minutes / he’s much slower

since

male times = 0.2819(age) + b

Code:

  plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
  abline(reg = plot.reg, col = "black")

image

p-value: 0.00788

A relative small p-value represents a genuine relationship that males over 25 years of age have slower times. We would reject the null that states there’s not a relationship between age and times at a significance level of 0.05 since the p-value<0.05.

KBiehler1

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(9)
dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Output

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-33.322 -15.939  -5.021  10.201  66.245 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  52.1552     6.1718   8.451 6.22e-15 ***
Age           0.4235     0.1251   3.386 0.000854 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.59 on 198 degrees of freedom
Multiple R-squared:  0.05475,	Adjusted R-squared:  0.04997 
F-statistic: 11.47 on 1 and 198 DF,  p-value: 0.0008541

Rplot

Interpretation

The plot looks good and the small p-value of 0.000854 represents a genuine relationship between the two variables (length of time, and age over 25 years)

the estimated time for a 54 year old statistics runner would be 75.0242. I used the formula y=0.4235(54)+52.1552

philycheesestk

Here is how I did things.

First I input my data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(10) # Your seed is set to your position in the Groovy Class Randomizer
dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

That summary command gave me this response:

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
   Min      1Q  Median      3Q     Max 
28.797 -13.917  -3.970   9.074  75.943 

Coefficients:
             Estimate  Std. Error t-value Pr(>|t|)    
(Intercept)  48.5130     5.2874   9.175  < 2e-16 ***
Age           0.4469     0.1084   4.124 5.47e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.2 on 198 degrees of freedom
Multiple R-squared:  0.07912,   Adjusted R-squared:  0.07447 
F-statistic: 17.01 on 1 and 198 DF,  p-value: 5.466e-05

I then used this code to generate a model “plot” for the data:

plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Interpretation

The model does look strong for comparing age and the times of the males who are older than 25 years old.

p-value = 5.466e-05

We can reject the null at a 95% level of confidence.

The model predicts that out amazing 54 year old statistics professor, will be likely to have his times be:

0.4469(54) + 48.5130 = 72.65
AlexisBrandt
 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
 df2 = subset(df, Gender == "M" & Age > 25)
set.seed(1)
dfs = df2[sample(length(df2$Age), 200),]
plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.628 -14.623  -4.204  10.167  68.195 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  59.2572     5.5759  10.627   <2e-16 ***
Age           0.2743     0.1170   2.343   0.0201 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.71 on 198 degrees of freedom
Multiple R-squared:  0.02699,	Adjusted R-squared:  0.02207 
F-statistic: 5.491 on 1 and 198 DF,  p-value: 0.0201

y=.2743x + 59.2572

Moderate positive correlation
p- value = .0201

Net Time at 54 would be 74.0694

mdavis9
**Code**
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

df2 = subset(df, Gender == "M" & Age > 25)

set.seed(11) # Your seed is set to your position in the Groovy Class Randomizer

dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)

 summary(plot.reg)

**Response**
Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
    Min      1Q  Median      3Q     Max 
-32.542 -12.360  -3.977   8.637  68.203 

Coefficients:
            Estimate Std. Error t value
(Intercept)  59.6804     5.2066  11.463
Age           0.1905     0.1105   1.725
            Pr(>|t|)    
(Intercept)   <2e-16 ***
Age           0.0861 .  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.94 on 198 degrees of 
freedom
Multiple R-squared:  0.0148,	Adjusted R- 

squared: 0.009825
F-statistic: 2.975 on 1 and 198 DF, p-value:
0.08614
Rplot

The P-Value is 0.08614 The model looks moderately linear.

Your time would be 0.1905(54)+59.6804= 69.96
robin

Code Used

 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(14)
dfs = df2[sample(length(df2$Age), 200), ]
peach_tree.fit = lm(Net.Time~Age, data = dfs)
summary(peach_tree.fit)
plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = peach_tree.fit)

Output

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-31.849 -13.728  -4.374   9.851  78.425 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  56.9369     5.3310  10.680  < 2e-16 ***
Age           0.3081     0.1121   2.748  0.00656 ** 

Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.82 on 198 degrees of freedom
Multiple R-squared:  0.03673,	Adjusted R-squared:  0.03186 
F-statistic: 7.549 on 1 and 198 DF,  p-value: 0.006559

peachtree%202015

Interpretation

The linear model is Y = 0.3081X + 56.9369.
The p-value is 0.00656, indicating correlation between age and time.
The estimated time for a 54 year old is 73.5743, the estimated time for a 55 year old is 73.8824. Less than 1/2 of a minute difference.

albeatty
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(2)
dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-34.644 -15.289  -6.054  10.118  58.652 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  52.1785     5.6388   9.254  < 2e-16 ***
Age           0.4258     0.1193   3.569  0.00045 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.27 on 198 degrees of freedom
Multiple R-squared:  0.06043,	Adjusted R-squared:  
0.05568 
F-statistic: 12.73 on 1 and 198 DF,  p-value: 0.0004503

plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = 
"Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Rplot

  • It’s a nice positive linear model!
  • p-value = 0.00045 (would reject the null at 95%)
  • I don’t know how old you are, but you don’t need to put yourself down with such self-degrading language, man.
  • Now that I know you are 54, I can tell you your net time is probably 75.1717 this year and will be 75.5975 next year.
KBC2019

Code Used:
df2 = subset(df, Gender == “M” & Age > 25)
set.seed(7)
dfs = df2[sample(length(df2$Age), 200),]
df2$Age = factor(df2$Age, labels = ages)

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min 1Q Median 3Q Max
-32.542 -12.360 -3.977 8.637 68.203

Coefficients:
Estimate Std. Error t value
(Intercept) 59.6804 5.2066 11.463
Age 0.1905 0.1105 1.725
Pr(>|t|)
(Intercept) <2e-16 ***
Age 0.0861 .

Signif. codes:
0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.94 on 198 degrees of freedom
Multiple R-squared: 0.0148, Adjusted R-squared: 0.009825
F-statistic: 2.975 on 1 and 198 DF, p-value: 0.08614

You would reject the null hypothosis

ktaylor4

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(8)
dfs = df2[sample(length(df2$Age), 200),
plot.reg= lm(Net.Time~Age, data = dfs)
summary(plot.reg)

Results

Call:
 lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-36.786 -16.841  -5.380   9.661  94.961 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  55.3646     6.4896   8.531 3.74e-15 ***
Age           0.3973     0.1398   2.842  0.00495 ** 

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.2 on 198 degrees of freedom
Multiple R-squared:  0.0392,	Adjusted R-squared:  0.03434 
F-statistic: 8.077 on 1 and 198 DF,  p-value: 0.004952

Code

plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")abline(reg = plot.reg, col = "black")

Interpretation
The linear model: 0.3973x+55.3646
The p-value is 0.00495 therefore, there is a relationship between age and time.
You should complete the race at 76.82 minutes.

Henry

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(3) # Your seed is set to your position in the Groovy Class Randomizer
dfs = df2[sample(length(df2$Age), 200),]
plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)
plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "navy", pch = "red")
abline(reg = plot.reg, col = "black")

Return:

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-31.617 -13.933  -3.666  11.684  62.310 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  57.6962     4.9084  11.755  < 2e-16 *** .
Age           0.2819     0.1050   2.684  0.00788 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.24 on 198 degrees of freedom
Multiple R-squared:  0.03512,	Adjusted R-squared:  0.03024 
F-statistic: 7.206 on 1 and 198 DF,  p-value: 0.00788

Rplot2
Data Interpretation:

Linear model – y = 0.2819x + 57.6962
P-val = 0.00788, thus we reject the null, supporting the positive correlation between age and time.

mmealie

Code

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(0) # Your seed is set to your position in the Groovy Class Randomizer
dfs = df2[sample(length(df2$Age), 200),]
plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

Output

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-29.795 -13.953  -3.747   9.972  62.493 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  51.5361     5.3545   9.625  < 2e-16 ***
Age           0.4485     0.1137   3.944 0.000111 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.34 on 198 degrees of freedom
Multiple R-squared:  0.07284,	Adjusted R-squared:  0.06816 
F-statistic: 15.56 on 1 and 198 DF,  p-value: 0.0001112

The Linear Model
y = .4485x + 51.5361
p-value = 0.00011112, therefore we reject the null hypothesis and accept the alternate hypothesis that there is a correlation present between age and time.

For an old (54) statistician they would run the race in 75.7551 minutes.

Lumpyhead00

Code:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(12)
dfs = df2[sample(length(df2$Age), 200),]

plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)


plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Output:

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-34.066 -13.593  -3.643  11.173  77.135 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)   47.762      4.948   9.653  < 2e-16 ***
Age            0.539      0.106   5.083 8.57e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.98 on 198 degrees of freedom
Multiple R-squared:  0.1154,	Adjusted R-squared:  0.111 
F-statistic: 25.84 on 1 and 198 DF,  p-value: 8.567e-07

Rplot

The Linear Model
y=0.539x + 47.762
Based on the p-value and the chart I would reject the null and accept the alternate hypothesis because there is a correlation between age and time.

Marks Time:

y=0.539(54)+47.762

y=76.86 or 77 mins and 26 sec and much slower

jmahan

Code for Summary Table

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(14)
dfs = df2[sample(length(df2$Age), 200),]
plot.reg = lm(Net.Time~Age, data = dfs)
summary(plot.reg)

Summary table output

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-31.849 -13.728  -4.374   9.851  78.425 

Coefficients:
        Estimate Std. Error t value
(Intercept)  56.9369     5.3310  10.680
Age           0.3081     0.1121   2.748
        Pr(>|t|)    
(Intercept)  < 2e-16 ***
Age          0.00656 ** 
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.82 on 198 degrees of freedom
Multiple R-squared:  0.03673,	Adjusted R-squared:  0.03186 
F-statistic: 7.549 on 1 and 198 DF,  p-value: 0.006559

Code for Graphical Output

 plot(Net.Time~Age, data = dfs, xlab = "Age", ylab = "Net.Time", col = "red", pch = "red")
abline(reg = plot.reg, col = "black")

Graphical Output
TimevsAge

Interpretation

The model presents a moderate trend for comparing age and times of males older than 25.

p-value: 0.006559

Next year’s estimated time for a 54 year old statistics teacher would be 73.58.

mark

So, what’s my time gonna be??

mark

So, what’s my time gonna be??