An archived instance of Mark's Discourse site as of Tuesday July 18, 2017.

Peachtree linear regression

mark

(10 points)

Use the following code to grab a sample from our Peachtree data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(987654321)
dfs = df2[sample(length(df2$Age), 200),]

Then, run a linear regression on your sample and interpret. Specifically:

  • What is the linear model relating Net Time to Age?
  • Does your model look good?
  • How slow will your poor old statistics be next year???
ejoy90

First, Grab the Data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Generate the Subset:

df2 = subset(df, Gender=="M" & Age>25)
set.seed(123456789)
df2[sample(length(df2$Age),200),]

Run a Linear Regression on Sample:

fit = lm(Net.Time~Age,data=df2)
summary(fit)

##Out:

Call:
lm(formula = Net.Time ~ Age, data = df2)

Residuals:
Min      1Q  Median      3Q     Max 
-40.480 -14.390  -4.341  10.274 157.099 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept) 51.84285    0.52527   98.70   <2e-16 ***
Age          0.42995    0.01107   38.84   <2e-16 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.99 on 22725 degrees of freedom
Multiple R-squared:  0.06224,   Adjusted R-squared:  0.0622 
F-statistic:  1508 on 1 and 22725 DF,  p-value: < 2.2e-16

Results/Interpretation:

cor(df2$Net.Time, df2$Age)
[1] 0.2494794
Sarcasticswimmer

Get Dat' Sweet Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(¯\_(ツ)_/¯)
dss = df2[sample(length(df2$Age), 200),]

Run a linear regression

fit = lm(Net.Time ~ Age, data=dss)
summary(fit)

Out

Call:
lm(formula = Net.Time ~ Age, data = dss)

Residuals:
Min      1Q  Median      3Q     Max 
-33.765 -15.175  -2.823  10.587  65.287 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  55.6391     5.4883  10.138  < 2e-16 ***
Age           0.3564     0.1134   3.142  0.00193 ** 
--- 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.32 on 198 degrees of freedom
Multiple R-squared:  0.0475,    Adjusted R-squared:  0.04269 
F-statistic: 9.875 on 1 and 198 DF,  p-value: 0.001933


plot(dss$Age, dss$Net.Time)
abline(fit)

Conclusions

Professor will be
Y= 55.6391+0.3564X
In which X is age.

If X=53
Time will be 74.5

cor(dss$Net.Time, dss$Age)
[1] 0.2179558

Due to the small correlation this is a poor model.
There is a small correlation between age and time, it relates that as you age there is a slight net increase in overall time.

Prestonw

Grab the Data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Form a Subset and Sample:

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(987654321)
dfs = df2[sample(length(df2$Age), 200),]

Run a Linear Regression:

fit = lm(Net.Time ~ Age, data=df)
summary(fit)



Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 65.514898   0.269582  243.02   <2e-16 ***
Age          0.263138   0.006313   41.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Plot for the Ages

plot(dfs$Age, dfs$Net.Time)
abline(fit)

Interpreted Results:

cor(dfs$Age, dfs$Net.Time)
#Out:
[1] 0.2716388

The linear model is:
Y = 65.514898 + 0.263138*X

This is a good model for the data. There is a positive relationship between Age and Time as an older age leads to a larger time. The older Mark becomes, the slower he will become.
Next Year, Mark(53 yrs old) will run the race at 79.46 minutes, which is slower then 79.19 of last year.

Jenna

Grab the Data

 df = read.csv('https://www.marksmath.org  /data/peach_tree2015.csv')

Form a Subset

df2 = subset(df, Gender == "M" & Age > 25)

Grab a Sample

set.seed(MY_ID)
sample(length(df2$Age), 200)
dfs = df2[sample(length(df2$Age), 200),]

Run a Linear Regression

fit = lm(Net.Time ~ Age, data = dfs)
summary(fit)

#Out: 
Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-33.511 -13.713  -3.114   9.331  63.200 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept) 43.50029    4.73064   9.195  < 2e-16 ***
Age          0.55979    0.09876   5.668 5.05e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.26 on 198 degrees of freedom
Multiple R-squared:  0.1396,    Adjusted R-squared:  0.1353 
F-statistic: 32.13 on 1 and 198 DF,  p-value: 5.049e-08

Plot

qqpic = qqnorm(fit$residuals)

Compute the Correlation:

cor(qqpic$x,qqpic$y)
#Out: 
[1] 0.9667474

Interpret

The correlation is close to 1. The model appears good and the data seems normal

Linear Model

y = 0.55979x + 43.50029

A 53 year old can expect to run the race in about 73 minutes as opposed to 72.5 minutes last year

Alison

Grab the Data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Generate the Subset:

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(ID#)  
dfs = df2[sample(length(df2$Age), 200),]

Run a Linear Regression:

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-29.521 -15.098  -5.361   9.834 110.882 

 Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  56.6281     5.9610    9.50   <2e-16 ***
Age           0.3059     0.1269    2.41   0.0168 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.88 on 198 degrees of freedom
Multiple R-squared:  0.02851,   Adjusted R-squared:  0.0236 
F-statistic:  5.81 on 1 and 198 DF,  p-value: 0.01685

Conclusion:
Professor:
Y= 56.6281+0.3059.X
In which X is age.


A 53 year old man could expect to run the race in about 73 minutes.

Amelia

Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(ID)
dfs = df2[sample(length(df2$Age), 200),]

Linear Regression

fit = lm(Net.Time ~ Age, data = df)
summary(fit)

#output
Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 65.514898   0.269582  243.02   <2e-16 ***
Age          0.263138   0.006313   41.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Plot

plot(dfs$Net.Time, dfs$Age)

Conclusion

Statistics professor at 53 years old:

65.514898+ 0.263138*53= 79.46121


cor(df2$Net.Time, df2$Age)
#output
0.2494794
Sierra

Grab data, form a subset, and grab a sample:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(MYSTUDENTID)
dfs = df2[sample(length(df2$Age), 200),]

Run a linear regression:

fit = lm(Net.Time ~ Age, data = dfs)
summary(fit)

## Output:
# Call:
# lm(formula = Net.Time ~ Age, data = dfs)

# Residuals:
#     Min      1Q  Median      3Q     Max 
# -35.165 -16.030  -5.322  12.223  75.472 

# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  55.7200     6.0542   9.203  < 2e-16 ***
# Age              0.3796     0.1273   2.981  0.00323 ** 
   ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 21.62 on 198 degrees of freedom
# Multiple R-squared:  0.04296,   Adjusted R-squared:  0.03812 
# F-statistic: 8.887 on 1 and 198 DF,  p-value: 0.003232

Linear model:
y = 0.3796x + 55.7200

Plot the data:

plot(dfs$Age, dfs$Net.Time)
abline(fit)

Run a correlation:

cor(dfs$Age, dfs$Net.Time)

## Output: 
# [1] 0.2072603

Create a Q-Q Plot:

qqnorm(fit$residuals)
qqline(fit$residuals)

Interpretation:

According to the linear model produced, there is a positive relationship between age and net time. As one ages, their net time increases. However, age and net time are weakly correlated in this sample data set, as the correlation value is 0.2072603, which is far from 1. In addition, the best fit line produced does not fit the data on the graph very well, as the data is rather scattered from the line. Also, the Q-Q plot produced shows that the residuals are not normal. These things indicate a relatively poor linear model for the sample data.

Andy

Grab data and create subset

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(930357379)
dfs = df2[sample(length(df2$Age), 200),]

Run Linear Regression

fit = lm(Net.Time ~ Age, data=df)
summary(fit)

Out:

Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 65.514898   0.269582  243.02   <2e-16 ***
Age          0.263138   0.006313   41.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Plot

plot(dss$Age, dss$Net.Time)
abline(fit)

Correlation

cor(dfs$Age, dfs$Net.Time)

Out: 0.07298901

Interpretation of data

Y = 65.514898 + 0.26138x
If x=53

PaulWall

First grab the Peachtree data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Then form a subset of rows corresponding to men over the age of 25

df2 = subset(df, Gender == "M" & Age > 25)

Next, grab a random sample of 200 of the men from the data subset

set.seed(STUDENT_ID)
sample(length(df2$Age),200)

Run a linear regression on Net.Time

fit = lm(Net.Time ~ Age, data = dfs)
summary(fit)
##Out: Residuals:
Min      1Q  Median      3Q     Max 
-40.480 -14.390  -4.341  10.274 157.099 

Call:
lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
Min      1Q  Median      3Q     Max 
-34.309 -15.856  -3.522  10.245  83.285 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept)  45.2366     6.0128   7.523 1.83e-12 ***
Age           0.6587     0.1302   5.058 9.63e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.19 on 198 degrees of freedom
Multiple R-squared:  0.1144,    Adjusted R-squared:   0.11 
F-statistic: 25.58 on 1 and 198 DF,  p-value: 9.629e-07

plot(dfs$Age, dfs$Net.Time)
abline(fit)

Interpret the results

The least squares fit is: y = 0.6587x + 45.2366
According to this model, Professor Mark McClure will finish the race next year in 80.15 minutes when he is 53 years old. (0.6587*53 + 45.2366)

The small p-value shows that there is not independence between Net.Time and Age.

The data set doesn't have a very high correlation though:

cor(dfs$Age, dfs$Net.Time)
##Out: [1] 0.3382707

There is a lot of noise in the data, which is very evident in the QQ plot of the residuals
.

qqpic = qqnorm(fit$residuals)

Overall, the linear regression model isn't good for this data. The correlation is far from 1.

FRD

1. Generating the subset and the sample

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
menOver45<-subset(df,Gender == "M" & Age>45)
set.seed(myID)
randomSample= menOver45[sample(length(menOver45$Age),200),]

2. Generating the linear regression

linearRegression=lm(randomSample$Net.Time~randomSample$Age)
linearRegression
**Out
**Call:
**lm(formula = randomSample$Net.Time ~ **randomSample$Age)

**Coefficients:
**     (Intercept)  randomSample$Age  
**         35.1097            0.7354

3. Linear model relating net time to age

$y(x)=0.7534(x) + 35.1097$

4. Evaluation of the linear model

Graph

plot(randomSample$Age,randomSample$Net.Time)
abline(linearRegression)


Residuals
qqnorm(linearRegression$residuals)
qqline(linearRegression$residuals)

Correlation




cor(randomSample$Net.Time,randomSample$Age)
[1] 0.2449124

Conclusion
According to the qqplot and the correlation test, the model seems to be a not appropiate for this data.

5. Prediction

The net time of a 54 year old male runner is approximately 75.7933 minutes.

wolfpack77

First, grab a sample from the Peachtree Data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(student_ID)
dfs = df2[sample(length(df2$Age),200),]

Then, run a linear regression on your sample and interpret:

fit = lm(Net.Time ~ Age, data=dfs)
summary(fit)

#Out
Call:
fit = lm(formula = Net.Time ~ Age, data = dfs)

Residuals:
    Min      1Q  Median      3Q     Max 
-35.273 -14.478  -4.030   9.806  78.697 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  54.0835     5.6157   9.631  < 2e-16 ***
Age           0.4352     0.1184   3.677 0.000304 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.55 on 198 degrees of freedom
Multiple R-squared:  0.06392,   Adjusted R-squared:  0.05919 
F-statistic: 13.52 on 1 and 198 DF,  p-value: 0.000304

What is the linear model relating Net Time to Age?

Y = 0.4352*X + 54.0835

plot(dfs$Age, dfs$Net.Time)
abline(fit)

Does your model look good?

Check data for linearity by creating a QQ plot from the residuals:

qqpic = qqnorm(fit$residuals)
qqline(fit$residuals)

Compute the linear correlation from the R correlation command:

cor(qqpic$x,qqpic$y)
#Out
[1] 0.9670869

Since the linear correlation value is close to 1, one can conclude that the model looks linear and the data is nearly normal.

How slow will your poor old statistics professor be next year???

According to this sample from the overall data, a 53 year old male would expect a time of 77.15 minutes.

YOU_SHALL_NEVER_KNOW

Getting Dat Peachy Trees

df =read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Sample of Random, Older Men

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(LOOK_AWAY_PEASENTS)
dfs = df2[sample(length(df2$Age), 200),]

Let's Make Dat Linear Regression

fit = lm(Net.Time~Age, data = dfs)
summary(fit)

##
## Call:
## lm(formula = Net.Time ~ Age, data = dfs)

## Residuals:
## Min      1Q  Median      3Q     Max 
## -35.112 -15.805  -4.554  14.573  54.418 

## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  51.1958     5.5960   9.149  < 2e-16 ***
## Age           0.4693     0.1215   3.864 0.000151 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Residual standard error: 20.07 on 198 degrees of freedom
## Multiple R-squared:  0.07012,   Adjusted R-squared:  0.06543 
## F-statistic: 14.93 on 1 and 198 DF,  p-value: 0.000151

A Plot for the Ages

plot(dfs$Age, dfs$Net.Time)
abline(fit)

What's That Linear Model?!

y = 0.4693x + 51.1958

What About Dem Residuals?

qqpic = qqnorm(fit$residuals)

So What About Correlations???

cor(dfs$Age, dfs$Net.Time)

## Out:
## [1] 0.2648088

Conclusions???????????

This is such a noisy plot, my ears hurt.
The correlation is so low; it's a terrible representation of the data.

For a 43 year old man, according to this horrible sample:

y = 0.4693(43) + 51.1958
y = 71.3757

Seems like a somewhat average time. I suppose this person won't slow down all that much, if at all

kd95
 df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
 df2 = subset(df, Gender == "M" & Age > 25)
 set.seed(911111111)
 dfs = df2[sample(length(df2$Age), 200),]

Graph -

plot(dfs$Age, dfs$Clock.Time)
fit = lm(Age ~ Clock.Time, data=dfs)
abline(fit)

Summary -

 Call:
 lm(formula = Clock.Time ~ Age, data = dfs)

 Residuals:
    Min     1Q Median     3Q    Max 
 -93.54 -49.11 -14.03  52.52 124.46 

 Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
 (Intercept)  89.1776    15.7414   5.665 5.12e-08 ***
 Age           0.7308     0.3362   2.173   0.0309 *  
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 56.59 on 198 degrees of freedom
 Multiple R-squared:  0.0233,    Adjusted R-squared:  0.01837 
 F-statistic: 4.724 on 1 and 198 DF,  p-value: 0.03093

Correlation -

 cor(dfs$Clock.Time, dfs$Age)
 # [1] 0.1526531

Q-Q Residuals -

qqpic = qqnorm(fit$residuals)

The Q-Q graph of residuals shows that the residuals are not extremely normal (it lies loosely along a line, but has a significant wave to it), so I'm not sure that this is a great data set to apply a linear regression to. The correlation for this sample is very low.

Bellaj

First grabbing some data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Subset

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(STUDENT ID)
 dfs = df2[sample(length(df2$Age), 200),]

Linear Regression

fit = lm(Net.Time~Age,data=df)
summary(fit)

##Out

Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 65.514898   0.269582  243.02   <2e-16 ***
Age          0.263138   0.006313   41.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Plot

plot(dfs$Age, dfs$net.time)
abline(fit)

QQ Plot

Correlation

cor(dfs$Net.Time, dfs$Age)
0.3240449

Conclusion

y=65.514898+0.263138
Not a good correlation since .3240449 is not that close to 1
Not a good plot.

Kristian

Data Entry:

Grabbing the data:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Subset of Men, >25:

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(student_ID)
dfs = df2[sample(length(df2$Age), 200),]

Linear Regression:

fit = lm(Net.Time ~ Age, data=dss)
summary(fit)

Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
 (Intercept) 65.514898   0.269582  243.02   <2e-16 ***
 Age          0.263138   0.006313   41.69   <2e-16 ***
 ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Here's My Plot:

dfs= df2[sample(length(df2$Age), 200),]
plot(dfs$Age, dfs$Net.Time)
fit = lm(Net.Time ~ Age, data = dfs)
abline(fit)
cor(dfs$Net.Time, dfs$Age)

Interpret the Results:

Here's the formula for my line:

y = 0.263138x + 65.514898

My correlation:

cor(dfs$Net.Time, dfs$Age)

##Out

 0.183992

There is a positive relationship between age and net time within my data set. As the men in this data set age, their time gets slower. My data does is not a good model as it has a lot of "noise" in the data. My correlation is close to 0, which tells me that though there is that positive relationship, there is a low correlation between the two. So, yes they're related, but not very strongly so. With a low correlation that means there are a lot of outliers of men whose running times are not affected or strongly affected by age. So, this is pretty good news for Professor McClure- according to my data it is not strong evidence that he will get slower by next year.

monehish

Grab Data & Generate Subset:

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(930XXXXXX)
dfs = df2[sample(length(df2$Age), 200),]

Run a Linear Regression

fit = lm(Net.Time~Age,data=df)
summary(fit)

##Out
Call:
lm(formula = Net.Time ~ Age, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-46.213 -15.981  -4.271  12.696 151.863 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
(Intercept) 65.514898   0.269582  243.02   <2e-16 ***
Age          0.263138   0.006313   41.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared:  0.03074,   Adjusted R-squared:  0.03072 
F-statistic:  1738 on 1 and 54794 DF,  p-value: < 2.2e-16

Generate a Scatterplot:

plot(dfs$Age, dfs$Net.Time)
fit = lm(Net.Time ~ Age, data = dfs)
abline(fit)

Interpretation:

cor(dfs$Age, dfs$Net.Time)

correlation = 0.2442404

y = 65.514898 + 0.263138X

Conclusion:

The linear model shows a positive relationship between age and finish time but the correlation is lower than expected. So we can conclude that the two are related, but the significance of it is questionable. This is likely due to all the outliers present in the data set. Even still, it seems likely that men generally will slow down with age.

not_sam

Get Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Form Subset

df2 = subset(df, Gender == "M" & Age > 25)

Get Sample

set.seed(930******)
sample(length(df2$Age), 200)
dfs = df2[sample(length(df2$Age), 200),]

Call

Call:
lm(formula = Net.Time ~ Age, data = df2)

Residuals:
Min 1Q Median 3Q Max
-40.480 -14.390 -4.341 10.274 157.099

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.84285 0.52527 98.70 <2e-16 ***
Age 0.42995 0.01107 38.84 <2e-16 ***



Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.99 on 22725 degrees of freedom
Multiple R-squared: 0.06224, Adjusted R-squared: 0.0622
F-statistic: 1508 on 1 and 22725 DF, p-value: < 2.2e-16

y= 0.42995x + 51.84285

Residuals

Correlations

cor(dfs$Age, dfs$Net.Time)
[1] 0.2912386

Conclusion,
Alright, not great. Strangely vertical.

Nonamaker

Get Data

df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')

Generate Subset

df2 = subset(df, Gender == "M" & Age > 25)
set.seed(*********)
dfs = df2[sample(length(df2$Age), 200),]

Run Linear Regression

fit = lm(Net.Time~Age,data=df2)
summary(fit)

Call:
lm(formula = Net.Time ~ Age, data = df2)

Residuals:
Min 1Q Median 3Q Max
-40.480 -14.390 -4.341 10.274 157.099

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.84285 0.52527 98.70 <2e-16 ***
Age 0.42995 0.01107 38.84 <2e-16 ***



Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.99 on 22725 degrees of freedom
Multiple R-squared: 0.06224, Adjusted R-squared: 0.0622
F-statistic: 1508 on 1 and 22725 DF, p-value: < 2.2e-16


Plot

plot(dfs$Net.Time,dfs$Age)
abline(fit)


Interpretation

cor(df2$Net.Time,df2$Age)

0.2494794 (weak positive correlation)

Y(X) = 51.84285 + 0.42995*X
Y(53) = 74.6302

qqpic = qqnorm(fit$residuals)


Data is not normally distributed. Linear regression is probably inaccurate for this data.