Data Entry:
Grabbing the data:
df = read.csv('https://www.marksmath.org/data/peach_tree2015.csv')
Subset of Men, >25:
df2 = subset(df, Gender == "M" & Age > 25)
set.seed(student_ID)
dfs = df2[sample(length(df2$Age), 200),]
Linear Regression:
fit = lm(Net.Time ~ Age, data=dss)
summary(fit)
Call:
lm(formula = Net.Time ~ Age, data = df)
Residuals:
Min 1Q Median 3Q Max
-46.213 -15.981 -4.271 12.696 151.863
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.514898 0.269582 243.02 <2e-16 ***
Age 0.263138 0.006313 41.69 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21.43 on 54794 degrees of freedom
Multiple R-squared: 0.03074, Adjusted R-squared: 0.03072
F-statistic: 1738 on 1 and 54794 DF, p-value: < 2.2e-16
Here's My Plot:
dfs= df2[sample(length(df2$Age), 200),]
plot(dfs$Age, dfs$Net.Time)
fit = lm(Net.Time ~ Age, data = dfs)
abline(fit)
cor(dfs$Net.Time, dfs$Age)
Interpret the Results:
Here's the formula for my line:
y = 0.263138x + 65.514898
My correlation:
cor(dfs$Net.Time, dfs$Age)
##Out
0.183992
There is a positive relationship between age and net time within my data set. As the men in this data set age, their time gets slower. My data does is not a good model as it has a lot of "noise" in the data. My correlation is close to 0, which tells me that though there is that positive relationship, there is a low correlation between the two. So, yes they're related, but not very strongly so. With a low correlation that means there are a lot of outliers of men whose running times are not affected or strongly affected by age. So, this is pretty good news for Professor McClure- according to my data it is not strong evidence that he will get slower by next year.