An archive the questions from Mark's Fall 2018 Stat 225.

Interpreting linear regression

Mark

I ran a regression to determine if there is a relation between age and race time using the following code:

from scipy.stats import linregress
import pandas as pd
df = pd.read_csv('https://www.marksmath.org/data/peach_tree2015.csv')
df_men = df[df.Gender == 'M']
sam = df_men.sample(75, random_state=3)
linregress(sam.Age, sam['Net Time'])

I generated the following output:

LinregressResult(slope=0.35432206515177694, intercept=52.826891391060116, rvalue=0.2550867549750977, pvalue=0.02719625032021964, stderr=0.15719494672098736)
  1. To a 99% level of confidence, is there a genuine relationship between and Age and net time?
  2. What net time does the data predict for a 50 year old?
vscala
  1. With a 99% confidence level, we fail to reject the null hypothesis that there is no correlation between Age and Net Time as the p-value is greater than .01.
  2. The predicted net time of a 50-year-old is about 70.54
Mark

This looks pretty good - just one comment on this:

On a test, I would definitely include the computation that gave rise to this. That is:

0.354*50 + 52.826 \approx 70.54.