You might recall that we talked about linear regression a couple of months ago. A simple example is givrn by this look at our CDC data relating height and weight:
As we mentioned before, the correlation of about \(0.42\) is a quantitative assessment of the relationship between the variables and the formula \(W=4.87H-154.24\) yields an estimate of the weight \(W\) in terms of the height \(H\).
Here’s the thing though: If you look back at those previous notes you’ll find slightly different numbers. The reason is that the data is a random sample of the \(20,000\) men in the study. If we take a different random sample, then we’ll get different numbers. Thus, the coefficients in linear regression can be considered as sample statistics so we have standard errors and \(p\)-values associated with them.
Let’s discuss how we might interpret the following:
set.seed(1)
cdc = read.csv("https://www.marksmath.org/data/cdc.csv")
men = subset(cdc, gender=='m')
subset = men[sample(1:length(men$height),50),]
cdc_fit = lm(subset$weight~subset$height)
summary(cdc_fit)
##
## Call:
## lm(formula = subset$weight ~ subset$height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73.120 -23.716 -8.848 17.896 93.392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -122.182 144.958 -0.843 0.4035
## subset$height 4.504 2.043 2.205 0.0323 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.3 on 48 degrees of freedom
## Multiple R-squared: 0.09198, Adjusted R-squared: 0.07307
## F-statistic: 4.862 on 1 and 48 DF, p-value: 0.03227
If I could run this type of hypothesis test in Javascript, I’d add it to my CFB Data demo!