Here’s a fun little article: https://theathletic.com/89525/2017/08/29/the-four-stats-that-matter-to-college-football-coaches/
It’s behind a paywall so most folks can probably see only the beginning. Here’s the main claim, though - Amongst the myriad stats that college football coaches track to gauge their team’s strength, four stand out as highly correlated to wins and losses:
Moreover, these are more highly correlated to winning percentage than are other oft cited stats, such as total offense, pass defense or total defense.
As usual, I’ve got a CSV file that stores all sorts of relevant data on my webpage. This particular data set was scraped off of http://www.cfbstats.com/. Let’s read it in and examine the column variables.
df = read.csv('~/CFB2016.csv')
names(df)
## [1] "X"
## [2] "Team"
## [3] "ID"
## [4] "Wins"
## [5] "Losses"
## [6] "TEAM.Scoring..Points.Game"
## [7] "OPP.Scoring..Points.Game"
## [8] "TEAM.Total.Points"
## [9] "OPP.Total.Points"
## [10] "TEAM.First.Downs..Total"
## [11] "OPP.First.Downs..Total"
## [12] "TEAM.First.Downs..Rushing"
## [13] "TEAM.First.Downs.Passing"
## [14] "TEAM.First.Downs.By.Penalty"
## [15] "OPP.First.Downs..Rushing"
## [16] "OPP.First.Downs.Passing"
## [17] "OPP.First.Downs.By.Penalty"
## [18] "TEAM.Rushing..Yards...Attempt"
## [19] "OPP.Rushing..Yards...Attempt"
## [20] "TEAM.Rushing..Attempts"
## [21] "TEAM.Rushing.Yards"
## [22] "TEAM.Rushing.TD"
## [23] "OPP.Rushing..Attempts"
## [24] "OPP.Rushing.Yards"
## [25] "OPP.Rushing.TD"
## [26] "TEAM.Passing..Rating"
## [27] "OPP.Passing..Rating"
## [28] "TEAM.Passing..Yards"
## [29] "OPP.Passing..Yards"
## [30] "TEAM.Passing..Attempts"
## [31] "TEAM.Passing.Completions"
## [32] "TEAM.Passing.Interceptions"
## [33] "TEAM.Passing.TD"
## [34] "OPP.Passing..Attempts"
## [35] "OPP.Passing.Completions"
## [36] "OPP.Passing.Interceptions"
## [37] "OPP.Passing.TD"
## [38] "TEAM.Total.Offense..Yards...Play"
## [39] "OPP.Total.Offense..Yards...Play"
## [40] "TEAM.Total.Offense..Plays"
## [41] "TEAM.Total.Offense.Yards"
## [42] "OPP.Total.Offense..Plays"
## [43] "OPP.Total.Offense.Yards"
## [44] "TEAM.Punt.Returns..Yards...Return"
## [45] "OPP.Punt.Returns..Yards...Return"
## [46] "TEAM.Punt.Returns..Returns"
## [47] "TEAM.Punt.Returns.Yards"
## [48] "TEAM.Punt.Returns.TD"
## [49] "OPP.Punt.Returns..Returns"
## [50] "OPP.Punt.Returns.Yards"
## [51] "OPP.Punt.Returns.TD"
## [52] "TEAM.Kickoff.Returns..Yards...Return"
## [53] "OPP.Kickoff.Returns..Yards...Return"
## [54] "TEAM.Kickoff.Returns..Returns"
## [55] "TEAM.Kickoff.Returns.Yards"
## [56] "TEAM.Kickoff.Returns.TD"
## [57] "OPP.Kickoff.Returns..Returns"
## [58] "OPP.Kickoff.Returns.Yards"
## [59] "OPP.Kickoff.Returns.TD"
## [60] "TEAM.Punting..Yards...Punt"
## [61] "OPP.Punting..Yards...Punt"
## [62] "TEAM.Punting..Punts"
## [63] "TEAM.Punting.Yards"
## [64] "OPP.Punting..Punts"
## [65] "OPP.Punting.Yards"
## [66] "TEAM.Interceptions..Returns"
## [67] "TEAM.Interceptions.Yards"
## [68] "TEAM.Interceptions.TD"
## [69] "OPP.Interceptions..Returns"
## [70] "OPP.Interceptions.Yards"
## [71] "OPP.Interceptions.TD"
## [72] "TEAM.Fumbles..Number"
## [73] "TEAM.Fumbles.Lost"
## [74] "OPP.Fumbles..Number"
## [75] "OPP.Fumbles.Lost"
## [76] "TEAM.Penalties..Number"
## [77] "TEAM.Penalties.Yards"
## [78] "OPP.Penalties..Number"
## [79] "OPP.Penalties.Yards"
## [80] "TEAM.Time.of.Possession"
## [81] "OPP.Time.of.Possession"
## [82] "TEAM.3rd.Down.Conversion.."
## [83] "OPP.3rd.Down.Conversion.."
## [84] "TEAM.3rd.Down.Conversions..Attempts"
## [85] "TEAM.3rd.Down.Conversions"
## [86] "OPP.3rd.Down.Conversions..Attempts"
## [87] "OPP.3rd.Down.Conversions"
## [88] "TEAM.4th.Down.Conversion.."
## [89] "OPP.4th.Down.Conversion.."
## [90] "TEAM.4th.Down.Conversions..Attempts"
## [91] "TEAM.4th.Down.Conversions"
## [92] "OPP.4th.Down.Conversions..Attempts"
## [93] "OPP.4th.Down.Conversions"
## [94] "TEAM.Red.Zone..Success.."
## [95] "OPP.Red.Zone..Success.."
## [96] "TEAM.Red.Zone..Attempts"
## [97] "TEAM.Red.Zone.Scores"
## [98] "OPP.Red.Zone..Attempts"
## [99] "OPP.Red.Zone.Scores"
## [100] "TEAM.Field.Goals..Success.."
## [101] "OPP.Field.Goals..Success.."
## [102] "TEAM.Field.Goals..Attempts"
## [103] "TEAM.Field.Goals.Made"
## [104] "OPP.Field.Goals..Attempts"
## [105] "OPP.Field.Goals.Made"
## [106] "TEAM.PAT.Kicking..Success.."
## [107] "OPP.PAT.Kicking..Success.."
## [108] "TEAM.PAT.Kicking..Attempts"
## [109] "TEAM.PAT.Kicking.Made"
## [110] "OPP.PAT.Kicking..Attempts"
## [111] "OPP.PAT.Kicking.Made"
## [112] "TEAM.2.Point.Conversions..Success.."
## [113] "OPP.2.Point.Conversions..Success.."
## [114] "TEAM.2.Point.Conversions..Attempts"
## [115] "TEAM.2.Point.Conversions.Made"
## [116] "OPP.2.Point.Conversions..Attempts"
## [117] "OPP.2.Point.Conversions.Made"
That’s 117 fields for all 128 FBS teams - quite a lot of data.
Let’s take a look at the specific data menioned in the article and who well it correlates with wins and losses.
wl = df$Wins/(df$Wins + df$Losses)
other = as.numeric(sub("%","",df$TEAM.3rd.Down.Conversion..))
cor(wl,other)
## [1] 0.5325356
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.3749 -2.6549 -0.0299 3.0927 12.8619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.938 1.137 28.977 < 2e-16 ***
## wl 14.376 2.036 7.062 9.8e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.72 on 126 degrees of freedom
## Multiple R-squared: 0.2836, Adjusted R-squared: 0.2779
## F-statistic: 49.88 on 1 and 126 DF, p-value: 9.803e-11
A correlation of 0.53 with an F-Stat of about \(10^{-10}\).
other = df$TEAM.Passing..Yards/df$TEAM.Passing..Attempts
cor(wl,other)
## [1] 0.5759094
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7375 -0.6068 -0.0883 0.5163 3.5318
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7423 0.2285 25.128 < 2e-16 ***
## wl 3.2360 0.4092 7.908 1.14e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.949 on 126 degrees of freedom
## Multiple R-squared: 0.3317, Adjusted R-squared: 0.3264
## F-statistic: 62.53 on 1 and 126 DF, p-value: 1.144e-12
other = (df$OPP.Fumbles.Lost-df$TEAM.Fumbles.Lost) + (df$OPP.Passing.Interceptions-df$TEAM.Passing.Interceptions)
cor(wl,other)
## [1] 0.5944906
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.7579 -4.0460 0.3465 3.8089 16.2858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.524 1.410 -7.465 1.20e-11 ***
## wl 20.951 2.525 8.299 1.38e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.855 on 126 degrees of freedom
## Multiple R-squared: 0.3534, Adjusted R-squared: 0.3483
## F-statistic: 68.87 on 1 and 126 DF, p-value: 1.38e-13
other = df$TEAM.Rushing..Attempts
cor(wl,other)
## [1] 0.5782156
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -169.72 -45.21 -5.13 33.83 258.28
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 381.76 17.11 22.317 < 2e-16 ***
## wl 243.69 30.63 7.955 8.86e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 71.04 on 126 degrees of freedom
## Multiple R-squared: 0.3343, Adjusted R-squared: 0.3291
## F-statistic: 63.28 on 1 and 126 DF, p-value: 8.861e-13
other = df$TEAM.Total.Offense..Yards...Play
cor(wl,other)
## [1] 0.5968873
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.43356 -0.42254 0.02848 0.36445 1.44780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.7774 0.1336 35.747 < 2e-16 ***
## wl 1.9986 0.2393 8.351 1.04e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.555 on 126 degrees of freedom
## Multiple R-squared: 0.3563, Adjusted R-squared: 0.3512
## F-statistic: 69.74 on 1 and 126 DF, p-value: 1.04e-13
other = df$OPP.Total.Offense..Yards...Play
cor(wl,other)
## [1] -0.5384097
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21166 -0.42290 -0.02172 0.36407 1.30834
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.5416 0.1308 50.016 < 2e-16 ***
## wl -1.6798 0.2342 -7.172 5.57e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5431 on 126 degrees of freedom
## Multiple R-squared: 0.2899, Adjusted R-squared: 0.2842
## F-statistic: 51.44 on 1 and 126 DF, p-value: 5.566e-11
other = df$OPP.Total.Offense..Yards...Play
cor(wl,other)
## [1] -0.5384097
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21166 -0.42290 -0.02172 0.36407 1.30834
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.5416 0.1308 50.016 < 2e-16 ***
## wl -1.6798 0.2342 -7.172 5.57e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5431 on 126 degrees of freedom
## Multiple R-squared: 0.2899, Adjusted R-squared: 0.2842
## F-statistic: 51.44 on 1 and 126 DF, p-value: 5.566e-11
Here’s the deal - if you want to win games, then score points.
other = df$TEAM.Total.Points
cor(wl,other)
## [1] 0.7884184
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -138.857 -36.073 -3.224 39.257 189.212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 182.89 14.97 12.22 <2e-16 ***
## wl 385.63 26.80 14.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62.16 on 126 degrees of freedom
## Multiple R-squared: 0.6216, Adjusted R-squared: 0.6186
## F-statistic: 207 on 1 and 126 DF, p-value: < 2.2e-16
Keeping the other team from scoring points seems less important.
other = df$OPP.Total.Points
cor(wl,other)
## [1] -0.5697005
plot(wl,other)
fit = lm(other~wl)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = other ~ wl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -138.042 -45.520 -4.827 41.272 159.974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 466.98 15.06 31.000 < 2e-16 ***
## wl -209.90 26.98 -7.781 2.25e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62.56 on 126 degrees of freedom
## Multiple R-squared: 0.3246, Adjusted R-squared: 0.3192
## F-statistic: 60.54 on 1 and 126 DF, p-value: 2.251e-12