CFB Stats compared to Ws and Ls

Here’s a fun little article: https://theathletic.com/89525/2017/08/29/the-four-stats-that-matter-to-college-football-coaches/

It’s behind a paywall so most folks can probably see only the beginning. Here’s the main claim, though - Amongst the myriad stats that college football coaches track to gauge their team’s strength, four stand out as highly correlated to wins and losses:

Moreover, these are more highly correlated to winning percentage than are other oft cited stats, such as total offense, pass defense or total defense.

As usual, I’ve got a CSV file that stores all sorts of relevant data on my webpage. This particular data set was scraped off of http://www.cfbstats.com/. Let’s read it in and examine the column variables.

df = read.csv('~/CFB2016.csv')
names(df)
##   [1] "X"                                   
##   [2] "Team"                                
##   [3] "ID"                                  
##   [4] "Wins"                                
##   [5] "Losses"                              
##   [6] "TEAM.Scoring..Points.Game"           
##   [7] "OPP.Scoring..Points.Game"            
##   [8] "TEAM.Total.Points"                   
##   [9] "OPP.Total.Points"                    
##  [10] "TEAM.First.Downs..Total"             
##  [11] "OPP.First.Downs..Total"              
##  [12] "TEAM.First.Downs..Rushing"           
##  [13] "TEAM.First.Downs.Passing"            
##  [14] "TEAM.First.Downs.By.Penalty"         
##  [15] "OPP.First.Downs..Rushing"            
##  [16] "OPP.First.Downs.Passing"             
##  [17] "OPP.First.Downs.By.Penalty"          
##  [18] "TEAM.Rushing..Yards...Attempt"       
##  [19] "OPP.Rushing..Yards...Attempt"        
##  [20] "TEAM.Rushing..Attempts"              
##  [21] "TEAM.Rushing.Yards"                  
##  [22] "TEAM.Rushing.TD"                     
##  [23] "OPP.Rushing..Attempts"               
##  [24] "OPP.Rushing.Yards"                   
##  [25] "OPP.Rushing.TD"                      
##  [26] "TEAM.Passing..Rating"                
##  [27] "OPP.Passing..Rating"                 
##  [28] "TEAM.Passing..Yards"                 
##  [29] "OPP.Passing..Yards"                  
##  [30] "TEAM.Passing..Attempts"              
##  [31] "TEAM.Passing.Completions"            
##  [32] "TEAM.Passing.Interceptions"          
##  [33] "TEAM.Passing.TD"                     
##  [34] "OPP.Passing..Attempts"               
##  [35] "OPP.Passing.Completions"             
##  [36] "OPP.Passing.Interceptions"           
##  [37] "OPP.Passing.TD"                      
##  [38] "TEAM.Total.Offense..Yards...Play"    
##  [39] "OPP.Total.Offense..Yards...Play"     
##  [40] "TEAM.Total.Offense..Plays"           
##  [41] "TEAM.Total.Offense.Yards"            
##  [42] "OPP.Total.Offense..Plays"            
##  [43] "OPP.Total.Offense.Yards"             
##  [44] "TEAM.Punt.Returns..Yards...Return"   
##  [45] "OPP.Punt.Returns..Yards...Return"    
##  [46] "TEAM.Punt.Returns..Returns"          
##  [47] "TEAM.Punt.Returns.Yards"             
##  [48] "TEAM.Punt.Returns.TD"                
##  [49] "OPP.Punt.Returns..Returns"           
##  [50] "OPP.Punt.Returns.Yards"              
##  [51] "OPP.Punt.Returns.TD"                 
##  [52] "TEAM.Kickoff.Returns..Yards...Return"
##  [53] "OPP.Kickoff.Returns..Yards...Return" 
##  [54] "TEAM.Kickoff.Returns..Returns"       
##  [55] "TEAM.Kickoff.Returns.Yards"          
##  [56] "TEAM.Kickoff.Returns.TD"             
##  [57] "OPP.Kickoff.Returns..Returns"        
##  [58] "OPP.Kickoff.Returns.Yards"           
##  [59] "OPP.Kickoff.Returns.TD"              
##  [60] "TEAM.Punting..Yards...Punt"          
##  [61] "OPP.Punting..Yards...Punt"           
##  [62] "TEAM.Punting..Punts"                 
##  [63] "TEAM.Punting.Yards"                  
##  [64] "OPP.Punting..Punts"                  
##  [65] "OPP.Punting.Yards"                   
##  [66] "TEAM.Interceptions..Returns"         
##  [67] "TEAM.Interceptions.Yards"            
##  [68] "TEAM.Interceptions.TD"               
##  [69] "OPP.Interceptions..Returns"          
##  [70] "OPP.Interceptions.Yards"             
##  [71] "OPP.Interceptions.TD"                
##  [72] "TEAM.Fumbles..Number"                
##  [73] "TEAM.Fumbles.Lost"                   
##  [74] "OPP.Fumbles..Number"                 
##  [75] "OPP.Fumbles.Lost"                    
##  [76] "TEAM.Penalties..Number"              
##  [77] "TEAM.Penalties.Yards"                
##  [78] "OPP.Penalties..Number"               
##  [79] "OPP.Penalties.Yards"                 
##  [80] "TEAM.Time.of.Possession"             
##  [81] "OPP.Time.of.Possession"              
##  [82] "TEAM.3rd.Down.Conversion.."          
##  [83] "OPP.3rd.Down.Conversion.."           
##  [84] "TEAM.3rd.Down.Conversions..Attempts" 
##  [85] "TEAM.3rd.Down.Conversions"           
##  [86] "OPP.3rd.Down.Conversions..Attempts"  
##  [87] "OPP.3rd.Down.Conversions"            
##  [88] "TEAM.4th.Down.Conversion.."          
##  [89] "OPP.4th.Down.Conversion.."           
##  [90] "TEAM.4th.Down.Conversions..Attempts" 
##  [91] "TEAM.4th.Down.Conversions"           
##  [92] "OPP.4th.Down.Conversions..Attempts"  
##  [93] "OPP.4th.Down.Conversions"            
##  [94] "TEAM.Red.Zone..Success.."            
##  [95] "OPP.Red.Zone..Success.."             
##  [96] "TEAM.Red.Zone..Attempts"             
##  [97] "TEAM.Red.Zone.Scores"                
##  [98] "OPP.Red.Zone..Attempts"              
##  [99] "OPP.Red.Zone.Scores"                 
## [100] "TEAM.Field.Goals..Success.."         
## [101] "OPP.Field.Goals..Success.."          
## [102] "TEAM.Field.Goals..Attempts"          
## [103] "TEAM.Field.Goals.Made"               
## [104] "OPP.Field.Goals..Attempts"           
## [105] "OPP.Field.Goals.Made"                
## [106] "TEAM.PAT.Kicking..Success.."         
## [107] "OPP.PAT.Kicking..Success.."          
## [108] "TEAM.PAT.Kicking..Attempts"          
## [109] "TEAM.PAT.Kicking.Made"               
## [110] "OPP.PAT.Kicking..Attempts"           
## [111] "OPP.PAT.Kicking.Made"                
## [112] "TEAM.2.Point.Conversions..Success.." 
## [113] "OPP.2.Point.Conversions..Success.."  
## [114] "TEAM.2.Point.Conversions..Attempts"  
## [115] "TEAM.2.Point.Conversions.Made"       
## [116] "OPP.2.Point.Conversions..Attempts"   
## [117] "OPP.2.Point.Conversions.Made"

That’s 117 fields for all 128 FBS teams - quite a lot of data.

Measuring their statistics

Let’s take a look at the specific data menioned in the article and who well it correlates with wins and losses.

Third down conversion rate

wl = df$Wins/(df$Wins + df$Losses)
other = as.numeric(sub("%","",df$TEAM.3rd.Down.Conversion..))
cor(wl,other)
## [1] 0.5325356
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.3749  -2.6549  -0.0299   3.0927  12.8619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   32.938      1.137  28.977  < 2e-16 ***
## wl            14.376      2.036   7.062  9.8e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.72 on 126 degrees of freedom
## Multiple R-squared:  0.2836, Adjusted R-squared:  0.2779 
## F-statistic: 49.88 on 1 and 126 DF,  p-value: 9.803e-11

A correlation of 0.53 with an F-Stat of about \(10^{-10}\).

Average yards per pass attempt

other = df$TEAM.Passing..Yards/df$TEAM.Passing..Attempts
cor(wl,other)
## [1] 0.5759094
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7375 -0.6068 -0.0883  0.5163  3.5318 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.7423     0.2285  25.128  < 2e-16 ***
## wl            3.2360     0.4092   7.908 1.14e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.949 on 126 degrees of freedom
## Multiple R-squared:  0.3317, Adjusted R-squared:  0.3264 
## F-statistic: 62.53 on 1 and 126 DF,  p-value: 1.144e-12

Turnover margin

other = (df$OPP.Fumbles.Lost-df$TEAM.Fumbles.Lost) + (df$OPP.Passing.Interceptions-df$TEAM.Passing.Interceptions)
cor(wl,other)
## [1] 0.5944906
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.7579  -4.0460   0.3465   3.8089  16.2858 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -10.524      1.410  -7.465 1.20e-11 ***
## wl            20.951      2.525   8.299 1.38e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.855 on 126 degrees of freedom
## Multiple R-squared:  0.3534, Adjusted R-squared:  0.3483 
## F-statistic: 68.87 on 1 and 126 DF,  p-value: 1.38e-13

Rushing attempts

other = df$TEAM.Rushing..Attempts
cor(wl,other)
## [1] 0.5782156
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -169.72  -45.21   -5.13   33.83  258.28 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   381.76      17.11  22.317  < 2e-16 ***
## wl            243.69      30.63   7.955 8.86e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 71.04 on 126 degrees of freedom
## Multiple R-squared:  0.3343, Adjusted R-squared:  0.3291 
## F-statistic: 63.28 on 1 and 126 DF,  p-value: 8.861e-13

Some of the things that the article indicates are less important

Total offense (by yards/play)

other = df$TEAM.Total.Offense..Yards...Play
cor(wl,other)
## [1] 0.5968873
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.43356 -0.42254  0.02848  0.36445  1.44780 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.7774     0.1336  35.747  < 2e-16 ***
## wl            1.9986     0.2393   8.351 1.04e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.555 on 126 degrees of freedom
## Multiple R-squared:  0.3563, Adjusted R-squared:  0.3512 
## F-statistic: 69.74 on 1 and 126 DF,  p-value: 1.04e-13
other = df$OPP.Total.Offense..Yards...Play
cor(wl,other)
## [1] -0.5384097
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21166 -0.42290 -0.02172  0.36407  1.30834 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.5416     0.1308  50.016  < 2e-16 ***
## wl           -1.6798     0.2342  -7.172 5.57e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5431 on 126 degrees of freedom
## Multiple R-squared:  0.2899, Adjusted R-squared:  0.2842 
## F-statistic: 51.44 on 1 and 126 DF,  p-value: 5.566e-11
other = df$OPP.Total.Offense..Yards...Play
cor(wl,other)
## [1] -0.5384097
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21166 -0.42290 -0.02172  0.36407  1.30834 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.5416     0.1308  50.016  < 2e-16 ***
## wl           -1.6798     0.2342  -7.172 5.57e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5431 on 126 degrees of freedom
## Multiple R-squared:  0.2899, Adjusted R-squared:  0.2842 
## F-statistic: 51.44 on 1 and 126 DF,  p-value: 5.566e-11

A couple more

Here’s the deal - if you want to win games, then score points.

other = df$TEAM.Total.Points
cor(wl,other)
## [1] 0.7884184
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -138.857  -36.073   -3.224   39.257  189.212 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   182.89      14.97   12.22   <2e-16 ***
## wl            385.63      26.80   14.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 62.16 on 126 degrees of freedom
## Multiple R-squared:  0.6216, Adjusted R-squared:  0.6186 
## F-statistic:   207 on 1 and 126 DF,  p-value: < 2.2e-16

Keeping the other team from scoring points seems less important.

other = df$OPP.Total.Points
cor(wl,other)
## [1] -0.5697005
plot(wl,other)
fit = lm(other~wl)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = other ~ wl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -138.042  -45.520   -4.827   41.272  159.974 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   466.98      15.06  31.000  < 2e-16 ***
## wl           -209.90      26.98  -7.781 2.25e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 62.56 on 126 degrees of freedom
## Multiple R-squared:  0.3246, Adjusted R-squared:  0.3192 
## F-statistic: 60.54 on 1 and 126 DF,  p-value: 2.251e-12