Estimating win probabilities

Since it's tournament time, we've got 63 (or 67 or 134) opportunities coming up to ask the question:

What's the probability that this team beats that team?

Ultimately, we'd love to be able to create submissions to the Kaggle NCAA contest. Of course, doing so is a multi-faceted project requiring

In this notebook, we focus on one particular aspect of that process that requires some knowledge of probability theory and the normal distribution. We specifically consider the following question:

Suppose we have a list of teams and we have a numerical rating associated with each team intended to indicate team strength. How can we use those ratings to answer our fundamental question: What's the probability that this team beats that team?

Example ratings

To illustrate what we mean by ratings, consider the following list of ACC teams:

Team Record Rating
Duke 16-4 0.80
UNC 15-5 0.75
Notre Dame 15-5 0.75
Miami FL 14-6 0.70
Wake Forest 13-7 0.65
Virginia 12-8 0.60
Virginia Tech 11-9 0.55
Florida State 10-10 0.50
Syracuse 9-11 0.45
Clemson 8-12 0.40
Louisville 6-14 0.30
Boston College 6-14 0.30
Pittsburgh 6-14 0.30
Georgia Tech 5-15 0.25
NC State 4-16 0.20

For each team, we see three items:

In this particular example, the rating is simply the team's winning percentage. While quite simple, this can work reasonably well, when the teams in the list all play one another multiple times.

Again, the question is: Given a pair of teams how might we assess the probability that one team beats the other based on this winninng percentage. For example, Duke has a winning percentage of 0.8 and NC State has winning percentage of 0.2. Based on that - what should be our assessment of the probability that Duke would beat NC State, if they were to play again?

FiveThirtyEight ratings

The win/loss rating above is just meant to be a simple illustration. While it can work fairly well in small, isolated examples, it's not likely to work well in larger, more complicated examples. If we take a close look at this year's tournament, we might notice that Winthrop is 23-10, while UNC is 18-10. Thus, Winthrop has a higher winning percentage and, therefore, a higher rating based on winning percentage alone. It's easy to find these kinds of examples since most games during the season are within conferences, rather than between conferences. Thus, Winthrop obtained most of its 23 wins by defeating Big South teams, rather than ACC teams.

Before working on win probabilities, let's build on someone elses work to find team ratings. Specifically, let's use FiveThirtyEight's NCAA Forcast.

Note: We are not going to simply copy probabilities from the interactive bracket; rather we're going to use the Power Rating column we can find by switching to the table. The power rating is also contained in this downloadable CSV file. Here's the relevant part of the whole table sorted by Power Rating:

team_name team_rating
Gonzaga 96.47
Kansas 91.72
Kentucky 91.23
Arizona 91.39
Auburn 89.60
Villanova 90.22
Purdue 89.44
Iowa 88.55
Tennessee 88.55
UCLA 89.84
Houston 88.15
Duke 89.34
Texas Tech 88.70
Baylor 87.92
Illinois 87.07
Louisiana State 85.72
Arkansas 86.78
Wisconsin 84.64
Texas 86.31
Connecticut 86.45
Alabama 85.15
Michigan 84.73
Virginia Tech 84.68
North Carolina 83.99
Ohio State 84.17
Memphis 85.47
Saint Mary's (CA) 84.32
Loyola (IL) 83.70
Providence 82.72
Indiana 83.01
Michigan State 83.49
San Diego State 83.49
Southern California 83.43
Texas Christian 81.92
Seton Hall 82.84
Marquette 81.92
Boise State 82.49
Creighton 81.49
Murray State 81.37
Davidson 81.92
Alabama-Birmingham 81.15
San Francisco 83.00
Miami (FL) 81.16
Iowa State 80.59
Colorado State 81.65
Notre Dame 81.64
Rutgers 81.12
Richmond 79.71
South Dakota State 79.68
Vermont 80.32
Chattanooga 78.68
Wyoming 78.49
New Mexico State 77.67
Colgate 76.39
Akron 76.67
Saint Peter's 74.19
Yale 74.44
Montana State 74.31
Jacksonville State 73.16
Wright State 73.50
Longwood 73.41
Delaware 73.58
Georgia State 73.48
Norfolk State 71.42
Texas Southern 70.37
Cal State Fullerton 71.79
Bryant 71.56
Texas A&M-Corpus Christi 67.32

Note: If you are curious about where these types of ratings might come from in the first place, you can read FiveThirtyEight's well-documented methodology. You can also read about my variation on the Page Rank algorithm that I use to build my brackets.

Again, though, we are focused on turning the ratings into win probabilities at the moment.

A naive approach to probability

Here is, perhaps, the simplest way to translate ratings into win probabilities: Suppose that Team 1 has rating $R_1$ and that Team 2 has rating $R_2$. Let $P_{12}$ denote the probability that Team 1 defeats Team 2 and let $P_{21}$ denote the probability that Team 2 defeates Team 1. Then we might suppose that $$ P_{12} = \frac{R_{1}}{R_1 + R_2} \text{ and } P_{21} = \frac{R_2}{R_1 + R_2}. $$ This looks good in that it at least obeys the laws of probability. That is,

That all looks pretty good! If we examine some particular cases, though, we'll see that it's not quite strong enough. Let's take a look, for example, at the probability that the highest rated team (Gonzaga) defeats the lowest rated team (Texas A&M Corpus Christie):

team_name team_rating
Gonzaga 96.47
Texas A&M CC 67.32

Using those ratings, we have $$ \frac{96.47}{96.47 + 67.32} \approx 0.58898. $$ Well, it certainly seems like Gonzaga has a much better chance of winning that game than that!!

A normal approach

Generally, we might expect $P_{12}$ to depend upon the difference $R_1-R_2$. In order to appropriately assess probabilities associated with this quantity, we should examine its distribution. Let's take a look at a histogram of the pairwise differences of the ratings:

Hey - this looks normally distributed! So, to compute $P_{12}$, let's first compute $R_1 - R_2$, and then assess $$ P(X < R_1-R_2), $$ where $X$ is normally distributed with mean and standard deviation determined by the pairwise difference data.

Not surprisingly, the mean of the pairwise differences is zero; it's really been constructed that way. The standard deviation (for this particular data set) is about $8.5549$.

Now, let's reconsider the probability that Gonzaga beats T A&M CC. Recall that the ratings are

We can now compute a $Z$-score: $$ Z = \frac{96.47 - 73.48}{8.5549} = 2.6873. $$ If we plug that into our normal probability calculator, we find that we get a probabilty of over $0.999$ - which certainly seems much more believable!