Estimating win probabilities¶

Since it's tournament time, we've got 63 (or 67 or 134) opportunities coming up to ask the question:

What's the probability that this team beats that team?

Ultimately, we'd love to be able to create submissions to the Kaggle NCAA contest. Of course, doing so is a multi-faceted project requiring

The collection, curation, and formatting of data,
The analysis of that data to assess team strengths and weaknesses,
The translation of those strengths, weaknesses, and potentially other factors to gametime probabilities.

In this notebook, we focus on one particular aspect of that process that requires some knowledge of probability theory and the normal distribution. We specifically consider the following question:

Suppose we have a list of teams and we have a numerical rating associated with each team intended to indicate team strength. How can we use those ratings to answer our fundamental question: What's the probability that this team beats that team?

Example ratings¶

To illustrate what we mean by ratings, consider the following list of ACC teams:

Team	Record	Rating
Duke	16-4	0.80
UNC	15-5	0.75
Notre Dame	15-5	0.75
Miami FL	14-6	0.70
Wake Forest	13-7	0.65
Virginia	12-8	0.60
Virginia Tech	11-9	0.55
Florida State	10-10	0.50
Syracuse	9-11	0.45
Clemson	8-12	0.40
Louisville	6-14	0.30
Boston College	6-14	0.30
Pittsburgh	6-14	0.30
Georgia Tech	5-15	0.25
NC State	4-16	0.20

For each team, we see three items:

The team name,
the teams' ACC win/loss record,
and a numerical rating.

In this particular example, the rating is simply the team's winning percentage. While quite simple, this can work reasonably well, when the teams in the list all play one another multiple times.

Again, the question is: Given a pair of teams how might we assess the probability that one team beats the other based on this winninng percentage. For example, Duke has a winning percentage of 0.8 and NC State has winning percentage of 0.2. Based on that - what should be our assessment of the probability that Duke would beat NC State, if they were to play again?

FiveThirtyEight ratings¶

The win/loss rating above is just meant to be a simple illustration. While it can work fairly well in small, isolated examples, it's not likely to work well in larger, more complicated examples. If we take a close look at this year's tournament, we might notice that Winthrop is 23-10, while UNC is 18-10. Thus, Winthrop has a higher winning percentage and, therefore, a higher rating based on winning percentage alone. It's easy to find these kinds of examples since most games during the season are within conferences, rather than between conferences. Thus, Winthrop obtained most of its 23 wins by defeating Big South teams, rather than ACC teams.

Before working on win probabilities, let's build on someone elses work to find team ratings. Specifically, let's use FiveThirtyEight's NCAA Forcast.

Note: We are not going to simply copy probabilities from the interactive bracket; rather we're going to use the Power Rating column we can find by switching to the table. The power rating is also contained in this downloadable CSV file. Here's the relevant part of the whole table sorted by Power Rating:

team_name	team_rating
Gonzaga	96.47
Kansas	91.72
Kentucky	91.23
Arizona	91.39
Auburn	89.60
Villanova	90.22
Purdue	89.44
Iowa	88.55
Tennessee	88.55
UCLA	89.84
Houston	88.15
Duke	89.34
Texas Tech	88.70
Baylor	87.92
Illinois	87.07
Louisiana State	85.72
Arkansas	86.78
Wisconsin	84.64
Texas	86.31
Connecticut	86.45
Alabama	85.15
Michigan	84.73
Virginia Tech	84.68
North Carolina	83.99
Ohio State	84.17
Memphis	85.47
Saint Mary's (CA)	84.32
Loyola (IL)	83.70
Providence	82.72
Indiana	83.01
Michigan State	83.49
San Diego State	83.49
Southern California	83.43
Texas Christian	81.92
Seton Hall	82.84
Marquette	81.92
Boise State	82.49
Creighton	81.49
Murray State	81.37
Davidson	81.92
Alabama-Birmingham	81.15
San Francisco	83.00
Miami (FL)	81.16
Iowa State	80.59
Colorado State	81.65
Notre Dame	81.64
Rutgers	81.12
Richmond	79.71
South Dakota State	79.68
Vermont	80.32
Chattanooga	78.68
Wyoming	78.49
New Mexico State	77.67
Colgate	76.39
Akron	76.67
Saint Peter's	74.19
Yale	74.44
Montana State	74.31
Jacksonville State	73.16
Wright State	73.50
Longwood	73.41
Delaware	73.58
Georgia State	73.48
Norfolk State	71.42
Texas Southern	70.37
Cal State Fullerton	71.79
Bryant	71.56
Texas A&M-Corpus Christi	67.32

Note: If you are curious about where these types of ratings might come from in the first place, you can read FiveThirtyEight's well-documented methodology. You can also read about my variation on the Page Rank algorithm that I use to build my brackets.

Again, though, we are focused on turning the ratings into win probabilities at the moment.

A naive approach to probability¶

Here is, perhaps, the simplest way to translate ratings into win probabilities: Suppose that Team 1 has rating $R_1$ and that Team 2 has rating $R_2$. Let $P_{12}$ denote the probability that Team 1 defeats Team 2 and let $P_{21}$ denote the probability that Team 2 defeates Team 1. Then we might suppose that $$ P_{12} = \frac{R_{1}}{R_1 + R_2} \text{ and } P_{21} = \frac{R_2}{R_1 + R_2}. $$ This looks good in that it at least obeys the laws of probability. That is,

$0 \leq P_{ij} \leq 1$
$R_1 < R_2 \implies P_{12} < \frac{1}{2} < P_{21}$
$P_{12} + P_{21} = 1$.

That all looks pretty good! If we examine some particular cases, though, we'll see that it's not quite strong enough. Let's take a look, for example, at the probability that the highest rated team (Gonzaga) defeats the lowest rated team (Texas A&M Corpus Christie):

team_name	team_rating
Gonzaga	96.47
Texas A&M CC	67.32

Using those ratings, we have $$ \frac{96.47}{96.47 + 67.32} \approx 0.58898. $$ Well, it certainly seems like Gonzaga has a much better chance of winning that game than that!!

A normal approach¶

Generally, we might expect $P_{12}$ to depend upon the difference $R_1-R_2$. In order to appropriately assess probabilities associated with this quantity, we should examine its distribution. Let's take a look at a histogram of the pairwise differences of the ratings:

Hey - this looks normally distributed! So, to compute $P_{12}$, let's first compute $R_1 - R_2$, and then assess $$ P(X < R_1-R_2), $$ where $X$ is normally distributed with mean and standard deviation determined by the pairwise difference data.

Not surprisingly, the mean of the pairwise differences is zero; it's really been constructed that way. The standard deviation (for this particular data set) is about $8.5549$.

Now, let's reconsider the probability that Gonzaga beats T A&M CC. Recall that the ratings are

Gonzaga: $R_1 = 96.47$ and
Tex A&M CC: $R_2 = 67.32$.

We can now compute a $Z$-score: $$ Z = \frac{96.47 - 67.32}{8.5549} \approx 3.40. $$ If we plug that into our normal probability calculator, we find that we get a probabilty of over $0.999$ - which certainly seems much more believable!