Estimating win probabilities

The basic question

Since it’s tournament time, we’ve got 63 (or 67 or 134) opportunities coming up to ask the question:

What’s the probability that this team beats that team?

Kaggle

Personally, I’d love to be able to create submissions to the Kaggle NCAA contest. Doing so, though, is a multi-faceted project requiring

The collection, curation, and formatting of data,
The analysis of that data to assess team strengths and weaknesses,
The translation of those strengths, weaknesses, and potentially other factors to gametime probabilities, and
Extension of those individual probabilities to the whole tournament, possibly via simulation.

Simulation too

Note that simulation itself is an interesting and important topic. CBS Sports just posted their upset predictions, which claims to be built on 10,000 “simulations” of the tournament. Here’s my Stat 185 level explanation of simulation.

On this webpage, though, let’s focus on one particular aspect of that process that requires some knowledge of probability theory and the normal distribution. We specifically consider the following question:

Suppose we have a list of teams and we have a numerical rating associated with each team intended to indicate team strength. How can we use those ratings to answer our fundamental question: What’s the probability that this team beats that team?

Example ratings

To illustrate what we mean by ratings, consider the list of ACC teams on the next page.

	massey_rating	eigen_rating
TeamName
North Carolina	21.521838	10.199467
Duke	21.099904	9.202866
Clemson	15.345840	8.917379
Wake Forest	15.875161	8.233685
Virginia	11.425055	8.198499
Pittsburgh	14.821228	7.885828
Syracuse	8.640870	7.826167
Florida St	10.256914	7.738343
Virginia Tech	12.175809	7.629896
NC State	10.859435	7.550299
Boston College	9.982552	7.517799
Georgia Tech	4.970889	7.003241
Miami FL	9.968956	6.723686
Notre Dame	5.368068	6.464464
Louisville	0.778522	4.792415

The ratings data

For each team, we also see a couple numerical ratings - the Massey rating and the so-called eigen-rating. There are only a couple of important points for us, though:

Both ratings are numeric measures of a team’s strength relative to other teams that play within the same league.
Roughly, the difference between two Massey ratings indicates the expected score difference if the teams were to meet again. The list above, for example, suggest that UNC would be expected to beat Louisville by 20+ points. You can find Ken Massey’s ratings for all kinds of sports on his website.
The eigen-rating is based on Google’s page rank algorithm.

Just last night, I used Kaggle data to compute these ratings for all 362 Men’s teams in Division 1. The ratings above are exactly those restricted to the ACC. How to compute these types of ratings is a fun and important topic in it’s own right. For now, though, we focus on one particular question:

Given reasonable team ratings, how might we compute the probability that this team beats that team?

Assessing win probabilities

Given two teams with ratings \(R_1\) and \(R_2\), we might expect the proability that team 1 beats team 2 to depend upon the difference \(R_1-R_2\). In order to appropriately assess probabilities associated with this quantity, we should examine its distribution. Let’s take a look at a histogram of the symmetric pairwise differences of the eigen-ratings of all 362 Division 1 teams:

Using it

Hey - that looks normal! In fact, the bell-shaped curve in the figure is exactly the normal curve with mean \(\mu=0\) and standard deviation \(\sigma=2.66758\), in agreement with the data. Note that the mean has to be zero because the way that the data is formed.

Now, suppose we want to use the eigen-ratings to compute the probability that UNC beats NC State. To do so, let \(R_1 = 10.199467\), let \(R_2 = 7.550299\), and suppose that \(X\) is normally distributed with mean \(\mu=0\) and standard deviation \(\sigma=2.66758\). I guess we could express the probability that we want as \[ P(X < R_1-R_2). \]

Finishing up the computation

Computing the \(Z\)-score for \(R_1-R_2\), we get \[ Z = \frac{10.199467 - 7.550299}{2.66758} = \frac{2.649168}{2.66758} \approx 0.9931. \] Looking this up in a standard normal table or using a normal calculator, we get \(0.839669\) or about \(84\%\).

Of course, that’s not what happened this past weekend but, that’s why we love sports!