Eigen-ratings

Intro

This web page outlines a technique based in linear algebra to rate teams playing games in a common league based on a matrix describing game results. In that sense, it’s somewhat like the Massey ratings that we discussed earlier.

Like Massey ratings, the eigenrating technique sets up a matrix based on game results between the teams. This matrix, though, represents paired comparisons between the teams and the technique uses the dominant eigenvector of that matrix to obtain ratings. There’s a fair amount of flexibility in the setup of that matrix allowing for the possibility of optimimization.

I learned of the technique from James Keener’s 1993 paper in SIAM Review “The Perron-Frobenius theorem and the ranking of football teams” and the presentation here is strongly influenced by that paper. I also consulted Google’s Pagerank and Beyond by Langville and Meyer, as the technique is closely related Google’s original pagerank algorithm.

Imports

There’s a fair amount of code on this page and, in fact, you’ll need to copy some of it, paste it into your own Colab notebook and use it to complete the lab over this material. Let’ go ahead an import the libraries we’ll be using:

import numpy as np
import pandas as pd
import graphviz as gv 

from scipy.linalg import eig
from scipy.sparse import dok_matrix
from scipy.sparse.linalg import eigs

from sklearn.linear_model import LogisticRegression

Description of the algorithm

Let’s start with a simple example to motivate the technique and then describe the general algorithm as an eigenvector problem.

A simple working example

Let’s suppose that three competitors or teams (numbered zero, one, and two) compete against each other twice each in some type of contest - football, basketball, tiddly winks, what have you. How might we rate them based on the results that we see? For concreteness, suppose the results are the following:

Team zero beat team one twice and team two once,
Team one beat team two once,
Team two beat team zero and team one once each.

We might represent this diagrammatically as follows:

This configuration is called a directed graph or digraph. We construct it by placing an edge from team \(i\) to team \(j\) and labeling that edge with the number of times that team \(i\) beat team \(j\). We suppress zero edges. There an obvious matrix associated with a digraph called the adjacency matrix. The rows and columns will be indexed by the teams. In row \(i\) and column \(j\), we place the label on the edge from team \(i\) to team \(j\). The adjacency matrix for this digraph is

\[\left(\begin{matrix} 0 & 2 & 1 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \end{matrix}\right)\]

It seems reasonably clear that team zero should be ranked the highest, having beaten both competitors a total of 3 times out of 4 possible tries. Team one, on the other hand won only 1 game out of it’s four tries, while team two seems to be in the middle, having split it’s game with both competitors. Certainly, the teams listed from worst to first should be: \(1\), \(2\), \(0\).

Eigen-formulation

It turns out there’s a lovely way to recover this exact ranking using the eigensystem of the adjacency matrix associated with the directed graph. In general, suppose we have \(N\) participants in our collection of contests. We also suppose there is a vector \(\vec{r}\) of ratings with positive entries \(r_j\). Conceptually, \(r_j\) represents the actual strength of the \(j^{\text{th}}\) competitor. We wish to assign a positive score \(s_i\) to each competitor. If competitor \(i\) beats competitor \(j\), we expect that to contribute positively to the score of competitor \(i\). Furthermore, the stronger competitor \(j\) was, the more we expect the contribution to be. Symbolically, we might write:

\[s_i = \frac{1}{n_i}\sum_{j=1}^N a_{ij}r_j.\]

Thus, \(s_i\) is a linear combination of the strengths of it’s opponents. The normalization factor \(n_i\) is the number of games that team \(i\) played; we include it because we don’t want a team’s rating to be higher simply because it played more games.

A key issue, of course, is how should the matrix of coefficients \(A=(a_{ij})\) be chosen? Certainly, if team \(i\) defeated team \(j\) every time they played (there might be more than one contest between the two), then we expect \(a_{ij}>0\). Beyond that, there’s a lot of flexibility and the precise choice is one for experimentation and (hopefully) optimization. In the simple approach that follows, we’ll take \(a_{ij}\) to simply be the number of times that team \(i\) defeated team \(j\).

Finally, it seems reasonable to hope that our score \(s_i\) of the \(i^{\text{th}}\) team should be related to the actual strength \(r_i\) of that team. Let’s assume that they are, in fact, proportional: \(\vec{s} = \lambda \vec{r}\). Put another way, we have \[A\vec{r} = \lambda\vec{r}.\] That is, the actual strengths vector \(\vec{r}\) is an eigen-vector of \(A\)!

The Perron-Frobenius theorem

At this point, we turn to some mathematical theory to guide us in our choice of eigenvector.

Theorem: If the square matrix \(A\) has nonnegative entries, then it has an eigenvector \(\vec{r}\) with nonnegative entries corresponding to a positive eigenvalue \(\lambda\). If \(A\) is irreducible, then \(\vec{r}\) has strictly positive entries, is unique, simple, and the corresponding eigenvalue is the one of largest absolute value.

This all seems good because we certainly want a vector of positive ratings and the theorem tells us which one to choose. This eigenvalue/eigenvector pair is sometimes called dominant.

To some readers, the notion of an irreducible matrix is quite possibly new. There are a number of equivalent characterizations including:

\(A\vec{v}>0\), whenever \(\vec{v}>0\),
There is no permutation of the rows and columns of \(A\) transforming the matrix into a block matrix of the form \[\left(\begin{matrix} A_{11} & A_{12} \\ 0 & A_{22} \end{matrix}\right),\] where \(A_{11}\) and \(A_{22}\) are square.
The associated digraph is strongly connected, i.e. there is a path from any vertex to any other.

I find the last characterization easiest to work with and it’s easy to believe that it’s likely to be satisfied, if teams play each other enough. Even if the matrix is not irreducible, the eigen-rating approach can work. If not, it’s often sufficient to work with the appropriate strongly connected component of the digraph.

The simple example revisited in code

Recall that the adjacency matrix associated with our simple example, written in code, is:

M = [
    [0,2,1],
    [0,0,1],
    [1,1,0]
]

According to the theory, this should have a unique positive eigenvalue whose magnitude is larger than the magnitude of any other eigenvalue. There should be an associated eigenvector with all positive entries. Of course, if \(\vec{v}\) is an eigenvector, then so is \(-\vec{v}\) (or any constant multiple). The theory tells us that we might as well just take the absolute value.

Here’s the eigensystem for our simple example:

values, vectors = eig(M)
print(values)
print(vectors)

[-0.88464618+0.58974281j -0.88464618-0.58974281j  1.76929235+0.j        ]
[[-0.63083491+0.j         -0.63083491-0.j         -0.72356278+0.j        ]
 [ 0.25319498-0.46743038j  0.25319498+0.46743038j -0.33963778+0.j        ]
 [ 0.05167574+0.56283042j  0.05167574-0.56283042j -0.60091853+0.j        ]]

The result is a pair: a list of the eigenvalues and a matrix whose columns are the eigenvectors. Note that the last displayed eigenvalue, about \(1.769\), has zero imaginary part and is clearly larger in absolute value than the other two, which are complex conjugates. The corresponding eigenvector has components all with the same sign. The absolute value of that eigenvector should be reasonable strengths associated with the teams, approximately: \[0.7235, \; 0.3396, \; 0.6009.\] As expected, the zeroth team is the strongest, while the middle team is the weakest.

Working with real data

Let’s take a look now at some examples involving real data.

The Big South

I’ve got the results of every Big South regular season game played this year; we can grab that data like so:

games = pd.read_csv('https://marksmath.org/data/BigSouth2025RegularSeason.csv')
games

	WTeamName	WScore	LTeamName	LScore
0	Winthrop	95	SC_Upstate	76
1	Presbyterian	68	Longwood	60
2	High_Point	76	Radford	58
3	Charleston_So	72	Gardner_Webb	63
4	Radford	87	Winthrop	67
...	...	...	...	...
67	Winthrop	85	Longwood	59
68	Winthrop	103	UNC_Asheville	90
69	Longwood	83	SC_Upstate	66
70	Presbyterian	68	Gardner_Webb	57
71	Radford	76	Charleston_So	60

72 rows × 4 columns

Here are the teams. This code is slightly more complicated than necessary, since it accounts for the possibility that some teams didn’t win any games.

wteams = games.WTeamName
lteams = games.LTeamName
teams = pd.concat([wteams, lteams], names=['TeamName']
  ).drop_duplicates().sort_values()
teams = list(teams.to_numpy().flat)
teams = pd.DataFrame({'team': teams})
teams

	team
0	Charleston_So
1	Gardner_Webb
2	High_Point
3	Longwood
4	Presbyterian
5	Radford
6	SC_Upstate
7	UNC_Asheville
8	Winthrop

Now, let’s iterate through the games and construct the simple win matrix described above:

# Consruct dict to look up idx from team name
teams['idx'] = teams.index
team_to_idx_dict = teams.set_index('team')['idx'].to_dict()
teams = teams.drop('idx', axis=1)

# Set up the matrix
n = len(teams)
M = dok_matrix((n,n))
for game in games.iterrows():
    i = team_to_idx_dict[game[1].WTeamName]
    j = team_to_idx_dict[game[1].LTeamName]
    M[i,j] = M[i,j] + 1

# Display it
M = M.toarray()
M

array([[0., 2., 0., 1., 1., 1., 1., 0., 0.],
       [0., 0., 0., 1., 1., 0., 2., 0., 1.],
       [2., 2., 0., 1., 2., 2., 2., 1., 2.],
       [1., 1., 1., 0., 0., 1., 2., 1., 0.],
       [1., 1., 0., 2., 0., 1., 1., 1., 0.],
       [1., 2., 0., 1., 1., 0., 2., 1., 1.],
       [1., 0., 0., 0., 1., 0., 0., 0., 0.],
       [2., 2., 1., 1., 1., 1., 2., 0., 1.],
       [2., 1., 0., 2., 2., 1., 2., 1., 0.]])

For a league of moderate size, it might make sense to visualize this data with a directed graph:

dot = gv.Digraph()
dot.graph_attr.update({'size': '8,5!'})
for i in range(len(M)):
    wteam = teams.iloc[i].team
    for j in range(len(M)):
        if(M[i,j] > 0):
            lteam = teams.iloc[j].team
            dot.edge(wteam, lteam, label=str(int(M[i,j])))
dot

Once we have the matrix, it’s pretty easy to compute the dominant eigenvector and display the teams ranked by rating:

value, vector = eigs(M, k=1, which = 'LM')
vector = abs(np.ndarray.flatten(vector.real))
teams['rating'] = vector
teams.sort_values('rating', ascending=False)

	team	rating
2	High_Point	0.561979
7	UNC_Asheville	0.427111
8	Winthrop	0.403688
5	Radford	0.324317
3	Longwood	0.285367
4	Presbyterian	0.275199
0	Charleston_So	0.202649
1	Gardner_Webb	0.172820
6	SC_Upstate	0.074218

That looks like a reasonable ranking, at least!

Comments

There’s a lot going on here and plenty of things worth pointing out.

The matrix \(M\)

The first thing to mention is the matrix \(M\) itself. To this point, we’ve used only the simplest possible formulation where \([M]_{ij}\) is simply the number of times that team \(i\) defeated team \(j\). We didn’t even need to normalize the rows by the number of games played, since each team played the same number of games.

It seems easy to believe that we could pick the matrix entries in a way that improves the ratings. Perhaps we could take the score into acount, for example? Perhaps we could weight more recent games more heavily or take the home court advantage into account? There are ton of possibilities that, ultimately, should be chosen by optimization.

The computer commands

The commands that construct the matrix and compute the eigenvalue are of interest, as well. I’m referring specifically to the following lines of code:

M = dok_matrix((n,n))
value, vector = eigs(M, k=1, which = 'LM')

The matrix M is stored as a sparse matrix, meaning that a high proportion of its entries are zero. For a relatively small problem like this, the sparseness is not really such a big deal. As the size of the matrix grows, utilizing the sparse structure of the matrix becomes more and more important. Truly large sparse matrices can be stored in much less memory than dense matrices. In addition, algorithms can be specialized for sparse matrices to avoid unnecessary computation and speed things up considerably.

The eigenvalue and eigenvector are computed using the eigs command, which is specifically designed for sparse matrices. The arguments k=1 and which='LM' indicate that we need only 1 eigenvector and that we specifically want the one whose eigenvalue as the Largest Magnitude - the so-called dominant eigenvector.

Power iteration

A common algorithm to compute just the dominant eigenvector is called power iteration.

Power iteration shold work with any matrix \(A\) having a dominant eigenvector corresponding to a simple, positive eigenvalue. To appy it, simply pick a vector \(\mathbf{x}_0\) and iterate the function \[ \mathbf{x} \to \frac{A\mathbf{x}}{\|\mathbf{x}\|}. \] As long as the initial vector \(\mathbf{x}_0\) lies outside any other eigenspace, the process will converge to the dominant eigenvector and its magnitude will be the dominant eigenvalue. The rate of convergence should be exponential and the base is the ratio of the dominant eigenvalue divided by the magnitude of the next largest eigenvalue.

This works is because multiplication by the matrix \(A\) stretches space out the most in the direction of the dominant eigenvector.

Power iteration is super easy to implement and we can check that it works by comparing to the team ratings generated by SciPy:

x = np.ones(len(M))
for i in range(50):
    x = M.dot(x)/np.linalg.norm(x)
x/np.linalg.norm(x) - teams.rating

0    8.326673e-17
1   -5.551115e-17
2   -1.110223e-16
3    1.110223e-16
4    0.000000e+00
5    5.551115e-17
6   -1.942890e-16
7    5.551115e-17
8   -1.110223e-16
Name: rating, dtype: float64

Making predictions

OK, let’s try to use those ratings to make actual predictions for the upcoming Big South Tournament. For example, UNCA plays Charleston Southern ~~this Friday night~~ tonight! We can look up their ratings in our list of teams:

teams.loc[[7,0]]

	team	rating
7	UNC_Asheville	0.427111
0	Charleston_So	0.202649

Our question is, how can we translate those ratings to a probabilistic prediction?

One answer, of course, is to take get some data and apply logistic regression!

Data

Using Kaggle data, I’ve assembled fourteen years worth of Big South games. You can grab it and examine like so:

df = pd.read_csv(
    'https://marksmath.org/data/big_south_games_with_ratings.csv')

The data is not quite ready for logistic regression. We should

label each game with an Outcome of 1 indicating that the first team won the game,
flip the data and label the resulting games with an Outcome of 0,
concatenate those two data frames to obtain an expanded, symmetric data frame, and compute a RatingDiff for each game.

df["Outcome"] = 1
df_flipped = df.copy()
df_flipped["Outcome"] = 0 
df_flipped["WTeamRating"], df_flipped["LTeamRating"] = df["LTeamRating"], df["WTeamRating"]
df_expanded = pd.concat([df, df_flipped], ignore_index=True)

df_expanded["RatingDiff"] = df_expanded["WTeamRating"] - df_expanded["LTeamRating"]
df_expanded

	Season	WTeamID	WTeamName	WScore	LTeamID	LTeamName	LScore	tourney_game	WTeamRating	LTeamRating	Outcome	RatingDiff
0	2011	1219	High Point	66	1205	Gardner Webb	64	False	0.219615	0.190593	1	0.029022
1	2011	1251	Liberty	70	1342	Presbyterian	61	False	0.410214	0.218673	1	0.191541
2	2011	1421	UNC Asheville	70	1347	Radford	50	False	0.353151	0.070308	1	0.282843
3	2011	1457	Winthrop	88	1440	VMI	82	False	0.301116	0.303004	1	-0.001888
4	2011	1157	Coastal Car	73	1149	Charleston So	71	False	0.558662	0.274054	1	0.284608
...	...	...	...	...	...	...	...	...	...	...	...	...
3395	2024	1255	Longwood	80	1219	High Point	79	True	0.525855	0.274761	0	0.251094
3396	2024	1255	Longwood	80	1219	High Point	79	True	0.525855	0.274761	0	0.251094
3397	2024	1421	UNC Asheville	83	1205	Gardner Webb	72	True	0.439125	0.442608	0	-0.003483
3398	2024	1421	UNC Asheville	83	1205	Gardner Webb	72	True	0.439125	0.442608	0	-0.003483
3399	2024	1255	Longwood	85	1421	UNC Asheville	59	True	0.442608	0.274761	0	0.167847

3400 rows × 12 columns

Modeling the data

Ultimately, we want to use the regular season games to fit the logistic regression to relate the Outcome to the RatingDiff. We’ll then use the results of that model to make predictions about the upcoming tournament.

Here’s how we do that:

train_data = df_expanded[df_expanded["tourney_game"] == False]
test_data = df_expanded[df_expanded["tourney_game"] == True]
X_train = train_data[["RatingDiff"]].values
y_train = train_data["Outcome"].values
X_test = test_data[["RatingDiff"]].values
y_test = test_data["Outcome"].values

# Fit the model!
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Use the model

Finally, we can use the model to make a probabilistic prediction for tonight’s game:

model.predict_proba([[
  teams.loc[7].rating - teams.loc[0].rating
]])

array([[0.15906252, 0.84093748]])

Well, that’s pretty confident!

Comments

You might recall that the prediction we produced last time was much less confident - only 0.52. The only substantial difference between this code and the code that I whipped up just before class is in the definition of the model. Last time, the model was defined by:

model = LogisticRegressionCV(
  cv=5, penalty="l2", 
  scoring="accuracy", 
  max_iter=1000
)

This approach uses five-fold cross-validation to minimize the score function pluse a constant time the sum of the squares of the model coefficients. That is, it should minimize \[ J(\mathbf{w}) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log \hat{p}_i + (1 - y_i) \log (1 - \hat{p}_i) \right] + \alpha \sum_{j=1}^{p} w_j^2 \] Unfortunately, specifying score="accuracy" changed the metric which is why I was so unhappy with the prediction.

Another issue is the use of cross-validation. The nature of the data indicates that we can’ just blindly use cross-validation. We should fit on the regular season data and then choose the regularization constant to minimize our score on the tournament data. That needs to be set up manually and uses a bit more code than I want to get into here.

Thus, this time, we defined it simply by:

model = LogisticRegression()

That gives us the 84% prediction which, honestly, is probably too high now. But, hey, I like it!