sharash

The schedule-adjusted league table

In most soccer leagues, the league table is the standard way to rank teams, and is used at the end of the season to crown champions and choose teams for promotion and relegation, as well as qualification for international competitions. League points are gained by winning or drawing games. However, the table does not take into account the difficulty of opponents; winning against the strongest team in the league is worth as many points as winning against the weakest. As a consequence, the schedule of when teams play each other can skew the ranking of teams in favor of those with easier schedules. This is more of a problem towards the beginning of the season. As a side note, the question of how to rank teams is more pertinent in systems where not everyone plays each other, for example the NBA.

The schedule-adjusted league table (SALT) was proposed by Constantinos Chappas as a better way to tell how well teams have been doing so far. Basically, for each pair of teams, SALT takes into account the difference in points they have gained from equivalent matchups at the equivalent venue. In doing so, it automatically considers schedule difficulty, and may therefore represent a better ranking of teams than the regular league table.

SALT uses a linear model: \(M P = R\), where \(M\) is a 190*20 matrix of every matchup times every team, \(R\) is a 190-vector of relative points, so the difference in points per game for the games that both teams have played, and \(P\) is a 20-vector of how many more points per game the team earns compared to average (will be scaled to points later), to be solved for with least-squares. For additional details, check the original post by Chappas.

SALT has some attractive properties:

  • The average of the SALT points is equal to the average of the actual points after each matchday.
  • At the end of the season, SALT and the actual league table are identical!

Let’s code this model up in R, and look at some results for the 2018-2019 Serie A season:

# load some packages
library(jsonlite)
library(dplyr)
library(ggplot2)
library(gganimate)
theme_set(theme_bw())

# get the data from https://github.com/openfootball/football.json
url.clubs = 'https://raw.githubusercontent.com/openfootball/football.json/master/2018-19/it.1.clubs.json'
url.results = 'https://raw.githubusercontent.com/openfootball/football.json/master/2018-19/it.1.json'
it.teams = fromJSON(url.clubs)$clubs
n.teams = nrow(it.teams)
# let's add team codes for Frosinone and SPAL
it.teams[19,2]='FRO'
it.teams[20,2]='SPA'
it.results = fromJSON(url.results)$rounds$matches
n.rounds = length(it.results)

# start by creating M matrix
create.matchup.index.matrix = function(n.teams){
  M = matrix(0, nrow=n.teams*(n.teams-1)/2, ncol=n.teams)
  i.row = 1
  for (i1 in 1:(n.teams-1)){
    for (i2 in (i1+1):n.teams){
      M[i.row, i1] = 1
      M[i.row, i2] = -1
      i.row = i.row + 1
    }
  }
  return(M)
}
M = create.matchup.index.matrix(n.teams)

To calculate \(R\), I will use two helper matrices: one 20*40 matrix of points gained per match, and another matrix with the same dimensions containing the matchday when the game took place. Each matrix contains one row per team, and one column per matchup, so 20 opponents at home plus 20 opponents away. Using these helper matrices, we can easily see which opponents two teams have played in common, which we’ll need to calculate SALT.

# create matrices to record round and points for each game
game.points = matrix(0, nrow=n.teams, ncol=2*n.teams,
  dimnames=list(it.teams$code, c(it.teams$code, it.teams$code)))
# use Inf so a team never plays itself
game.matchday = matrix(Inf, nrow=n.teams, ncol=2*n.teams,
  dimnames=list(it.teams$code, c(it.teams$code, it.teams$code)))

# create vector to translate team name to
# index, like a dictionary
name.to.index = 1:n.teams
names(name.to.index) = it.teams$name

# fill in matchday and points matrices
for (i.round in 1:n.rounds){
  current.round = it.results[[i.round]]
  n.games = nrow(current.round)

  for (i.game in 1:n.games){
    team1.index = name.to.index[current.round$team1[i.game]]
    team2.index = name.to.index[current.round$team2[i.game]]

    # record info on when game was played
    game.matchday[team1.index, team2.index] = i.round
    # add n.teams because game is away for team2
    game.matchday[team2.index, team1.index + n.teams] = i.round

    # translate scoreline to points
    # first extract score for each team
    score.team1 = current.round$score[i.game,][[1]][1]
    score.team2 = current.round$score[i.game,][[1]][2]

    # then translate to points for the game
    if (score.team1 > score.team2){
      points.team1=3
      points.team2=0
    } else if (score.team1 < score.team2){
      points.team1=0
      points.team2=3
    } else {
      points.team1=1
      points.team2=1
    }

    game.points[team1.index,
                team2.index] = points.team1
    # add n.teams because game is away for team2
    game.points[team2.index,
                team1.index + n.teams] = points.team2
  }
}

# create function to fill in R
create.relative.point.matrix = function(games.played,
                                        points.so.far){
  n.teams = nrow(points.so.far)
  R = rep(0, n.teams*(n.teams-1)/2)

  index = 1
  for (i1 in 1:(n.teams-1)){
    for (i2 in (i1+1):n.teams){
      # which games have both teams played
      games.played.in.common = (games.played[i1,] &
                                games.played[i2,])
      n.games.in.common = sum(games.played.in.common)

      # only calculate R[index] if they have matches in common
      if (n.games.in.common > 0){
        # how many points did they get from those games
        points1 = sum(points.so.far[i1, games.played.in.common])
        points2 = sum(points.so.far[i2, games.played.in.common])

        # the difference per game is R
        R[index] = (points1 - points2) / n.games.in.common
      }

      # add points for head to head matches
      if (games.played[i1, i2]){
        R[index] = R[index] *
          n.games.in.common/(n.games.in.common + 1) +
          (points.so.far[i1, i2] -
            points.so.far[i2, i1+n.teams])/(n.games.in.common + 1)
        n.games.in.common = n.games.in.common + 1
      }
      if (games.played[i1, i2 + n.teams]){
        R[index] = R[index] * n.games.in.common/
          (n.games.in.common + 1) +
          (points.so.far[i1, i2+n.teams] -
           points.so.far[i2, i1])/(n.games.in.common + 1)
        n.games.in.common = n.games.in.common + 1
      }

      index = index + 1
    }
  }
  return(R)
}

To obtain the least-squares solution, I could use lsfit, but in this case I use the Moore-Penrose pseudoinverse (via SVD) instead.

# solve linear system AX=Y with Moore-Penrose pseudoinverse
mp.solve = function(A, Y, tol=1e-8){
  svd.res = svd(A)

  # create diagonal matrix with inverse of
  # all singular values greater than tol
  sv.inv = rep(0, length(svd.res$d))
  sv.inv[svd.res$d>tol] = 1/svd.res$d[svd.res$d>tol]
  sv.mat = diag(sv.inv)

  # create pseudoinverse
  A.inv = svd.res$v %*% sv.mat %*% t(svd.res$u)

  # solve system
  X = A.inv %*% Y
  return(X)
}

# and a function to calculate P, then scale to points
calculate.adjusted.points = function(M,
                                     R,
                                     game.points,
                                     games.played,
                                     i.round){
  # use MP pseudoinverse to solve linear system
  P = mp.solve(M, R)

  # scale to points
  avg.points = mean(game.points[games.played])
  salt.points = i.round * (avg.points + P)

  return(salt.points)
}

Now, let’s calculate the actual and adjusted league points after each matchday

# create data frame with actual and schedule-adjusted points
# start with zeros
df.salt=NULL
for (i.team in 1:n.teams){
  team = it.teams$code[i.team]
  df.salt = rbind(df.salt, data.frame(team=team, matchday=0,
    actual.points=0, salt.points=0))
}

# then calculate SALT for all rounds
for (i.round in 1:n.rounds){
  games.played = game.matchday <= i.round

  points.so.far = matrix(0, nrow=n.teams, ncol=2*n.teams)
  points.so.far[games.played] = game.points[games.played]

  # fill in R matrix
  R = create.relative.point.matrix(games.played, points.so.far)

  # calculate adjusted points
  salt.points = calculate.adjusted.points(M, R, game.points,
    games.played, i.round)

  # fill in data frame
  for (i.team in 1:n.teams){
    team = it.teams$code[i.team]
    actual.points = sum(points.so.far[i.team,])
    df.salt = rbind(df.salt,
                    data.frame(team=team,
                               matchday=i.round,
                               actual.points=actual.points,
                               salt.points=salt.points[i.team]))
  }
}

Some notes on this:

  • avg.points changes with each matchday; this is so that SALT on an ongoing season will exactly match the SALT for the corresponding matchday once the full season results are in. In other words, SALT for a matchday won’t change based on how far along a season has come.
  • Unlike regular league points, SALT can decrease from one matchday to the next, depending on other teams’ results.

Okay, so now we have the SALT points for each matchday. Let’s start by seeing when each team had the biggest difference between actual points and adjusted points:

df.salt %>% group_by(team) %>%
  mutate(abs.diff = abs(salt.points-actual.points))
  %>% slice_max(n=1,order_by=abs.diff) %>% ungroup()
team matchday actual.points salt.points abs.diff
ATA 10 12 10.11 1.89
BOL 13 11 8.96 2.04
CAG 17 17 19.06 2.06
CHI 6 2 4.70 2.70
EMP 18 16 12.68 3.32
FIO 13 18 15.85 2.15
FRO 11 6 1.24 4.76
GEN 8 12 8.63 3.37
INT 11 25 27.92 2.92
JUV 3 9 6.10 2.90
LAZ 13 23 25.48 2.48
MIL 15 26 28.70 2.70
NAP 26 56 60.15 4.15
PAR 10 13 8.95 4.05
ROM 8 14 12.20 1.80
SAM 11 15 17.92 2.92
SAS 9 14 17.65 3.65
SPA 9 12 16.30 4.30
TOR 21 30 32.64 2.64
UDI 15 13 14.87 1.87

So the difference between regular and schedule-adjusted league points is always less than 5 points for the 2018-2019 Serie A season.

And how do the actual points compare with adjusted points for each matchday? Let’s plot it!

animate.salt = function(df,
                        fps=4,
                        seconds.per.round=1.5,
                        width=720,
                        height=720){
  points.min = min(min(df.salt$actual.points),
    min(df.salt$salt.points))
  points.max = max(max(df.salt$actual.points),
    max(df.salt$salt.points))
  p.anim = ggplot(data=df.salt,
    aes(x=actual.points,
      y=salt.points,
      label=team,
      color=team)) +
    geom_label(size=8) +
    geom_abline(slope=1, color='#a6a6a6') +
    labs(title='Schedule-adjusted league points',
         x='Actual league points',
         y='Schedule-adjusted points') +
    xlim(points.min, points.max) +
    ylim(points.min, points.max) +
    theme(legend.position="none",
          text = element_text(size=24)) +
    transition_states(matchday,
                      transition_length = 2,
                      state_length = 1) +
    ease_aes('cubic-in-out') +
    ggtitle('Matchday {closest_state}')

  animate(p.anim,
          nframes=(n.rounds+1)*seconds.per.round*fps,
          fps=fps,
          end_pause=6*fps,
          width=width,
          height=height)
}

animate.salt(df.salt, fps=10)

salt-anim

A team above the diagonal line is perhaps lower in the league table than performances merit due to a difficult schedule. Conversely, a team below the diagonal has many difficult matchups to come, and may thus drift down the league table in the future.

So that’s the schedule-adjusted league table: a more nuanced way of looking at the results so far. This is just a simple adjustment, but the model can certainly be extended by looking at recent team form, more advanced metrics like expected goals, and so on. For some other alternative league tables, take a look at: