website statistics

Here’s a fun exercise. Remember back in high school calculus when you would find the peak of a curve by setting its derivative to zero? That’s a basic example of mathematical optimization. Regression is another, more complicated form of optimization. In regression we want to fit a line or curve so that the difference between the predicted values and the observed values is minimized. Simply put, optimization is finding either the maximum or minimum of a function, subject to a list of constraints.

Optimization is the hidden science behind much of the world around us. It’s how airlines know how to schedule routes, how politicians gerrymander districts, and how companies plan advertising campaigns. It’s also how sports leagues manage to build schedules to accommodate a dizzying myriad of requirements. Optimization is why we had to suffer through the Titans-Jaguars game Thursday night.

Let’s say we want to find the best-fitting set of team ratings that would explain the game outcomes so far in the season. What we can do is start out with an assumed set of team ratings. Anything is fine, but let’s start with a rating of zero points for every team. Then we can list each game and its net score result (home score – visitor score - 3). We’ll subtract 3 because that’s the value of home field.

Next to each game we’ll lookup each opponent’s rating and compute a projected net score in the same way (home rating – visitor rating – 3). For each game there is an error between the projected net result and the actual result. We can add up all the games’ errors and get a total error.

Almost everyone has a relatively powerful optimization tool on their computer but probably doesn’t realize it. Excel’s Solver tool does a pretty good job for small optimization problems. We can tell Solver to select the combination of team ratings that minimizes the total error. This would tell us the team ratings that best explains the game outcomes we’ve seen to date.

There’s one problem. If we simply add up all the errors, we’re bound to get something close to zero no matter what the team ratings are. That’s because positive errors (the home team overperforms expectation) and negative errors (home team underperforms expectation) will cancel out. One way around this problem is to square the error. This is known as L2 Norm regression, or least squared regression. One big drawback with this approach is that it is very sensitive to outliers. In our example, it’s very sensitive to blowouts.

Another option is to minimize the absolute value of the errors. This is known as L1 Norm regression or absolute error. This approach is not hyper-sensitive to blowouts but comes with its own difficulties.

Solvers have a tough time with the absolute value function. If you plot it on the x-y plane, it’s a “V” shape, with its apex at the origin (0,0 point). This is a discontinuity and means absolute value is a non-differentiable function. Solvers like differentiable functions for the same reason they were so handy in high school for finding the minimum and maximum of a curve.

Fortunately, there’s a way to trick solvers into seeing problems like this as a completely linear problem. We can express the error as a positive component and a negative component, both constrained to be greater than or equal to zero. By telling the solver to add the positive component and subtract the negative component the problems becomes purely linear. Purely linear problems can be solved with a technique called Linear Programming, using a algorithm known as the Simplex method. (This technique requires more variables and constraints than Excel’s Solver can handle, so I used another more heavy-duty solver.)

Here are team ratings for the 2014 season through week 15, based on a variety of approaches. The L2 Norm column is least squared error approach. The column labeled Non-Lin is the L1 approach using Excel’s non-linear solver. The column labeled L1 Evo is the L1 approach using Excel’s evolutionary algorithm, which uses a process I explained in my summer project from last year. The column labeled simply L1 is the pure, direct solution using Simplex. The average column doesn’t average all the columns, just the pure L1 and L2 approaches.

Rank Team L2 Norm L1 Non-Lin. L1 Evo L1 Norm Average
1 NE 13.2 19.0 18.0 18.7 15.9
2 DEN 8.8 12.0 10.2 13.7 11.2
3 SEA 6.6 8.7 8.1 8.7 7.7
4 KC 6.6 7.0 6.0 6.7 6.6
5 GB 6.1 5.0 4.2 6.7 6.4
6 IND 6.1 7.0 7.9 6.7 6.4
7 BLT 7.0 3.6 4.4 3.7 5.3
8 PHI 3.4 6.1 5.7 5.7 4.6
9 MIA 4.2 4.0 3.7 3.7 3.9
10 BUF 3.2 4.0 3.1 3.7 3.4
11 ARZ 3.0 4.0 3.9 3.7 3.3
12 SD 2.7 4.0 2.9 3.7 3.2
13 DAL 2.1 3.1 3.1 3.2 2.7
14 DET 2.1 1.9 1.8 1.7 1.9
15 HST 0.9 3.0 3.1 2.7 1.8
16 PIT 2.6 0.6 1.3 0.7 1.6
17 SL 1.2 0.1 -0.3 0.2 0.7
18 CIN 0.5 -0.9 -1.2 -1.3 -0.4
19 SF -1.3 -0.4 -0.2 -0.3 -0.8
20 MIN -0.6 -3.2 -2.9 -3.3 -1.9
21 NO -0.8 -3.4 -2.7 -3.3 -2.0
22 CLV -4.4 -4.4 -3.5 -4.3 -4.4
23 NYG -3.3 -6.4 -6.6 -6.3 -4.8
24 ATL -3.9 -6.4 -5.7 -6.3 -5.1
25 NYJ -5.8 -5.1 -5.3 -5.3 -5.5
26 WAS -7.0 -6.0 -5.9 -6.3 -6.6
27 CAR -5.6 -8.4 -7.4 -8.3 -7.0
28 CHI -5.6 -8.6 -6.7 -9.3 -7.5
29 TB -7.5 -9.3 -8.4 -9.3 -8.4
30 OAK -10.4 -8.2 -7.3 -7.3 -8.8
31 JAX -10.0 -10.9 -10.9 -11.3 -10.7
32 TEN -10.6 -11.1 -10.5 -11.3 -11.0

Notice that each approach yields slightly different ratings. The teams with very different L2 ratings than their L1 ratings likely had some big blowouts on their records. The overall order of the teams is relatively stable though.

Also notice how the true L1 results tend to group teams into tiers. There is a group of teams rated at 6.7, at 3.7, and at -6.3 points. This is a natural tendency of the approach, and it might be a useful way to think of the teams as well.

The only real surprise to me in terms of actual results is that KC is ranked so high. I suspect they are strongly buoyed by their big win over NE early in the season.

If you want true estimates of team strength to project future game outcomes, you’d want to regress these ratings toward the mean to some degree. There is an awful lot of sample error and other random effects included in every actual game outcome.

We can use the same technique on just about any team statistic. Here’s an example of L2 and L1 optimization for team Expected Points Added (EPA). EPA would give us similar results to actual points scored, except it treats special teams scores probabilistically.

Rank team L2 Norm L1 Norm Average
1 NE 9.2 18.7 13.9
2 DEN 11.1 13.7 12.4
3 SEA 8.3 8.7 8.5
4 KC 5.8 6.7 6.3
5 GB 5.5 6.7 6.1
6 IND 3.1 6.7 4.9
7 BLT 5.7 3.7 4.7
8 ARZ 3.9 3.7 3.8
9 SD 2.8 3.7 3.3
10 MIA 2.8 3.7 3.2
11 DAL 2.9 3.2 3.0
12 DET 4.0 1.7 2.8
13 BUF 1.7 3.7 2.7
14 PHI -0.3 5.7 2.7
15 PIT 2.3 0.7 1.5
16 HST -0.5 2.7 1.1
17 SL 1.0 0.2 0.6
18 SF 1.1 -0.3 0.4
19 NO 0.8 -3.3 -1.3
20 MIN -0.4 -3.3 -1.8
21 CIN -2.4 -1.3 -1.9
22 CLV -4.5 -4.3 -4.4
23 NYG -3.6 -6.3 -5.0
24 NYJ -5.8 -5.3 -5.6
25 WAS -5.4 -6.3 -5.8
26 ATL -5.8 -6.3 -6.1
27 CAR -4.4 -8.3 -6.4
28 CHI -3.7 -9.3 -6.5
29 OAK -7.7 -7.3 -7.5
30 TB -7.8 -9.3 -8.6
31 JAX -9.7 -11.3 -10.5
32 TEN -9.9 -11.3 -10.6

We get mostly the same order with EPA as we did with net scores.

The overall lesson is that there is no one correct method. Different but equally valid approaches give different answers. There are a number of ranking systems like these, such as SRS and Jeff Sagarin's long-standing Pure Points ratings, and they all will give us a very good picture of which teams are best and worst.