What can Solvency II Internal Models learn from ICC Player Ratings?

I’m looking at the FIFA rankings at August 2018, as you do. They rank all the world’s international football teams in order of strength. Compared to the previous ranking list two months earlier, there have been some big movements. Croatia have jumped from 20th to 4th; Germany have fallen from 1st to 15th. Why the big movements?

Well, the way the rankings work is that each team has a rating. France are at the top of the list because their rating of 1726 is higher than that of all the other countries. Every time France play in an international, they are given a rating for that one match. The rating depends on the result and on the quality of the opposition. It might depend on the importance of the match too – I can’t remember and it’s not important to this article. What is important is that the 1726 rating is the average of the ratings that France scored for each match over the previous four years.

And that’s why weird things happen in the FIFA rankings. Germany fell 14 places because their rating fell sharply in just two months. And it fell sharply not just because they had a poor 2018 World Cup but also because they had a great 2014 World Cup, putting seven past Brazil in the semi-final and ending up as winners. All of those results in the 2014 World Cup dropped out of the calculation in 2018 as they were now more than four years old. As far as the FIFA rankings are concerned, everything in the last four years is 100% relevant whereas everything before it is 100% irrelevant.

The ICC cricket rankings (I’m now talking about player rankings rather than team rankings) are much better behaved. They are also derived by ordering by ratings and players’ ratings are derived from (bowling or batting) ratings derived on an innings by innings basis. But player ratings vary more smoothly over time, meaning that players tend to drift up or down within the rankings rather than making big jumps. How does the rating system achieve this?

Well, here’s the thing. Whereas FIFA give equal weightings to all match ratings within a four-year window and no weightings outside it, the ICC ratings are weighted averages. I can’t remember what x is but it’s between 0 and 1 and a player’s rating after an innings is equal to x times his rating for that innings plus (1-x) times his rating before that innings. The result is that his rating after the innings is the weighted average of all the innings averages over his (or her, sorry) entire career, with the size of the weightings decaying exponentially as the data gets older.

This not only smooths out ICC ratings (and therefore rankings) but also makes them more up-to-date by placing more weight on the most recent data. If FIFA used this methodology, Germany would gradually have drifted down the ratings over the last five years, with the memories of 2014 gradually fading rather than being wiped out overnight like Snowball’s heroics in Animal Farm.

I think it’s interesting. And that brings me on to Solvency II internal models.

In calibrating internal models, companies will tend to fit probability distributions to historic data. There will be a lot of thought put into how far back the calibration data should go. The more data you use the more accurate your calibration will be, but that’s only if the characteristics of the risk have stayed unchanged over time. There will be strong arguments for not using the oldest data because it’s starting to look out-of-date and irrelevant. It’s a tough call.

Companies have made those calls though, and calibrated their models using data from a carefully chosen time window. But what happens when they come to recalibrate? If they calibrated using x years of data and they come to recalibrate the model one year later, will they just add on one extra year’s data? Maybe but I expect they’ll eventually find that the oldest data is no longer relevant. At some point they’ll start chopping off some of the earliest data and that’s when they’re in danger of having their SCR move around like Germany’s FIFA rating when significant historic events drop out of the calibration data.

So the idea that I’m throwing out there is whether there are alternative calibration methodologies that can take a lead from the ICC, with calibration parameters estimated using exponentially decaying weighted averages of the underlying data. Would firms (and regulators) rather see SCRs that behave like FIFA ratings or like ICC ratings?