At EquiRatings headquarters, the initial idea to apply the esteemed Elo rating to eventing nearly brought tears to Diarm’s and Sam’s eyes. The first horse ranking system the sport has ever seen, based soundly on one of the most well-used ratings in today’s major sports (we’re talking the Premier League, NFL, NBA, and MLB).
Inspired by others who likewise adapted the Elo from a chess rating to a sports rating, we’ve developed the eventing Elo to objectively rate horses and offer the odds for our favourite head-to-head duels. Here’s how the rating works for Eventing.
Elo ranking systems were first introduced to the world of chess in the 1960’s and have since been expanded and adapted to cover numerous types of competition, from online gaming, to sports analysis, and even online dating. It is used to rank competitors based on inferred performance from their win/loss record in a series of head to head matchups. It’s most commonly used where a game or sport doesn’t have an inherent ranking of players’ abilities (like say a handicap in golf), and where performance is more likely to be judged on their performance record against other known players.
One of the main reasons people use the Elo system is its simplicity. Every competitor starts with an initial rating. When they beat another competitor, their rating goes up (relative to the gap in the opponents’ incoming ratings) and when they lose, their rating goes down. The formula is simple in that it has very few parameters, is easy to implement, and it’s easy to understand the evolution of players’ rankings.
While some might detract from the system for its simplicity, that it doesn’t consider the margin of winning a matchup, or the individual phase scores, or when horses are only really competing as a warm up for their next event, we believe it is a more informative number for all of that. It ranks horses based on their consistent, all around performance, and only that. Looking at the rankings, either currently, or over the last 10 years, it does a pretty good job of it.
At first look, it may seem that Elo isn’t a natural fit for eventing. For starters, eventing isn’t a comparative scoring sport like football, where players score points against each other. Rather, eventers score against judges for dressage marks and against courses in cross country and show jumping. Instead of breaking through the opposing team’s defense to score a point, eventers are going up against a test and they out-do each other based on their test performance.
But, this eventing test varies. Eventing has different levels of competition that have different dressage tests, jumping difficulties and cross country time challenges. A clear cross country round at CCI2*-S doesn’t equate to a clear round at CCI4*-L. On top of that, different tracks at the same level present different levels of challenges. Making the time at Burghley CCI5*-L is not the same achievement as making it at, say, Pau CCI5*-L.
Even dressage tests aren’t immune to scoring variance. While it may be contentious to brand some events as being easier-scoring than others, just looking at variance between judges for the same tests showcases that there is some variance there.
These realities required us to adapt Elo for use in eventing. These realities are also the exact reason why we believe our sport needs it. Just judging horses on their scoring achievements doesn’t show the full picture. Using Elo to summarize their scoring against each other is that extra layer of insight that helps us compare apples to apples.
So, having highlighted the case for introducing an Elo ranking system into eventing, how do we go about it? Eventing isn’t a natural fit to the classic Elo sports (think the Premier League , the NFL, NBA, and MLB), so we had to make some adjustments to the basic Elo formula to make it work.
First off, we have the fact that eventing isn’t a two-player sport. Elo ratings are calculated on the results of a two-player matchup. The usual way of adjusting Elo to fit a sport where we have a field competing against one another (a multiplayer game) is to break the field down into a round robin type format. Here each competitor in the field is paired off against every other competitor in what we will call a head-to-head matchup. If competitor A finished ahead of competitor B, then A is judged to have won that matchup, and receives a bump in Elo (while competitor B receives a corresponding drop in their Elo). We tally the Elo changes for every matchup for each competitor and use that as the total Elo adjustment (loss or gain) for that competitor from that competition.
The classic Elo formula uses only 3 parameters:
Two other variables are the initial rating for a new competitor (set to 300), and the width of the Elo divisor (set to 400). You may notice that we deviate from the standard initial Elo rating (usually 1200- 1500), but there is a strong reason for that. Performing in eventing isn’t as reliable as say performing in chess where many match-ups can be predicted with a 99% accuracy. In international eventing, every horse can have an off day, and have rail or a refusal. The large points associated with those mistakes means that horses will have rounds where they are beaten by much lower ranked horses. As such, the Elo rate in eventing doesn’t evolve to have the same breadth of scores as say chess or tennis, where the strongest players can be over 2000 points greater than the weakest players. This starting score is completely arbitrary in an Elo algorithm, it has no effect on how competitors end up comparing to one another and it solely affects the scale on which Elo scores are given. By starting at a score of 300, we are able to keep most Elo scores between 0 – 1000, which we believe is a simpler scale to use and comprehend.
The formula works as follows:
Competitor A is heads up against Competitor B. Competitor A has a ranking of Ra, and B has a ranking of Rb.
After competition, each competitor’s ranking changes by the following amount:
Where S is the result of the match-up (1 for a win, 0 for a loss, 0.5 for a draw), and K is the K-factor, which determines how much a single result can impact a horse’s overall rating. The higher the K-factor, the more the Elo is influenced by a horse’s most recent results. The lower the K-factor, the more Elo is representative of longer periods of form.
In adapting the Elo for the sport of eventing, we only made adjustments to one of these factors, the K-factor. This is because there are a couple of scenarios that don’t fit the classic form of Elo competitions, and a couple of adjustments to this K-factor help smooth out the Elo performance.
In adapting the Elo, we were always trying to improve its performance. This performance can be measured in many ways, like accuracy in predicting matchup winners, or loss functions that try to minimise extreme upsets when making predictions using Elo rankings.
We measured performance of the eventing Elo system by considering the Brier Skill Score. This function tries to minimise the error in matchup probability predictions made using the Elo function, and is a very common method of measuring Elo performance. Since most of our reporting on Elo will be around the top levels of eventing (4* and 5* competitions), we have adapted the Elo to gain maximum Brier Skill Score at those levels of competition (at the expense of global accuracy, which would be more weighted towards the 2- and 3-star levels, since they have the majority of international competitors).
When using the Elo for multiplayer sports (like eventing), it is common to divide the total Elo change, from all the head-to-head matchups, by the competition size. This is because all the head-to-head matchups arise from single performances by each of the competitors (each horse). Dividing them by competition size means the change in Elo is on a similar scale each time a horse competes, regardless of how many competitors they go against. If we didn’t divide by competition size, then winning a competition of, say, 50 competitors, would be equivalent to winning 5 competitions in a row with 10 competitors of similar calibre. We would argue that winning the 5 competitions is a stronger result than just winning one large competition.
However, it isn’t fair to say that competition size doesn’t matter. Winning against a large field in eventing is considered a bigger achievement to winning amongst a smaller field. To adjust for this, we soften the competition divisor with larger competitions, such that winning a competition against 50 horses has about a 40% greater impact compared to winning a competition against 10 horses.
From this, we settled on a K-factor of 80 and a competition power divisor of 0.8. Since these factors have the largest influence on the Elo Brier Skill Score, the following adjustments were made using 80 and 0.8 as the set, base factors. While this method of finding optimum parameters would not be suitable for more sensitive analyses, the most rigid optimisation protocols are computationally demanding operations. For our purpose of rating and ranking, this method was appropriate and achievable.
Another major consideration in eventing is competition level. In order to prevent horses that consistently win at the lower levels from getting inflated Elo scores, we dampened the influence of runs at those levels compared to runs at 4* and 5*. We used the same level groupings that are used by the FEI world ranking system, that is: 5*L; 4*L; 4*S and 3*L; 3*S and 2*L; 2*S; 1*S. The relative influence of these groupings is roughly on the scale of 5:4:3:2:1:0.5, which is very similar to the scale used for world ranking points.
The following graph shows Brier Skill Score against a parameter which sets the gap in importance between the levels of competition (the larger the parameter, the larger the importance gap).
One of the assumptions of an Elo system is that a competitor’s performance is logistically (or normally) distributed. This means they will perform reasonably close to their average performance level most of the time and on rarer occasions have a very strong or very weak performance. In eventing, however, the penalties associated with cross country jumping (XCJ) faults (or eliminations) are so large that incurring a single penalty causes even the strongest horse to fall way down the leaderboard and lose matchups against many lower-ranked horses. In eventing, no horse is immune to these XCJ faults. Indeed, it is difficult to class any horse as having more than a 95% chance of jumping clear around any course. When a horse has an XCJ fault, the change in Elo scores are too drastic, for both the horse itself and the low-ranked horses that happen to finish ahead of it. To soften the impact of XCJ faults in the Elo, we reduce the Elo change by 30% on any matchup that involves an XCJ fault.
The final adjustment we make is to reduce the K-factor for the stronger Elo horses, an adjustment that is quite common in Elo systems. The reasoning is that once a horse has achieved a level of performance, we are more confident about its ability. By reducing the K-factor for these horses we soften the importance a single competition has on their score, and make more gradual adjustments to their rankings. Also, since horses with these top Elo scores are more likely to be competing at 4* and 5* events, it allows them to have an off day or an easy round without having such a major impact on their score.
The Elo is purely data-driven and, in that, it doesn't try to mind-read and account for strategy. It takes cross country time penalties at face value. If a horse is rated low and we can all unanimously agree that his rider rarely pushes for time, the rating will appropriately reflect that. Who puts up the strongest results on a consistent basis? Period. That is what the Elo is measuring.
If a horse is withdrawn at any point before cross country, their Elo remains unchanged. If a horse is withdrawn after cross county (where cross country is the second phase), their Elo will go down. This is because withdrawing after cross country is very rarely voluntary; it is typically either injury related (most common) or performance related.
The Eventing Elo is the first of its kind in eventing. The EquiRatings High Performance Rating (HPR) is valuable too, rating horses based on their single best performance, but what the Elo does is rate horses over time, based on all their performances dating back to 2008. A horse’s Elo will respond after every run – up or down (except for the withdrawal consideration we mentioned above). It’s fascinating to see and compare upward trends over horses’ careers. We’re excited about the eventing Elo and what is can do to change the game.