These are freqeuently asked questions for the Squash Levels site. Please look through these before contacting us as there is a good chance your question is already covered. If not, then please get in touch and we can add your question to this list. Click on the question to reveal the answer.
We often get asked about SquashLevels’ calibration algorithm so we’ve put something together here in the form of an FAQ. If you have any calibration related questions that aren’t answered here then do please let us know and we can add them.
We have recently made a significant upgrade to the calibration engine and we also cover those changes here.
The ultimate goal is that the system automatically calculates a playing level value for every player after every result which is an accurate assessment of how well they were playing at that time.
The measure of playing level is (mathematically) relative such that if you are playing twice as well as your opponent then you will have twice the level and this applies all the way from a beginner's first competitive matches to the top pros. This should apply wherever you are, however good you are, whoever you played and whenever you played!
This allows us to plot graphs, compare players, predict results, set goals and even compare players from different eras! Although it's all comparative, it's fixed to a specific level at a specific time so it can actually be treated as an absolute figure. That's really important so that when a player knows what level they are, they know what that means. These are the sorts of levels we find. Numbers very approximate as clubs, counties etc. do vary considerably!
With the vageries of human behaviour, millions of matches over decades of time, it's a complex task!
The calibration engine is at the heart of SquashLevels, pounding away at all the new results every night generating levels for every player for every match across the system.
The following sections attempt to divide a complex engine into its constituent parts and give you an insight into how it all works. All the details are left out for obvious reasons but, if you read this FAQ, you will have a pretty good idea what goes on under the calibration hood. Anoraks on...
This is the most obvious part of the engine with the system assessing the level that the player is playing at and assigning a level value to them. For every match the system compares the actual result with the expected result against their opponent and they go up a bit if they play better than expected and down a bit if less well than expected. All ranking systems do this though SquashLevels makes a point of using points scores for accuracy. The algorithm itself is based on:
We can work with game scores only, making assumptions around the average 3-0 result (based on our analysis of real 3-0 match results) but we can only use averages so it takes a lot more results for the levels to become accurate. Not all 3-0 results are the same, obviously.
Applying player calibration to a set of players who play each other over a period of time naturally calibrates all those players over time. The more they play and the more opponents they play, the quicker the calibration. They are effectively a pool of players such as those from a club or a county league.
This is a natural effect and doesn’t require anything specific from the calibration engine. All calibration engines therefore provide intra-pool calibration.
A player’s level doesn’t mean much unless you can compare them with other players on the system. I.e. A 1000 level player in Surrey should be playing at the same level as a 1000 level player in Yorkshire, or Calgary for that matter.
This is ‘pool calibration’ where players in a pool are treated ‘as one’ and then compared with other pools such that their respective pool levels are equivalent. The comparisons are made by analysing the results of those players who play in more than one pool but, as ever, you have to be careful which results you use and how you use them. Behavioural modelling is really important for this.
There are different types of pools and they behave slightly differently such as geographical pools like Yorkshre and Surrey and then there are club boxes, tournaments, ladders, tours and so on. Club boxes are interesting as the top players are usually more associated with their county pools and the actual pool boundary is further down the boxes. Tricky!
Just to add to the challenge, some pools are a subset of other pools (e.g. a tournament series in a club) and others are made up of subsets from a number of other pools such as regional events and leagues. We refer to these as derivative pools. They might appear to need calibrating but they can’t be adjusted!
As long as there is a group of players who play each other at least a few times over the course of a season then they can be considered a pool. There’s a good deal of complexity around automatically identifying the pools and where the pool boundaries are.
Another goal is that a player’s level is also equivalent over time. I.e. If your playing level is twice now what it was two years ago then you should be able to see that in the charts and also make predictions and set goals for yourself in the future. It’s also good fun to compare ‘best levels’ from different players at different times and see who might have come out on top had they played each other in their prime.
Drift is the big problem for time calibration and, as mentioned before, the use of behavioural modelling causes drift. There are some interesting factors that effect and cause drift, all of which have to be taken into account over hundreds of thousands of results over, literally, decades. If we’re out by even a small amount, the drift really adds up and you just can’t compare more than a few years.
These are some of the most common causes of drift:
It would be nice if it all cancelled out but it doesn’t. The analysis we have done has allowed us to quantify all of these drift effects and we’ve been able to factor them into the algorithm. The results now look pretty good - even over a 20 year period.
The whole system is about player levels but one of its goals is to be fully inclusive so that beginners can compare with club players who can compare with league players and so on all the way up to the top pros. It’s great fun to run the predictor between ourselves, keen amateurs, and the top pros or anyone else on the system and actually get a predicted result.
Clearly, the pros don’t play the beginners so we rely on the results of beginners playing leisure players, leisure players playing box players etc., all the way up to the top pros. There are actually 8 distinct levels that we need to calibrate across (levels approximate!):
This doesn’t leave much margin for error as, with a comparative based system, any errors will be exaggerated at the ends.
The idea is that every night, new results are imported from the many systems that connect to SquashLevels and then the calibration is let loose on them such that the new player levels are worked out for those matches while ensuring level equivalence between pools and over time.
The really key part though is that this needs to be done completely automatically. With hundreds of thousands of results coming in for tens of thousands of players it simply isn’t possible to do any of this manually. Scale is the big killer for these complex algorithms - unless they are fast, efficient and automated.
This turns out to be very difficult to achieve because of the ambiguities of player behaviour, player results and the models we apply to match them. The adjustments are made a little at a time with the goal being that they all match nicely and the system becomes stable but, in reality, humans are not that straightforward and unless you get it right, you really can’t achieve stability and the levels keep changing forever.
The calibration engine has code that works on all of the calibration types pretty much at the same time making small adjustments and moulding the data almost like clay to get it as close as possible to the minimum overall calibration error.
There is no right answer as we’re dealing with human behaviour and you only have to watch yourself and your teammates varying wildly from week to week to realise it’s not an exact science! The goal is to get as close as we can.
The engine separates its efforts into:
In both cases the engine attempts to set a starting level and then adjust dynamically over time. Note that, as a pool is made up of players, it is the player levels that we are adjusting; both for the starting level and also over time.
Fundamentally, there are three processes running as we work through the results
Identifying the pools is not straightforward as they come and go and change over time. Some of the bigger pools like the county leagues are good examples of pools but they also vary quite a bit over time with new players coming in, others leaving or retiring or, simply, the county deciding to restructure their leagues! It is a complex task to identify all the pools, which players are core to those pools (as opposed to occasional players, visitors etc.) and where the pool boundaries are. As mentioned earlier, clubs are interesting because often the top players play in the county leagues so we have to decide if they are representative of the club or the county… This all has to be done automatically.
Comparing the pools is also a challenge. One of the great things about squash is that there are many players who play in more than one pool such as the county players who play in multiple counties and also the tournament players who travel around the region or further. The trick is to use these players to compare their respective pools.
It turns out that these multi-pool players, being human, are somewhat variable in their playing level as they play from pool to pool but, through all the behavioural analysis we have done, we have worked out which players we can use, which matches are representative. We can even predict their expected effort allowing us to use a lot more matches as, previously, we’d only been able to use the subset of matches that we felt full effort had been applied.
After all this analysis we end up with a sparsely populated matrix, several hundred pools across and down with their pool differences recorded where we have them. This is effectively a massive simultaneous equation with semi-random numbers to work from. The trick is not to attempt to solve it but to minimise the overall error - small adjustments at a time. Large adjustments can cause calibration shock which is to be avoided!
Once we have the pool differences we then need to adjust the pools as best we can. A pool’s starting level is made up of all the players in the pool but as those players come and go over time, we have to be careful which ones we adjust. We even need to assess whether a pool can be adjusted as it may be derived from other pools that have already been calibrated.
The starting levels of both players and pools are worked out and set but after that they can only be adjusted a very small amount each match. So, in order to be able factor in these dynamic adjustments needed over time, we created a level pump. This effectively runs throughout the calibration process adding or reducing each player’s level after each match to make the overall adjustments needed; whether for the player’s themselves, the pool they are playing in or the system as a whole. They must be small enough to be virtually unnoticed but large enough that they have enough effect. A difficult balance. If you look at the match review page you will see, at the bottom, an allowance for this dynamic adjustment.
For a full calibration, once the new results have been read in, the engine is fed all the results from day one, in order, all the way through to the very latest results. This is needed as new feeds connect and their results can go back 5-10 years or more. For a partial calibration, just the last few seasons are re-run ensuring any recent historic result changes are taken into account (e.g. merging player histories). Some form of calibration is run every night to keep up to date with the latest results and updates.
The original calibration algorithm was pretty good but, after nearly 20 years of results, it was showing some areas that clearly needed some investigation. In particular:
We had also received quite a bit of feedback on the better players being penalised for holding back on their lesser opponents and, although we’d allowed for that, it seemed we needed to do more.
This summer, we spent a great deal of time and effort updating the SquashLevels calibration engine to address these problems and also beef up our behavioural modelling for even greater accuracy. We’ve had 1.6 million results to learn from and it’s been a fascinating learning, all of which has been dialled into the algorithm. There really isn’t a calibration engine like this for squash, or any other racket sport for that matter.
We have been busy:
Every player’s case will be different but it’s almost certain that your level will have changed!
In general we found:
At a pool level, the changes are mostly around the 5-10% mark though there are some higher than that. At a player level some of the changes are very significant but we’ve studied the biggest movers (both up and down) and believe that they are all more accurate than they were before. Please see the point below if you feel you’ve been hard done by.
There certainly are. Despite huge effort around the behavioural model we have to acknowledge we’re dealing with humans and volatile humans at that. By actively using the models we have been able to use many more results to calibrate which helps get the averages right but, we acknowledge that they could still be wrong.
Do let us know. Please don’t worry about 5-10% as we change by more than that each week anyway. If you think you’re 50% out then that’s another matter.
We can look at your history and make a manual assessment if your level appears wrong. In these cases we can usually identify the results that may be causing the issue or, it may just be that we feel it’s actually more accurate now. Each case is different and we strive for accuracy so we’ll take a look.
We do have some controls:
A question often asked so here’s an overview. We work out what we can from the name of the match type and source of the results. With nearly 6,000 match types we have to automate though the weightings can be manually overridden.
Match type weightings are applied at the end, after all other damping so it’s an independent control over how much levels can change. In general:
Just to compare, three tournament matches in a row (such as from a single tournament) will have a compound weighting of 100% whereas three box matches in a row will be 12½ %. Quite a difference!
It’s a key part of the calibration engine so we’ve offered it as a separate FAQ.
The following cases all affect behaviour and have been modelled as best we can based on analysing the 1.6 million results in the system:
This question is asked a lot and we’ve made changes to improve the algorithm for this case so we thought we’d treat it as a separate FAQ.
The system has always allowed for the better player to play down but we included this case in our behavioural modelling over the summer (based on thousands and thousands of real cases) and we are now able to plot graphs showing the typical effort applied by the better player!
One of the interesting learnings from this is that the better player actually starts to hold back even when there’s not that much difference between the players. Probably unwise given how much we vary but that's’ what happens!
Our analysis covers player level ratios right up to 10:1 and showed remarkable consistency in the effort that the better players put in at all levels. Even the pros. We found a distinctive range of effort that we consider the ‘acceptable range’ for the better player. This allows us to put the result into one of four bands:
In addition to all this there is more damping applied the further apart the players are so even the most generous of runarounds shouldn't have too much effect if the players are very unevenly matched. We're talking 3:1+ here, mind. Just being 30% better doesn't count as all that one-sided!
What we’re hoping is that this will encourage the better players to give their opponent a good game but certainly not destroy them. They should also be aware that their level will go down if they just mess around but, even in those cases, it shouldn’t go down by much. These are verging on the exhibition matches but we didn’t want to exclude them altogether.
We have done our very best to make this a genuinely useful and fun resource and believe that most player levels are probably somewhere near right BUT please remember it's based on the vageries of human behaviour - your opponent's as well as your own - and it's certainly not 100% accurate to two decimal places!
So don't take it TOO seriously! Try to improve, yes, enjoy the ebb and flow, have a laugh in the bar aftwerwards but it's just a number at the end of day - one of the less important things in your life!
And for the same reason, if you're a team captain making your selections, use these levels as a guide rather than a black and white selection cut-off. You know your players better than we do. You know if they're having a couple of difficult weeks and will bounce back. All SquashLevels does is try to measure it. YOU are the one selecting the team. Don't leave it to a computer!