To give you the best experience, we use cookies to remember your settings and personalise your pages and social media. Please click Accept if you are happy for us to do this. To find out more about our use of cookies please see our cookie policy.
SquashLevels Calibration FAQ

We often get asked about SquashLevels’ calibration algorithm so we’ve put something together here in the form of an FAQ. If you have any calibration related questions that aren’t answered here then do please let us know and we can add them.

We have recently made a significant upgrade to the calibration engine and we also cover those changes here.

What are we trying to achieve?

The ultimate goal is that the system automatically calculates a playing level value for every player after every result which is an accurate assessment of how well they were playing at that time.

The measure of playing level is (mathematically) relative such that if you are playing twice as well as your opponent then you will have twice the level and this applies all the way from a beginner's first competitive matches to the top pros. This should apply wherever you are, however good you are, whoever you played and whenever you played!

This allows us to plot graphs, compare players, predict results, set goals and even compare players from different eras! Although it's all comparative, it's fixed to a specific level at a specific time so it can actually be treated as an absolute figure. That's really important so that when a player knows what level they are, they know what that means. These are the sorts of levels we find. Numbers very approximate as clubs, counties etc. do vary considerably!

  • Beginners - (< 100)
  • Leisure players (50 - 300)
  • Club boxes players (200 - 2000)
  • County league players (500 - 3000)
  • Top county league players (3000 - 10,000)
  • PSL (10,000 - 30,000)
  • Satellite PSA (20,000 - 40,000)
  • Top PSA (30,000+)

With the vageries of human behaviour, millions of matches over decades of time, it's a complex task!

What is calibration?

The calibration engine is at the heart of SquashLevels, pounding away at all the new results every night generating levels for every player for every match across the system.

The following sections attempt to divide a complex engine into its constituent parts and give you an insight into how it all works. All the details are left out for obvious reasons but, if you read this FAQ, you will have a pretty good idea what goes on under the calibration hood. Anoraks on...

Player calibration

This is the most obvious part of the engine with the system assessing the level that the player is playing at and assigning a level value to them. For every match the system compares the actual result with the expected result against their opponent and they go up a bit if they play better than expected and down a bit if less well than expected. All ranking systems do this though SquashLevels makes a point of using points scores for accuracy. The algorithm itself is based on:

  • Maths - PAR is easy (11-5 is about twice as good), English scoring less so. We use a combination of points scores and games scores to assess the result. The overall goal is that if you are twice as good as your opponent then your level will be double theirs. This works all the up from beginner (<50) to top pro (>50,000)
  • Weighting - the more important the match (e.g. a tournament) the greater the weighting. This allows you to play a box match without having too much impact on your league standings. See the FAQ on match type weightings below.
  • Behavioural modelling - as it turns out, not everyone puts 100% effort in every match and that’s down to behaviour. There are many other cases too where player behaviour defies the maths and, based on the analysis of 1.6 million results on the system, we’ve built an extensive behavioural model that allows us to predict and make use of these behaviours. This is a critical part of the engine because it has a very significant effect on player levels. It also results in uneven level changes where one player can go up more than their opponent goes down and that causes ‘drift’! For more on our behavioural modelling and how we combat drift, see the relevant FAQ sections below.

We can work with game scores only, making assumptions around the average 3-0 result (based on our analysis of real 3-0 match results) but we can only use averages so it takes a lot more results for the levels to become accurate. Not all 3-0 results are the same, obviously.

Intra-pool calibration

Applying player calibration to a set of players who play each other over a period of time naturally calibrates all those players over time. The more they play and the more opponents they play, the quicker the calibration. They are effectively a pool of players such as those from a club or a county league.

This is a natural effect and doesn’t require anything specific from the calibration engine. All calibration engines therefore provide intra-pool calibration.

Pool calibration

A player’s level doesn’t mean much unless you can compare them with other players on the system. I.e. A 1000 level player in Surrey should be playing at the same level as a 1000 level player in Yorkshire, or Calgary for that matter.

This is ‘pool calibration’ where players in a pool are treated ‘as one’ and then compared with other pools such that their respective pool levels are equivalent. The comparisons are made by analysing the results of those players who play in more than one pool but, as ever, you have to be careful which results you use and how you use them. Behavioural modelling is really important for this.

There are different types of pools and they behave slightly differently such as geographical pools like Yorkshre and Surrey and then there are club boxes, tournaments, ladders, tours and so on. Club boxes are interesting as the top players are usually more associated with their county pools and the actual pool boundary is further down the boxes. Tricky!

Just to add to the challenge, some pools are a subset of other pools (e.g. a tournament series in a club) and others are made up of subsets from a number of other pools such as regional events and leagues. We refer to these as derivative pools. They might appear to need calibrating but they can’t be adjusted!

As long as there is a group of players who play each other at least a few times over the course of a season then they can be considered a pool. There’s a good deal of complexity around automatically identifying the pools and where the pool boundaries are.

Time calibration

Another goal is that a player’s level is also equivalent over time. I.e. If your playing level is twice now what it was two years ago then you should be able to see that in the charts and also make predictions and set goals for yourself in the future. It’s also good fun to compare ‘best levels’ from different players at different times and see who might have come out on top had they played each other in their prime.

Drift is the big problem for time calibration and, as mentioned before, the use of behavioural modelling causes drift. There are some interesting factors that effect and cause drift, all of which have to be taken into account over hundreds of thousands of results over, literally, decades. If we’re out by even a small amount, the drift really adds up and you just can’t compare more than a few years.

These are some of the most common causes of drift:

  • Those players who start low, end high and then stop playing. That’s level lost!
  • All the irregularities from behavioural modelling
  • Gaps in player history - especially juniors who get (a lot) better without apparently playing (missing results) and masters… who don’t…
  • Pros who suddenly stop at stratospheric levels and then reappear 5 years later in the leagues at a much lower level.
  • We all get better the more we play - but by how much? It’s hard to take this into account in a fundamentally comparative algorithm but if we don’t the whole system actually drifts down over time.

It would be nice if it all cancelled out but it doesn’t. The analysis we have done has allowed us to quantify all of these drift effects and we’ve been able to factor them into the algorithm. The results now look pretty good - even over a 20 year period.

Level calibration

The whole system is about player levels but one of its goals is to be fully inclusive so that beginners can compare with club players who can compare with league players and so on all the way up to the top pros. It’s great fun to run the predictor between ourselves, keen amateurs, and the top pros or anyone else on the system and actually get a predicted result.

Clearly, the pros don’t play the beginners so we rely on the results of beginners playing leisure players, leisure players playing box players etc., all the way up to the top pros. There are actually 8 distinct levels that we need to calibrate across (levels approximate!):

  • Beginners - (< 100)
  • Leisure players (50 - 300)
  • Club boxes players (200 - 2000)
  • County league players (500 - 3000)
  • Top county league players (3000 - 10,000)
  • PSL (10,000 - 30,000)
  • Satellite PSA (20,000 - 40,000)
  • Top PSA (30,000+)

This doesn’t leave much margin for error as, with a comparative based system, any errors will be exaggerated at the ends.

Automation and stability

The idea is that every night, new results are imported from the many systems that connect to SquashLevels and then the calibration is let loose on them such that the new player levels are worked out for those matches while ensuring level equivalence between pools and over time.

The really key part though is that this needs to be done completely automatically. With hundreds of thousands of results coming in for tens of thousands of players it simply isn’t possible to do any of this manually. Scale is the big killer for these complex algorithms - unless they are fast, efficient and automated.

This turns out to be very difficult to achieve because of the ambiguities of player behaviour, player results and the models we apply to match them. The adjustments are made a little at a time with the goal being that they all match nicely and the system becomes stable but, in reality, humans are not that straightforward and unless you get it right, you really can’t achieve stability and the levels keep changing forever.

How does it work?

The calibration engine has code that works on all of the calibration types pretty much at the same time making small adjustments and moulding the data almost like clay to get it as close as possible to the minimum overall calibration error.

There is no right answer as we’re dealing with human behaviour and you only have to watch yourself and your teammates varying wildly from week to week to realise it’s not an exact science! The goal is to get as close as we can.

The engine separates its efforts into:

  • Player calibration - ensuring that each player is correctly calibrated within their pool
  • Pool calibration - ensure that each pool is correctly calibrated compared to the other pools

In both cases the engine attempts to set a starting level and then adjust dynamically over time. Note that, as a pool is made up of players, it is the player levels that we are adjusting; both for the starting level and also over time.

Fundamentally, there are three processes running as we work through the results

  • Identify the pools - allowing for them to change season by season
  • Compare the pools with all other pools
  • Adjust the pools - i.e. set the starting level as soon as we can and then adjust dynamically

Identifying the pools is not straightforward as they come and go and change over time. Some of the bigger pools like the county leagues are good examples of pools but they also vary quite a bit over time with new players coming in, others leaving or retiring or, simply, the county deciding to restructure their leagues! It is a complex task to identify all the pools, which players are core to those pools (as opposed to occasional players, visitors etc.) and where the pool boundaries are. As mentioned earlier, clubs are interesting because often the top players play in the county leagues so we have to decide if they are representative of the club or the county… This all has to be done automatically.

Comparing the pools is also a challenge. One of the great things about squash is that there are many players who play in more than one pool such as the county players who play in multiple counties and also the tournament players who travel around the region or further. The trick is to use these players to compare their respective pools.

It turns out that these multi-pool players, being human, are somewhat variable in their playing level as they play from pool to pool but, through all the behavioural analysis we have done, we have worked out which players we can use, which matches are representative. We can even predict their expected effort allowing us to use a lot more matches as, previously, we’d only been able to use the subset of matches that we felt full effort had been applied.

After all this analysis we end up with a sparsely populated matrix, several hundred pools across and down with their pool differences recorded where we have them. This is effectively a massive simultaneous equation with semi-random numbers to work from. The trick is not to attempt to solve it but to minimise the overall error - small adjustments at a time. Large adjustments can cause calibration shock which is to be avoided!

Once we have the pool differences we then need to adjust the pools as best we can. A pool’s starting level is made up of all the players in the pool but as those players come and go over time, we have to be careful which ones we adjust. We even need to assess whether a pool can be adjusted as it may be derived from other pools that have already been calibrated.

The starting levels of both players and pools are worked out and set but after that they can only be adjusted a very small amount each match. So, in order to be able factor in these dynamic adjustments needed over time, we created a level pump. This effectively runs throughout the calibration process adding or reducing each player’s level after each match to make the overall adjustments needed; whether for the player’s themselves, the pool they are playing in or the system as a whole. They must be small enough to be virtually unnoticed but large enough that they have enough effect. A difficult balance. If you look at the match review page you will see, at the bottom, an allowance for this dynamic adjustment.

For a full calibration, once the new results have been read in, the engine is fed all the results from day one, in order, all the way through to the very latest results. This is needed as new feeds connect and their results can go back 5-10 years or more. For a partial calibration, just the last few seasons are re-run ensuring any recent historic result changes are taken into account (e.g. merging player histories). Some form of calibration is run every night to keep up to date with the latest results and updates.

What changes were made over the summer?

The original calibration algorithm was pretty good but, after nearly 20 years of results, it was showing some areas that clearly needed some investigation. In particular:

  • The pro men were too low
  • The pro women were too high
  • The historic levels were too high (those more than 10 years old)
  • Some counties were too low
  • Some clubs were too high
  • There were some oddities in the junior levels

We had also received quite a bit of feedback on the better players being penalised for holding back on their lesser opponents and, although we’d allowed for that, it seemed we needed to do more.

This summer, we spent a great deal of time and effort updating the SquashLevels calibration engine to address these problems and also beef up our behavioural modelling for even greater accuracy. We’ve had 1.6 million results to learn from and it’s been a fascinating learning, all of which has been dialled into the algorithm. There really isn’t a calibration engine like this for squash, or any other racket sport for that matter.

We have been busy:

  • Developed behavioural profiles for both effort and weighting for players of all levels and all match types. This allowed us to use many more cross-pool results and be better able to filter out rogue results.
  • Developed the calibration level pump that allows a more dynamic adjustment of players and pools over time
  • Using our behavioural modelling and level pump we have been able to develop a near ‘lossless’, drift-free algorithm that lets us compare levels over very long periods of time.
  • Improved the accuracy of calibrating results with no points scores though it’s still significantly better if you have them.
  • Specifically detect and allow for players on meteoric rises so that we can reduce the impact on their opponents as they work their way up.
  • More accurate (typical) assumptions around what happens to juniors if there are periods (sometimes multi-year) where the system has no results for them. Also for masters and retired pros.
  • Updated the amount of change any player can have per match and per tournament based on level and match history.
  • Created a provisional level for those players with just a few results in their history and are indicated with a (P) in their profile and rankings. The purpose of this is to allow players with a provisional level to play friendly matches with players who have a non-provisional level in order to establish their own level but, very importantly, the match does *not* affect the level of their opponent. We’ve had feedback that these matches are avoided to protect their level! Not so now. Please support new players if they ask for a match to get them started. Note - you need to play ‘properly’ in order to give them an accurate level!
  • A more lenient approach to better players giving their lesser opponents a runaround. With the improved behavioural modelling and analysis from thousands of example matches we have split the outcomes into four bands:
    • The better player actually played better than expected - they go up as normal, their opponent goes down.
    • The better player played down as expected - no change for either player.
    • The better player played at a lower level than expected but within an acceptable range. No change for the better player but their opponent gets something for pushing the better player hard.
    • The better player played below the ‘acceptable range’. Their level goes down a little (it’s still damped) and their opponent is rewarded.
  • Related to the above point is the detection of exhibition matches. This is focused on PSA players playing league matches - even against other PSA players - as well as other cases. These matches are nearly always exciting 3-2 encounters to keep the crowds entertained but it doesn’t help the accuracy of the system! Levels are not changed for exhibition matches unless their opponent is lower level (it’s all relative!) and they do particularly well in which case they are rewarded at least a little.

How will it have affected my level?

Every player’s case will be different but it’s almost certain that your level will have changed! 

In general we found:

  • The Pro men players are much higher - where they ought to be
  • The Pro women players are lower - and are now about right compared to the men
  • The better players have generally gone up, partly because the Pro men have gone up but also because the system is more lenient on them giving their lesser opponents ‘a good game’.
  • Many of the clubs were too high so they have been lowered accordingly
  • The counties were mostly OK because of their size and number of matches but there were some adjustments made.
  • Some of the ladies-only leagues were a bit high and have been lowered accordingly
  • Some of the crazy levels caused by rogue results (or actual results that were simply mad) have been corrected. Not all, alas.

At a pool level, the changes are mostly around the 5-10% mark though there are some higher than that. At a player level some of the changes are very significant but we’ve studied the biggest movers (both up and down) and believe that they are all more accurate than they were before. Please see the point below if you feel you’ve been hard done by.

Are there limitations?

There certainly are. Despite huge effort around the behavioural model we have to acknowledge we’re dealing with humans and volatile humans at that. By actively using the models we have been able to use many more results to calibrate which helps get the averages right but, we acknowledge that they could still be wrong.

We find:

  • Not enough players playing their first matches at the start of the pool so that, even though the pool appears a little high or low, we can’t set it’s starting level. We call these pools 'derivative’ and do our best to adjust them dynamically using the level pump.
  • Not enough players playing multi-pool. In some cases, a pool of players play pretty much in isolation so they are impossible to calibrate automatically. Or there’s just one player who pops in for a bit of a knock and makes everyone look good.
  • Duff or unexpected results, if not averaged out by large numbers can have an adverse effect. We only adjust if we feel there are enough but, even so, there are cases where too many results are just not representative.

What if I feel my level is wrong?

Do let us know. Please don’t worry about 5-10% as we change by more than that each week anyway. If you think you’re 50% out then that’s another matter.

We can look at your history and make a manual assessment if your level appears wrong. In these cases we can usually identify the results that may be causing the issue or, it may just be that we feel it’s actually more accurate now. Each case is different and we strive for accuracy so we’ll take a look.

We do have some controls:

  • We can override your starting level
  • We can override your level for any match so, for instance, if you’re a junior who’s improved greatly but have no results for a long time (and our allowance for that was way out) then that can be corrected.
  • We can raise or lower entire pools though only those that are not ‘derivative’.

Match type weightings

A question often asked so here’s an overview. We work out what we can from the name of the match type and source of the results. With nearly 6,000 match types we have to automate though the weightings can be manually overridden.

Match type weightings are applied at the end, after all other damping so it’s an independent control over how much levels can change. In general:

  • Multi-club or regional tournaments - 100% (i.e. no additional damping)
  • Club tournaments - 80%
  • County leagues - 75%
  • Summer leagues - 60%
  • Club boxes/internal leagues - 50%
  • Manually entered challenge matches - 40%

Just to compare, three tournament matches in a row (such as from a single tournament) will have a compound weighting of 100% whereas three box matches in a row will be 12½ %. Quite a difference!

Behavioural modelling

It’s a key part of the calibration engine so we’ve offered it as a separate FAQ.

The following cases all affect behaviour and have been modelled as best we can based on analysing the 1.6 million results in the system:

  • Good players playing lesser players. A big one!
  • Exhibition matches
  • Player level - the pro players are a lot more consistent than the rest of us
  • Tournament disasters
  • Big gaps in your history - particularly for juniors
  • Retiring pros who then reappear in the county leagues years later
  • Upcoming stars with meteoric rises. For these players we can apply ‘accelerated level adjustment’ and we can soften the blow for their opponents.
  • How much effort a player is likely to put into each match type. For example, county league players treat club boxes as a bit of fun whereas they’re deadly serious for the players further down the boxes. Same applies to PSA players playing PSL. Just two examples.
  • Limits on match and tournament movement based on level, time and match type. This is also protection against rogue results. Most players really don’t improve (or get worse) that quickly though, bear in mind we’re measuring playing level not ability so it’s all going to be quite dynamic - that’s half the fun!

How can I give my lesser opponent a runaround without getting penalised?

This question is asked a lot and we’ve made changes to improve the algorithm for this case so we thought we’d treat it as a separate FAQ.

The system has always allowed for the better player to play down but we included this case in our behavioural modelling over the summer (based on thousands and thousands of real cases) and we are now able to plot graphs showing the typical effort applied by the better player!

One of the interesting learnings from this is that the better player actually starts to hold back even when there’s not that much difference between the players. Probably unwise given how much we vary but that's’ what happens!

Our analysis covers player level ratios right up to 10:1 and showed remarkable consistency in the effort that the better players put in at all levels. Even the pros. We found a distinctive range of effort that we consider the ‘acceptable range’ for the better player. This allows us to put the result into one of four bands:

  • The better player actually played better than expected. Clearly no leniency shown and normal results processing are applied. They go up as normal, their opponent goes down.
  • The better player played down as expected. Typical behaviour (as we have learned) and no change for either player.
  • The better player played at a lower level than expected but within their acceptable range. No change for the better player but their opponent gets something for pushing them hard.
  • The better player played below their ‘acceptable range’. Their level goes down a little (it’s still damped) and their opponent is rewarded.

In addition to all this there is more damping applied the further apart the players are so even the most generous of runarounds shouldn't have too much effect if the players are very unevenly matched. We're talking 3:1+ here, mind. Just being 30% better doesn't count as all that one-sided!

What we’re hoping is that this will encourage the better players to give their opponent a good game but certainly not destroy them. They should also be aware that their level will go down if they just mess around but, even in those cases, it shouldn’t go down by much. These are verging on the exhibition matches but we didn’t want to exclude them altogether.

And finally...

We have done our very best to make this a genuinely useful and fun resource and believe that most player levels are probably somewhere near right BUT please remember it's based on the vageries of human behaviour - your opponent's as well as your own - and it's certainly not 100% accurate to two decimal places!

So don't take it TOO seriously! Try to improve, yes, enjoy the ebb and flow, have a laugh in the bar aftwerwards but it's just a number at the end of day - one of the less important things in your life!

And for the same reason, if you're a team captain making your selections, use these levels as a guide rather than a black and white selection cut-off. You know your players better than we do. You know if they're having a couple of difficult weeks and will bounce back. All SquashLevels does is try to measure it. YOU are the one selecting the team. Don't leave it to a computer!