Squash Levels System

How the Squash Levels system works

This page is still under construction.

There are a few systems around that take results from a league and assign a value to each player but it requires a great deal of additional functionality to provide a system that can be used across multiple leagues, counties, nationally and beyond.

This page describes what it takes to construct such a system. It is focused on what is needed and how it works rather than what it does or how to use it. Please see the help pages for that.

An effective system has:

Click on the headings below to reveal the explanation.

Access to results

The results of all players’ matches across the county are spread around a number of different systems. These are mostly league management systems but there are also boxes, tournaments and the PSA/WSA results on squashinfo.

For those systems where we have been able to connect to ‘system to system’ it is possible to pull the results across automatically and quickly using agreed formats every night. This is by far the most effective way of accessing results but for other systems it’s also possible to extract the results from their user web pages. This can only be done occasionally as it takes much longer and is deliberately slowed down to make sure the respective web site is not impacted.

Click here to see the sort of format we use for system to system tranfers. Just include any new or modified matches for the time period requested.

We work directly with the IT brains behind each respective league system to work out how best to automate the sending of the results. All systems are different but it usually turns out to be a straight forward exercise. Click here to see the output from the Avon system. If you're one of those IT guys and interested in hooking up then please get in touch. We'd love to work with you.

The system is totally dependent on the accuracy of the received data so a certain amount of intelligence has to be applied to the parsing to make sure the scores are sensible and the right way around! Any doubt and only the games scores are used or, worse still, the result has to be ignored.

Accurate identification of players

With tens of thousands of players on the system, we can't just rely on player name to know whose results are whose. Besides, it's very common to find the same player appearing multiple times on a single league system as they change clubs or their names are just scribbled down wrongly at the end of the evening. We use a unique player ID for each player on the SquashLevels system but it's important that the source systems also have player IDs and we store those along with the player details.

The source system player ID allows us to differentiate players with the same name who are actually different players but we also need to be able to identify and merge duplicate players who have different source IDs but who are actually the same player. By merging duplicate players and remembering all the source IDs they have from the various league systems allows us to continue to receive results for apparently different players but still know they are for the same person!

One of the tasks of the league admin is to identify duplicate players and merge them using the tools provided. This is mostly a one-off exercise when the league is initially added to the system but there will always be a continuous appearance of duplicate players that need to be identified and merged. If not merged then a duplicate player will appear multiple times in the player listings and have separate match histories. It's encouraging that when this happens, the duplicates usually have very similar levels - it's an accurate system!

The ability to store and process large numbers of players and results

A system covering the whole of the UK and beyond must expect to receive, store, process and cope with more than a million results received over a number of years from around 50,000 players. This level of scale requires special attention to make sure the system is responsive.

Most pages are constucted on-the-fly using information from the database even though they quite often require a very large number of calculations but some are just too complex, use enormous amounts of data or simply take too long. For these pages, we use pre-processed data which is generated the night before and then can be made available to users very quickly on demand. Although we keep this to a minimum, the amount of pre-processing being done over night is increasing as the database gets larger.

To make it easy to find a player

With so many players on the system it needs to be easy to find who you're looking for. The system provides a dynamic name look-up as the user types offering a continuous list of possible players allowing the user pick the one they want based on the county, club and level shown along with the player name.

The user can split the name up into fragments to give the system more flexibility for the look-up. For instance entering 'jo n sm th' will find 'John Smith', 'Jon Smith', 'Jonathon Smith' and even 'Jon Smythe' so it's very easy to find even miss-spelled players'.

An accurate player level algorithm

A player's level reflects their ability at any particular point in time and fluctuations in their level of play should be reflected in their numeric level as accurately as possible. The whole point of the system is to show their level, how it's changing over time and how it compares with other players anywhere in the country or beyond so it needs to be accurate.

The basic alrogithm is fairly straight forward - a better than expected result causes a player to go up and a worse than expected result causes it to go down - but it also needs to cope with the many extremes and boundary cases that crop up in real life. The reason this system is so accurate is that it takes human behaviour into account. It's not the maths that's complex - it's the psychology!

  • Points scores are also used. There's a big difference between, say 9-1, 9-3, 9-2 and 9-7, 10-8, 9-7 so using the points scores keeps the algorithm accurate even when players are not well matched.
  • The system covers players from complete beginner (<50) to the very top pros (70,000+).
  • It has to cope with matches between extremes of ability (such as pros playing in local leagues or even just the randomness of the summer leagues). In these cases, the better players aren't usually taking it a bit easy and shouldn't be penalised for that. The better players are assumed to be playing within a range of levels allowing them to take it easy.
  • Be able to keep up with juniors who get better very quickly.
  • If the difference in player levels is > 3:1 and that is backed up by the result then the levels are not adjusted. It's just too extreme to be accurate.
  • If the difference in player levels is > 1.5:1 then it's likely that the better player isn't trying his/her absolute best but their opponent probably is. The system applies an ‘effective effort’ factor (it’s applied gradually) to the better player’s level before the calculations are done. This is especially useful for the summer league when there are quite often mismatches in abilities and we don’t want to penalise the better players for not hammering their weaker opponents.
  • Players with a lower level can adjust faster than those with a higher level. This is really aimed at the juniors who can cause havoc in these systems as they tend to improve very quickly and often only play a few ‘officially competitive’ matches each year (e.g. the local junior tournaments). We need a junior’s level to keep pace with their actual ability and, at the same time, we don’t want them to ‘take’ too much level from their opponents. We’ve put a lot of work into levels for juniors.
  • It has to cope with players who leave for a while and then come back much better or worse than they left. See point below.
  • If a player hasn't played for a while or they are new to the system, the confidence in that player's level is reduced. This allows these players' levels to move more quickly than a player with recent history and, hopefully, get them to the right level in just a few matches without impacting their opponents along the way.
  • An unexpected result also drops a player’s level confidence. This allows them to return to ‘normal’ more quickly in the same way.

The reality is that the algorithm has to be much more focused on understanding human behaviour than being a mathematically correct exercise!

Cut the data in different ways to generate listings

With tens of thousands of players on the system it's important to show the player listings that are interesting to each specific user. The most obvious listings are by club or county but the system provides a full list of ways to list players:

  • Club - all the players who play for a specific club. Note that this needs to include players who play for other clubs as well as the one requested.
  • County - all the players who play for a specific county - or are officially playing for that county. Again, players who play in more than one county need to be included.
  • Country - all the players of that nationality. We have a default listing for the English players.
  • By event - everyone who played in a specific event or a class of event such as the BSPA series.
  • Age group - all the age categories are covered as well as general 'juniors' and 'masters'.
  • Sex - by male or female.

To be able to update player details from the latest results

Player details change fairly frequently whether it be their club, county, age group or even name. It's not possible to do this manually with so many players on the system so the player details are updated using the information passed in along with the results from the source systems.

In some cases the information is pulled in with the results and in other cases, the information can be assumed just because it comes from a particular source. For instance, everyone who plays in the Kent leagues can have their county updated to Kent.

In contrast to that, it may be necessary to lock the player's county for those players who represent their county but actually play in different counties. This only affects the top England players but is necessary functionality for such an all-inclusive system.

It's also possible to manually override these details.

Be able to work back in time

When the results are first received for a particular league, the system has to find players who already have a level from which to work out the levels of all the other players in that league. This is a two step process:

  • The admin needs to identify players who are duplicates between the new league and the players already on the system. Once merged, these players will seed the new league.
  • The system will work both backwards and forwards from the first matches in which player levels are known. With the starting point of different counties sometimes being years apart, it's not uncommon to have to work back two or more years from the first known levels to the start of the league.

Going backwards can't be damped so there are controls in place to restrict the effect of unexpected results.

This process gives a good approximation for the initial level for all new players which is then improved over time using the automated initial level and league calibration processes described below.

Consolidate county and club names

With the many different leagues and sources of results, county and club names received can vary quite a bit even though they're referring to the same county or club. This is a similar issue to having duplicate players as we'll end up with duplicate counties and clubs on the system if they are not merged together.

The system needs to be able to identify each duplicate county or club name as it's received from the results input data and consolidate it to a single, unique name when the result is added to the system. This mapping needs to be agreed with the county admins.

Some systems actually return team names rather than club names so there's an extra level of filtering that needs to be done to derive the club name from the team name first. It isn't always possible to do this generically so the county admin may need to provide more specific mappings when needed.

Finally, when all of this is done, there are quite a few clubs that appear in more than one county. With the name mapping described above, these clubs will at least have the same name but they will behave like different clubs on the system. E.g. listing the players for Shepton Mallett in Somerset will give different listing compared to the players for Shepton Mallet in Avon. These shouldn't be merged because we need to keep the club to county associations but we do also need to be able to treat them as a single club for club specific listings. This requires yet another mapping to be applied dynamically at the time of the listing.

Automatically set the initial level for each player

TBC

Auto-calibration across leagues

It’s a fundamental requirement of a multi-league system that the levels assigned to players are equivalent whichever league they play in – that’s the whole point. So a 1000 level player in the Avon Mixed league is the same standard as a 1000 level player in the Yorkshire leagues is the same standard as a 1000 level player in the Kent NW league. And so on.

This is done by looking for players who play in more than one league and for each transition between the leagues, look at the impact of that transition on their level. If it goes up (on average) then it’s likely that the league they have transitioned to is a bit too high compared to the one they transitioned from. By combining all of the transitions between all of the leagues (like a large set of simultaneous equations) it’s possible to calibrate all the leagues compared to each other.

We need one league to act as a gold standard so we pick the one that has a large set of the most consistent players with a good likelihood of transitions between it and as many other leagues as possible. For this reason we have picked the combined tournaments that make up the PSA. This works well but note the first point on the list below.

There are a few factors to take into account that make this an interesting exercise:

  • Analysis of the data shows us that players tend to up their game by about 10% when playing in tournaments
  • Some players drop out for a while as part of their transition so you can’t trust their level so much
  • The leagues (or at least the available data for the leagues) all start at different times and some leagues even come and go over time
  • Some leagues share large numbers of the same players so changing those players affects more than just the league being targeted. There are quite a few inter-league dependencies.
  • Very few players play consistently so small numbers of player transitions can add significant inaccuracy to the calibration if not weighted appropriately.
  • Juniors, in particular, change very quickly and if they do that at the same time as changing leagues it can throw the whole calibration out.
  • There are groups of players who just play each other whichever league they are in - you can't use these player transitions to calibrate the leagues.
  • Some players alternate between leagues so their level never stabilises enough to be usable.
  • Duplicate players can reduce the effectiveness of the calibration. Once a player is identified as playing in multiple leagues, their history needs to be combined and re-run otherwise their transitions will not be seen.
  • At what point do you start measuring? Leagues have a start-up period and it takes a while to get a decent number of player transitions.
  • At what point do you stop measuring? Leagues change over time so you can’t just take a 10 year average. You really want to calibrate them as soon as possible (i.e. give them a starting point) and then monitor them for change beyond that.
  • A league needs to be calibrated across a whole number of years because they change over the course of a year such as summer v winter sub-leagues. A calibration of a year and a half, say, gives different results.
  • The league names are mostly annual and seasonal so you need to combine them to identify a consistent set of long term leagues. You can't separate these as; they are too short lived and have all the same players.
  • After all of the above, a league may still be considered too high or too low in which case we need to provide a tool to the county admin to be able to raise or lower it. This tool has to be an integral part of the calibration process.

All of the above gets you to the point of being able to measure the calibration across the many leagues on the system but the next question is; how do you change the level of a league? This raises a few more questions:

  • What constitutes the level of a league?
  • Who are the players in that league? Is it everyone who’s ever played in it, anyone who’s played in it recently, anyone who has mostly played in it or even anyone who has only played in it? These are all possible definitions.
  • Do you change all the leagues at the same time or go through a sequence of leagues? Do they all need changing?
  • Do you make the change in one go or make many, small adjustments?

After considerable analysis, it turns out the most effective strategy is to make many small adjustments to all the leagues over a period of time. This allows the calibration process to adjust and keep up with the interactions of the leagues – some have quite a few players who play in other leagues. These adjustments are made every night after the normal match result processing.

The starting level of a league is changed by changing the starting levels of the players who play mostly in that league along with anyone else who has played in it more than a few times. The starting period turns out to be at least a year but there also needs to be a minimum number of transitions to maintain accuracy. Once you go beyond two years, the character of the league can change and you’re left with a calibration that is simply inaccurate at all times!

As the player starting levels are adjusted, this has an interaction with the player starting level process (previous section) so this is another reason to make many, small changes rather than one big one, to prevent these inter-dependent processes from tripping over each other.

How this will affect you

In a nutshell... your level is likely to change. How much it changes and whether it goes up or down will depend on how the league you play in compares with all the other leagues. Since the whole process is automatic you'll have to wait and see... Some leagues may have to change quite a bit whereas others may hardly change at all.

The process hasn't started yet as we're still testing on our development servers but it will be kicked off fairly soon. We'll update this page when that happens.

How you can help

This process is based on how the system interprets the changes in player level as they transition from league to league but, given the difficulties listed above, there may be some inaccurracies. If you have a good idea of the releative levels of some of these players and you feel that a league is too low or too high compared with any other league, please get in touch and we'll take a look at what's going on.

We don't believe this process has ever been done before - which is not all that surprising given the complexities - and it make take a little adjustment before we get it right.

Auto-calibrate over time

Players come and go over time, play in different leagues, get better, get worse, have periods off and generally spend most of their playing time being inconsistent. Yet it’s really important that a 1000 level player in 2014 is the same standard as a 1000 level player back in 2005 so we can compare over time. This means:

  • A player can tell if they’re actually getting better
  • You can compare players from different times
  • You can actually find out who the best player of all time is!

This is done using a few techniques and making a few assumptions:

  • Once the initial cross-league calibration is done, the same calibration process can be used over time to monitor and adjust the leagues. The adjustments are made using tiny (less then ½%) changes to everyone’s normal level adjustments after each match which allow whole leagues to be adjusted on-the-fly.
  • The complexities of allowing for human nature in the standard level calculations result in unequal adjustments being made to the two players and that causes an overall net loss or gain of level. This can have long term effects on the levels of the leagues and of the system itself. We therefore add a process for ‘catching’ the level loss or gain and feeding it back into the system.
  • It is very common for players to start low, get better over time and then leave. During their time of improvement, they ‘take’ level from many other players in the system but once they’ve left it can’t be ‘given back’. Some equivalent of giving back has to be included in the overall process or the overall level will reduce over time. Yes, the system is exothermic!

Tools

To support and maintain the system, there are a number of tools available to the admins.

  • Find duplicate players - this admin-only tool looks for potential duplicates for a specific county and offers a list. These are players who have played in the county but also have been found elsewhere in the system either in the same county or beyond. The most obvious comparison is by name (exact match, initials and surname or just surname only) but the system can also check that the levels of the duplicates are about the same and that they haven't played for two different clubs on the same date. The combination of all three checks limits those players identified to a likely set but we still perform a manual check before going ahead with the merge.
  • Suggest duplicate players - this tool is available to all users and suggests any possible duplicates at the top of each player's history page. The system makes a quick check when the page is requested and, if there are any, lists the likely candidates along with a form allowing the user to say yes or no to each one. This input is taken as a request which the admins can check and act on later. Non-admin users can't do the actual player merging themselves.
  • Merge duplicate players - Once it has been confirmed that two players on the system are the same person then this tool can be used to perform the actual merge. This requires going through every reference in the database for one of the players and changing it to refer to the other player instead. At the same time, the player attributes (such as club, county, starting level etc.) are combined to provide the most useful and up-to-date information. E.g. the last club played for is the club used for the merged player whereas the initial level of the earliest match played is used. If both players have an age category defined then a prioritisation is applied to come up with the most likely one for the merged player. It's really important that source system details are retained for both players so as results continue to come in from all those sources - intended for the non-merged players - they are all associated with the merged player by the system.
  • Separate incorrectly merged players - this can happen and we need a way of unmerging them. In this case the admin is offered a number of ways to separate the players such as by the club or league and based on that, the system can create a new player and separate out the match results. It's equally important that the source information is separated too otherwise future results for both players will still go to the merged player.
  • Add/remove/edit players
  • Add/remove/edit match results
  • Manage events
  • Manage membership and logons
  • Database tools

Roles

The system is only as good as the data it receives and there are a number of roles that the system needs in order to keep it accurate, current and useful.

  • System admin - keeps an eye on the system as a whole, checking the nightly processing, that the listings in general are sensible, that there aren't any rogue players or results messing things up. Also works with the various league system IT folks to integrate their results into the system.
  • County admin
  • Club admin
  • Member
  • Causual user
  • League system IT support