How the Squash Levels system works
This page is still under construction.
There are a few systems around that take results from a league and assign a value to each player but it requires a great deal of additional functionality to provide a system that can be used across multiple leagues, counties, nationally and beyond.
This page describes what it takes to construct such a system. It is focused on what is needed and how it works rather than what it does or how to use it. Please see the help pages for that.
An effective system has:
Click on the headings below to reveal the explanation.
Access to results
The results of all players' matches across the county are spread around a number of different systems. These are mostly league management systems but there are also boxes, tournaments and the PSA/WSA results on squashinfo.
For those systems where we have been able to connect to 'system to system' it is possible to pull the results across automatically and quickly using agreed formats every night. This is by far the most effective way of accessing results but for other systems it's also possible to extract the results from their user web pages. This can only be done occasionally as it takes much longer and is deliberately slowed down to make sure the respective web site is not impacted.
Click here to see the sort of format we use for system to system tranfers. Just include any new or modified matches for the time period requested.
We work directly with the IT brains behind each respective league system to work out how best to automate the sending of the results. All systems are different but it usually turns out to be a straight forward exercise. Click here to see the output from the Avon system. If you're one of those IT guys and interested in hooking up then please get in touch. We'd love to work with you.
The system is totally dependent on the accuracy of the received data so a certain amount of intelligence has to be applied to the parsing to make sure the scores are sensible and the right way around! Any doubt and only the games scores are used or, worse still, the result has to be ignored.
Accurate identification of players
With tens of thousands of players on the system, we can't just rely on player name to know whose results are whose. Besides, it's very common to find the same player appearing multiple times on a single league system as they change clubs or their names are just scribbled down wrongly at the end of the evening. We use a unique player ID for each player on the SquashLevels system but it's important that the source systems also have player IDs and we store those along with the player details.
The source system player ID allows us to differentiate players with the same name who are actually different players but we also need to be able to identify and merge duplicate players who have different source IDs but who are actually the same player. By merging duplicate players and remembering all the source IDs they have from the various league systems allows us to continue to receive results for apparently different players but still know they are for the same person!
One of the tasks of the league admin is to identify duplicate players and merge them using the tools provided. This is mostly a one-off exercise when the league is initially added to the system but there will always be a continuous appearance of duplicate players that need to be identified and merged. If not merged then a duplicate player will appear multiple times in the player listings and have separate match histories. It's encouraging that when this happens, the duplicates usually have very similar levels - it's an accurate system!
The ability to store and process large numbers of players and results
A system covering the whole of the UK and beyond must expect to receive, store, process and cope with more than a million results received over a number of years from around 50,000 players. This level of scale requires special attention to make sure the system is responsive.
Most pages are constucted on-the-fly using information from the database even though they quite often require a very large number of calculations but some are just too complex, use enormous amounts of data or simply take too long. For these pages, we use pre-processed data which is generated the night before and then can be made available to users very quickly on demand. Although we keep this to a minimum, the amount of pre-processing being done over night is increasing as the database gets larger.
To make it easy to find a player
With so many players on the system it needs to be easy to find who you're looking for. The system provides a dynamic name look-up as the user types offering a continuous list of possible players allowing the user pick the one they want based on the county, club and level shown along with the player name.
The user can split the name up into fragments to give the system more flexibility for the look-up. For instance entering 'jo n sm th' will find 'John Smith', 'Jon Smith', 'Jonathon Smith' and even 'Jon Smythe' so it's very easy to find even miss-spelled players'.
An accurate player level algorithm
A player's level reflects their ability at any particular point in time and fluctuations in their level of play should be reflected in their numeric level as accurately as possible. The whole point of the system is to show their level, how it's changing over time and how it compares with other players anywhere in the country or beyond so it needs to be accurate.
The basic alrogithm is fairly straight forward - a better than expected result causes a player to go up and a worse than expected result causes it to go down - but it also needs to cope with the many extremes and boundary cases that crop up in real life. The reason this system is so accurate is that it takes human behaviour into account. It's not the maths that's complex - it's the psychology!
The reality is that the algorithm has to be much more focused on understanding human behaviour than being a mathematically correct exercise!
Cut the data in different ways to generate listings
With tens of thousands of players on the system it's important to show the player listings that are interesting to each specific user. The most obvious listings are by club or county but the system provides a full list of ways to list players:
To be able to update player details from the latest results
Player details change fairly frequently whether it be their club, county, age group or even name. It's not possible to do this manually with so many players on the system so the player details are updated using the information passed in along with the results from the source systems.
In some cases the information is pulled in with the results and in other cases, the information can be assumed just because it comes from a particular source. For instance, everyone who plays in the Kent leagues can have their county updated to Kent.
In contrast to that, it may be necessary to lock the player's county for those players who represent their county but actually play in different counties. This only affects the top England players but is necessary functionality for such an all-inclusive system.
It's also possible to manually override these details.
Be able to work back in time
When the results are first received for a particular league, the system has to find players who already have a level from which to work out the levels of all the other players in that league. This is a two step process:
Going backwards can't be damped so there are controls in place to restrict the effect of unexpected results.
This process gives a good approximation for the initial level for all new players which is then improved over time using the automated initial level and league calibration processes described below.
Consolidate county and club names
With the many different leagues and sources of results, county and club names received can vary quite a bit even though they're referring to the same county or club. This is a similar issue to having duplicate players as we'll end up with duplicate counties and clubs on the system if they are not merged together.
The system needs to be able to identify each duplicate county or club name as it's received from the results input data and consolidate it to a single, unique name when the result is added to the system. This mapping needs to be agreed with the county admins.
Some systems actually return team names rather than club names so there's an extra level of filtering that needs to be done to derive the club name from the team name first. It isn't always possible to do this generically so the county admin may need to provide more specific mappings when needed.
Finally, when all of this is done, there are quite a few clubs that appear in more than one county. With the name mapping described above, these clubs will at least have the same name but they will behave like different clubs on the system. E.g. listing the players for Shepton Mallett in Somerset will give different listing compared to the players for Shepton Mallet in Avon. These shouldn't be merged because we need to keep the club to county associations but we do also need to be able to treat them as a single club for club specific listings. This requires yet another mapping to be applied dynamically at the time of the listing.
Automatically set the initial level for each player
Auto-calibration across leagues
It's a fundamental requirement of a multi-league system that the levels assigned to players are equivalent whichever league they play in - that's the whole point. So a 1000 level player in the Avon Mixed league is the same standard as a 1000 level player in the Yorkshire leagues is the same standard as a 1000 level player in the Kent NW league. And so on.
This is done by looking for players who play in more than one league and for each transition between the leagues, look at the impact of that transition on their level. If it goes up (on average) then it's likely that the league they have transitioned to is a bit too high compared to the one they transitioned from. By combining all of the transitions between all of the leagues (like a large set of simultaneous equations) it's possible to calibrate all the leagues compared to each other.
We need one league to act as a gold standard so we pick the one that has a large set of the most consistent players with a good likelihood of transitions between it and as many other leagues as possible. For this reason we have picked the combined tournaments that make up the PSA. This works well but note the first point on the list below.
There are a few factors to take into account that make this an interesting exercise:
All of the above gets you to the point of being able to measure the calibration across the many leagues on the system but the next question is; how do you change the level of a league? This raises a few more questions:
After considerable analysis, it turns out the most effective strategy is to make many small adjustments to all the leagues over a period of time. This allows the calibration process to adjust and keep up with the interactions of the leagues - some have quite a few players who play in other leagues. These adjustments are made every night after the normal match result processing.
The starting level of a league is changed by changing the starting levels of the players who play mostly in that league along with anyone else who has played in it more than a few times. The starting period turns out to be at least a year but there also needs to be a minimum number of transitions to maintain accuracy. Once you go beyond two years, the character of the league can change and you're left with a calibration that is simply inaccurate at all times!
As the player starting levels are adjusted, this has an interaction with the player starting level process (previous section) so this is another reason to make many, small changes rather than one big one, to prevent these inter-dependent processes from tripping over each other.
How this will affect you
In a nutshell... your level is likely to change. How much it changes and whether it goes up or down will depend on how the league you play in compares with all the other leagues. Since the whole process is automatic you'll have to wait and see... Some leagues may have to change quite a bit whereas others may hardly change at all.
The process hasn't started yet as we're still testing on our development servers but it will be kicked off fairly soon. We'll update this page when that happens.
How you can help
This process is based on how the system interprets the changes in player level as they transition from league to league but, given the difficulties listed above, there may be some inaccurracies. If you have a good idea of the releative levels of some of these players and you feel that a league is too low or too high compared with any other league, please get in touch and we'll take a look at what's going on.
We don't believe this process has ever been done before - which is not all that surprising given the complexities - and it make take a little adjustment before we get it right.
Auto-calibrate over time
Players come and go over time, play in different leagues, get better, get worse, have periods off and generally spend most of their playing time being inconsistent. Yet it's really important that a 1000 level player in 2014 is the same standard as a 1000 level player back in 2005 so we can compare over time. This means:
This is done using a few techniques and making a few assumptions:
Allowance for the Junior circuit
Please scroll down to the update on the implementation.
The international junior circuit has some particular characteristics:
In general, the better juniors in each age group are prepared (actually, their parents...) to do more travelling so this tends to result in the levels being more accurate for these players and much less accurate for the lower level, more occasional players. As a result the level inaccuracies are not the same for all the juniors so we can't fix this with a one-off, wholescale adjustment.
To compound the issue, the country specific events are typically annual so those that only play in their local event only show up on the system once a year. They play 3-4 matches and then disappear completely until the following year. And being juniors, they are likely to be vastly improved in the space of 12 months!
The result of this is that the less frequent juniors mostly have a level that is too low - sometimes as much as 4x too low - and they drag the level down of their opponents every match. Those worst affected are the lower age group juniors who have an accurate level from playing their county leagues (and who have lots of results) and then have their level trashed by a series of players in a junior circuit event whose levels are all way too low.
The level damping is also reduced for juniors and for events which only exacerbates the problem.
This will be addressed by:
Implementation - update
We have now implemented these changes and have loaded them on to the main system. The junior circuit looks a lot better though it does struggle to keep up with these juniors who seem to double their level everytime they enter a new event! We'll continue to think about improvements here but the main goal was to not overly impact those juniors who play more regularly (and have higher levels) and enter these events and we've done that.
A big thank you to those who helped us with these changes. We are still open to feedback so please use this email link if you have any other thoughts or suggestions.
To support and maintain the system, there are a number of tools available to the admins.
The system is only as good as the data it receives and there are a number of roles that the system needs in order to keep it accurate, current and useful.