I invented a modified Elo rating system while working for Intel Corp. in Hillsboro Or around 1996. I started a chess club that played blitz chess meeting at lunch time and after hours in the cafeteria. We grew to 60+ people quickly and were playing 100+ blitz games per day. Eventually, the group became competitive and I added a rating system to calm some of the who's better than who debates. I was using the standard Elo rating system and a provisional system for new people. Anybody with a USCF OTB (over the board) rating was started with that rating. After a while, the rating system started exhibiting deflationary tendancies: some were improving too quickly for the rating system to keep up and thus lowering the ratings of the rest. At first, I solved the problem by putting some people back on the provisi onal system. This became too manual motivating me to create a new rating system. After reviewing what was really happening, I decided to adjust the Elo system to deal with the problem. This worked very well and the need for periodic manual intervention disappeared. The fundamental problem with the standard USCF Elo system of the 1990s was that is zero sum and thus deflationary for all rated below 2100 which is the vast majority of chess players. All rating systems that are zero sum will exhibit the same deflationary behavior and thus need to be replaced.
Many of the new people joining the club had played chess to varying degrees before, but had never played blitz. Some had played with a clock while others had not. Many hadn't played in years or a decade. So, these people had a/some learning curve(s) to overcome and some simple ones at that. It takes some time for a chess player to get used to a chess clock. It takes more time to get used to blitz time controls and pacing yourself. It takes more time to regularly realize you can win a lost game by moving fast when your opponent is low on time. It takes more time to realize you need to know your openings deeper and blindly repeat elementary endgames. It takes even more time to develop an efficient prioritized thinking process to find good moves in a few seconds and more ....
So, we see several learning obstacles. When each is overcome a player experiences a nonlinear increase in ability and performance. For some players, this series of obstacles is overcome in quick succession, but most in a nonlinear manner. It happens each time a player has an epiphany about one of the obstacles. Experience allows you to notice repeated issues, then you come to a relization leading to a jump in ability/performance. Add to these blitz chess specific obstacles the obstacles of learning to play better chess and you see the age old adage of practice makes perfect, thus the more you play and pay attention the more likely you are to have an epiphany and improve.
From a ratings perspective, while we were using the standard Elo rating system, the higher rated players were losing points despite the fact that some of them were improving. This occured due to relatively rapid improvement of the lower rated players and the base Elo system is zero sum. The problems became quickly noticable because we were playing and rating many games per day in a closed group.
Yes, I believe it does. I have seen rating charts for individuals that have periods of high oscillation, then they become gradually more consistent. At that point, it seems they experience an epiphany because they get a sudden nonlinear rating jump. Just after the jump, their performance oscillates highly again. The process then repeats multiple times. Also, as the rating gets higher the nonlinear jumps are smaller. Others may have a period of stability or small linear improvement then a nonlinear jump. I reference the following USCF rating charts.
I'm not claiming that all experience this cycle, but some do and a good rating system must allow it to happen instead of force some predetermined false model. When claiming that all people follow a model, all you need is one counter example to prove such a model wrong. Also, notice the first graph relative to any of the others. You will see the learning curve is rather different than the rest. The first graph is for one of the world elite in chess: Hikaru Nakamura.
Arpad Elo expereinced the issues of chess ratings in the 1950's under the Harkness rating system. He believed that he could improve on it and developed the Elo system which was adopted by the USCF (United States Chess Federation) in 1960 and by FIDE (Federation International des Eschec) 10 years later in 1970. Elo thought (quite correctly) that a good and correct rating system needed to model human behavior/performance. So, he gathered lots of tournament data a set out to find out what really happens. He came to the conclusion that ratings should be adjusted based based on performance vs expectation. Also, his model made the system zero sum for competitors the same rating groups. His system suggests that humans perform with in a 400 point standard deviation.
Arpad Elo's system was a breakthrough in performance modeling / rating systems. Amazingly, he developed the system in the late 1950's before the invention of the PC. However, it lacks a certain concept. The Elo system didn't allow for nonlinear rapid improvement in the lower end of the spectrum. The K value which is used to decide the maximum number of points you can gain for winning a game had only 3 values:
For matches where both players have ratings in the same ratings range the system was zero sum. If you gained X rating points, your opponent lost X rating points. Thus, a person or persons improving at a faster than average pace for the group, would steal points from others who would in turn get them back from others thus deflating the ratings of the higher rated players for that rating group. This was happening at Intel in our blitz chess club. The effect is more pronounced at blitz than classical time controls due to the greater number of games that can be played per day/week.Why does this happen when Arpad Elo created the system based on real data from the USCF? I have a theory - the data was uncontrollably biased. In the 1950's, all USCF chess ratings were for classical time controls only. On top of that, the vast majority of competitors were adult males. These two issues introduced a data bias which kept Dr. Elo from noticing the problem.
Based on what was happening at the Intel Oregon Blitz Chess Club, it was clear that the system shouldn't be zero sum. Lower rated players must be allowed to improve faster than higher rated players and not ulitmately at the expense of the higher rated players. It should allow competitors of any rating to drop points and/or gain points (back).
At first, I put people experiencing rapid rating gains back on the provisional system and restored some of the rating points lost to the higher rated opponents. Of course, some of this was a manual process. I concluded that the easiest approach was to modify the Elo rating system giving it a fully floating K value. The K value would be different for nearly every competitor. It would be purely a function of the competitor's rating. Thus, resulting in a system that:
Now, the only question is what should the function K = f(R) look like? It should be inversely proportional to the rating, but exactly what?
I thought it should be nonlinear. However, it was the late 1990s and I didn't have access to enough data to be sure of it. I decided to think through the corner cases. The one that bothered me was somebody has a lucky (or unlucky) day. In a nonlinear system, such a player moves up fast but takes longer to fall back down. Such a case could lead to inflation as other people will be gaining points from him while is on his way down. Also, a higher rated player may have a bad day thus artificially elevating the rating of a lower rated player. The higher rated will have an easier time going back up than the lower rated player going down. This could be a problem if the K value is too large for lower ratings and tapers down too rapidly.
A linear system has the same problem, but at a much smaller magnitude because the change in K value as the rating changes is smaller. Such a system is not zero-sum thus being less likely to be deflationary, but still too large a gap in K values per rating could cause problems. I opted for the linear system where the K value start large for a low rating and decays linearly as the rating increases. This worked quite well:
Once I decided on a linear system, it was just a matter of finding the correct value for C in the following equation.
K Value | Gain for E = 50% | Rating |
---|---|---|
124 | 62 | 100 |
114 | 57 | 300 |
104 | 52 | 500 |
94 | 47 | 700 |
84 | 42 | 900 |
74 | 37 | 1100 |
64 | 32 | 1300 |
54 | 27 | 1500 |
44 | 22 | 1700 |
34 | 17 | 1900 |
24 | 12 | 2100 |
24 | 12 | 2200 |
24 | 12 | 2300 |
16 | 8 | 2400 |
16 | 8 | 2500 |
16 | 8 | 2600+ |
The gold highlighted values exhibit the same K values for ratings >= 2100 as with the normal Elo system. Thus, the possibilty of somebody climbing from 2100 to FM, IM or GM faster than was previously possible is not an issue. The K values for ratings between each of these 200 point gaps are to be linearly interpolated. For example, the K value for a rating of 1760 is 41. An exception exists for ratings >= 2100. After 2100, the K value is a step function.
By itself, the Elo system is deflationary (not inflationary) for all people in the same K group due to it being a zero sum system. Adjusting the system so that the system as a whole can gain points is the way to stop it from being deflationary. Allowing, it to gain too many points will make it inflationary and all ratings could go up. Not enough gain in total system points due to improving players will allow it to continue being deflationary, but not as much as before.
In practice the system didn't exhibit any signs of inflation or deflation:
None of the above issues for either problem were exhibited. Participation satisfaction went up for the higher rated members: they were no longer guaranteed to lose points from a slow drain. Due to reduced individual ratings oscillation, individual rankings oscilated less. Individual rating gains were more gradual and plateaus were less harsh.
The USCF used a bonus point system to add points to a members rating and thus to the system as a whole when a member out played his/her expectation in a tournament where the member played 3 or more games. This was insufficient during the 1990s and 2000s. An updated system was implemented using a floating K value recognizing the potential for rapid improvement in lower rated players. The following table shows K values for varios rating levels.
K Value | Gain for E = 50% | Rating |
---|---|---|
80 | 40 | 700 |
72.727 | 36.36 | 900 |
61.538 | 30.7 | 1100 |
53.333 | 26.67 | 1300 |
47.058 | 23.529 | 1500 |
38.095 | 19.047 | 1700 |
30.769 | 15.3846 | 1900 |
23.5294 | 11.764 | 2100 |
20.5128 | 10.256 | 2200 |
17.39 | 8.6956 | 2300 |
15.68 | 7.843 | 2400 |
15.68 | 7.843 | 2500 |
15.68 | 7.843 | 2600+ |
The primary problem is the fundamental concept around its creation - "the more you play the more consistent you become". Another way of saying this is the more you practice the less likely you are to improve. Also, your ability to improve has nothing to do with your current level of play. This flies directly in the face of points 2, 3, 4 and 5. Of extreme importance is that it defies the age old concept of "practice makes perfect".
The "95% confidence level" concept is untrue. My personal experience on chess.com is summarized in my recent bullet rating improvement of 400+ points in less than a month. Of course, I hit the lowest RD value I could by the time I made a 200 point gain. (I had been playing a while to start with). So, by the time I was at start+200 the system was 95% sure that my real rating was between start+150 and start+250. It continued to be 95% sure that I should not improve for the next 25 or more games resulting in another 200 rating point gain.
The variables c and t don't have anything to do with real variance. The variable t is time since last game - the longer the time gap, the larger the value of t and RD. The constant c is based on the uncertainty of a player's skill over a certain amount of time.
In the above equation, we see that the RD value in no way is a function of the variance of your rating/performance. In statistics, a standard deviation is a function of the variance of your rating/performance, thus the RD value is not a real standard deviation.
The points you can gain in a match with somebody 100 points higher than you are the same no matter your rating. If you are 1100 or if you are 2000. This goes against the fundamental concept of competitive sports performance improvement: ease of improvement is inversely proportional to playing strength. In other words, the system believes your ability to improve is the same if you are rated 1100 or 2100 instead of being easier to improve when you are at the lower end of the spectrum than the upper end.
Old USCF Elo | Elo-Roberson | New USCF | Glicko | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
In the USCF columns above, P.E. indicates the use of a special rating formula for those that are effectively provisionally rated. In the table, E stands for expectation and E=50% happens when two people of equal rating are paired with each other and E= 0% are the points won when the lower rated player has zero expectation of winning because he is severely outrated. Here is a list of noteworthy observations from the above table.
Online systems are readily accessible, thus allowing for far more games per day/hour much less far more games per week. Any system that degrades your chances of improvement as the number of your games rises will become and issue faster than in OTB play. Also, there are external factors influencing performance level:
All of these and more produce variability more than the Glicko system assumes. To say that they all affect everybody the same is ludicrus. For example, some people use laptops and have a UPS connected to their modems and routers while others don't. Some people have a wife and two+ kids while others don't.
If you have been playing a lot and your RD value has you at a low improvement rate and you have an old bad mouse. When you buy a new gamer mouse that is much more responsive, the system will not care or notice. You will play better but your number of games is high enough that you are not expected to improve.
This happened to me on chess.com. I had an old mouse where I had to often make mouse movements twice to move from a1 to h8... My son bought himself a high end gaming mouse and waterfalled me his low end gaming mouse. Low end or not it is amazingly better than what I was using. I tested the concept by playing several G/1 bullet games with the old mouse. There were several games where I lost and my opponent had at least 15 seconds left and some 30 or more seconds left. After about 7 games or more, I switched to the low end gaming mouse. I started winning games. When I lost on time, the most my opponents had on me was around 10 seconds and many times less than 5 seconds.
Obviously, my online playing ability just improved, but the Glicko rating system didn't know that and thought that the more I played the less I should improve.
In another example, our chess team on chess.com has an NM, but the team matches uses the rapid ratings and he had not played rapid games on that server. Thus, his rating was 1200. He tried for a week to improve it. He managed to get it into the 1700s. It took many games because the system reduced the number of possible points he could gain per game with every game. Sometimes (due to fatigue) he'd make a mistake and lose a game, but the number of possible points for a win is still reduced. It has been a month and his rapid rating is 1800s.
The Elo-Roberson and the new USCF rating systems are valid improvements over the standard Elo system. They are neither inflationary or deflationary and they adhere to the behavior charactersitics that model human competitve sport performance especially in chess. On the other hand, Glicko systems #1 and #2 violate some of the behavior charactersitics. The Glicko systems erroneously assume that the more you play the less likely you are to improve. Also, they erroneously assume that consistency for all can exist in the nonlevel playing field of internet chess.
The Elo-Roberson and new USCF systems are more capable of handling large improvements in ability and more capable of dealing with improvements from epiphanies during regular play than the Glicko systems.
I suggest that any server using the Glicko system replace it with another system that more properly adheres to the behavior charactersitics that model human competitve sport performance such as the Elo-Roberson system or the new USCF system.