The author of the post gets some of the probability concepts confused in the article.

ELO is based on calculating the winning odds for each player using a logistic regression (or sigmoid if you are in machine learning). This curve is associated with an exponential curve in that it has an exponentiation operation in its calculation but it’s not an exponential function in strict statistical terms. These odds are then used in the score calculations.

The author, however, has a good intuition in the post: the logistic curve used to calculate the odds of winning a match might not work in their specific case. My intuition is that ELO was developed for chess and that chess players seldom play against players that are far way from their rating level. This translates in “cutting” up the logistic curve into vertical sections and each section will approximate a line.

The other thing that I appreciated in the post is the intuition that winning odds can be calculated from the actual data instead of relying on a priori formula. It is a good application of data science to a problem that was usually solved “parametrically” with statistical models. One could even set up a fully Bayesian estimator for the winning odds.

Finally, the fact that the author finds that the game in question has a winning distribution resembling a line instead of a more pronounced logistic curve makes me wonder how much skill is involved in the game versus just other factors. A logistic regression with low discrimination will stretch the curve out and leave a mostly straight line in its place. I wonder how much of the complaints about the game rating system are actually due to game mechanics instead of how scores are calculated. I haven’t played the game so I don’t know, but the statistics might indicate something more beyond issues with calculating the winning odds.

Edit: Here is a picture of what I was thinking about the difference in discrimination for two logits. You can notice that the red line has lower discrimination than the yellow one because it takes longer for it to transition between 0 and 1. The middle section of the curve will approximate a line.

I’m not familiar with ranking systems so this might be naive: the ELO formula has some constants baked in which I understand could have been learned. But does that matter? Isn’t the algorithm after a while predicting the outcome correctly (based on the points which represent the past) regardless and this is just a matter of scaling?

I’m not familiar with ELO too. I was just reading the wiki page on it. This section is where it reports the formula.

Assuming that Player 1 has an ELO of 1600 and Player 2 has an ELO of 1430, the expected win probability for Player 1 will be:

$$ 1 / (1 + 10 ^ (-170 / 400)) = 0.7268 $$

and for Player 2

$$ 1 / (1 + 10 ^ (170 / 400)) = 0.2731 $$

I think that the 400 in the formula is the standard deviation in ratings. So, that parameter should be adjusted to reflect the actual score distribution.

Those calculations lead to winning odds of 2.66:1 in favor of Player 1. If player 1 wins, they will add (1-0.7268) * K points to their score and Player 2 lose (0-0.2731) * K points, where K is the scaling factor. In case Player 2 wins, their score will change by (1-0.2731) * K points and Player 1’s score will change by (0-0.7268) * K points. So, it works out that a player’s score will change a lot when the match outcome is unexpected.

These calculations require you to put a lot of trust in the calculated odds of winning a match. If the actual odds do not follow the assumptions baked into the formula that calculates them (i.e., the power of 10, the division by 400), the estimated probabilities will be off, leading to score changes that are not what players would expect.

Replacing the a priori probability calculations with the observed win rates is a possible way to address this concerns. This change might not work in case of chess ELO because tournaments organizers need to be able to quickly calculate the tournament results so a formula works better in this case. In OP’s case, using prior games’ results might be better than using a formula because that would represent what actually happens instead of some a priori formula calculation.

Do you have an idea how sensitive ELO is to the order of games? Would the ranking look different if played in a different order? l would assume that the order is approaching some limit when no new players join but players keep playing. Given a fixed set of games and players, what is the impact of the order on the final ranking?

I am not sure. In most chess tournaments, the ratings get updated at the end of all the matches to it doesn’t matter the order of the games.

If the ratings get updated after each game, I would imagine that there is a way to maximize the rating gain given your winning beliefs for each games. I am not sure what would be the optimal way to sequence them though. I am not sure if it is better to play the games with lower chance to win first then to play the other ones (I think that it will be similar to smurfing in this case) or if it would be better to win first then lose after. Maybe a simulation would help with figuring that out.

That’s why CSGO (I think Dota 2, as well) don’t use Elo, but instead use a modified Glicko-2 rating system. Daily decay sounds like a terrible idea though, especially if you have to apparently win 5 games a day?

Reminds me of the old PvP rank system in vanilla WoW, with decay. It was horrible. I basically failed my next milestone because I was moving and didn’t have internet for 2 weeks. Catching up would’ve cost me hours upon hours for the next.. 2-4 weeks.

Very content light and although it has some interesting points about ranking systems, the title of their article definitely implied they’d go over their approach, not just plug their game a bunch.

The author of the post gets some of the probability concepts confused in the article.

ELO is based on calculating the winning odds for each player using a logistic regression (or sigmoid if you are in machine learning). This curve is associated with an exponential curve in that it has an exponentiation operation in its calculation but it’s not an exponential function in strict statistical terms. These odds are then used in the score calculations.

The author, however, has a good intuition in the post: the logistic curve used to calculate the odds of winning a match might not work in their specific case. My intuition is that ELO was developed for chess and that chess players seldom play against players that are far way from their rating level. This translates in “cutting” up the logistic curve into vertical sections and each section will approximate a line.

The other thing that I appreciated in the post is the intuition that winning odds can be calculated from the actual data instead of relying on a priori formula. It is a good application of data science to a problem that was usually solved “parametrically” with statistical models. One could even set up a fully Bayesian estimator for the winning odds.

Finally, the fact that the author finds that the game in question has a winning distribution resembling a line instead of a more pronounced logistic curve makes me wonder how much skill is involved in the game versus just other factors. A logistic regression with low discrimination will stretch the curve out and leave a mostly straight line in its place. I wonder how much of the complaints about the game rating system are actually due to game mechanics instead of how scores are calculated. I haven’t played the game so I don’t know, but the statistics might indicate something more beyond issues with calculating the winning odds.

Edit: Here is a picture of what I was thinking about the difference in discrimination for two logits. You can notice that the red line has lower discrimination than the yellow one because it takes longer for it to transition between 0 and 1. The middle section of the curve will approximate a line.

I’m not familiar with ranking systems so this might be naive: the ELO formula has some constants baked in which I understand could have been learned. But does that matter? Isn’t the algorithm after a while predicting the outcome correctly (based on the points which represent the past) regardless and this is just a matter of scaling?

I’m not familiar with ELO too. I was just reading the wiki page on it. This section is where it reports the formula.

Assuming that Player 1 has an ELO of 1600 and Player 2 has an ELO of 1430, the expected win probability for Player 1 will be:

$$ 1 / (1 + 10 ^ (-170 / 400)) = 0.7268 $$

and for Player 2

$$ 1 / (1 + 10 ^ (170 / 400)) = 0.2731 $$

I thinkthat the 400 in the formula is the standard deviation in ratings. So, that parameter should be adjusted to reflect the actual score distribution.Those calculations lead to winning odds of 2.66:1 in favor of Player 1. If player 1 wins, they will add (1-0.7268) * K points to their score and Player 2 lose (0-0.2731) * K points, where K is the scaling factor. In case Player 2 wins, their score will change by (1-0.2731) * K points and Player 1’s score will change by (0-0.7268) * K points. So, it works out that a player’s score will change a lot when the match outcome is unexpected.

These calculations require you to put a lot of trust in the calculated odds of winning a match. If the actual odds do not follow the assumptions baked into the formula that calculates them (i.e., the power of 10, the division by 400), the estimated probabilities will be off, leading to score changes that are not what players would expect.

Replacing the a priori probability calculations with the observed win rates is a possible way to address this concerns. This change might not work in case of chess ELO because tournaments organizers need to be able to quickly calculate the tournament results so a formula works better in this case. In OP’s case, using prior games’ results might be better than using a formula because that would represent what actually happens instead of some a priori formula calculation.

Do you have an idea how sensitive ELO is to the order of games? Would the ranking look different if played in a different order? l would assume that the order is approaching some limit when no new players join but players keep playing. Given a fixed set of games and players, what is the impact of the order on the final ranking?

I am not sure. In most chess tournaments, the ratings get updated at the end of all the matches to it doesn’t matter the order of the games.

If the ratings get updated after each game, I would imagine that there is a way to maximize the rating gain given your winning beliefs for each games. I am not sure what would be the optimal way to sequence them though. I am not sure if it is better to play the games with lower chance to win first then to play the other ones (I think that it will be similar to smurfing in this case) or if it would be better to win first then lose after. Maybe a simulation would help with figuring that out.

That’s why CSGO (I think Dota 2, as well) don’t use Elo, but instead use a modified Glicko-2 rating system. Daily decay sounds like a terrible idea though, especially if you have to apparently win 5 games a day?

Reminds me of the old PvP rank system in vanilla WoW, with decay. It was horrible. I basically failed my next milestone because I was moving and didn’t have internet for 2 weeks. Catching up would’ve cost me hours upon hours for the next.. 2-4 weeks.

It says each games cancels out 1 point of decay, so you only have to play 5 games, not win them.

They literally never say what the actual formula for Aco is, wtf

This article just seems like an ad for the game to me. It’s very content light.

Also there is decay in his rating system, to encourage people to play more. I guess fun isn’t motivating enough.

Very content light and although it has some interesting points about ranking systems, the title of their article definitely implied they’d go over their approach, not just plug their game a bunch.