Hi guys, there was previously a discussion that the test set rating distribution is different from the training set. Could you explain how exactly the test set ratings were acquired? I know you recruit chess players to try solve the puzzles and calculate ratings with some algorithm (Glicko?). But further from that, where do you recruit these people from? How many people solve each puzzle? Do you know the Elo ratings of these people? Do players get matched with some random puzzle or are they matched with puzzles their own rating? I'm curious to understand the distribution shift!
> Could you explain how exactly the test set ratings were acquired?
See my detailed answer here: https://knowledgepit.ml/post/747/
> But further from that, where do you recruit these people from?
Chess forums, discord, friends etc.
> How many people solve each puzzle?
This varies between puzzles and I can't give out this information precisely but I will say that in the validation set all puzzles had RD < 130.
> Do you know the Elo ratings of these people?
If you mean puzzle lichess rating (often coloquially referred to as "Puzzle Elo"), then yes, we do, as we use their lichess ratings :) If you mean Fide Elo, then no.
> Do players get matched with some random puzzle or are they matched with puzzles their own rating?
They are matched with puzzles more or less randomly, we still have the lichess algorithm in the background but we don't use their lichess ratings to assign puzzles.
I hope this helps!
Thank you very much! Would it also be possible to say how much of the test set is used to calculate the leaderboard score?