Is there any additional dataset information (e.g., number of plays, rating deviation, etc) that is available like what was posted on the forums during the previous competition?
For example, is the held out test set roughly the same set of puzzles as last time? And has the puzzle server been running this entire time (so that each of those puzzles will have more plays)?
Thanks!
The additional data fields are the same as in previous competition and are included only in training dataset.
As for the test set, it is a subset of the previous one (we excluded 47 puzzles) and now the whole testing data is provided (without the ratings of course), but which instances count for public leaderboard or for the final score is hidden. Also, yes, the puzzle server has been running so the target ratings in the test set will change and be more accurate.
Hope that answers your questions!
Regarding the new fields:
For every puzzle in both datasets we will now provide 22 success probabilitity predictions in both datasets. These are precomputed using chess engines and represent the predicted success chance of a player of given rating divided by 11 levels and two rating types (rapid and blitz). This change is meant to lower the entry bar for contestants without access to specialized hardware.
Is there any more information available about which chess engines were used to compute these predictions?
I had a hypothesis about how they did this so I ran a test to confirm it. They are simply using the rapid and blitz Maia2 models and computing the probability of choosing the correct moves using the specified rating.
So like, if the puzzle had 2 moves (that the player had to do) they would just compute `P(move_1, maia2_model, FEN_1, ELO, ELO) * P(move_2, maia2_model, FEN_2, ELO, ELO)`.
sorry, its more like this conditional probability, not the joint one above:
`P(move_1 | maia2_model, FEN_1, ELO, ELO) * P(move_2, maia2_model | FEN_2, ELO, ELO)`
Hi!
The fields contain probabilities computed using maia2 models set on different ratings and modes (rapid/blitz).
Here is the code that can be used to reproduce what we did:
https://github.com/CSSLab/maia2
This was done mostly for your convenience as computing these for the whole dataset takes a lot of time and not everyone is aware that such tools exist.
Best,
Jan