It would be very useful for the participants to know how exactly the evaluation is carried out on the preliminary testing set that we get feedback on the leaderboard. I can see at least two possibilities:
1. Our submissions are tested against only a subset of corresponding precalculated complete 800 testing rates, i.e. we submit 800 values but only say 100 randomly seleted ones are used to calulate rmse against the true rates
2. Our submissions are tested against the whole set of 800 testing rates that are however computed only on the fraction of complete set of testing games. So for example instead of say 300k testing games you use only say 50k games to compute the preliminary 800 testing rates and you use them all to give us rmse feedback on the leaderboard.
3. Is there any other method?
the preliminary evaluation is carried out on a random subset of the test set (10%, i.e. 80 test records).
I hope it helps :-)
Thanks for that very useful information, is this 10% of test records always fixed for every evaluation throughout the competition?
Yes, it is :-)