Dear Participants,
in recent days we found a serious issue in the evaluation data set. Due to a bug in our data preprocessing code, a part of game states in the test data come from games that were included in the training set. This makes it possible to exploit the competition rules (even unconsciously) and generate solutions that are based on a simple game matching.
We regard this as a serious problem that threatens the integrity of the competition and makes its results useless. For this reason, we decided to undertake decisive actions that will be implemented in the next few days:
- We temporarily close the submission system. It will be opened again at latest on Friday, April 14.
- The current test data set will become an additional training data set. All labels for this data will become publically available.
- A new test data set will be uploaded to the Data files folder. It will have the same format as the current test data but will consist of game state descriptions obtained from a completely new set of games.
- The Leaderboard will be reset. All currently submitted solutions will become deprecated.
We realize that it can be a very inconvenient situation for many of you and we sincerely apologize. However, we do believe that it is the only reasonable solution.
Best regards,
Andrzej Janusz
Dear Andrzej, while the presented motivation is reasonable, this action writes off many hours of work and wastes many GFlops of comp. power which is not a good outcome. To minimize the damage I encourage immediate publishing of validation labels to avail lengthy models retrain asap. Dymitr
Dear Dymitr,
I do realize that for many participants our decision means a significant loss of time and resources. I hope that at least some of models that you already tested can be reused to make predictions for new data.
Regarding the validation labels, I uploaded a file named deprecated_testLabels.7z to the Data files folder.
Andrzej
Very helpful would also be the game_id for the training set data. With that extra information the contestants would be able to simulate the realistic scenario of validating on the separate games data rather fall into the risk of overfitting through in-game training and validation. Cheers. Dymitr
Dear Dymitr,
unfortunately, due to the bug that caused this situation, the game_ids which we currently have are not reliable (if they were reliable we would not have this problem).
Andrzej
Dear Participants,
we have an unexpected delay in processing the new test data and for this reason, the submission system will not be reopened until tomorrow, April 15.
I am very sorry for this delay and I would like to thank you for your patience.
Andrzej Janusz
Dear Participants,
the new test data set is available in the Data files folder and the evaluation system is finally online. However, we do need some more time to maintain the Leaderboard. Until we finish, scores of your new solutions will be visible only in your Submissions folder.
I would like to sincerely thank you for your patience. Good luck during the last month of the challenge :-)
I would also like to use this occasion to wish you all a joyful Easter!
Andrzej Janusz