
2 months from now
FedCSIS 2025 Challenge: Predicting Chess Puzzle Difficulty - Second Edition
This is the second edition of chess puzzle competition, this time with much bigger datasets and new data fields. The goal is to build a model to predict the difficulty (measured as Lichess rating) of given chess puzzles. The top 3 solutions will be awarded prizes.
Main changes from the first edition:
-
For every puzzle in both datasets we will now provide 22 success probabilitity predictions in both datasets. These are precomputed using chess engines and represent the predicted success chance of a player of given rating divided by 11 levels and two rating types (rapid and blitz). This change is meant to lower the entry bar for contestants without access to specialized hardware.
-
Different data and bigger datasets, this time the training dataset has over 4.5 million instances, compared to 3.7 million in the first edition.
In a chess puzzle, the player assumes a role of White or Black in a particular configuration of pieces on a chessboard. The goal for the puzzle taker is to find the best sequence of moves, either outright checkmating the opponent or obtaining a winning material advantage.
On the Internet, chess puzzles are often found on chess websites, like Lichess. The moves from the opposing side are made automatically and the puzzle taker is provided with immediate feedback.
Solving puzzles is considered one of the primary ways to hone chess skills. However, currently the only way to reliably estimate puzzle difficulty is to present it to a wide variety of chess players and see if they manage to solve it.
The goal of the contest is to predict how difficult a chess puzzle is from the initial position of the pieces and moves in the solution. Puzzle difficulty is measured by its Glicko-2 rating calibrated by Lichess. In simplified terms, it means that Lichess treats each attempt at solving a puzzle like a match between the user and the puzzle. If the user solves the puzzle correctly, that counts as a win for the user and they gain puzzle rating while the puzzle loses rating. When the user fails to solve the puzzle, that counts as a loss and the opposite happens. Both user and puzzle ratings are initialized at 1500.
Each chess puzzle is described by the initial position (in Forsyth–Edwards Notation, or FEN) and the moves included in the puzzle solution (in Portable Game Notation, or PGN). The solution starts with one move leading to the puzzle position and includes both moves that the puzzle taker has to find and moves by the simulated opponent.
The training and testing datasets are provided in two .csv files.
Test dataset consists of the following fields:
|
|
|
|
|
|
|
|
|
|
|
|
(22 fields) |
|
|
Based on the above data, the challenge contestants are expected to predict the Rating field (which will be kept secret).
|
|
|
|
The training dataset contains all of the above fields, and also a few additional ones listed below.
RatingDeviation (int): Measure of uncertainty in the Glicko-2 rating system. It decreases as more players attempt to solve the puzzle.
Popularity (int): Users can ”upvote“ or “downvote” a puzzle. This value is the difference between the number of upvotes and downvotes.
NbPlays (int): Number of attempts at solving the puzzle.
Themes (string): Lichess allows choosing puzzles to solve based on different themes, such as tactical concepts, solution length or puzzle types (e.g. mates in x moves).
GameUrl (string): Lichess puzzles are generated from the games played on the site.
OpeningTags (string): Information about the opening from which this puzzle originated. This field has missing values.
Solution format
Solutions in this competition should be submitted to the online evaluation system as a text file with exactly 2235 lines containing predictions for test instances. Each line in the submission should contain a single integer that indicates the predicted rating of the chess puzzle. The ordering of predictions should be the same as the ordering of the test set.
Evaluation
The quality of submissions will be evaluated using the mean squared error metric.
Solutions will be evaluated online, and the preliminary results will be published on the public leaderboard. The public leaderboard will be available starting April 25th. The preliminary score will be computed on a subset of the test records, fixed for all participants. The final evaluation will be performed after the completion of the competition using the remaining part of the test records. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation.
(visible only if you are logged in)
- April 18, 2025: start of the competition
- April 25, 2025: submitting solutions and public leaderboard become available
- June 27, 2025 (23:59 GMT): deadline for submitting the predictions
- June 30, 2025 (23:59 GMT): deadline for sending the reports, end of the competition
- July 7, 2025: online publication of the final results, sendinginvitations for submitting short papers for the special session at FedCSIS 2025
- July 27, 2025: deadline for submitting invited papers
- August 7, 2025: notification of paper acceptance
- August 17, 2025: camera-ready of accepted papers, and registration for the conference are due
Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the Sponsors:
- 1000 USD for the winning solution + one FedCSIS 2025 registration
- 500 USD for the 2nd place solution + one FedCSIS 2025 registration
- 250 USD for the 3rd place solution + one FedCSIS 2025 registration
- Jan Zyśko
- Michał Ślęzak
- Maciej Świechowski
- Dominik Ślęzak