Hello!
I tried to enlarge training set by adding deprecated test data. I supposed that l need to concatenate chunk5,6,and 7 in this order and fill decision with "deprecated test labels"
Then I tried cross-validation only on old train set and only on the new set (deprecated)
In first case I got ~ 0.79 roc auc, In second ~ 0.59 which means complete random in labels.
So my question is - in which order shall one match deprecated labels? The best way would be obtaining file with pairs: gamestate_id - decision for deprecated data.
Thanks in advance!
Dear Vasily,
the way in which you tried to add the labels is correct, so it is difficult for me to say what went wrong. Maybe it is a matter of some bug in your code?
Best,
Andrzej
Thank you for the answer!
I doubt in this - the only thing I changed during this experiment is the data on which I perform CV (from test to deprecated). The model and it's parameters remained the same.
The value ~0.59 in roc auc means complete random of labels (for example I tried to load deprecated chunks in order 6,5,7 and got the same score ~ 0.59)
So I think that labeling for deprecated data isn't correct. Though I can be mistaken, I'd like to hear some other opininons from participants who tried to use deprecated data)
Thank you for your answer!
Hi Vasily,
Before the new test set was released, I had done an evaluation on the deprecated test set and got a similar score to the one I had had in the previous public board. It means that the released test set labels are correct. I don't see anything wrong in your processing step. As Andrzej suggested, you may want to double-check the code.
Cheers,
Henry
Dear Vasily,
the labeling for the deprecated test data set is correct.
The score that you received (~0.59 AUC) is far from random result. For a data set with 1,250,000 records and a balanced distribution of labels, there is an extremely low probability of getting AUC that high only by a chance. In fact, for such a set, the probability of getting AUC higher than 0.501 by assigning random scores is lower than 5%. Thus, I would recommend double-checking your code :-) It is also possible that such a huge difference in results might be related to some properties of the model that you are using...
Best,
Andrzej