description of evaluation metric

AAIA'14 Data Mining Competition: Key risk factors for Polish State Fire Service

by piotr - Monday, March 31, 2014, 12:12:46

Dear Organizers,

Could you provide an example how the evaluation metric will be computed based on basline solution from file exemplary_solution.txt? I don't know why there must be 10 lines in submission file, each line means a separate Naive Bayes classifier constructed with features pointed in the line?

thanks,
Piotr

Re: description of evaluation metric

by janek - Tuesday, April 01, 2014, 12:45:57

Hello,

You are correct. From the attributes in each line we construct three NB prediction models (for each of the decision attributes). Every model assigns scores to test cases, i.e. probabilities that the case should be classified to the positive decision class. In this way, for every decision attribute and every test case we have ten scores. We create an ensemble of predictions by taking their sum.

As the evaluation metric we use the average AUC of the prediction ensemble for different decision attributes, decreased by a penalty for using a large number of attributes.

I hope that this explanation is clear :-)

Best regards,

Andrzej Janusz

Re: description of evaluation metric

by piotr - Wednesday, April 02, 2014, 20:24:31

thanks! yes it's clear,Piotr

Re: description of evaluation metric

by dymitrruta - Monday, April 14, 2014, 10:59:07

Dear Andrzej, I do not see the logic of submitting 10 different solutions that are aggregated by the average AUC. You can just take your best solution and replicate it by adding a dummy non-contributing variables to achieve pretty much the same score with "different" subsets. Ensemble makes sense if the outcomes are combined a bit more intelligently, not by the individual average. Could you please explain or correct if I think wrong? Cheers Dymitr

Re: description of evaluation metric

by janek - Monday, April 14, 2014, 20:31:30

Dear Dymitr,

You can find plenty examples of using Naive Bayes ensembles in the literature related to machine learning and text mining. Most of them is based on simple averaging. By giving ten different subsets of attributes you are able to model differences in the importance of individual attributes.

Moreover, our research group is much more interested in using many compact sets of attributes in our future research, then in using just one big set. That is another reason why we want to persuade contestants to indicate many smaller attribute sets.

Best regards,

Andrzej Janusz

Re: description of evaluation metric

by piotr - Saturday, April 19, 2014, 12:47:39

Dear Organizers,

could you provide details of Naive Bayes implementation that you use to compute score? I found that in R language there is a difference between e1071 and klaR packages in Naive Bayes implementation which cause different outputs in prediction.

best regards,

Piotr