It took us longer than I expected, but we finally were able to publish the first of additional data tables. This data describes individual localized alerts that correspond to investigated alerts from the training and test sets. You may match the records from the new table with records from training and test sets using identifiers from the ' alert_ids' column.
We also uploaded a new 'extended' baseline to demonstrate how useful the new data set is. It was obtained using the same method (XGBoost) and parameter settings as the previous baseline ('baseline solution' on the public Leaderboard) but with additional features extracted from the new data. The improvement of AUC is quite apparent: from 0.8899 it jumped to 0.9224.
I hope you will find the new data set useful. We are very interested in how much you can improve your score, so feel invited to share your thoughts and achievements on this forum :-)
can you help up to understand the localized_alerts.csv dataset?
For example in the training set we found the following row:
with alert_ids equal to "AAL". In the localized_alert.csv dataset there are 3 row with the same alert_id "ALL":
- AAL|ThreatWatch Outbound|FW|xQn|SX|172.AT.TL.37|YT.LB.34.21|PRIV-172|INTERNET|63496|80|4|2|3|0|5|1|0||0|1
- AAL|ThreatWatch Inbound|FW|xQn|SX|YT.LB.34.21|YT.EK.108.146|INTERNET|INTERNET|443|60012|2|4|2|1311|5|1|0||0|1
- AAL|ThreatWatch Outbound|FW|xQn|SX|172.AT.TL.31|YT.LB.34.21|PRIV-172|INTERNET|60012|443|4|2|3|1319|5|2|0||0|1
How these rows are linked to the alert in the training set? What is the meaning of these localized alert? Are these rows ordered by time?
the localised alerts table is describing alert events that are corresponding to investigeted alert from training/test set. In this particular case AAL investigation alert is connected to 3 localised events, you can think of it that investiged alert is the aggregation of the group of events from the localised alerts table.
In general localised alerts aren't ordered by time. You can order alerts with a particular alert_ids using the alert time column in the data which describes how many seconds have passed since the first alert in this investigated alerts group.