6 months, 3 weeks ago
Recruitment Challenge @ QED Software
This is an internal data mining challenge at QED Software, aiming at checking ML skills of new employees.
At QED Software, we use the KnowledgePit platform to challenge the data science community members to solve real business problems and test their skills, knowledge, and the most important - creative thinking. This challenge is not a competition but, first and foremost, a place for self-evaluation. If you have been invited to take part in it, you are likely a motivated data scientist. But what can you do with what we have prepared? Enter the challenge and check your skills!
What to know more about what is ahead of you? The task is to detect truly suspicious events and false alarms within network traffic alert data that the Security Operations Center (SOC) Team members have to analyse daily. An efficient classification model should help the SOC Team to optimise their operations significantly. Technical details are in the Task Description section.
We wish you good luck and satisfaction with your solution(s)!
This challenge is based on a data mining competition (Suspicious Network Event Recognition) organized in association with IEEE BigData 2019 conference. Check the original competition here. You may find there some reference results.
In this challenge, the task is to detect truly suspicious events and false alarms within the set of so-called network traffic alerts that the Security Operations Center (SOC) Team members have to analyze daily. An efficient classification model could help the SOC Team to optimize their operations significantly. This data set comes from IEEE BigData 2019 Cup: Suspicious Network Event Recognition challenge.
The data set available in the challenge consist of alerts investigated by a SOC team at the Security on Demand company (SoD). We call such signals 'investigated'. Each record is described by various statistics selected based on experts' knowledge and a hierarchy of associated IP addresses (anonymized), called assets. For each alert in the 'investigated alerts' data tables, there is a history of related log events (a detailed set of network operations acquired by SoD, anonymized to ensure the safety of SoD clients).
The data sets cover half a year between October 1, 2018, and March 31, 2019. You can find the description of columns from the 'investigated alerts' data in a separate file called column_descriptions.txt. We divided the main data into a training set and a test set based on alert timestamps. The training set (the file cybersecurity_training.csv) utilizes approximately four months, and the remaining part constitutes a test set (the file cybersecurity_test.csv). The format of those two files is the same - columns are separated by the vertical line '|' sign. However, the target column called 'notified' is missing in the test data.
The task and the format of submissions: the job is to predict which of the investigated alerts were considered truly suspicious by the SOC team and led to issuing a notification to SoD's clients. In the training data, this information is indicated by the column 'notified'. A submission should have a form of scores assigned to every record from the test data - each score in a separate line of a text file. You can find an example of a correctly formatted submission file in the Data files section.
Evaluation: we evaluate the quality of submissions using the AUC measure. The assessment is automatic. The preliminary results are published in the Submission section. The preliminary results are evaluated on a representatively selected subset (10%) of the test data, however, only the score obtained on the remaining 90% of test data is taken into account in the final assessment.
___________
You may find it useful to explore the publications linked to the original competition:
A. Janusz, D. Kałuza, A. Chadzynska-Krasowska, B. Konarsk, J. Holland, D. Slezak: IEEE BigData 2019 Cup: Suspicious Network Event Recognition. BigData 2019.
Q. H. Vu, D. Ruta, and L. Cen, “Gradient Boosting Decision Trees for Cyber Security Threats Detection Based on Network Events Logs,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.
C. Dongy, Y. Chen, Y. Zhang, B. Jiang, S. Liu, D. Han, and B. Liu, “An Approach For Scale Suspicious Network Events Detection,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.
T. Wang, C. Zhang, Z. Lu, D. Du, and Y. Han, “Identifying Truly Suspicious Events and False Alarms Based on Alert Graph,” in 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, December 9-12, 2019, 2019.