6 days, 6 hours from now

IEEE BigData 2020 Cup: Predicting Escalations in Customer Support

Predicting Escalations in Customer Support is a data mining challenge organized in association with the IEEE BigData 2020 conference. The task is to predict which cases in Information Builders, Inc. (ibi) technical support ticketing system will be escalated in the nearest future by customers. The competition is organized jointly by ibi (https://www.ibi.com) and QED Software (http://www.qed.pl/).

Technical Support Representatives of Information Builders, Inc. (bi) strive to provide the highest quality level of support to their customers. At times, we may encounter situations where our support process and the needs of our customers conflict. When this occurs, undoubtedly, an escalation will arise. Every escalation is very disruptive to the support process. It changes the day to day activities of Technical Support Representatives, and more importantly, we have an upset customer. The ability to predict when an escalation may arise will allow us to react and do what’s possible to prevent an escalation, diffuse a potential problem, thus maintaining customer satisfaction. We should be able to predict “when” an escalation occurs, it is also equally important to predict why an escalation is going to arise – is it due to a production outage, duration, technical proficiency, project deadlines or other issues. Depending upon the type of escalation, we will be able to build differing support processes that can be best suited to prevent an escalation.

This competition – aiming at building models that predict whether particular customer success cases are going to escalate in future based on information about their up-to-now history – is an important step for ibi to provide their customers with better services relying on modern machine learning solutions.

More details regarding the task and the description of the challenge data set can be found in the Task description section.

Special track at IEEE BigData 2020: A special session devoted to the challenge will be held at the IEEE BigData 2020 conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report. 

Terms & Conditions


Please logIn to the system!

Data sets for this competition were provided by ibi. Data is divided into four main tables which correspond to information stored by a ticketing system of the Customer Service department. Since the data contains sensitive information, it was carefully preprocessed and anonymized to guarantee the safety of ibi's customers and employees.

  • IBI_case_metadata_anonymized.csv file contains basic information regarding each case from the available data (training and test examples), such as the name of a person issuing the ticket, his/her company, an ID of a group responsible for handling the case, etc. Most of this information is typically available when a new case is opened in the system. Each case is associated with a unique REFERENCEID which can be used to join records from all available data tables.
  • IBI_case_milestones_anonymized.csv file contains data regarding all important events in the history of a case. It can be used to track all activities related to each REFERENCEID in the data, however, for test cases, this activity log is cut at the decision timestamp. A typical case has one-to-many relation with entries from this table.
  •  IBI_case_comments_anonymized.csv contains all messages exchanged between a customer and the customer service staff. The texts in natural language were encoded to protect the privacy of IBI's customers and employees. However, to facilitate the use of NLP techniques, we provide an additional file challenge_dictionary_info.csv which stores basic information about the encoded words, such as a POS tag, result of a NER model, counts in the entire data, and information whether a term was present in a standard English dictionary or it was a non-standard term (e.g. link, some special name, filename, etc.). The dictionary stores encoded terms from all available data tables. Similarly to the milestone data table, the comments have a many-to-one relation with the considered cases. For the test cases, the history of comments was cut at the decision timestamp. 
  • IBI_case_status_history.csv is an automatically generated status log of each case in the data. It stores information regarding case severity status, and additional information whether the case at a given timestamp is marked as escalated. For the convenience of participants of the challenge, we added an auxiliary column to this table, which expresses the inverted time to the nearest escalation for the corresponding case - this value is the prediction target for test cases (and for those cases, it is missing in the data). 

Additionally, there is a file IBI_test_cases_no_target.csv, which indicates REFERENCEIDs of test cases, along with the corresponding decision timestamps (i.e. time in seconds since the opening of a case, at which a model needs to make a prediction regarding the time to the nearest escalation of the case).

The task and the format of submissions: the task for participants is to create an efficient model for predicting inverted time to escalation, which is computed as: $$ y = \left\{ \begin{array}{ll} 0 & \textrm{if a case was never escalated}\\ \frac{86400}{SECONDS\_TO\_NEXT\_ESCALATE + 86400} & \textrm{otherwise} \end{array} \right. $$ This transformation of the prediction target is required to keep the consistency of the predictions and facilitate the training of models (the term 86400 in the formula corresponds to the number of seconds during 24 hours and is used to scale the target values). In this way, the predicted values should always be in the $[0, 1]$ interval.

The predictions for test instances from the IBI_test_cases_no_target.csv table should be submitted to the online evaluation system as a textual file. The file should have exactly 12724 lines, and each line should contain exactly one number from the $[0, 1]$ interval. The ordering of predictions should be the same as the ordering of instances in the IBI_test_cases_no_target.csv table.

Evaluation: the quality of submissions will be evaluated using the $R^2$ measure, i.e., for each test instance $i$, the prediction will be compared to the ground truth value, and the overall model performance will be evaluated using the formula:

$$R^2 = 1 - \frac{RSS}{TSS},$$ where RSS is the residual sum of squares: $$RSS = \sum_i (y_i - \hat{y_i})^2,$$ and TSS is the total sum of squares: $$TSS =  \sum_i (y_i - \bar{y})^2,$$ $\hat{y_i}$ for $i \in \{1, \ldots, \|test\ size\|\}$ are the predictions of the model, and $\bar{y}$ is the mean value of the target variable.

Solutions will be evaluated online and the preliminary results will be published on the public leaderboard. The preliminary score will be computed on a small subset of the test instances (10%), fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation. Moreover, to be eligible for the awards in this challenge, the winning teams must exceed the score of the baseline solution by at least 10%.

In case of any questions, please post on the competition forum or write an email to contact {at} knowledgepit.ml 

In order to download competition files you need to be enrolled.
Rank Team Name Score Submission Date
0.0811 2020-09-3 14:59:22
competition baseline
0.0415 2020-09-9 12:18:32
Debojit Mandal
0.0382 2020-09-15 10:52:46
0.0368 2020-09-7 05:46:31
0.0318 2020-09-1 09:02:28
0.0318 2020-09-8 08:13:14
0.0310 2020-09-14 20:29:49
0.0308 2020-09-7 21:40:31
0.0308 2020-09-9 14:45:07
0.0306 2020-09-20 17:35:42
0.0279 2020-09-20 12:11:16
0.0263 2020-09-6 13:16:09
0.0239 2020-09-17 04:25:36
0.0225 2020-09-20 13:14:20
0.0210 2020-08-30 13:00:51
0.0113 2020-08-18 16:04:09
0.0113 2020-09-4 23:03:30
0.0100 2020-09-20 18:28:40
0.0040 2020-09-21 11:15:55
0.0031 2020-08-5 04:31:46
0.0028 2020-07-22 13:40:55
0.0004 2020-07-20 14:18:40
Turing Insight
0.0000 2020-05-30 08:15:47
0.0000 2020-05-31 07:19:25
0.0000 2020-06-16 23:34:03
-0.0014 2020-09-20 13:47:56
-0.0043 2020-07-5 21:15:41
Kirov reporting
-0.0095 2020-07-1 20:52:28
-0.0107 2020-08-13 17:40:29
-0.0146 2020-05-27 07:38:08
-0.0146 2020-07-2 15:40:47
-0.0146 2020-08-5 08:38:14
-0.0146 2020-08-12 07:41:37
-0.0146 2020-08-17 11:08:33
-0.0186 2020-07-26 16:24:05
-0.0338 2020-08-11 12:43:19
-0.3748 2020-09-18 15:24:18
-1.0399 2020-08-25 04:50:47
  • April 8, 2020: web site of the challenge opens, the task is revealed,
  • April 30 May 15, 2020: start of the competition, data become available,
  • September 14 September 28, 2020 (23:59 GMT): deadline for submitting the solutions,
  • September 18 September 30, 2020 (23:59 GMT): deadline for sending the reports, end of the competition,
  • September 21 October 3, 2020: online publication of the final results, sending invitations for submitting papers for the special track at the IEEE BigData 2020 conference,
  • October 24, 2020: deadline for submitting invited papers,
  • November 1, 2020: notification of paper acceptance,
  • November 15, 2020: camera-ready of accepted papers due,
  • December 10-13, 2020: the IEEE BigData 2020 conference (special track date TBA).

Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the sponsor:

  • First Prize: 2000 USD + one free IEEE BigData 2020 conference registration,
  • Second Prize: 750 USD + one free IEEE BigData 2020 conference registration,
  • Third Prize: 250 USD + one free IEEE BigData 2020 conference registration.

The award ceremony will take place during the special track at the IEEE BigData 2020 conference.

  • Guohua Hao, ibi
  • Andrzej Janusz, QED Software & University of Warsaw
  • Tony Li, ibi
  • Mateusz Przyborowski, QED Software & University of Warsaw
  • Eric Raab, ibi
  • Dominik Ślęzak, QED Software & University of Warsaw

In case of any questions please post on the competition forum or write an email at contact {at} knowledgepit.ml

This forum is for all users to discuss matters related to the competition. Good manners apply!
  Discussion Author Replies Last post
"Duplicate file in Your Team. " Man Hing 1 by Andrzej
Monday, September 21, 2020, 15:03:42
The meaning of IBI_case_milestones_anonymized. CSV 1 by Andrzej
Wednesday, September 16, 2020, 10:40:24
new baseline and the extension of the competition deadline! Andrzej 4 by Andrzej
Saturday, September 12, 2020, 19:56:27
Submission format clarification Timothy 3 by Timothy
Friday, September 04, 2020, 20:54:39
Submission format clarification Timothy 0 by Timothy
Thursday, September 03, 2020, 16:43:11
submission cols 3 by
Wednesday, September 02, 2020, 18:59:10
Early baseline source code published Daniel 2 by Anuj
Saturday, August 29, 2020, 20:48:36
New dictionary with translations of some ids Daniel 0 by Daniel
Monday, August 17, 2020, 11:36:59
Submission file status results in 'test cases with no predictions' Malsha 1 by Daniel
Monday, August 17, 2020, 11:20:40
who can participate anish 1 by Mateusz
Thursday, August 13, 2020, 15:45:25
The baseline problem FANG JYUN 1 by Mateusz
Thursday, August 13, 2020, 15:08:56
Unable to reset the password using forgot password link. raviteja 1 by Andrzej
Thursday, July 30, 2020, 09:27:22
Time period Luka 1 by Andrzej
Sunday, July 12, 2020, 22:12:23
Conflict between ISESCALATE and INV_TIME_TO_NEXT_ESCALATION in the status table Dymitr 1 by Andrzej
Wednesday, June 03, 2020, 10:41:53
I can't download some data files. 지선 1 by Andrzej
Monday, June 01, 2020, 12:28:30
The submission system is online! Andrzej 1 by Andrzej
Monday, May 25, 2020, 20:05:35
Evaluation metric Henry 4 by Andrzej
Saturday, May 23, 2020, 13:37:03
baseline problem Man Hing 1 by Andrzej
Thursday, May 21, 2020, 11:52:24
cannot download some data files Man Hing 1 by Andrzej
Monday, May 18, 2020, 15:56:26
The competition has started! Andrzej 2 by Andrzej
Monday, May 18, 2020, 15:55:22
Start of the competition is postponed Andrzej 0 by Andrzej
Thursday, April 30, 2020, 21:13:22