2 months, 2 weeks from now

IEEE BigData 2020 Cup: Predicting Escalations in Customer Support

Predicting Escalations in Customer Support is a data mining challenge organized in association with the IEEE BigData 2020 conference. The task is to predict which cases in Information Builders' technical support ticketing system will be escalated in the nearest future by customers. The competition is organized jointly by Information Builders (https://www.informationbuilders.com/) and QED Software (http://www.qed.pl/).

Technical Support Representatives of Information Builders strive to provide the highest quality level of support to their customers. At times, we may encounter situations where our support process and the needs of our customers conflict. When this occurs, undoubtedly, an escalation will arise. Every escalation is very disruptive to the support process. It changes the day to day activities of Technical Support Representatives, and more importantly, we have an upset customer. The ability to predict when an escalation may arise will allow us to react and do what’s possible to prevent an escalation, diffuse a potential problem, thus maintaining customer satisfaction. We should be able to predict “when” an escalation occurs, it is also equally important to predict why an escalation is going to arise – is it due to a production outage, duration, technical proficiency, project deadlines or other issues. Depending upon the type of escalation, we will be able to build differing support processes that can be best suited to prevent an escalation.

This competition – aiming at building models that predict whether particular customer success cases are going to escalate in future based on information about their up-to-now history – is an important step for Information Builders to provide their customers with better services relying on modern machine learning solutions.

More details regarding the task and the description of the challenge data set can be found in the Task description section.

Special track at IEEE BigData 2020: A special session devoted to the challenge will be held at the IEEE BigData 2020 conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report. 

Terms & Conditions
 
 

 

Please logIn to the system!

Data sets for this competition were provided by Information Builders. Data is divided into four main tables which correspond to information stored by a ticketing system of the Customer Service department. Since the data contains sensitive information, it was carefully preprocessed and anonymized to guarantee the safety of IBI's customers and employees.

  • IBI_case_metadata_anonymized.csv file contains basic information regarding each case from the available data (training and test examples), such as the name of a person issuing the ticket, his/her company, an ID of a group responsible for handling the case, etc. Most of this information is typically available when a new case is opened in the system. Each case is associated with a unique REFERENCEID which can be used to join records from all available data tables.
  • IBI_case_milestones_anonymized.csv file contains data regarding all important events in the history of a case. It can be used to track all activities related to each REFERENCEID in the data, however, for test cases, this activity log is cut at the decision timestamp. A typical case has one-to-many relation with entries from this table.
  •  IBI_case_comments_anonymized.csv contains all messages exchanged between a customer and the customer service staff. The texts in natural language were encoded to protect the privacy of IBI's customers and employees. However, to facilitate the use of NLP techniques, we provide an additional file challenge_dictionary_info.csv which stores basic information about the encoded words, such as a POS tag, result of a NER model, counts in the entire data, and information whether a term was present in a standard English dictionary or it was a non-standard term (e.g. link, some special name, filename, etc.). The dictionary stores encoded terms from all available data tables. Similarly to the milestone data table, the comments have a many-to-one relation with the considered cases. For the test cases, the history of comments was cut at the decision timestamp. 
  • IBI_case_status_history.csv is an automatically generated status log of each case in the data. It stores information regarding case severity status, and additional information whether the case at a given timestamp is marked as escalated. For the convenience of participants of the challenge, we added an auxiliary column to this table, which expresses the inverted time to the nearest escalation for the corresponding case - this value is the prediction target for test cases (and for those cases, it is missing in the data). 

Additionally, there is a file IBI_test_cases_no_target.csv, which indicates REFERENCEIDs of test cases, along with the corresponding decision timestamps (i.e. time in seconds since the opening of a case, at which a model needs to make a prediction regarding the time to the nearest escalation of the case).

The task and the format of submissions: the task for participants is to create an efficient model for predicting inverted time to escalation, which is computed as: $$ y = \left\{ \begin{array}{ll} 0 & \textrm{if a case was never escalated}\\ \frac{86400}{SECONDS\_TO\_NEXT\_ESCALATE + 86400} & \textrm{otherwise} \end{array} \right. $$ This transformation of the prediction target is required to keep the consistency of the predictions and facilitate the training of models (the term 86400 in the formula corresponds to the number of seconds during 24 hours and is used to scale the target values). In this way, the predicted values should always be in the $[0, 1]$ interval.

The predictions for test instances from the IBI_test_cases_no_target.csv table should be submitted to the online evaluation system as a textual file. The file should have exactly 12724 lines, and each line should contain exactly one number from the $[0, 1]$ interval. The ordering of predictions should be the same as the ordering of instances in the IBI_test_cases_no_target.csv table.

Evaluation: the quality of submissions will be evaluated using the $R^2$ measure, i.e., for each test instance $i$, the prediction will be compared to the ground truth value, and the overall model performance will be evaluated using the formula:

$$R^2 = 1 - \frac{RSS}{TSS},$$ where RSS is the residual sum of squares: $$RSS = \sum_i (y_i - \hat{y_i})^2,$$ and TSS is the total sum of squares: $$TSS =  \sum_i (y_i - \bar{y})^2,$$ $\hat{y_i}$ for $i \in \{1, \ldots, \|test\ size\|\}$ are the predictions of the model, and $\bar{y}$ is the mean value of the target variable.

Solutions will be evaluated online and the preliminary results will be published on the public leaderboard. The preliminary score will be computed on a small subset of the test instances (10%), fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation. Moreover, to be eligible for the awards in this challenge, the winning teams must exceed the score of the baseline solution by at least 10%.

In case of any questions, please post on the competition forum or write an email to contact {at} knowledgepit.ml 

In order to download competition files you need to be enrolled.
Rank Team Name Score Submission Date
1
early baseline
0.0307 2020-05-25 19:55:52
2
hieuvq
0.0014 2020-06-12 15:30:06
3
Turing Insight
0.0000 2020-05-30 08:15:47
4
victorkras2008
0.0000 2020-05-31 07:19:25
5
Mathurin
0.0000 2020-06-16 23:34:03
6
Kirov reporting
-0.0095 2020-07-1 20:52:28
7
amy
-0.0144 2020-06-15 18:10:12
8
Dymitr
-0.0146 2020-05-27 07:38:08
9
Lukazambuca
-0.0146 2020-07-2 15:40:47
  • April 8, 2020: web site of the challenge opens, the task is revealed,
  • April 30 May 15, 2020: start of the competition, data become available,
  • September 14, 2020 (23:59 GMT): deadline for submitting the solutions,
  • September 18, 2020 (23:59 GMT): deadline for sending the reports, end of the competition,
  • September 21, 2020: online publication of the final results, sending invitations for submitting papers for the special track at the IEEE BigData 2020 conference,
  • October 12, 2020: deadline for submiting invited papers,
  • November 1, 2020: notification of paper acceptance,
  • November 15, 2020: camera-ready of accepted papers due,
  • December 10-13, 2020: the IEEE BigData 2020 conference (special track date TBA).

Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the sponsor:

  • First Prize: 2000 USD + one free IEEE BigData 2020 conference registration,
  • Second Prize: 750 USD + one free IEEE BigData 2020 conference registration,
  • Third Prize: 250 USD + one free IEEE BigData 2020 conference registration.

The award ceremony will take place during the special track at the IEEE BigData 2020 conference.

  • Dominik Ślęzak, QED Software & University of Warsaw
  • Eric Raab, Information Builders
  • Andrzej Janusz, QED Software & University of Warsaw
  • Guohua Hao, Information Builders
  • Mateusz Przyborowski, QED Software & University of Warsaw
  • Tony Li, Information Builders

In case of any questions please post on the competition forum or write an email at contact {at} knowledgepit.ml

This forum is for all users to discuss matters related to the competition. Good manners apply!
  Discussion Author Replies Last post
Conflict between ISESCALATE and INV_TIME_TO_NEXT_ESCALATION in the status table Dymitr 1 by Andrzej
Wednesday, June 03, 2020, 10:41:53
I can't download some data files. 지선 1 by Andrzej
Monday, June 01, 2020, 12:28:30
The submission system is online! Andrzej 1 by Andrzej
Monday, May 25, 2020, 20:05:35
Evaluation metric Henry 4 by Andrzej
Saturday, May 23, 2020, 13:37:03
baseline problem Man Hing 1 by Andrzej
Thursday, May 21, 2020, 11:52:24
cannot download some data files Man Hing 1 by Andrzej
Monday, May 18, 2020, 15:56:26
The competition has started! Andrzej 2 by Andrzej
Monday, May 18, 2020, 15:55:22
Start of the competition is postponed Andrzej 0 by Andrzej
Thursday, April 30, 2020, 21:13:22