4 years, 1 month ago
IEEE BigData 2020 Cup: Predicting Escalations in Customer Support
Predicting Escalations in Customer Support is a data mining challenge organized in association with the IEEE BigData 2020 conference. The task is to predict which cases in Information Builders, Inc. (ibi) technical support ticketing system will be escalated in the nearest future by customers. The competition is organized jointly by ibi (https://www.ibi.com) and QED Software (http://www.qed.pl/).
Technical Support Representatives of Information Builders, Inc. (bi) strive to provide the highest quality level of support to their customers. At times, we may encounter situations where our support process and the needs of our customers conflict. When this occurs, undoubtedly, an escalation will arise. Every escalation is very disruptive to the support process. It changes the day to day activities of Technical Support Representatives, and more importantly, we have an upset customer. The ability to predict when an escalation may arise will allow us to react and do what’s possible to prevent an escalation, diffuse a potential problem, thus maintaining customer satisfaction. We should be able to predict “when” an escalation occurs, it is also equally important to predict why an escalation is going to arise – is it due to a production outage, duration, technical proficiency, project deadlines or other issues. Depending upon the type of escalation, we will be able to build differing support processes that can be best suited to prevent an escalation.
This competition – aiming at building models that predict whether particular customer success cases are going to escalate in future based on information about their up-to-now history – is an important step for ibi to provide their customers with better services relying on modern machine learning solutions.
More details regarding the task and the description of the challenge data set can be found in the Task description section.
Special track at IEEE BigData 2020: A special session devoted to the challenge will be held at the IEEE BigData 2020 conference. We will invite authors of selected challenge reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be indexed in the same way as regular conference papers. The invited teams will be chosen based on their final rank, innovativeness of their approach, and quality of the submitted report.
IEEE BigData 2020 Cup: Predicting Escalations in Customer Support has finished. We are happy to announce that the winners of the competition are Peter Klimov and Vladimir Funtikov from the team Team!
This year, our challenge attracted a total of 254 teams which, in total, submitted over 1050 solutions. Thanks for your contribution!
We would like to thank everyone for participating. In particular, we want to express our gratitude to all teams who decided to send us descriptions of their solutions. Shortly, we will be sending invitations to selected teams to extend their reports and submit conference papers for a special session at IEEE Big Data Conference.
Rank | Team Name | Is Report | Preliminary Score | Final Score | Submissions | |
---|---|---|---|---|---|---|
1 | Team |
True | True | 0.0710 | 0.046300 | 65 |
2 | competition baseline |
True | True | 0.0415 | 0.039400 | 13 |
3 | Debojit Mandal |
True | True | 0.0417 | 0.035300 | 100 |
4 | shubham |
True | True | 0.0330 | 0.029500 | 54 |
5 | sunsoul |
True | True | 0.0239 | 0.028600 | 100 |
6 | Emi |
True | True | 0.0487 | 0.028100 | 68 |
7 | Chopin |
True | True | 0.0428 | 0.011300 | 63 |
8 | AMC_JTJ |
True | True | 0.0318 | 0.004700 | 47 |
9 | victorkras2008 |
True | True | 0.0000 | 0.000000 | 1 |
10 | hieuvq |
True | True | 0.0811 | -0.019900 | 83 |
11 | Turing Insight |
True | True | 0.0000 | -0.103800 | 36 |
12 | paranoid_android |
False | True | 0.0396 | No report file found or report rejected. | 7 |
13 | PP |
False | True | 0.0318 | No report file found or report rejected. | 63 |
14 | sna |
False | True | 0.0308 | No report file found or report rejected. | 3 |
15 | BlackStar |
False | True | 0.0263 | No report file found or report rejected. | 19 |
16 | chashanliu |
False | True | 0.0225 | No report file found or report rejected. | 25 |
17 | BGU |
False | True | 0.0113 | No report file found or report rejected. | 43 |
18 | tdobson |
False | True | 0.0100 | No report file found or report rejected. | 13 |
19 | SoloDance |
False | True | 0.0040 | No report file found or report rejected. | 16 |
20 | РЫЫЫЫЫЫЫЫЫА |
False | True | 0.0031 | No report file found or report rejected. | 32 |
21 | random |
False | True | 0.0028 | No report file found or report rejected. | 21 |
22 | WastedTimes |
False | True | 0.0210 | No report file found or report rejected. | 35 |
23 | testJK |
False | False | -999.0000 | No report file found or report rejected. | 9 |
24 | VLADISLAV |
False | True | 0.0004 | No report file found or report rejected. | 16 |
25 | Mathurin |
False | True | 0.0000 | No report file found or report rejected. | 2 |
26 | Lalka |
False | True | -0.0014 | No report file found or report rejected. | 2 |
27 | Kirov reporting |
False | True | -0.0095 | No report file found or report rejected. | 14 |
28 | sourabhjha |
False | True | -0.0107 | No report file found or report rejected. | 4 |
29 | Lukazambuca |
False | True | -0.0146 | No report file found or report rejected. | 10 |
30 | mr_doppelpack |
False | True | -0.0146 | No report file found or report rejected. | 1 |
31 | cbuxe |
False | True | -0.0146 | No report file found or report rejected. | 5 |
32 | daniel_kaluza |
False | True | -0.0146 | No report file found or report rejected. | 1 |
33 | 8 |
False | True | -0.0186 | No report file found or report rejected. | 2 |
34 | Mahmoud Trigui |
False | True | -0.0295 | No report file found or report rejected. | 15 |
35 | admin |
False | True | -0.0338 | No report file found or report rejected. | 4 |
36 | riccardo1350 |
False | True | 0.0113 | No report file found or report rejected. | 14 |
37 | One_n_Only |
False | True | -0.0395 | No report file found or report rejected. | 6 |
38 | sssssssssssss |
False | True | 0.0279 | No report file found or report rejected. | 18 |
39 | SupportHelper |
False | True | -0.3748 | No report file found or report rejected. | 6 |
40 | Fluer |
False | True | -0.3807 | No report file found or report rejected. | 9 |
41 | Prachi 12 |
False | True | -127.4103 | No report file found or report rejected. | 2 |
Data sets for this competition were provided by ibi. Data is divided into four main tables which correspond to information stored by a ticketing system of the Customer Service department. Since the data contains sensitive information, it was carefully preprocessed and anonymized to guarantee the safety of ibi's customers and employees.
- IBI_case_metadata_anonymized.csv file contains basic information regarding each case from the available data (training and test examples), such as the name of a person issuing the ticket, his/her company, an ID of a group responsible for handling the case, etc. Most of this information is typically available when a new case is opened in the system. Each case is associated with a unique REFERENCEID which can be used to join records from all available data tables.
- IBI_case_milestones_anonymized.csv file contains data regarding all important events in the history of a case. It can be used to track all activities related to each REFERENCEID in the data, however, for test cases, this activity log is cut at the decision timestamp. A typical case has one-to-many relation with entries from this table.
- IBI_case_comments_anonymized.csv contains all messages exchanged between a customer and the customer service staff. The texts in natural language were encoded to protect the privacy of IBI's customers and employees. However, to facilitate the use of NLP techniques, we provide an additional file challenge_dictionary_info.csv which stores basic information about the encoded words, such as a POS tag, result of a NER model, counts in the entire data, and information whether a term was present in a standard English dictionary or it was a non-standard term (e.g. link, some special name, filename, etc.). The dictionary stores encoded terms from all available data tables. Similarly to the milestone data table, the comments have a many-to-one relation with the considered cases. For the test cases, the history of comments was cut at the decision timestamp.
- IBI_case_status_history.csv is an automatically generated status log of each case in the data. It stores information regarding case severity status, and additional information whether the case at a given timestamp is marked as escalated. For the convenience of participants of the challenge, we added an auxiliary column to this table, which expresses the inverted time to the nearest escalation for the corresponding case - this value is the prediction target for test cases (and for those cases, it is missing in the data).
Additionally, there is a file IBI_test_cases_no_target.csv, which indicates REFERENCEIDs of test cases, along with the corresponding decision timestamps (i.e. time in seconds since the opening of a case, at which a model needs to make a prediction regarding the time to the nearest escalation of the case).
The task and the format of submissions: the task for participants is to create an efficient model for predicting inverted time to escalation, which is computed as: $$ y = \left\{ \begin{array}{ll} 0 & \textrm{if a case was never escalated}\\ \frac{86400}{SECONDS\_TO\_NEXT\_ESCALATE + 86400} & \textrm{otherwise} \end{array} \right. $$ This transformation of the prediction target is required to keep the consistency of the predictions and facilitate the training of models (the term 86400 in the formula corresponds to the number of seconds during 24 hours and is used to scale the target values). In this way, the predicted values should always be in the $[0, 1]$ interval.
The predictions for test instances from the IBI_test_cases_no_target.csv table should be submitted to the online evaluation system as a textual file. The file should have exactly 12724 lines, and each line should contain exactly one number from the $[0, 1]$ interval. The ordering of predictions should be the same as the ordering of instances in the IBI_test_cases_no_target.csv table.
Evaluation: the quality of submissions will be evaluated using the $R^2$ measure, i.e., for each test instance $i$, the prediction will be compared to the ground truth value, and the overall model performance will be evaluated using the formula:
$$R^2 = 1 - \frac{RSS}{TSS},$$ where RSS is the residual sum of squares: $$RSS = \sum_i (y_i - \hat{y_i})^2,$$ and TSS is the total sum of squares: $$TSS = \sum_i (y_i - \bar{y})^2,$$ $\hat{y_i}$ for $i \in \{1, \ldots, \|test\ size\|\}$ are the predictions of the model, and $\bar{y}$ is the mean value of the target variable.
Solutions will be evaluated online and the preliminary results will be published on the public leaderboard. The preliminary score will be computed on a small subset of the test instances (10%), fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published online. It is important to note that only teams that submit a report describing their approach before the end of the challenge will qualify for the final evaluation. Moreover, to be eligible for the awards in this challenge, the winning teams must exceed the score of the baseline solution by at least 10%.
In case of any questions, please post on the competition forum or write an email to contact {at} knowledgepit.ml
- April 8, 2020: web site of the challenge opens, the task is revealed,
April 30May 15, 2020: start of the competition, data become available,September 14September 28, 2020 (23:59 GMT): deadline for submitting the solutions,September 18September 30, 2020 (23:59 GMT): deadline for sending the reports, end of the competition,September 21October 3, 2020: online publication of the final results, sending invitations for submitting papers for the special track at the IEEE BigData 2020 conference,- October 24, 2020: deadline for submitting invited papers,
- November 1, 2020: notification of paper acceptance,
- November 15, 2020: camera-ready of accepted papers due,
- December 10-13, 2020: the IEEE BigData 2020 conference (special track date TBA).
Authors of the top-ranked solutions (based on the final evaluation scores) will be awarded prizes funded by the sponsor:
- First Prize: 2000 USD + one free IEEE BigData 2020 conference registration,
- Second Prize: 750 USD + one free IEEE BigData 2020 conference registration,
- Third Prize: 250 USD + one free IEEE BigData 2020 conference registration.
The award ceremony will take place during the special track at the IEEE BigData 2020 conference.
- Guohua Hao, ibi
- Andrzej Janusz, QED Software & University of Warsaw
- Tony Li, ibi
- Mateusz Przyborowski, QED Software & University of Warsaw
- Eric Raab, ibi
- Dominik Ślęzak, QED Software & University of Warsaw
In case of any questions please post on the competition forum or write an email at contact {at} knowledgepit.ml
Discussion | Author | Replies | Last post | |
---|---|---|---|---|
announcement of the competition results | Andrzej | 0 | by Andrzej Saturday, October 03, 2020, 10:55:22 |
|
the end of the competition | Andrzej | 2 | by Andrzej Wednesday, September 30, 2020, 09:22:30 |
|
new baseline and the extension of the competition deadline! | Andrzej | 6 | by Andrzej Monday, September 28, 2020, 11:39:26 |
|
"Duplicate file in Your Team. " | Man Hing | 1 | by Andrzej Monday, September 21, 2020, 13:03:42 |
|
The meaning of IBI_case_milestones_anonymized. CSV | 1 | by Andrzej Wednesday, September 16, 2020, 08:40:24 |
||
Submission format clarification | Timothy | 3 | by Timothy Friday, September 04, 2020, 18:54:39 |
|
Submission format clarification | Timothy | 0 | by Timothy Thursday, September 03, 2020, 14:43:11 |
|
submission cols | Debojit | 3 | by Debojit Wednesday, September 02, 2020, 16:59:10 |
|
Early baseline source code published | Daniel | 2 | by Anuj Saturday, August 29, 2020, 18:48:36 |
|
New dictionary with translations of some ids | Daniel | 0 | by Daniel Monday, August 17, 2020, 09:36:59 |
|
Submission file status results in 'test cases with no predictions' | Malsha | 1 | by Daniel Monday, August 17, 2020, 09:20:40 |
|
who can participate | anish | 1 | by Mateusz Thursday, August 13, 2020, 13:45:25 |
|
The baseline problem | FANG JYUN | 1 | by Mateusz Thursday, August 13, 2020, 13:08:56 |
|
Unable to reset the password using forgot password link. | raviteja | 1 | by Andrzej Thursday, July 30, 2020, 07:27:22 |
|
Time period | Luka | 1 | by Andrzej Sunday, July 12, 2020, 20:12:23 |
|
Conflict between ISESCALATE and INV_TIME_TO_NEXT_ESCALATION in the status table | Dymitr | 1 | by Andrzej Wednesday, June 03, 2020, 08:41:53 |
|
I can't download some data files. | 지선 | 1 | by Andrzej Monday, June 01, 2020, 10:28:30 |
|
The submission system is online! | Andrzej | 1 | by Andrzej Monday, May 25, 2020, 18:05:35 |
|
Evaluation metric | Henry | 4 | by Andrzej Saturday, May 23, 2020, 11:37:03 |
|
baseline problem | Man Hing | 1 | by Andrzej Thursday, May 21, 2020, 09:52:24 |
|
cannot download some data files | Man Hing | 1 | by Andrzej Monday, May 18, 2020, 13:56:26 |
|
The competition has started! | Andrzej | 2 | by Andrzej Monday, May 18, 2020, 13:55:22 |
|
Start of the competition is postponed | Andrzej | 0 | by Andrzej Thursday, April 30, 2020, 19:13:22 |