6 years, 2 months ago

## ISMIS'17 Data Mining Competition: Trading Based on Recommendations

### ISMIS 2017 Data Mining Competition is a challenge organized using the KnowledgePit platform at the 23rd International Symposium on Methodologies and Intelligent Systems, held at Warsaw University of Technology, Poland, on June 26-29, 2017. The task is to come up with a strategy for investing in a stock market based on recommendations provided by different experts. The competition is kindly sponsored by mBank S.A. and Tipranks, with a support from ISMIS 2017 organizers.

Overview

Topic outline: Financial markets prediction is not an easy task. Plenty of researchers and practitioners have committed a lot of their time and effort trying to come up with a method that would persistently provide profits for investors. Many of them claim that they succeeded and publish their recommendations for different types of assets. The main goal of this competition is to determine whether such recommendations do have a predictive power. We will narrow the problem to the selected number of stocks and analysts. The task is to devise an algorithm that would most accurately predict the class of return from an investment in a stock over the next quarter, basing on historical recommendations related to a particular stock. Here classification seems adequate as we claim that being able to predict whether the return will be positive or negative is a much more appropriate (and perhaps easier) task than trying to guess an exact value.

More details regarding the task and a description of the competition data can be found in the Data description section.

Special Session at ISMIS 2017: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend and submit them for further reviewing process, conducted in the same way as for other special sessions at ISMIS 2017. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted reports. The accepted papers - depending on their scope and the reviewing results - will be included in the main ISMIS 2017 proceedings or in the Industrial Session proceedings.

In case of any questions please post on the competition forum or write us an email at ismis2017-competition@ii.pw.edu.pl

ISMIS'17 Data Mining Competition: Trading Based on Recommendations is over. We would like to sincerely thank all participants for their contribution and support!

We are happy to announce that the competition attracted a total of 159 teams from which 73 were active and submitted at least one solution to the leaderboard. A total number of submissions was 2570 but unfortunately, none of the solutions which were marked as the final exceeded our baseline (which actually tells us a lot about the problem). Thank you for your effort!

The official Winners:

1. Mathurin Ache (team mathurin), France
2. Łukasz Siemaszko (team bongod), Poland

We have already sent invitations to 10 teams which submitted the most interesting descriptions of their approach to extend their report into a paper for a special session at ISMIS 2017. Those submissions will go through a special peer reviewing track and all accepted papers will be published in conference’s proceedings. All other teams are also welcome to submit extended descriptions of their approach to ISMIS 2017, however, their papers will undergo regular reviewing process by members of conference’s Program Committee and might be shifted to a different session (e.g. the industrial track). The regular paper submission system is available here:  https://easychair.org/conferences/?conf=ismis2017

For all of you who would like to continue research related to the competition, we plan to reveal all of the used data (including the labels for the test set). You will be able to find it here within a few weeks.

Data description and format: The data sets for this competition are provided in a tabular format. The training data set, namely ismis17_trainingData.csv, in consecutive lines contains 12,234 records that correspond to recommendations for stock symbols at different points in time. These time points will be referred as decision dates.  Each data record is composed of three columns, separated by semicolons. The first column gives an internal identifier of a stock symbol (true symbols are hidden). The second column of a record stores an ordered list recommendations issued by experts for a given stock during two months before the decision date. The third column gives information about the true return class of the stock, computed over the period of three months after the decision date. It may take one of three values: ‘Buy’, ‘Hold’, ‘Sell’, which correspond to considerably positive, close to zero, and considerably negative returns, respectively.

In each record, the list of recommendations consists of one or more tips from financial experts. Any single recommendation is expressed using four values and put between ‘{}’ brackets. The first value is an identifier of an expert. The second value gives a class of the stock predicted by the expert (‘Buy’, ‘Hold’ or ‘Sell’), and the third value expresses expert’s expectations regarding the return rate of the stock in a future. It needs to be stressed that information regarding the expected return rates may sometimes be inconsistent and generally less reliable than the prediction of the rating, due to different interpretations of stock quotes by experts (e.g. not considering splits and/or dividends). Moreover, some experts do not share their expectations about the returns. Such situations are denoted by NA values in the data. The fourth value in each recommendation quantifies a time distance to the decision date (in days), e.g. if this value is 5, it means that the recommendation was published five days before the decision date. The list of recommendations in each record is sorted by the time distances, thus it can be regarded as a time series.

In order to additionally enrich the competition data, we provide a table that groups experts by companies for which they work (the file named company_expert.csv). In total, the data consist of recommendations from 2,832 experts who are employed in 228 different financial institutions.

The test data file, namely ismis17_testData.csv, consists of 7,555 records. It has a similar format as the training data, however, it does not contain the third column with true return classes. The task for participants is to predict the labels for the test cases. It is important to note that the training and test data sets correspond to different time periods and the records in both sets are given in a random order.

The format of submissions: The participants of the competition are asked to predict return classes of the records from the test set and send us their predictions using the submission system. Each solution should be sent in a single text file containing exactly 7,555 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one class label from the set {‘Buy’, ‘Hold’ or ‘Sell’}. Solutions containing any other labels or with a different number of lines will evaluate with an error.

Evaluation of resultsThe submitted solutions will be evaluated on-line and preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a subset of the test set consisting of 1000 records, fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. Moreover, in order to claim the awards, winners will have to provide source codes that allow reproducing their final solution (in any programming language). All the winners will be officially announced during a special session devoted to this competition, which will be organized at the ISMIS’17 conference (http://ismis2017.ii.pw.edu.pl/).

The assessment of solutions will be done using the accuracy (ACC) measure with an additional cost/reward matrix. For a confusion matrix X, obtained from a vector of predictions preds, and the cost matrix C displayed below, the accuracy is computed as: $ACC(preds) = \frac{\sum_{i = 1..3} \left( X_{i,i} \cdot C_{i,i} \right) }{\sum_{i = 1..3}\sum_{j = 1..3} \left( X_{i,j} \cdot C_{i,j} \right) }.$

 preds\truth Buy Hold Sell Buy 8 4 8 Hold 1 1 1 Sell 8 4 8

For convenience of participants, we provide an exemplary solution file, exemplary_solution.csv, as a reference.

Rank Team Name Is Report Preliminary Score Final Score Submissions
1
mathurin
True 0.4202 0.437507 2
2
bongod
True 0.4282 0.437461 2
3
pwbluehorizon1
True 0.4196 0.429882 2
4
boocheck
True 0.4515 0.427162 2
5
a.ruta
True 0.5348 0.424225 2
6
mkoz
True 0.4517 0.423419 2
7
amy
True 0.4614 0.417240 2
8
grzegorzkozlowski
True 0.4289 0.416458 2
9
vinh
True 0.4130 0.415950 2
10
ptrsbck
True 0.4341 0.412528 2
11
duxuhao
True 0.4568 0.409194 2
12
koziejka
True 0.4286 0.408538 2
13
zdravevski
True 0.4829 0.407256 2
14
podcheck
True 0.4463 0.407203 2
15
optymista
True 0.5178 0.405360 2
16
dotjabber
True 0.4266 0.402121 2
17
podludek
True 0.4385 0.399513 2
18
katarzyna
True 0.4451 0.397411 2
19
True 0.4289 0.395662 2
20
dk
True 0.4452 0.391991 2
21
0bartek
True 0.4164 0.388849 2
22
True 0.4114 0.388371 2
23
dymitrruta
True 0.6015 0.386997 2
24
lusiek
True 0.4266 0.381812 2
25
guppi
True 0.4003 0.000000 2
26
hieuvq
False 0.4684 No report file found or report rejected. 2
27
parcel
False 0.4639 No report file found or report rejected. 2
28
ps319383
False 0.4824 No report file found or report rejected. 2
29
butter
False 0.4501 No report file found or report rejected. 2
30
False 0.4546 No report file found or report rejected. 2
31
mg320637
False 0.4397 No report file found or report rejected. 2
32
ckarats
False 0.4376 No report file found or report rejected. 2
33
nitekna
False 0.4611 No report file found or report rejected. 2
34
juggler
False 0.4301 No report file found or report rejected. 2
35
kneefer1
False 0.4300 No report file found or report rejected. 2
36
michalm
False 0.4282 No report file found or report rejected. 2
37
baseline_solution
False 0.4266 No report file found or report rejected. 2
38
mkal
False 0.4266 No report file found or report rejected. 2
39
wniemkowski
False 0.4241 No report file found or report rejected. 2
40
btw
False 0.4310 No report file found or report rejected. 2
41
hj
False 0.4173 No report file found or report rejected. 2
42
kneefer
False 0.4157 No report file found or report rejected. 2
43
contestant1
False 0.4141 No report file found or report rejected. 2
44
minio7
False 0.4359 No report file found or report rejected. 2
45
saikatroy
False 0.4131 No report file found or report rejected. 2
46
krishnateja614
False 0.4126 No report file found or report rejected. 2
47
marek1991
False 0.4107 No report file found or report rejected. 2
48
sslim
False 0.4155 No report file found or report rejected. 2
49
mateuszk
False 0.4085 No report file found or report rejected. 2
50
akar
False 0.4371 No report file found or report rejected. 2
51
obus
False 0.4266 No report file found or report rejected. 2
52
mkozlow
False 0.4068 No report file found or report rejected. 2
53
mifdal84
False 0.4065 No report file found or report rejected. 2
54
fajri91
False 0.4060 No report file found or report rejected. 2
55
ternaus
False 0.4368 No report file found or report rejected. 2
56
lameski
False 0.4019 No report file found or report rejected. 2
57
kp
False 0.4095 No report file found or report rejected. 2
58
marugari
False 0.3984 No report file found or report rejected. 2
59
lupus
False 0.3984 No report file found or report rejected. 2
60
False 0.4085 No report file found or report rejected. 2
61
basakesin
False 0.3972 No report file found or report rejected. 2
62
arcane27
False 0.3950 No report file found or report rejected. 2
63
False 0.4071 No report file found or report rejected. 2
64
nxgtr
False 0.3937 No report file found or report rejected. 2
65
flac
False 0.3936 No report file found or report rejected. 2
66
sebastianmusial
False 0.3936 No report file found or report rejected. 2
67
bearstrikesback
False 0.3923 No report file found or report rejected. 2
68
zagorecki
False 0.3981 No report file found or report rejected. 2
69
vnu_jaist2015
False 0.0000 No report file found or report rejected. 2
70
vinayakumar
False 0.3956 No report file found or report rejected. 2
71
datageek
False 0.3984 No report file found or report rejected. 2
72
saschapojot
False 0.0000 No report file found or report rejected. 2
73
jakub
False 0.3959 No report file found or report rejected. 2
• November 22, 2016: start of the competition; data sets become available,
• January 22, 2017 (23:59 GMT): deadline for submitting the predictions and the reports (the deadline for submitting reports has been extended until January 27),
• January 29 February 1, 2017: end of the challenge, on-line publication of final results, sending invitations for submitting papers for the special session at ISMIS 2017,
• late February 2017: deadline for submitting papers describing the selected solutions to the special session at ISMIS 2017.

The teams of the first two top-ranked solutions (based on the final evaluation scores, taking into account solutions satisfying terms and conditions of ISMIS 2017 Data Mining Competition) will be awarded prizes funded by our sponsors:

• First Prize: 1000 USD + one free ISMIS 2017 conference registration,
• Second Prize: 500 USD + one free ISMIS 2017 conference registration.

The award ceremony will take place during the ISMIS 2017 conference (June 26-29, 2017, Warsaw, Poland).

Andrzej Janusz, University of Warsaw - Chair

Kamil Żbikowski, Warsaw University of Technology & mBank S.A. - Chair

Piotr Gawrysiak, Warsaw University of Technology & mBank S.A.

Marzena Kryszkiewicz, Warsaw University of Technology

Henryk Rybiński, Warsaw University of Technology

Dominik Ślęzak, University of Warsaw & Infobright

Discussion Author Replies Last post
publication test set labels Andrzej 0 by Andrzej
Sunday, February 26, 2017, 12:39:18
For the future ( continued ) Marek 0 by Marek
Thursday, February 02, 2017, 15:03:54
The final results were published Andrzej 0 by Andrzej
Wednesday, February 01, 2017, 21:49:31
The final results were published Andrzej 1 by Dymitr
Wednesday, February 01, 2017, 22:22:08
For the future Marek 1 by Andrzej
Wednesday, February 01, 2017, 21:45:06
Extended deadline Łukasz 0 by Łukasz
Wednesday, January 25, 2017, 10:07:42
Report sending deadline extended Andrzej 0 by Andrzej
Tuesday, January 24, 2017, 17:13:47
problems with the evaluation system are fixed Andrzej 0 by Andrzej
Friday, January 20, 2017, 09:39:46
Did not see score appears in the submission board Marek 2 by Andrzej
Friday, January 20, 2017, 09:33:47
start of the last week Andrzej 0 by Andrzej
Monday, January 16, 2017, 16:40:37
the final week Andrzej 0 by Andrzej
Monday, January 16, 2017, 15:54:32
Confuxion Matrix and Cost Matrix Raji 1 by Andrzej
Monday, January 16, 2017, 13:06:40
ACC reference implementation Alexey 0 by Alexey
Friday, January 13, 2017, 15:39:01
Where is my submission Quang-Vinh 1 by Quang-Vinh
Friday, January 13, 2017, 12:35:11
terms and schedule Przemek 1 by Przemek
Thursday, January 12, 2017, 09:15:18
Happy New Year 2017! Marek 1 by Andrzej
Sunday, January 01, 2017, 15:43:32
Third column of the training set Christos 1 by Andrzej
Thursday, December 29, 2016, 09:47:27
How to form a team janpreet 3 by Andrzej
Wednesday, December 21, 2016, 09:31:19
How to delete the submitted results vinayakumar 1 by vinayakumar
Saturday, December 03, 2016, 06:06:04
Third value of a recommendation Christos 1 by Andrzej
Thursday, December 01, 2016, 10:19:11
Timing of Training and Test Sets Dymitr 1 by Dymitr
Sunday, November 27, 2016, 06:25:02
Welcome to ISMIS 2017 Data Mining Competition! Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 18:47:16
ISMIS 2017 Data Mining Competition Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 15:46:46