7 years, 10 months ago

ISMIS'17 Data Mining Competition: Trading Based on Recommendations

ISMIS 2017 Data Mining Competition is a challenge organized using the KnowledgePit platform at the 23rd International Symposium on Methodologies and Intelligent Systems, held at Warsaw University of Technology, Poland, on June 26-29, 2017. The task is to come up with a strategy for investing in a stock market based on recommendations provided by different experts. The competition is kindly sponsored by mBank S.A. and Tipranks, with a support from ISMIS 2017 organizers.

Overview

Topic outline: Financial markets prediction is not an easy task. Plenty of researchers and practitioners have committed a lot of their time and effort trying to come up with a method that would persistently provide profits for investors. Many of them claim that they succeeded and publish their recommendations for different types of assets. The main goal of this competition is to determine whether such recommendations do have a predictive power. We will narrow the problem to the selected number of stocks and analysts. The task is to devise an algorithm that would most accurately predict the class of return from an investment in a stock over the next quarter, basing on historical recommendations related to a particular stock. Here classification seems adequate as we claim that being able to predict whether the return will be positive or negative is a much more appropriate (and perhaps easier) task than trying to guess an exact value.

More details regarding the task and a description of the competition data can be found in the Data description section.

Special Session at ISMIS 2017: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend and submit them for further reviewing process, conducted in the same way as for other special sessions at ISMIS 2017. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted reports. The accepted papers - depending on their scope and the reviewing results - will be included in the main ISMIS 2017 proceedings or in the Industrial Session proceedings.

In case of any questions please post on the competition forum or write us an email at ismis2017-competition@ii.pw.edu.pl

Terms & Conditions
 
 

ISMIS'17 Data Mining Competition: Trading Based on Recommendations is over. We would like to sincerely thank all participants for their contribution and support!

We are happy to announce that the competition attracted a total of 159 teams from which 73 were active and submitted at least one solution to the leaderboard. A total number of submissions was 2570 but unfortunately, none of the solutions which were marked as the final exceeded our baseline (which actually tells us a lot about the problem). Thank you for your effort!

The official Winners:

  1. Mathurin Ache (team mathurin), France
  2. Łukasz Siemaszko (team bongod), Poland

We have already sent invitations to 10 teams which submitted the most interesting descriptions of their approach to extend their report into a paper for a special session at ISMIS 2017. Those submissions will go through a special peer reviewing track and all accepted papers will be published in conference’s proceedings. All other teams are also welcome to submit extended descriptions of their approach to ISMIS 2017, however, their papers will undergo regular reviewing process by members of conference’s Program Committee and might be shifted to a different session (e.g. the industrial track). The regular paper submission system is available here:  https://easychair.org/conferences/?conf=ismis2017

For all of you who would like to continue research related to the competition, we plan to reveal all of the used data (including the labels for the test set). You will be able to find it here within a few weeks.

Data description and format: The data sets for this competition are provided in a tabular format. The training data set, namely ismis17_trainingData.csv, in consecutive lines contains 12,234 records that correspond to recommendations for stock symbols at different points in time. These time points will be referred as decision dates.  Each data record is composed of three columns, separated by semicolons. The first column gives an internal identifier of a stock symbol (true symbols are hidden). The second column of a record stores an ordered list recommendations issued by experts for a given stock during two months before the decision date. The third column gives information about the true return class of the stock, computed over the period of three months after the decision date. It may take one of three values: ‘Buy’, ‘Hold’, ‘Sell’, which correspond to considerably positive, close to zero, and considerably negative returns, respectively.

In each record, the list of recommendations consists of one or more tips from financial experts. Any single recommendation is expressed using four values and put between ‘{}’ brackets. The first value is an identifier of an expert. The second value gives a class of the stock predicted by the expert (‘Buy’, ‘Hold’ or ‘Sell’), and the third value expresses expert’s expectations regarding the return rate of the stock in a future. It needs to be stressed that information regarding the expected return rates may sometimes be inconsistent and generally less reliable than the prediction of the rating, due to different interpretations of stock quotes by experts (e.g. not considering splits and/or dividends). Moreover, some experts do not share their expectations about the returns. Such situations are denoted by NA values in the data. The fourth value in each recommendation quantifies a time distance to the decision date (in days), e.g. if this value is 5, it means that the recommendation was published five days before the decision date. The list of recommendations in each record is sorted by the time distances, thus it can be regarded as a time series.

In order to additionally enrich the competition data, we provide a table that groups experts by companies for which they work (the file named company_expert.csv). In total, the data consist of recommendations from 2,832 experts who are employed in 228 different financial institutions.

The test data file, namely ismis17_testData.csv, consists of 7,555 records. It has a similar format as the training data, however, it does not contain the third column with true return classes. The task for participants is to predict the labels for the test cases. It is important to note that the training and test data sets correspond to different time periods and the records in both sets are given in a random order.

The format of submissions: The participants of the competition are asked to predict return classes of the records from the test set and send us their predictions using the submission system. Each solution should be sent in a single text file containing exactly 7,555 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one class label from the set {‘Buy’, ‘Hold’ or ‘Sell’}. Solutions containing any other labels or with a different number of lines will evaluate with an error.

Evaluation of resultsThe submitted solutions will be evaluated on-line and preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a subset of the test set consisting of 1000 records, fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. Moreover, in order to claim the awards, winners will have to provide source codes that allow reproducing their final solution (in any programming language). All the winners will be officially announced during a special session devoted to this competition, which will be organized at the ISMIS’17 conference (http://ismis2017.ii.pw.edu.pl/).

The assessment of solutions will be done using the accuracy (ACC) measure with an additional cost/reward matrix. For a confusion matrix X, obtained from a vector of predictions preds, and the cost matrix C displayed below, the accuracy is computed as: \[ACC(preds) = \frac{\sum_{i = 1..3} \left( X_{i,i} \cdot C_{i,i} \right) }{\sum_{i = 1..3}\sum_{j = 1..3} \left( X_{i,j} \cdot C_{i,j} \right) }.\]

 

The cost matrix C used for evaluation of submissions:
preds\truth Buy Hold Sell
Buy 8 4 8
Hold 1 1 1
Sell 8 4 8

 

For convenience of participants, we provide an exemplary solution file, exemplary_solution.csv, as a reference.

Rank Team Name Is Report   Preliminary Score Final Score Submissions
1
mathurin
True True 0.4202 0.437507 2
2
bongod
True True 0.4282 0.437461 2
3
pwbluehorizon1
True True 0.4196 0.429882 2
4
boocheck
True True 0.4515 0.427162 2
5
a.ruta
True True 0.5348 0.424225 2
6
mkoz
True True 0.4517 0.423419 2
7
amy
True True 0.4614 0.417240 2
8
grzegorzkozlowski
True True 0.4289 0.416458 2
9
vinh
True True 0.4130 0.415950 2
10
ptrsbck
True True 0.4341 0.412528 2
11
duxuhao
True True 0.4568 0.409194 2
12
koziejka
True True 0.4286 0.408538 2
13
zdravevski
True True 0.4829 0.407256 2
14
podcheck
True True 0.4463 0.407203 2
15
optymista
True True 0.5178 0.405360 2
16
dotjabber
True True 0.4266 0.402121 2
17
podludek
True True 0.4385 0.399513 2
18
katarzyna
True True 0.4451 0.397411 2
19
atanask
True True 0.4289 0.395662 2
20
dk
True True 0.4452 0.391991 2
21
0bartek
True True 0.4164 0.388849 2
22
mkadlof
True True 0.4114 0.388371 2
23
dymitrruta
True True 0.6015 0.386997 2
24
lusiek
True True 0.4266 0.381812 2
25
guppi
True True 0.4003 0.000000 2
26
hieuvq
False True 0.4684 No report file found or report rejected. 2
27
parcel
False True 0.4639 No report file found or report rejected. 2
28
ps319383
False True 0.4824 No report file found or report rejected. 2
29
butter
False True 0.4501 No report file found or report rejected. 2
30
adampap
False True 0.4546 No report file found or report rejected. 2
31
mg320637
False True 0.4397 No report file found or report rejected. 2
32
ckarats
False True 0.4376 No report file found or report rejected. 2
33
nitekna
False True 0.4611 No report file found or report rejected. 2
34
juggler
False True 0.4301 No report file found or report rejected. 2
35
kneefer1
False True 0.4300 No report file found or report rejected. 2
36
michalm
False True 0.4282 No report file found or report rejected. 2
37
baseline_solution
False True 0.4266 No report file found or report rejected. 2
38
mkal
False True 0.4266 No report file found or report rejected. 2
39
wniemkowski
False True 0.4241 No report file found or report rejected. 2
40
btw
False True 0.4310 No report file found or report rejected. 2
41
hj
False True 0.4173 No report file found or report rejected. 2
42
kneefer
False True 0.4157 No report file found or report rejected. 2
43
contestant1
False True 0.4141 No report file found or report rejected. 2
44
minio7
False True 0.4359 No report file found or report rejected. 2
45
saikatroy
False True 0.4131 No report file found or report rejected. 2
46
krishnateja614
False True 0.4126 No report file found or report rejected. 2
47
marek1991
False True 0.4107 No report file found or report rejected. 2
48
sslim
False True 0.4155 No report file found or report rejected. 2
49
mateuszk
False True 0.4085 No report file found or report rejected. 2
50
akar
False True 0.4371 No report file found or report rejected. 2
51
obus
False True 0.4266 No report file found or report rejected. 2
52
mkozlow
False True 0.4068 No report file found or report rejected. 2
53
mifdal84
False True 0.4065 No report file found or report rejected. 2
54
fajri91
False True 0.4060 No report file found or report rejected. 2
55
ternaus
False True 0.4368 No report file found or report rejected. 2
56
lameski
False True 0.4019 No report file found or report rejected. 2
57
kp
False True 0.4095 No report file found or report rejected. 2
58
marugari
False True 0.3984 No report file found or report rejected. 2
59
lupus
False True 0.3984 No report file found or report rejected. 2
60
janismdhanbad
False True 0.4085 No report file found or report rejected. 2
61
basakesin
False True 0.3972 No report file found or report rejected. 2
62
arcane27
False True 0.3950 No report file found or report rejected. 2
63
pameladt
False True 0.4071 No report file found or report rejected. 2
64
nxgtr
False True 0.3937 No report file found or report rejected. 2
65
flac
False True 0.3936 No report file found or report rejected. 2
66
sebastianmusial
False True 0.3936 No report file found or report rejected. 2
67
bearstrikesback
False True 0.3923 No report file found or report rejected. 2
68
zagorecki
False True 0.3981 No report file found or report rejected. 2
69
vnu_jaist2015
False True 0.0000 No report file found or report rejected. 2
70
vinayakumar
False True 0.3956 No report file found or report rejected. 2
71
datageek
False True 0.3984 No report file found or report rejected. 2
72
saschapojot
False True 0.0000 No report file found or report rejected. 2
73
jakub
False True 0.3959 No report file found or report rejected. 2
  • November 22, 2016: start of the competition; data sets become available,
  • January 22, 2017 (23:59 GMT): deadline for submitting the predictions and the reports (the deadline for submitting reports has been extended until January 27),
  • January 29 February 1, 2017: end of the challenge, on-line publication of final results, sending invitations for submitting papers for the special session at ISMIS 2017,
  • late February 2017: deadline for submitting papers describing the selected solutions to the special session at ISMIS 2017.

The teams of the first two top-ranked solutions (based on the final evaluation scores, taking into account solutions satisfying terms and conditions of ISMIS 2017 Data Mining Competition) will be awarded prizes funded by our sponsors:

  • First Prize: 1000 USD + one free ISMIS 2017 conference registration,
  • Second Prize: 500 USD + one free ISMIS 2017 conference registration.

The award ceremony will take place during the ISMIS 2017 conference (June 26-29, 2017, Warsaw, Poland).

Andrzej Janusz, University of Warsaw - Chair

Kamil Żbikowski, Warsaw University of Technology & mBank S.A. - Chair

Piotr Gawrysiak, Warsaw University of Technology & mBank S.A.

Marzena Kryszkiewicz, Warsaw University of Technology

Henryk Rybiński, Warsaw University of Technology

Dominik Ślęzak, University of Warsaw & Infobright

  Discussion Author Replies Last post
publication test set labels Andrzej 0 by Andrzej
Sunday, February 26, 2017, 11:39:18
For the future ( continued ) Marek 0 by Marek
Thursday, February 02, 2017, 14:03:54
The final results were published Andrzej 0 by Andrzej
Wednesday, February 01, 2017, 20:49:31
The final results were published Andrzej 1 by Dymitr
Wednesday, February 01, 2017, 21:22:08
For the future Marek 1 by Andrzej
Wednesday, February 01, 2017, 20:45:06
Extended deadline Łukasz 0 by Łukasz
Wednesday, January 25, 2017, 09:07:42
Report sending deadline extended Andrzej 0 by Andrzej
Tuesday, January 24, 2017, 16:13:47
problems with the evaluation system are fixed Andrzej 0 by Andrzej
Friday, January 20, 2017, 08:39:46
Did not see score appears in the submission board Marek 2 by Andrzej
Friday, January 20, 2017, 08:33:47
start of the last week Andrzej 0 by Andrzej
Monday, January 16, 2017, 15:40:37
the final week Andrzej 0 by Andrzej
Monday, January 16, 2017, 14:54:32
Confuxion Matrix and Cost Matrix Raji 1 by Andrzej
Monday, January 16, 2017, 12:06:40
ACC reference implementation Alexey 0 by Alexey
Friday, January 13, 2017, 14:39:01
Where is my submission Quang-Vinh 1 by Quang-Vinh
Friday, January 13, 2017, 11:35:11
terms and schedule Przemek 1 by Przemek
Thursday, January 12, 2017, 08:15:18
Happy New Year 2017! Marek 1 by Andrzej
Sunday, January 01, 2017, 14:43:32
Third column of the training set Christos 1 by Andrzej
Thursday, December 29, 2016, 08:47:27
How to form a team janpreet 3 by Andrzej
Wednesday, December 21, 2016, 08:31:19
How to delete the submitted results vinayakumar 1 by vinayakumar
Saturday, December 03, 2016, 05:06:04
Third value of a recommendation Christos 1 by Andrzej
Thursday, December 01, 2016, 09:19:11
Timing of Training and Test Sets Dymitr 1 by Dymitr
Sunday, November 27, 2016, 05:25:02
Welcome to ISMIS 2017 Data Mining Competition! Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 17:47:16
ISMIS 2017 Data Mining Competition Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 14:46:46