2 years, 4 months ago

ISMIS'17 Data Mining Competition: Trading Based on Recommendations

ISMIS 2017 Data Mining Competition is a challenge organized using the KnowledgePit platform at the 23rd International Symposium on Methodologies and Intelligent Systems, held at Warsaw University of Technology, Poland, on June 26-29, 2017. The task is to come up with a strategy for investing in a stock market based on recommendations provided by different experts. The competition is kindly sponsored by mBank S.A. and Tipranks, with a support from ISMIS 2017 organizers.

Overview

Topic outline: Financial markets prediction is not an easy task. Plenty of researchers and practitioners have committed a lot of their time and effort trying to come up with a method that would persistently provide profits for investors. Many of them claim that they succeeded and publish their recommendations for different types of assets. The main goal of this competition is to determine whether such recommendations do have a predictive power. We will narrow the problem to the selected number of stocks and analysts. The task is to devise an algorithm that would most accurately predict the class of return from an investment in a stock over the next quarter, basing on historical recommendations related to a particular stock. Here classification seems adequate as we claim that being able to predict whether the return will be positive or negative is a much more appropriate (and perhaps easier) task than trying to guess an exact value.

More details regarding the task and a description of the competition data can be found in the Data description section.

Special Session at ISMIS 2017: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend and submit them for further reviewing process, conducted in the same way as for other special sessions at ISMIS 2017. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted reports. The accepted papers - depending on their scope and the reviewing results - will be included in the main ISMIS 2017 proceedings or in the Industrial Session proceedings.

In case of any questions please post on the competition forum or write us an email at ismis2017-competition@ii.pw.edu.pl

Terms & Conditions
 
 

Contest Participation Rules:

  • The competition is open to all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons.
  • Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
  • One person may be incorporated in maximally 3 teams.
  • Each team needs to be composed of a different set of persons.
  • Each team is obliged to provide a short report describing their final solution. The report must contain information such as the name of a team, names of all team members and a brief overview of the used approach. The description should explain all data preprocessing steps and model construction steps. It should be submitted in the pdf format using our submission system by January 22, 2017 (23:59 GMT). Only submissions made by teams that provided the reports will qualify for the final evaluation.
  • The final ranking of the competing teams will be done on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
  • After the final evaluation, selected top-ranked teams will be asked to provide the source codes that can be used to reproduce their final solutions and a documentation that would allow running the code. If the code has to be run within a complex environment (e.g. distributed Hadoop cluster) a detailed setup explanation should be provided as well. The source codes will be used to verify a legitimacy of solutions. The winners of the ISMIS’17 Data Mining Competition are chosen from the top-ranked teams which provide reports and the source codes of their solutions for the verification.
  • Each report, paper and any other type of publication basing on the research where data from this competition will be used should accredit both mBank S.A. and Tipranks as the institutions that sponsored that particular research.
  • Organizers may reject any submission if they suspect that it was produced in an unfairly way or was submitted by a team which has broken the competition rules without providing any additional explanation.
  • By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.

In a case of questions related to the competition please contact us via email: ismis2017-competition@ii.pw.edu.pl or through the competition forum.

Please logIn to the system!

Data description and format: The data sets for this competition are provided in a tabular format. The training data set, namely ismis17_trainingData.csv, in consecutive lines contains 12,234 records that correspond to recommendations for stock symbols at different points in time. These time points will be referred as decision dates.  Each data record is composed of three columns, separated by semicolons. The first column gives an internal identifier of a stock symbol (true symbols are hidden). The second column of a record stores an ordered list recommendations issued by experts for a given stock during two months before the decision date. The third column gives information about the true return class of the stock, computed over the period of three months after the decision date. It may take one of three values: ‘Buy’, ‘Hold’, ‘Sell’, which correspond to considerably positive, close to zero, and considerably negative returns, respectively.

In each record, the list of recommendations consists of one or more tips from financial experts. Any single recommendation is expressed using four values and put between ‘{}’ brackets. The first value is an identifier of an expert. The second value gives a class of the stock predicted by the expert (‘Buy’, ‘Hold’ or ‘Sell’), and the third value expresses expert’s expectations regarding the return rate of the stock in a future. It needs to be stressed that information regarding the expected return rates may sometimes be inconsistent and generally less reliable than the prediction of the rating, due to different interpretations of stock quotes by experts (e.g. not considering splits and/or dividends). Moreover, some experts do not share their expectations about the returns. Such situations are denoted by NA values in the data. The fourth value in each recommendation quantifies a time distance to the decision date (in days), e.g. if this value is 5, it means that the recommendation was published five days before the decision date. The list of recommendations in each record is sorted by the time distances, thus it can be regarded as a time series.

In order to additionally enrich the competition data, we provide a table that groups experts by companies for which they work (the file named company_expert.csv). In total, the data consist of recommendations from 2,832 experts who are employed in 228 different financial institutions.

The test data file, namely ismis17_testData.csv, consists of 7,555 records. It has a similar format as the training data, however, it does not contain the third column with true return classes. The task for participants is to predict the labels for the test cases. It is important to note that the training and test data sets correspond to different time periods and the records in both sets are given in a random order.

The format of submissions: The participants of the competition are asked to predict return classes of the records from the test set and send us their predictions using the submission system. Each solution should be sent in a single text file containing exactly 7,555 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one class label from the set {‘Buy’, ‘Hold’ or ‘Sell’}. Solutions containing any other labels or with a different number of lines will evaluate with an error.

Evaluation of resultsThe submitted solutions will be evaluated on-line and preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a subset of the test set consisting of 1000 records, fixed for all participants. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. Moreover, in order to claim the awards, winners will have to provide source codes that allow reproducing their final solution (in any programming language). All the winners will be officially announced during a special session devoted to this competition, which will be organized at the ISMIS’17 conference (http://ismis2017.ii.pw.edu.pl/).

The assessment of solutions will be done using the accuracy (ACC) measure with an additional cost/reward matrix. For a confusion matrix X, obtained from a vector of predictions preds, and the cost matrix C displayed below, the accuracy is computed as: \[ACC(preds) = \frac{\sum_{i = 1..3} \left( X_{i,i} \cdot C_{i,i} \right) }{\sum_{i = 1..3}\sum_{j = 1..3} \left( X_{i,j} \cdot C_{i,j} \right) }.\]

 

The cost matrix C used for evaluation of submissions:
preds\truth Buy Hold Sell
Buy 8 4 8
Hold 1 1 1
Sell 8 4 8

 

For convenience of participants, we provide an exemplary solution file, exemplary_solution.csv, as a reference.

ISMIS'17 Data Mining Competition: Trading Based on Recommendations is over. We would like to sincerely thank all participants for their contribution and support!

We are happy to announce that the competition attracted a total of 159 teams from which 73 were active and submitted at least one solution to the leaderboard. A total number of submissions was 2570 but unfortunately, none of the solutions which were marked as the final exceeded our baseline (which actually tells us a lot about the problem). Thank you for your effort!

The official Winners:

  1. Mathurin Ache (team mathurin), France
  2. Łukasz Siemaszko (team bongod), Poland

We have already sent invitations to 10 teams which submitted the most interesting descriptions of their approach to extend their report into a paper for a special session at ISMIS 2017. Those submissions will go through a special peer reviewing track and all accepted papers will be published in conference’s proceedings. All other teams are also welcome to submit extended descriptions of their approach to ISMIS 2017, however, their papers will undergo regular reviewing process by members of conference’s Program Committee and might be shifted to a different session (e.g. the industrial track). The regular paper submission system is available here:  https://easychair.org/conferences/?conf=ismis2017

For all of you who would like to continue research related to the competition, we plan to reveal all of the used data (including the labels for the test set). You will be able to find it here within a few weeks.

  • November 22, 2016: start of the competition; data sets become available,
  • January 22, 2017 (23:59 GMT): deadline for submitting the predictions and the reports (the deadline for submitting reports has been extended until January 27),
  • January 29 February 1, 2017: end of the challenge, on-line publication of final results, sending invitations for submitting papers for the special session at ISMIS 2017,
  • late February 2017: deadline for submitting papers describing the selected solutions to the special session at ISMIS 2017.

The teams of the first two top-ranked solutions (based on the final evaluation scores, taking into account solutions satisfying terms and conditions of ISMIS 2017 Data Mining Competition) will be awarded prizes funded by our sponsors:

  • First Prize: 1000 USD + one free ISMIS 2017 conference registration,
  • Second Prize: 500 USD + one free ISMIS 2017 conference registration.

The award ceremony will take place during the ISMIS 2017 conference (June 26-29, 2017, Warsaw, Poland).

Andrzej Janusz, University of Warsaw - Chair

Kamil Żbikowski, Warsaw University of Technology & mBank S.A. - Chair

Piotr Gawrysiak, Warsaw University of Technology & mBank S.A.

Marzena Kryszkiewicz, Warsaw University of Technology

Henryk Rybiński, Warsaw University of Technology

Dominik Ślęzak, University of Warsaw & Infobright

  Discussion Author Replies Last post
publication test set labels Andrzej 0 by Andrzej
Sunday, February 26, 2017, 11:39:18
For the future ( continued ) Marek 0 by Marek
Thursday, February 02, 2017, 14:03:54
The final results were published Andrzej 1 by Dymitr
Wednesday, February 01, 2017, 21:22:08
The final results were published Andrzej 0 by Andrzej
Wednesday, February 01, 2017, 20:49:31
For the future Marek 1 by Andrzej
Wednesday, February 01, 2017, 20:45:06
Extended deadline Łukasz 0 by Łukasz
Wednesday, January 25, 2017, 09:07:42
Report sending deadline extended Andrzej 0 by Andrzej
Tuesday, January 24, 2017, 16:13:47
problems with the evaluation system are fixed Andrzej 0 by Andrzej
Friday, January 20, 2017, 08:39:46
Did not see score appears in the submission board Marek 2 by Andrzej
Friday, January 20, 2017, 08:33:47
start of the last week Andrzej 0 by Andrzej
Monday, January 16, 2017, 15:40:37
the final week Andrzej 0 by Andrzej
Monday, January 16, 2017, 14:54:32
Confuxion Matrix and Cost Matrix Raji 1 by Andrzej
Monday, January 16, 2017, 12:06:40
Where is my submission Quang-Vinh 1 by Quang-Vinh
Friday, January 13, 2017, 11:35:11
terms and schedule Przemek 1 by Przemek
Thursday, January 12, 2017, 08:15:18
ACC reference implementation Alexey 0 by Alexey
Friday, January 13, 2017, 14:39:01
Happy New Year 2017! Marek 1 by Andrzej
Sunday, January 01, 2017, 14:43:32
Third column of the training set Christos 1 by Andrzej
Thursday, December 29, 2016, 08:47:27
How to form a team janpreet 3 by Andrzej
Wednesday, December 21, 2016, 08:31:19
How to delete the submitted results vinayakumar 1 by vinayakumar
Saturday, December 03, 2016, 05:06:04
Third value of a recommendation Christos 1 by Andrzej
Thursday, December 01, 2016, 09:19:11
Timing of Training and Test Sets Dymitr 1 by Dymitr
Sunday, November 27, 2016, 05:25:02
Welcome to ISMIS 2017 Data Mining Competition! Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 17:47:16
ISMIS 2017 Data Mining Competition Andrzej 0 by Andrzej
Tuesday, November 22, 2016, 14:46:46