8 years, 1 month ago

AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines

AAIA'16 Data Mining Challenge is the third data mining competition associated with International Symposium on Advances in Artificial Intelligence and Applications (AAIA'16, https://fedcsis.org/2016/aaia) which is a part of FedCSIS conference series. This time, the task is related to the problem of predicting periods of increased seismic activity which may cause life-threatening accidents in underground coal mines. Prizes worth over 3,000 USD will be awarded to the most successful teams. The contest is sponsored by Research and Development Centre EMAG (http://ibemag.pl) with support from Polish Information Processing Society (http://www.pti.org.pl/) and Dituel Sp. z o.o. (http://www.dituel.pl/).

Overview

Providing safety of miners working underground is the fundamental requirement for the coal mining industry in Poland. Coal mining companies are obligated by the law to introduce many safety measures to secure the proper working conditions of their underground personnel. However, expert knowledge-based safety monitoring systems sometimes fail to foresee dangerous seismic events which have disastrous consequences. In this data mining competition, we would like to address this challenging problem. In particular, we want to ask participants to come up with reliable methods for predicting periods of increased seismic activity perceived in longwalls of a coal mine. 

More details regarding the task and a description of the competition data can be found in Task Description section.

Special session at AAIA'16: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as short papers and will be indexed by IEEE Digital Library and Web of Science. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.

In case of any questions please post on the competition forum or write us an email: contact@knowledgepit.ml

References:

  1. A. Janusz, M. Grzegorowski, M. Michalak, Ł. Wróbel, M. Sikora, D. Ślęzak: “Predicting seismic events in coal mines based on underground sensor measurements“, Eng. Appl. of AI, Vol. 64, Elsevier, pp. 83-94, 2017
  2. A. Janusz, M. Sikora, Ł. Wróbel, S. Stawicki, M. Grzegorowski, P. Wojtas, D. Ślęzak: "Mining Data from Coal Mines: IJCRS’15 Data Challenge", in Proceedings of RSFDGrC 2015, LNAI 9437, Springer, 2015
  3. M. Kozielski, A. Skowron, Ł. Wróbel, M. Sikora: “Regression Rule Learning for Methane Forecasting in Coal Mines“, Beyond Databases, Architectures, and Structures, CCIS, Vol. 521, Springer International Publishing, pp. 495-504, 2015
Terms & Conditions
 
 

AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines is over. We would like to thank all participants for their contribution!

We are happy to announce that the competition attracted a total of 203 teams from which 106 were active and submitted at least one solution to the leaderboard. A total number of submissions was 3135 which will give us a great material for research. Thanks!

The official Winners:

  1. Michal Tadeusiak (team tadeusz) from Deepsense.io, Poland
  2. Robert Bogucki, Jan Lasek, Jan Kanty Milczek, Michał Tadeusiak (team deepsense.io) from Deepsense.io, Poland
  3. Yasser Tabandeh (team yata) from Golgohar Mining & Industrial Company, Iran

In the beginning of next week we will start sending invitations extend the report into a short paper submitted to AAIA’16 conference. Those submissions will go through a special peer-reviewing track and all accepted papers will be assigned to a special session devoted to the competition. All other teams are also welcome to submit extended descriptions of their approach to AAIA’16 (the submission system is available here: https://www.fedcsis.org/hotcrp/), however, their papers will undergo the regular reviewing process by members of event’s Program Committee.

All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):

  • Janusz, A., Grzegorowski, M., Michalak, M., Wróbel, Ł., Sikora, M., Ślęzak, D.: Predicting seismic events in coal mines based on underground sensor measurements, Eng. Appl. of AI, Vol. 64, Elsevier, pp. 83-94, 2017 (bibtex)
  • Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015) (bibtex)
  • Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (bibtex)

To access the competition data you need to be logged and enrolled in the challenge.

Contest Participation Rules:

  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons.
  • Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team. Only the leader needs to be registered at Knowledge Pit and be enrolled for the competition.
  • One person may be incorporated in maximally 3 teams.
  • Each team needs to be composed of a different set of persons.
  • The total number of submissions stored at Knowledge Pit for any single team is not limited.
  • During the competition, teams that fulfill the requirements stated in the Task Description section may obtain access to additional training data sets. It is forbidden to share the obtained data with other teams participating in the competition or to provide access to this data to any third parties without a written consent from the organizers.
  • A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by February 29, 2016. Only submissions made by the teams that provided good quality reports will qualify for the final evaluation.
  • Organizers may reject any submission if they suspect that it was produced in an unfairly way or was submitted by a team which has broken the competition rules.
  • By enrolling to this competition you grant the organizers rights to process your submissions and reports for the purpose of evaluation and post-competition research.
Please log in to the system!
Rank Team Name Is Report Preliminary Score Final Score Submissions
1
tadeusz
True 0.9304 0.939321 2
2
deepsense.io
True 0.9452 0.938438 2
3
yata
True 0.9444 0.934233 2
4
podludek
True 0.9304 0.933630 2
5
jellyfish
True 0.9320 0.933578 2
6
millcheck
True 0.9233 0.932928 2
7
kkurach
True 0.9316 0.931173 2
8
gabd
True 0.9274 0.929920 2
9
basakesin
True 0.9310 0.929715 2
10
marcb
True 0.9248 0.924586 2
11
nitekna
True 0.9276 0.924578 2
12
0bartek
True 0.9199 0.923782 2
13
tpn
True 0.9349 0.923655 2
14
doxus
True 0.9270 0.922539 2
15
mgh
True 0.9316 0.922077 2
16
unnamed
True 0.9326 0.921515 2
17
kd
True 0.9158 0.918470 2
18
claygirl
True 0.9337 0.915997 2
19
andrii_babii
True 0.9335 0.915559 2
20
nttg
True 0.9323 0.915507 2
21
filip
True 0.9154 0.915477 2
22
quzar
True 0.9214 0.914302 2
23
dymitrruta
True 0.9342 0.913221 2
24
minio7
True 0.9187 0.913043 2
25
boocheck
True 0.9320 0.912792 2
26
jan
True 0.9452 0.907209 2
27
dotjabber
True 0.9082 0.905441 2
28
kp
True 0.9254 0.903444 2
29
raven0093
True 0.9291 0.899754 2
30
psilocybe
True 0.9177 0.898173 2
31
kp7
True 0.9270 0.893623 2
32
tomakur669
True 0.9092 0.890945 2
33
kneefer
True 0.9217 0.889064 2
34
ketjow4
True 0.9151 0.883797 2
35
vabi
True 0.9283 0.883599 2
36
mathurin
True 0.9283 0.875353 2
37
zagorecki
True 0.9354 0.872301 2
38
flac
True 0.9207 0.869977 2
39
pearman
True 0.9208 0.865146 2
40
fox
True 0.9160 0.856220 2
41
louco
True 0.9253 0.855794 2
42
palciu
True 0.9240 0.826096 2
43
camilius
True 0.7743 0.769713 2
44
grzegorzkozlowski
True 0.9243 0.765469 2
45
dumpling
True 0.9330 0.758928 2
46
researchlabs
True 0.7296 0.699813 2
47
dubskrzak
False 0.9328 No report file found or report rejected. 2
48
opi
False 0.9299 No report file found or report rejected. 2
49
rough
False 0.9278 No report file found or report rejected. 2
50
jackmark
False 0.9268 No report file found or report rejected. 2
51
shader93
False 0.9261 No report file found or report rejected. 2
52
cpp11
False 0.9220 No report file found or report rejected. 2
53
eltrollado
False 0.9182 No report file found or report rejected. 2
54
kelexu
False 0.9173 No report file found or report rejected. 2
55
420
False 0.9177 No report file found or report rejected. 2
56
quariante2
False 0.9136 No report file found or report rejected. 2
57
tekbar
False 0.9134 No report file found or report rejected. 2
58
kaser
False 0.9129 No report file found or report rejected. 2
59
zgorzal
False 0.9127 No report file found or report rejected. 2
60
asd.idpl@gmail.com
False 0.9125 No report file found or report rejected. 2
61
snm
False 0.9092 No report file found or report rejected. 2
62
asdfasdf
False 0.9051 No report file found or report rejected. 2
63
quariante
False 0.9238 No report file found or report rejected. 2
64
gibek
False 0.9007 No report file found or report rejected. 2
65
ssiedits1
False 0.9007 No report file found or report rejected. 2
66
mucha
False 0.9075 No report file found or report rejected. 2
67
krzaq
False 0.8899 No report file found or report rejected. 2
68
mix2ra
False 0.8774 No report file found or report rejected. 2
69
belegus
False 0.9054 No report file found or report rejected. 2
70
barcelonczyk
False 0.8322 No report file found or report rejected. 2
71
tomaszrzepka
False 0.8292 No report file found or report rejected. 2
72
pg7799
False 0.8153 No report file found or report rejected. 2
73
robson92
False 0.8248 No report file found or report rejected. 2
74
tmacelko
False 0.7934 No report file found or report rejected. 2
75
dawid.poloczek
False 0.7987 No report file found or report rejected. 2
76
rs
False 0.7862 No report file found or report rejected. 2
77
domin
False 0.7840 No report file found or report rejected. 2
78
lab
False 0.7724 No report file found or report rejected. 2
79
mr.nimelo
False 0.7724 No report file found or report rejected. 2
80
0000
False 0.7624 No report file found or report rejected. 2
81
ashish
False 0.6081 No report file found or report rejected. 2
82
kelog
False 0.6405 No report file found or report rejected. 2
83
hieuvq
False 0.5812 No report file found or report rejected. 2
84
pvanhth
False 0.7502 No report file found or report rejected. 2
85
chlam
False 0.5501 No report file found or report rejected. 2
86
nbelacel
False 0.5395 No report file found or report rejected. 2
87
piotrmaciag
False 0.5552 No report file found or report rejected. 2
88
nguye231
False 0.5331 No report file found or report rejected. 2
89
henryteo
False 0.5314 No report file found or report rejected. 2
90
reksio
False 0.5030 No report file found or report rejected. 2
91
ogencoglu
False 0.5015 No report file found or report rejected. 2
92
sohrab
False 0.6541 No report file found or report rejected. 2
93
mg320637
False 0.0000 No report file found or report rejected. 2
94
schredis
False 0.8796 No report file found or report rejected. 2
95
ens
False 0.5805 No report file found or report rejected. 2
96
mikes
False 0.0000 No report file found or report rejected. 2
97
shad
False 0.9007 No report file found or report rejected. 2
98
besin
False 0.0000 No report file found or report rejected. 2

The data sets for this competition are provided in a tabular format. The training data set, namely trainingData.csv, and the corresponding data labels, trainingLabels.csv, were compressed into a single archive trainingData.7z and can be downloaded from the Data files section after successful enrollment to the competition. In total, the training file contains 79,893 records, each corresponding to 24 hours of measurements.  Values stored in a single record can be divided into two separate parts. The first part consists of an identifier of the main working site and 12 other characteristics related to the whole period of 24 hours described by the record. The second part is composed of hourly aggregated measurements, thus for each characteristic it includes 24 consecutive values. There is a total number of 541 columns in the data (including the main working site id). There is also available a separate file, namely working_site_metadata.csv, with additional information about all main working sites included in the data (in the training and test parts). We also provide a separate file with names of columns (attributes) in the training and test data sets.

Labels in the data indicate whether a total seismic energy perceived with 8 hours after the period covered by a data record exceeds the warning threshold (i.e. 5*10^4 Joules). The labels for the training data are provided in separate files. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a random order.

Additional training data: Apart from the initial training data set there are also 4 smaller data sets which can be obtained by teams actively participating in the challenge. These sets can be used as additional training data in this challenge. All the new data records come from a period of time between the base training set and the test set and they are arranged chronologically, i.e. the first of additional sets comes from a period right after the training data, the second set comes from a period right after the first set and so on. The last of additional sets corresponds to a period right before the test data period.

We will provide access to these sets to participating teams based on their number of submitted solutions. Each team is required to make at least 10 correctly formatted submissions to get access to a single data file, thus to get access to all additional data a team needs to submit at least 40 solutions. Any solution which turns out to be unrelated to the task in the challenge (produce evaluation errors or are essentially random) will not be taken into account when deciding eligibility to obtain the additional files. Moreover, each of the additional files will be released at a different time:

  • additional_training_data_1.csv on Oct. 26, 2015
  • additional_training_data_2.csv on Nov. 23, 2015
  • additional_training_data_3.csv on Dec. 21, 2015
  • additional_training_data_4.csv on Jan. 18, 2016

No team participating in the competition can access an additional data file earlier than on its release date. We will be checking the eligibility to obtain particular data files periodically – every second Monday, beginning from the first release date.

Furthermore, all the additional data will become available to all participants in the last few days of the competition.

Format of submissions: The participants of the competition are asked to predict likelihood of the label 'warning' for the records from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 3,860 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one real number corresponding to the predicted likelihood. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.

Evaluation of resultsThe submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a subset of the test set, fixed for all participants. It will correspond to approximately 25% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at AAIA'16 conference (https://fedcsis.org/2016/aaia). The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure.

In order to download competition files you need to be enrolled.
  • October 5, 2015: start of the competition, data sets and description become available,
  • February 27, 2016: deadline for submitting the predictions,
  • 2016 March 4, 23:59 GMT (extended): deadline for sending the reports, end of the challenge,
  • March, 2016: on-line publication of final results, sending invitations for submitting short papers for the special session at FedCSIS'16,
  • April, 2016: deadline for submissions of papers describing the selected solutions,
  • April, 2016: deadline for submissions of camera-ready papers selected for presentation at the FedCSIS'16.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:

  • First Prize: 1000 USD + one free FedCSIS'16 conference registration,
  • Second Prize: 500 USD + one free FedCSIS'16 conference registration,
  • Third Prize: one free FedCSIS'16 conference registration.

The award ceremony will take place during the FedCSIS'16 conference (Sep 11-14, 2016, Gdańsk, Poland).

Andrzej Janusz, University of Warsaw

Marek Sikora, Institute of Innovative Technologies EMAG

Łukasz Wróbel, Institute of Innovative Technologies EMAG

Sebastian Stawicki, University of Warsaw

Marek Grzegorowski, University of Warsaw

Sinh Hoa Nguyen, University of Warsaw

Dominik Ślęzak, University of Warsaw & Infobright Inc.

  Discussion Author Replies Last post
HOW CAN I DOWNLOAD THE DATA Yu 2 by Andrzej
Tuesday, August 27, 2019, 13:11:00
Labels for the final test set Jan 1 by Jan
Monday, March 07, 2016, 10:55:44
Extended deadline for sending the reports Andrzej 0 by Andrzej
Sunday, February 28, 2016, 23:35:16
Submission sending interval Andrzej 0 by Andrzej
Thursday, February 25, 2016, 22:15:45
The last few days of AAIA'16 Data Mining Challenge Andrzej 0 by Andrzej
Monday, February 22, 2016, 11:52:35
The last few days of the AAIA'16 Data Mining Challenge Andrzej 0 by Andrzej
Monday, February 22, 2016, 11:51:23
Problem with uploading prediction file Kamil 2 by Kamil
Friday, February 19, 2016, 10:20:04
Scheduled maintenance of our service Andrzej 0 by Andrzej
Thursday, February 18, 2016, 13:50:19
Maintenance of our server Andrzej 0 by Andrzej
Thursday, February 18, 2016, 13:48:19
Availability of additional data Andrzej 0 by Andrzej
Monday, February 15, 2016, 19:49:12
Allowed to find a teammate here? Kele 3 by Kele
Saturday, January 23, 2016, 12:41:10
The last of additional training data sets Andrzej 0 by Andrzej
Monday, January 18, 2016, 20:06:59
Coal mine structure and additional data Dymitr 2 by Dymitr
Thursday, January 07, 2016, 22:02:36
Freqency of warning signals Jan 6 by Jan
Tuesday, January 19, 2016, 13:28:26
The third subset of additional data Andrzej 0 by Andrzej
Monday, December 21, 2015, 15:31:10
The third subset of additional data Andrzej 0 by Andrzej
Monday, December 21, 2015, 15:11:31
The second subset of additional training data Andrzej 0 by Andrzej
Tuesday, November 24, 2015, 15:15:20
Sample Submission Kele 1 by Kele
Friday, November 06, 2015, 23:48:03
Additional training data Andrzej 0 by Andrzej
Monday, October 26, 2015, 19:58:08
Additional training data Andrzej 6 by Andrzej
Monday, October 26, 2015, 19:56:09
Availability of reference papers Himanshu 4 by Andrzej
Thursday, October 29, 2015, 17:11:22
Maintenance of Knowledge Pit’s server Andrzej 0 by Andrzej
Monday, October 26, 2015, 13:10:21
Maintenance of Knowledge Pit’s server Andrzej 0 by Andrzej
Monday, October 26, 2015, 13:07:28
Submission problem Bartosz 7 by Andrzej
Tuesday, October 20, 2015, 11:34:23
Leaderboard is functional again Andrzej 1 by Andrzej
Friday, October 16, 2015, 19:06:55
Desired level of AUC Jan 2 by Łukasz
Sunday, October 18, 2015, 00:14:00
Leaderboard evaluation Eftim 1 by Eftim
Saturday, October 10, 2015, 21:15:38
Some explanation of data fields Jan 4 by Jan
Thursday, October 22, 2015, 12:55:29
Submissions and leaderboard sites are down Eftim 17 by Adam
Friday, October 16, 2015, 18:59:52
Start of AAIA'16 Data Mining Challenge Andrzej 0 by Andrzej
Monday, October 05, 2015, 01:46:09