9 years, 2 months ago

IJCRS'15 Data Challenge: Mining Data from Coal Mines

IJCRS'15 Data Challenge: Mining Data from Coal Mines is a competition organized within a frame of The 2015 International Joint Conference on Rough Sets (IJCRS'15). It continues the tradition of data mining challenges associated with rough set conferences. This time, the task is related to the problem of monitoring and prediction of dangerous concentrations of methane in longwalls of a Polish coal mine. The competition is sponsored by Research and Development Centre EMAG (https://ibemag.pl) with support from International Rough Set Society.

Overview

Coal mining requires working in hazardous conditions. Miners in an underground coal mine can face several threats, such as, e.g. methane explosions or rock-burst. To provide protection for people working underground, systems for active monitoring of a production processes are typically used. One of their fundamental applications is screening dangerous gas concentrations (methane in particular) in order to prevent spontaneous explosions [1]. Therefore, for that purpose the ability to predict dangerous concentrations of gases in the nearest future can be even more important then monitoring the current sensor readings [2].

We would like to address this particular problem in IJCRS'15 Data Challenge: Mining Data from Coal Mines. More details regarding the task and a description of the competition data can be found in Task Description section.

Special session at IJCRS'15: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as regular papers. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.

In case of any questions please post on the competition forum or write us an email: webmaster@knowledgepit.fedcsis.org

References:

  1. M. Kozielski, A. Skowron, Ł. Wróbel, M. Sikora: “Regression Rule Learning for Methane Forecasting in Coal Mines“, Beyond Databases, Architectures, and Structures, CCIS, Vol. 521, Springer International Publishing, pp. 495-504, 2015
  2. A. Krasuski, A. Jankowski, A. Skowron, and D. Ślęzak: “From sensory data to decision making: A perspective on supporting a fire commander”, in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 3. IEEE, 2013, pp. 229–236
  3. H.S. Nguyen: “On Efficient Handling of Continuous Attributes in Large Data Bases”, Fundamenta Informaticae, 48(1):61–81, 2001
  4. J.W. Grzymala-Busse: “A New Version of the Rule Induction System LERS”, Fundamenta Informaticae, 31, pp. 27-39, 1997
  5. L.S. Riza, A. Janusz, C. Bergmeir, C. Cornelis, F. Herrera, D. Ślęzak, and J.M. Benítez: “Implementing Algorithms of Rough Set Theory and Fuzzy Rough Set Theory in the R Package ’RoughSets’”, Information Sciences, 287(0):68–89, 2014
Terms & Conditions
 
 

IJCRS'15 Data Challenge: Mining Data from Coal Mines has ended and we are proud to announce the winners:

  1. Adam Zagorecki (team zagorecki) from Cranfield University, United Kingdom
  2. Marc Boulle (team marcb) from Orange Labs, France
  3. Dymitr Ruta (team dymitrruta) from EBTIC, Khalifa University, United Arab Emirates

Congratulations!

All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):

  • Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015)
  • Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (In print September 2016)

To access the competition data you need to be logged in. If you still haven't registered at Knowledge Pit, you may create an account using this link: https://knowledgepit.fedcsis.org/login/signup.php?

Rank Team Name Is Report Preliminary Score Final Score Submissions
1
zagorecki
True 0.9666 0.959267 2
2
mgrzegorowski
True 0.9327 0.947334 2
3
marcb
True 0.9479 0.943929 2
4
dymitrruta
True 0.9487 0.943699 2
5
kkurach_kp7
True 0.9685 0.940024 2
6
max25
True 0.9603 0.937775 2
7
kkurach
True 0.9591 0.936714 2
8
nitekna
True 0.9255 0.935814 2
9
seba91
True 0.9495 0.934909 2
10
archie2
True 0.9350 0.931817 2
11
wds
True 0.9460 0.931256 2
12
tdziopa
True 0.9288 0.925846 2
13
katarzynki
True 0.9269 0.917891 2
14
fzero
True 0.6469 0.581214 2
15
ayerdi
True 0.5954 0.580494 2
16
toczacypaczek
False 0.9484 No report file found or report rejected. 2
17
trzewior
False 0.9469 No report file found or report rejected. 2
18
weczer
False 0.9460 No report file found or report rejected. 2
19
artyr
False 0.9430 No report file found or report rejected. 2
20
auroree
False 0.9417 No report file found or report rejected. 2
21
tmonq
False 0.9395 No report file found or report rejected. 2
22
krecik
False 0.9395 No report file found or report rejected. 2
23
lab
False 0.9386 No report file found or report rejected. 2
24
krecik1
False 0.9377 No report file found or report rejected. 2
25
tomabar728
False 0.9367 No report file found or report rejected. 2
26
mavax
False 0.9363 No report file found or report rejected. 2
27
zaggy
False 0.9361 No report file found or report rejected. 2
28
adrzeniek
False 0.9356 No report file found or report rejected. 2
29
grzywna
False 0.9351 No report file found or report rejected. 2
30
archie
False 0.9347 No report file found or report rejected. 2
31
kebab48
False 0.9327 No report file found or report rejected. 2
32
annaokon
False 0.9319 No report file found or report rejected. 2
33
pbombik
False 0.9318 No report file found or report rejected. 2
34
mateusz
False 0.9314 No report file found or report rejected. 2
35
krzysiek91
False 0.9313 No report file found or report rejected. 2
36
nikusiaczek
False 0.9306 No report file found or report rejected. 2
37
leo
False 0.9301 No report file found or report rejected. 2
38
buf
False 0.9299 No report file found or report rejected. 2
39
moomean
False 0.9286 No report file found or report rejected. 2
40
artukoz021
False 0.9316 No report file found or report rejected. 2
41
pat_sc
False 0.9233 No report file found or report rejected. 2
42
alesew8368
False 0.9192 No report file found or report rejected. 2
43
agabrys
False 0.9039 No report file found or report rejected. 2
44
mateo081
False 0.9007 No report file found or report rejected. 2
45
bfrackowiak
False 0.9001 No report file found or report rejected. 2
46
baseline_solution
False 0.8930 No report file found or report rejected. 2
47
kp7
False 0.8631 No report file found or report rejected. 2
48
seba92
False 0.8499 No report file found or report rejected. 2
49
sohrab
False 0.8119 No report file found or report rejected. 2
50
fzero2
False 0.6458 No report file found or report rejected. 2
51
reksio
False 0.5002 No report file found or report rejected. 2

Data format: The time series data sets for this competition are provided in a tabular format. For a convenience of participants the training data set was divided into five smaller chunks, namely trainingData1.csv, ..., trainingData5.csv. Those files were compressed into a single archive trainingData.7z and can be dowloaded from the Data files section after successful enrollment to the competition. In total, the files contain sensor readings for 51,700 time periods, each 10 minutes long, with measurements taken every second (600 values for every sensor in a single series). Values for each time period are stored in a different row of the data. The data include readings from 28 different sensors thus, every row in the data consists of 16,800 values stored in consecutive columns and separated by commas. Names of the data columns, which allow to identify sensor names, are provided in a separate file, namely column_names.txt. Descriptions of the types of sensors used from monitoring the mining process are given in sensor_descriptions.txt and their placement in corridors of the mine is indicated on the provided mining process scheme (mining_process_scheme.png). The time periods in the training data are overlapping and are given in a chronological order.

Labels in the data indicate whether a warning threshold has been reached in a period between three and six minutes after the end of the training period, for three methane meters: MM263, MM264 and MM256. In particular, if a given row corresponds to a period between \(t_{-599}\) and \(t_{0}\), then the label for a methane meter MM in this row is 'warning' if and only if \(max(MM(t_{181}), ..., MM(t_{360})) \geq 1.0\). The labels for the training data are provided in separate files, trainingLabels.7z. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a randomize order.

Format of submissions: The participants of the competition are asked to predict likelihood of the label 'warning' for particular time series from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 5,076 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly three real numbers corresponding to the target methane meter sensors, separated by a comma. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.

Evaluation of resultsThe submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a random subset of the test set, fixed for all participants. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at IJCRS'15 conference.

The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure. It will be computed separately for each of the three target sensors. The final score in the competition will correspond to the average AUC for those three sets of predictions. Namely, if for a submitted solution \(s\) we denote by: $$ \begin{array}{ccl} AUC_{MM263}(s) & - & \textrm{AUC of predictions for the sensor MM263}, \\ AUC_{MM264}(s) & - & \textrm{AUC of predictions for the sensor MM264}, \\ AUC_{MM256}(s) & - & \textrm{AUC of predictions for the sensor MM256}, \end{array} $$ then the final score in the competition for a solution s will be computed as: \[score(s) = \left(AUC_{MM263}(s) + AUC_{MM264}(s) + AUC_{MM256}(s)\right)/3\hspace{0.2cm}.\]

The baseline solution: We prepared an exemplary solution as a reference for participants. It is displayed on the leaderboard as the baseline_solution score. This solution was obtained using two popular algorithms which derive from the theory of rough sets. Namely, a discretization method based on maximum discernibility heuristic [3] was used in a combination with LEM2 algorithm [4] for decision rule induction. Both algorithms were implemented in RoughSets package for R System [5].

  • April 13, 2015: start of the competition, data sets become available,
  • June 20, 2015: deadline for submitting the predictions,
  • June 25, 2015: deadline for sending the reports, end of the challenge,
  • June 29, 2015: on-line publication of final results, sending invitations for submitting short papers for the special session at IJCRS'15,
  • July 12, 2015: deadline for submissions of papers describing the selected solutions,
  • July 19, 2015: deadline for submissions of camera-ready papers selected for presentation at the IJCRS'15.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:

  • First Prize: 1000 USD + one free IJCRS'15 conference registration,
  • Second Prize: 500 USD + one free IJCRS'15 conference registration,
  • Third Prize: one free IJCRS'15 conference registration.

The award ceremony will take place during the IJCRS'15 conference (Oct 20-23, 2015, Jeju Island, Korea).

Andrzej Janusz, University of Warsaw

Marek Sikora, Silesian University of Technology

Łukasz Wróbel, Institute of Innovative Technologies EMAG

Sebastian Stawicki, University of Warsaw

Marek Grzegorowski, University of Warsaw

Dominik Ślęzak, University of Warsaw & Infobright Inc.

  Discussion Author Replies Last post
datasets 芳雪 0 by 芳雪
Tuesday, November 10, 2020, 07:02:42
datasets Guitong 0 by Guitong
Wednesday, September 02, 2020, 07:00:03
The deadline for submitting competition reports has been postponed! Andrzej 0 by Andrzej
Tuesday, June 23, 2015, 23:02:56
The last few days of IJCRS’15 Data Challenge Andrzej 0 by Andrzej
Monday, June 15, 2015, 14:49:41
Limit of 100 submissions. Adam 2 by Andrzej
Monday, May 18, 2015, 12:55:55
Welcome to IJCRS'15 Data Challenge Andrzej 0 by Andrzej
Monday, April 13, 2015, 13:00:20