IJCRS'15 Data Challenge

9 years, 1 month ago

IJCRS'15 Data Challenge: Mining Data from Coal Mines

IJCRS'15 Data Challenge: Mining Data from Coal Mines is a competition organized within a frame of The 2015 International Joint Conference on Rough Sets (IJCRS'15). It continues the tradition of data mining challenges associated with rough set conferences. This time, the task is related to the problem of monitoring and prediction of dangerous concentrations of methane in longwalls of a Polish coal mine. The competition is sponsored by Research and Development Centre EMAG (https://ibemag.pl) with support from International Rough Set Society.

Overview

Coal mining requires working in hazardous conditions. Miners in an underground coal mine can face several threats, such as, e.g. methane explosions or rock-burst. To provide protection for people working underground, systems for active monitoring of a production processes are typically used. One of their fundamental applications is screening dangerous gas concentrations (methane in particular) in order to prevent spontaneous explosions [1]. Therefore, for that purpose the ability to predict dangerous concentrations of gases in the nearest future can be even more important then monitoring the current sensor readings [2].

We would like to address this particular problem in IJCRS'15 Data Challenge: Mining Data from Coal Mines. More details regarding the task and a description of the competition data can be found in Task Description section.

Special session at IJCRS'15: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as regular papers. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.

In case of any questions please post on the competition forum or write us an email: webmaster@knowledgepit.fedcsis.org

References:

M. Kozielski, A. Skowron, Ł. Wróbel, M. Sikora: “Regression Rule Learning for Methane Forecasting in Coal Mines“, Beyond Databases, Architectures, and Structures, CCIS, Vol. 521, Springer International Publishing, pp. 495-504, 2015
A. Krasuski, A. Jankowski, A. Skowron, and D. Ślęzak: “From sensory data to decision making: A perspective on supporting a fire commander”, in Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 3. IEEE, 2013, pp. 229–236
H.S. Nguyen: “On Efficient Handling of Continuous Attributes in Large Data Bases”, Fundamenta Informaticae, 48(1):61–81, 2001
J.W. Grzymala-Busse: “A New Version of the Rule Induction System LERS”, Fundamenta Informaticae, 31, pp. 27-39, 1997
L.S. Riza, A. Janusz, C. Bergmeir, C. Cornelis, F. Herrera, D. Ślęzak, and J.M. Benítez: “Implementing Algorithms of Rough Set Theory and Fuzzy Rough Set Theory in the R Package ’RoughSets’”, Information Sciences, 287(0):68–89, 2014

Terms & Conditions

Summary of the Challenge

IJCRS'15 Data Challenge: Mining Data from Coal Mines has ended and we are proud to announce the winners:

Adam Zagorecki (team zagorecki) from Cranfield University, United Kingdom
Marc Boulle (team marcb) from Orange Labs, France
Dymitr Ruta (team dymitrruta) from EBTIC, Khalifa University, United Arab Emirates

Congratulations!

All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):

Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015)
Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (In print September 2016)

To access the competition data you need to be logged in. If you still haven't registered at Knowledge Pit, you may create an account using this link: https://knowledgepit.fedcsis.org/login/signup.php?

Final results

Rank	Team Name	Is Report	Preliminary Score	Final Score	Submissions
1	zagorecki	True	0.9666	0.959267	2
2	mgrzegorowski	True	0.9327	0.947334	2
3	marcb	True	0.9479	0.943929	2
4	dymitrruta	True	0.9487	0.943699	2
5	kkurach_kp7	True	0.9685	0.940024	2
6	max25	True	0.9603	0.937775	2
7	kkurach	True	0.9591	0.936714	2
8	nitekna	True	0.9255	0.935814	2
9	seba91	True	0.9495	0.934909	2
10	archie2	True	0.9350	0.931817	2
11	wds	True	0.9460	0.931256	2
12	tdziopa	True	0.9288	0.925846	2
13	katarzynki	True	0.9269	0.917891	2
14	fzero	True	0.6469	0.581214	2
15	ayerdi	True	0.5954	0.580494	2
16	toczacypaczek	False	0.9484	No report file found or report rejected.	2
17	trzewior	False	0.9469	No report file found or report rejected.	2
18	weczer	False	0.9460	No report file found or report rejected.	2
19	artyr	False	0.9430	No report file found or report rejected.	2
20	auroree	False	0.9417	No report file found or report rejected.	2
21	tmonq	False	0.9395	No report file found or report rejected.	2
22	krecik	False	0.9395	No report file found or report rejected.	2
23	lab	False	0.9386	No report file found or report rejected.	2
24	krecik1	False	0.9377	No report file found or report rejected.	2
25	tomabar728	False	0.9367	No report file found or report rejected.	2
26	mavax	False	0.9363	No report file found or report rejected.	2
27	zaggy	False	0.9361	No report file found or report rejected.	2
28	adrzeniek	False	0.9356	No report file found or report rejected.	2
29	grzywna	False	0.9351	No report file found or report rejected.	2
30	archie	False	0.9347	No report file found or report rejected.	2
31	kebab48	False	0.9327	No report file found or report rejected.	2
32	annaokon	False	0.9319	No report file found or report rejected.	2
33	pbombik	False	0.9318	No report file found or report rejected.	2
34	mateusz	False	0.9314	No report file found or report rejected.	2
35	krzysiek91	False	0.9313	No report file found or report rejected.	2
36	nikusiaczek	False	0.9306	No report file found or report rejected.	2
37	leo	False	0.9301	No report file found or report rejected.	2
38	buf	False	0.9299	No report file found or report rejected.	2
39	moomean	False	0.9286	No report file found or report rejected.	2
40	artukoz021	False	0.9316	No report file found or report rejected.	2
41	pat_sc	False	0.9233	No report file found or report rejected.	2
42	alesew8368	False	0.9192	No report file found or report rejected.	2
43	agabrys	False	0.9039	No report file found or report rejected.	2
44	mateo081	False	0.9007	No report file found or report rejected.	2
45	bfrackowiak	False	0.9001	No report file found or report rejected.	2
46	baseline_solution	False	0.8930	No report file found or report rejected.	2
47	kp7	False	0.8631	No report file found or report rejected.	2
48	seba92	False	0.8499	No report file found or report rejected.	2
49	sohrab	False	0.8119	No report file found or report rejected.	2
50	fzero2	False	0.6458	No report file found or report rejected.	2
51	reksio	False	0.5002	No report file found or report rejected.	2

Task description

Data format: The time series data sets for this competition are provided in a tabular format. For a convenience of participants the training data set was divided into five smaller chunks, namely trainingData1.csv, ..., trainingData5.csv. Those files were compressed into a single archive trainingData.7z and can be dowloaded from the Data files section after successful enrollment to the competition. In total, the files contain sensor readings for 51,700 time periods, each 10 minutes long, with measurements taken every second (600 values for every sensor in a single series). Values for each time period are stored in a different row of the data. The data include readings from 28 different sensors thus, every row in the data consists of 16,800 values stored in consecutive columns and separated by commas. Names of the data columns, which allow to identify sensor names, are provided in a separate file, namely column_names.txt. Descriptions of the types of sensors used from monitoring the mining process are given in sensor_descriptions.txt and their placement in corridors of the mine is indicated on the provided mining process scheme (mining_process_scheme.png). The time periods in the training data are overlapping and are given in a chronological order.

Labels in the data indicate whether a warning threshold has been reached in a period between three and six minutes after the end of the training period, for three methane meters: MM263, MM264 and MM256. In particular, if a given row corresponds to a period between $t_{-599}$ and $t_{0}$, then the label for a methane meter MM in this row is 'warning' if and only if $max(MM(t_{181}), ..., MM(t_{360})) \geq 1.0$. The labels for the training data are provided in separate files, trainingLabels.7z. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a randomize order.

Format of submissions: The participants of the competition are asked to predict likelihood of the label 'warning' for particular time series from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 5,076 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly three real numbers corresponding to the target methane meter sensors, separated by a comma. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.

Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a random subset of the test set, fixed for all participants. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at IJCRS'15 conference.

The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure. It will be computed separately for each of the three target sensors. The final score in the competition will correspond to the average AUC for those three sets of predictions. Namely, if for a submitted solution $s$ we denote by: $$ \begin{array}{ccl} AUC_{MM263}(s) & - & \textrm{AUC of predictions for the sensor MM263}, \\ AUC_{MM264}(s) & - & \textrm{AUC of predictions for the sensor MM264}, \\ AUC_{MM256}(s) & - & \textrm{AUC of predictions for the sensor MM256}, \end{array} $$ then the final score in the competition for a solution s will be computed as: \[score(s) = \left(AUC_{MM263}(s) + AUC_{MM264}(s) + AUC_{MM256}(s)\right)/3\hspace{0.2cm}.\]

The baseline solution: We prepared an exemplary solution as a reference for participants. It is displayed on the leaderboard as the baseline_solution score. This solution was obtained using two popular algorithms which derive from the theory of rough sets. Namely, a discretization method based on maximum discernibility heuristic [3] was used in a combination with LEM2 algorithm [4] for decision rule induction. Both algorithms were implemented in RoughSets package for R System [5].

Schedule

April 13, 2015: start of the competition, data sets become available,
June 20, 2015: deadline for submitting the predictions,
June 25, 2015: deadline for sending the reports, end of the challenge,
June 29, 2015: on-line publication of final results, sending invitations for submitting short papers for the special session at IJCRS'15,
July 12, 2015: deadline for submissions of papers describing the selected solutions,
July 19, 2015: deadline for submissions of camera-ready papers selected for presentation at the IJCRS'15.

Awards

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:

First Prize: 1000 USD + one free IJCRS'15 conference registration,
Second Prize: 500 USD + one free IJCRS'15 conference registration,
Third Prize: one free IJCRS'15 conference registration.

The award ceremony will take place during the IJCRS'15 conference (Oct 20-23, 2015, Jeju Island, Korea).

Contest organizing committee

Andrzej Janusz, University of Warsaw

Marek Sikora, Silesian University of Technology

Łukasz Wróbel, Institute of Innovative Technologies EMAG

Sebastian Stawicki, University of Warsaw

Marek Grzegorowski, University of Warsaw

Dominik Ślęzak, University of Warsaw & Infobright Inc.

Forum

Discussion	Author	Replies	Last post
datasets	芳雪	0	by 芳雪 Tuesday, November 10, 2020, 07:02:42
datasets	Guitong	0	by Guitong Wednesday, September 02, 2020, 07:00:03
The deadline for submitting competition reports has been postponed!	Andrzej	0	by Andrzej Tuesday, June 23, 2015, 23:02:56
The last few days of IJCRS’15 Data Challenge	Andrzej	0	by Andrzej Monday, June 15, 2015, 14:49:41
Limit of 100 submissions.	Adam	2	by Andrzej Monday, May 18, 2015, 12:55:55
Welcome to IJCRS'15 Data Challenge	Andrzej	0	by Andrzej Monday, April 13, 2015, 13:00:20