4 years, 8 months ago

## IJCRS'15 Data Challenge: Mining Data from Coal Mines

### IJCRS'15 Data Challenge: Mining Data from Coal Mines is a competition organized within a frame of The 2015 International Joint Conference on Rough Sets (IJCRS'15). It continues the tradition of data mining challenges associated with rough set conferences. This time, the task is related to the problem of monitoring and prediction of dangerous concentrations of methane in longwalls of a Polish coal mine. The competition is sponsored by Research and Development Centre EMAG (http://www.emag.pl/) with support from International Rough Set Society.

Overview

Coal mining requires working in hazardous conditions. Miners in an underground coal mine can face several threats, such as, e.g. methane explosions or rock-burst. To provide protection for people working underground, systems for active monitoring of a production processes are typically used. One of their fundamental applications is screening dangerous gas concentrations (methane in particular) in order to prevent spontaneous explosions [1]. Therefore, for that purpose the ability to predict dangerous concentrations of gases in the nearest future can be even more important then monitoring the current sensor readings [2].

We would like to address this particular problem in IJCRS'15 Data Challenge: Mining Data from Coal Mines. More details regarding the task and a description of the competition data can be found in Task Description section.

Special session at IJCRS'15: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as regular papers. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.

In case of any questions please post on the competition forum or write us an email: webmaster@knowledgepit.fedcsis.org

• The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
• Participants may submit solutions as teams made up of one or more persons.
• Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team. Only the leader needs to be registered at Knowledge Pit and be enrolled for the competition.
• One person may be incorporated in maximally 3 teams.
• Each team needs to be composed of a different set of persons.
• The total number of submissions stored at Knowledge Pit for any single team is limited to 100 solutions.
• A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
• Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by June 22, 2015. Only submissions made by teams that provided the reports will qualify for the final evaluation.
• By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.

Data format: The time series data sets for this competition are provided in a tabular format. For a convenience of participants the training data set was divided into five smaller chunks, namely trainingData1.csv, ..., trainingData5.csv. Those files were compressed into a single archive trainingData.7z and can be dowloaded from the Data files section after successful enrollment to the competition. In total, the files contain sensor readings for 51,700 time periods, each 10 minutes long, with measurements taken every second (600 values for every sensor in a single series). Values for each time period are stored in a different row of the data. The data include readings from 28 different sensors thus, every row in the data consists of 16,800 values stored in consecutive columns and separated by commas. Names of the data columns, which allow to identify sensor names, are provided in a separate file, namely column_names.txt. Descriptions of the types of sensors used from monitoring the mining process are given in sensor_descriptions.txt and their placement in corridors of the mine is indicated on the provided mining process scheme (mining_process_scheme.png). The time periods in the training data are overlapping and are given in a chronological order.

Labels in the data indicate whether a warning threshold has been reached in a period between three and six minutes after the end of the training period, for three methane meters: MM263, MM264 and MM256. In particular, if a given row corresponds to a period between $$t_{-599}$$ and $$t_{0}$$, then the label for a methane meter MM in this row is 'warning' if and only if $$max(MM(t_{181}), ..., MM(t_{360})) \geq 1.0$$. The labels for the training data are provided in separate files, trainingLabels.7z. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a randomize order.

Format of submissions: The participants of the competition are asked to predict likelihood of the label 'warning' for particular time series from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 5,076 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly three real numbers corresponding to the target methane meter sensors, separated by a comma. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.

Evaluation of resultsThe submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. The preliminary score will be computed on a random subset of the test set, fixed for all participants. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at IJCRS'15 conference (http://kbigdata.or.kr/IJCRS2015/).

The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure. It will be computed separately for each of the three target sensors. The final score in the competition will correspond to the average AUC for those three sets of predictions. Namely, if for a submitted solution $$s$$ we denote by: $$\begin{array}{ccl} AUC_{MM263}(s) & - & \textrm{AUC of predictions for the sensor MM263}, \\ AUC_{MM264}(s) & - & \textrm{AUC of predictions for the sensor MM264}, \\ AUC_{MM256}(s) & - & \textrm{AUC of predictions for the sensor MM256}, \end{array}$$ then the final score in the competition for a solution s will be computed as: $score(s) = \left(AUC_{MM263}(s) + AUC_{MM264}(s) + AUC_{MM256}(s)\right)/3\hspace{0.2cm}.$

The baseline solution: We prepared an exemplary solution as a reference for participants. It is displayed on the leaderboard as the baseline_solution score. This solution was obtained using two popular algorithms which derive from the theory of rough sets. Namely, a discretization method based on maximum discernibility heuristic [3] was used in a combination with LEM2 algorithm [4] for decision rule induction. Both algorithms were implemented in RoughSets package for R System [5].

IJCRS'15 Data Challenge: Mining Data from Coal Mines has ended and we are proud to announce the winners:

1. Adam Zagorecki (team zagorecki) from Cranfield University, United Kingdom
2. Marc Boulle (team marcb) from Orange Labs, France
3. Dymitr Ruta (team dymitrruta) from EBTIC, Khalifa University, United Arab Emirates

Congratulations!

All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):

• Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015) (bibtex)
• Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (In print September 2016) (bibtex)

To access the competition data you need to be logged in. If you still haven't registered at Knowledge Pit, you may create an account using this link: https://knowledgepit.fedcsis.org/login/signup.php?

• April 13, 2015: start of the competition, data sets become available,
• June 20, 2015: deadline for submitting the predictions,
• June 25, 2015: deadline for sending the reports, end of the challenge,
• June 29, 2015: on-line publication of final results, sending invitations for submitting short papers for the special session at IJCRS'15,
• July 12, 2015: deadline for submissions of papers describing the selected solutions,
• July 19, 2015: deadline for submissions of camera-ready papers selected for presentation at the IJCRS'15.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:

• First Prize: 1000 USD + one free IJCRS'15 conference registration,
• Second Prize: 500 USD + one free IJCRS'15 conference registration,
• Third Prize: one free IJCRS'15 conference registration.

The award ceremony will take place during the IJCRS'15 conference (Oct 20-23, 2015, Jeju Island, Korea).

Andrzej Janusz, University of Warsaw

Marek Sikora, Silesian University of Technology

Łukasz Wróbel, Institute of Innovative Technologies EMAG

Sebastian Stawicki, University of Warsaw

Marek Grzegorowski, University of Warsaw

Dominik Ślęzak, University of Warsaw & Infobright Inc.

