3 years, 4 months ago
AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines
AAIA'16 Data Mining Challenge is the third data mining competition associated with International Symposium on Advances in Artificial Intelligence and Applications (AAIA'16, https://fedcsis.org/2016/aaia) which is a part of FedCSIS conference series. This time, the task is related to the problem of predicting periods of increased seismic activity which may cause life-threatening accidents in underground coal mines. Prizes worth over 3,000 USD will be awarded to the most successful teams. The contest is sponsored by Research and Development Centre EMAG (http://ibemag.pl) with support from Polish Information Processing Society (http://www.pti.org.pl/) and Dituel Sp. z o.o. (http://www.dituel.pl/).
Providing safety of miners working underground is the fundamental requirement for the coal mining industry in Poland. Coal mining companies are obligated by the law to introduce many safety measures to secure proper working conditions of their underground personnel. However, expert knowledge-based safety monitoring systems sometimes fail to foresee dangerous seismic events which have disastrous consequences. In this data mining competition we would like to address this challenging problem. In particular, we want to ask participants to come up with reliable methods for predicting periods of increased seismic activity perceived in longwalls of a coal mine.
More details regarding the task and a description of the competition data can be found in Task Description section.
Special session at AAIA'16: A special session devoted to the competition will be held at the conference. We will invite authors of selected reports to extend them for publication in the conference proceedings (after reviews by Organizing Committee members) and presentation at the conference. The publications will be treated as short papers and will be indexed by IEEE Digital Library and Web of Science. The invited teams will be chosen based on their final rank, innovativeness of their approach and quality of the submitted report.
In case of any questions please post on the competition forum or write us an email: firstname.lastname@example.org
- A. Janusz, M. Sikora, Ł. Wróbel, S. Stawicki, M. Grzegorowski, P. Wojtas, D. Ślęzak: "Mining Data from Coal Mines: IJCRS’15 Data Challenge", in Proceedings of RSFDGrC 2015, LNAI 9437, Springer, 2015
- M. Kozielski, A. Skowron, Ł. Wróbel, M. Sikora: “Regression Rule Learning for Methane Forecasting in Coal Mines“, Beyond Databases, Architectures, and Structures, CCIS, Vol. 521, Springer International Publishing, pp. 495-504, 2015
Contest Participation Rules:
- The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
- Participants may submit solutions as teams made up of one or more persons.
- Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team. Only the leader needs to be registered at Knowledge Pit and be enrolled for the competition.
- One person may be incorporated in maximally 3 teams.
- Each team needs to be composed of a different set of persons.
- The total number of submissions stored at Knowledge Pit for any single team is not limited.
- During the competition, teams that fulfill the requirements stated in the Task Description section may obtain access to additional training data sets. It is forbidden to share the obtained data with other teams participating in the competition or to provide access to this data to any third parties without a written consent from the organizers.
- A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
- Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by February 29, 2016. Only made by the teams that provided good quality reports will qualify for the final evaluation.
- Organizers may reject any submission if they suspect that it was produced in an unfairly way or was submitted by a team which has broken the competition rules.
- By enrolling to this competition you grant the organizers rights to process your and reports for the purpose of evaluation and post-competition research.
The data sets for this competition are provided in a tabular format. The training data set, namely trainingData.csv, and the corresponding data labels, trainingLabels.csv, were compressed into a single archive trainingData.7z and can be downloaded from the section after successful enrollment to the competition. In total, the training file contains 79,893 records, each corresponding to 24 hours of measurements. Values stored in a single record can be divided into two separate parts. The first part consists of an identifier of the main working site and 12 other characteristics related to the whole period of 24 hours described by the record. The second part is composed of hourly aggregated measurements, thus for each characteristic it includes 24 consecutive values. There is a total number of 541 columns in the data (including the main working site id). There is also available a separate file, namely working_site_metadata.csv, with additional information about all main working sites included in the data (in the training and test parts). We also provide a separate file with names of columns (attributes) in the training and test data sets.
Labels in the data indicate whether a total seismic energy perceived with 8 hours after the period covered by a data record exceeds the warning threshold (i.e. 5*10^4 Joules). The labels for the training data are provided in separate files. The test data file, testData.7z, is in the same format as the training data set, however, the labels for the test series are hidden from participants. It is important to note that time periods in the test data do not overlap and they are given in a random order.
Additional training data: Apart from the initial training data set there are also 4 smaller data sets which can be obtained by teams actively participating in the challenge. These sets can be used as additional training data in this challenge. All the new data records come from a period of time between the base training set and the test set and they are arranged chronologically, i.e. the first of additional sets comes from a period right after the training data, the second set comes from a period right after the first set and so on. The last of additional sets corresponds to a period right before the test data period.
We will provide access to these sets to participating teams based on their number of submitted solutions. Each team is required to make at least 10 correctly formatted to get access to a single data file, thus to get access to all additional data a team needs to submit at least 40 solutions. Any solution which turns out to be unrelated to the task in the challenge (produce evaluation errors or are essentially random) will not be taken into account when deciding eligibility to obtain the additional files. Moreover, each of the additional files will be released at a different time:
- additional_training_data_1.csv on Oct. 26, 2015
- additional_training_data_2.csv on Nov. 23, 2015
- additional_training_data_3.csv on Dec. 21, 2015
- additional_training_data_4.csv on Jan. 18, 2016
No team participating in the competition can access an additional data file earlier than on its release date. We will be checking the eligibility to obtain particular data files periodically – every second Monday, beginning from the first release date.
Furthermore, all the additional data will become available to all participants in the last few days of the competition.
Format of : The participants of the competition are asked to predict likelihood of the label 'warning' for the records from the test set and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 3,860 lines (files with an additional empty last line will also be accepted). In the consecutive lines, this file should contain exactly one real number corresponding to the predicted likelihood. The values do not need to be in a particular range, however, higher numerical values should indicate a higher chance of the label 'warning'.
Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition https://fedcsis.org/2016/aaia). The assessment of solutions will be done using the Area Under the ROC Curve (AUC) measure.. The preliminary score will be computed on a subset of the test set, fixed for all participants. It will correspond to approximately 25% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session devoted to this competition, which will be organized at AAIA'16 conference (
AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines is over. We would like to thank all participants for their contribution!
We are happy to announce that the competition attracted a total of 203 teams from which 106 were active and submitted at least one solution to the leaderboard. A total number of submissions was 3135 which will give us a great material for research. Thanks!
The official Winners:
- Michal Tadeusiak (team tadeusz) from Deepsense.io, Poland
- Robert Bogucki, Jan Lasek, Jan Kanty Milczek, Michał Tadeusiak (team deepsense.io) from Deepsense.io, Poland
- Yasser Tabandeh (team yata) from Golgohar Mining & Industrial Company, Iran
In the beginning of next week we will start sending invitations extend the report into a short paper submitted to AAIA’16 conference. Those submissions will go through a special peer reviewing track and all accepted papers will be assigned to a special session devoted to the competition. All other teams are also welcome to submit extended descriptions of their approach to AAIA’16 (the submission system is available here: https://www.fedcsis.org/hotcrp/), however, their papers will undergo the regular reviewing process by members of event’s Program Committee.
All competition data were made available in the Data files folder (including the labels for the test set, as well as indexes of objects from the preliminary evaluation set). The data set is free to use for non-commercial purposes, however, if you decide to use it in your post-competition research, please add references to related papers describing the scope of the challenge and a link to Knowledge Pit platform. Below is an exemplary list of related papers (it will be extended):
- Janusz, A., Sikora, M., Wróbel, Ł., Stawicki, S., Grzegorowski, M., Wojtas, P., Ślęzak, D.: Mining Data from Coal Mines: IJCRS’15 Data Challenge. In: Proceedings of RSFDGrC 2015: 429-438, LNAI, vol. 9437. Springer (2015) (bibtex)
- Janusz, A., Sikora, M., Wróbel, Ł., Ślęzak, D.: Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In: Proceedings of FedCSIS 2016, IEEE (In print September 2016) (bibtex)
To access the competition data you need to be logged in. If you still haven't registered at Knowledge Pit, you may create an account using this link: https://knowledgepit.fedcsis.org/login/signup.php?
Acknowledgements: we would like to sincerely thank Dituel Sp. z o.o. (http://www.dituel.pl/) for refactoring and optimization of our platform during the final days of the challenge.
- October 5, 2015: start of the competition, data sets and description become available,
- February 27, 2016: deadline for submitting the predictions,
- 2016 March 4, 23:59 GMT (extended): deadline for sending the reports, end of the challenge,
- March, 2016: on-line publication of final results, sending invitations for submitting short papers for the special session at FedCSIS'16,
- April, 2016: deadline for submissions of papers describing the selected solutions,
- April, 2016: deadline for submissions of camera-ready papers selected for presentation at the FedCSIS'16.
Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes funded by our sponsors:
- First Prize: 1000 USD + one free FedCSIS'16 conference registration,
- Second Prize: 500 USD + one free FedCSIS'16 conference registration,
- Third Prize: one free FedCSIS'16 conference registration.
The award ceremony will take place during the FedCSIS'16 conference (Sep 11-14, 2016, Gdańsk, Poland).
Andrzej Janusz, University of Warsaw
Marek Sikora, Institute of Innovative Technologies EMAG
Łukasz Wróbel, Institute of Innovative Technologies EMAG
Sebastian Stawicki, University of Warsaw
Marek Grzegorowski, University of Warsaw
Sinh Hoa Nguyen, University of Warsaw
Dominik Ślęzak, University of Warsaw & Infobright Inc.
|Labels for the final test set||Jan||1||by Jan
Monday, March 07, 2016, 09:55:44
|Extended deadline for sending the reports||Andrzej||0||by Andrzej
Sunday, February 28, 2016, 22:35:16
|Submission sending interval||Andrzej||0||by Andrzej
Thursday, February 25, 2016, 21:15:45
|The last few days of the AAIA'16 Data Mining Challenge||Andrzej||0||by Andrzej
Monday, February 22, 2016, 10:51:23
|The last few days of AAIA'16 Data Mining Challenge||Andrzej||0||by Andrzej
Monday, February 22, 2016, 10:52:35
|Problem with uploading prediction file||Kamil||2||by Kamil
Friday, February 19, 2016, 09:20:04
|Scheduled maintenance of our service||Andrzej||0||by Andrzej
Thursday, February 18, 2016, 12:50:19
|Maintenance of our server||Andrzej||0||by Andrzej
Thursday, February 18, 2016, 12:48:19
|Availability of additional data||Andrzej||0||by Andrzej
Monday, February 15, 2016, 18:49:12
|Additional training data||Andrzej||6||by Andrzej
Monday, October 26, 2015, 18:56:09
|Allowed to find a teammate here?||Kele||3||by Kele
Saturday, January 23, 2016, 11:41:10
|Freqency of warning signals||Jan||6||by Jan
Tuesday, January 19, 2016, 12:28:26
|The last of additional training data sets||Andrzej||0||by Andrzej
Monday, January 18, 2016, 19:06:59
|Coal mine structure and additional data||Dymitr||2||by Dymitr
Thursday, January 07, 2016, 21:02:36
|The third subset of additional data||Andrzej||0||by Andrzej
Monday, December 21, 2015, 14:31:10
|The third subset of additional data||Andrzej||0||by Andrzej
Monday, December 21, 2015, 14:11:31
|The second subset of additional training data||Andrzej||0||by Andrzej
Tuesday, November 24, 2015, 14:15:20
|Submissions and leaderboard sites are down||Eftim||17||by Adam
Friday, October 16, 2015, 16:59:52
|Sample Submission||Kele||1||by Kele
Friday, November 06, 2015, 22:48:03
|Availability of reference papers||Himanshu||4||by Andrzej
Thursday, October 29, 2015, 16:11:22
|Additional training data||Andrzej||0||by Andrzej
Monday, October 26, 2015, 18:58:08
|Maintenance of Knowledge Pit’s server||Andrzej||0||by Andrzej
Monday, October 26, 2015, 12:10:21
|Maintenance of Knowledge Pit’s server||Andrzej||0||by Andrzej
Monday, October 26, 2015, 12:07:28
|Some explanation of data fields||Jan||4||by Jan
Thursday, October 22, 2015, 10:55:29
|Submission problem||Bartosz||7||by Andrzej
Tuesday, October 20, 2015, 09:34:23
|Leaderboard is functional again||Andrzej||1||by Andrzej
Friday, October 16, 2015, 17:06:55
|Desired level of AUC||Jan||2||by Łukasz
Saturday, October 17, 2015, 22:14:00
|Leaderboard evaluation||Eftim||1||by Eftim
Saturday, October 10, 2015, 19:15:38
|Start of AAIA'16 Data Mining Challenge||Andrzej||0||by Andrzej
Sunday, October 04, 2015, 23:46:09