IML2023project1

1 year, 2 months ago

First semester project for the Interactive Machine Learning 2022/2023 course at MIM UW

This is the first semester project for the Interactive Machine Learning 2022/2023 course at MIM UW. The task is to choose optimal batches of queries for training a model for predicting frags in a video game Tactical Troops: Anthracite Shift.

Overview

The goal of this competition is to choose two subsets of samples from the data pool such that they bring the largest improvement to a prediction model.

Competition rules are given in Terms and Conditions.

The description of the task, data, and evaluation metric is in the Task description section.

The deadline for submitting solutions is April 30, 2023

Terms & Conditions

Participants of the challenge are obliged to follow the competition rules:

This challenge is organized by Andrzej Janusz and Daniel Kałuża (the Organizers) for students enrolled in the Interactive Machine Learning 2022/2023 course at the Faculty of Mathematics, Informatics, and Mechanics at the University of Warsaw.
The provided data sets are the property of the Organizers and the KnowledgePit platform. It is forbidden to share or redistribute provided data sets to any third party without explicit consent from the Organizers.
Each team in the competition may consist of only one person. Working in larger groups or sharing solutions with other teams is strictly forbidden.
Each team has a limited number of submissions - the limit is set to 100.
The number of submissions per day is limited to 10.
Participants can only use data made available in the challenge - using any external resources is forbidden. Queries regarding external resources need to be issued through the competition forum.
It is strictly forbidden to hack the provided data or to exploit any unfair data leak that can improve the solution score. All attempts at making predictions for any test instance using information extracted from other test instances will result in disqualification.
The deadline for submitting the solutions is April 30, 2023 (23:59 GMT). Late submissions will not be accepted.
Each team is obliged to provide a short report describing their final solution. The report must contain information such as the name of the team, the names of all team members, the source code of the final solution, and a brief overview of the used approach. It should be submitted in the KnowledgePit submission system by April 30, 2023 (23:59 GMT).
By enrolling in this competition, you grant the Organizers the right to process your submissions and reports for the purpose of evaluation and post-competition research.
The final project score will depend on the quality of the solution (the score obtained in the final evaluation), and on the quality of the submitted report and code.

Enroll

Please log in to the system!

Final results

Rank	Team Name	Is Report	Preliminary Score	Final Score	Submissions
1	mgrotkowski	True	3.6270	3.601400	35
2	Mateusz Błajda	True	3.6182	3.594500	17
3	jandziuba	True	3.6377	3.593700	25
4	lastmanstanding	True	3.6287	3.590700	44
5	MJ	True	3.6444	3.590000	45
6	Krzysztof Jankowski	True	3.6209	3.588400	20
7	Karol	True	3.6174	3.581500	28
8	baseline	True	3.6040	3.580000	7
9	basiekjusz	True	3.5994	3.575700	10
10	ggruza	True	3.6100	3.573100	25
11	kuba	True	3.6109	3.570600	10

Task description

The task in this project is to choose two subsets of samples from the data pool such that they bring the largest improvement to a prediction model. The size of subsets should be 50, 200, respectively.

The initial batch of samples and the data pool are given as csv files. Keep in mind that the rows are indexed from 1 (not 0).

Each row in the data corresponds to a preprocessed representation of a game state. The task is to predict whether in this game turn the moving player will score a frag.

Format of submissions: solutions should be submitted as text files with two lines. The first line should contain exactly 50 integers - the indices of samples from the data pool (samples are indexed starting from 1), separated by commas. The second line should contain analogous indices for the second set of samples, with 200 integers.

Evaluation: the evaluation of submitted solutions will be done using an XGBoost model, trained independently on the three sets of samples added to the initial data which was made batch available to participants. Each model will be evaluated on a separate test set (hidden from participants). The quality metric used for the evaluation will be the average AUC. Results for the two sample sets will be averaged with weights 4 and 1 for the subset size 50 and 200, respectively.

During the challenge, your solutions will be evaluated on a fraction of the test set, and your best preliminary score will be displayed on the public Leaderboard. After the end of the competition, the selected solutions will be evaluated on the remaining part of the test data and this result will be used for the evaluation of the project.

Data files

In order to download competition files you need to be enrolled.

Forum

This forum is for all users to discuss matters related to the competition. Good manners apply!

	Discussion	Author	Replies	Last post
	Indexing start	Mateusz	0	by Mateusz Friday, May 05, 2023, 21:24:49