1 year, 2 months ago

Second semester project for Decision Systems Course 2022/2023

This is the second project for students enrolled in the Decision System 2022/2023 course at the Faculty of Mathematics, Informatics, and Mechanics at the University of Warsaw.

Overview

The goal of this competition is to find informative subsets of genes that allow to efficiently solve classification problems defined for a number of microarray data sets.

More detailed competition rules are given in the Terms and Conditions.

The description of the data and evaluation metric is in the Task description section. 

The deadline for sending submissions and scores is January 27, 2023.

Terms & Conditions
 
 
Please log in to the system!

The provided data consist of ten microarray sets with a various number of instances and attributes. Microarray data is a typical example of a problem called "few-samples-many-attributes".

The data tables are provided as CSV files with the ',' (coma) separator sign. In each set, the last column is called "target" and contains class labels for samples. The data sets can be downloaded after registration to the competition. You only have access to the training parts of the data sets. Your task is to (for each set) identify the optimal subset of attributes for an SVM classifier with a linear kernel and the cost parameter set to 1. No additional regularization will be used for the model. For each data set, you may indicate between 2 and 102 attributes but a small penalty to your score will be added for each attribute used.

The evaluation metric will be balanced accuracy (BAC) adjusted by a penalty for using many attributes. In particular, your score is the average of BAC - (k-2)/5000, where k is the number of utilized attributes for a given data set.

SVM model used for the evaluation is computed with the code below:

model <- e1071::svm(dt_tr[, feats, with = FALSE], dt_tr[, factor(target)], type = "C-classification", kernel = "linear", cost = 1, gamma = 1/length(feats), scale = TRUE)

During the competition, your solutions will be evaluated on five of the data sets, and your best preliminary score will be displayed on the public Leaderboard. The final score of each team will be computed on the remaining data sets.

The submission format: the solutions need to be submitted as text files with indicated attribute sets. The file should have exactly 10 rows. In each row, it should contain integers between 1 and the number of columns in the corresponding data set. These integers should indicate attributes (column numbers) that should be used by the evaluation model. The ordering of rows should correspond to the numbers indicated in the names of the provided data sets.

The deadline for sending submissions and reports is January 27, 2023.

In order to download competition files you need to be enrolled.
Rank Team Name Is Report Preliminary Score Final Score Submissions
1
Drużyna bez fajnego skrótu
True 0.7063 0.670000 33
2
CakeTeam
True 0.7087 0.667300 10
3
team
True 0.6750 0.665400 14
4
baseline
True 0.6743 0.658300 2
5
Team name
True 0.7297 0.654600 44
6
NWJ
True 0.6692 0.649400 14
7
na pewno nie rynek w Zabrzu
True 0.7144 0.642900 32
8
Kulka błota
True 0.6919 0.638700 4
9
Łukasz
True 0.6843 0.638200 10
10
cotopaxi
True 0.6694 0.636800 7
11
OS
True 0.7220 0.635300 26
12
ff
True 0.6940 0.634200 28
13
T
True 0.6568 0.632800 23
14
quozz
True 0.7193 0.632000 20
15
419328
True 0.5928 0.614300 11
16
Kl
True 0.6822 0.611500 30
17
drużyna1
True 0.6500 0.603900 17
18
teamteam
True 0.7011 0.574300 20
19
spinach
False 0.5228 No report file found or report rejected. 7
This forum is for all users to discuss matters related to the competition. Good manners apply!
  Discussion Author Replies Last post
Kolejność kolumn w rowiązaniu 1 by Andrzej
Thursday, January 05, 2023, 09:34:24
format zgłoszenia Tomasz 1 by Andrzej
Sunday, January 01, 2023, 18:41:34