4 years, 6 months ago

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data is our first competition organized within the frame of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). We would like to challenge participants with a task of devising effective algorithms for recognizing a gender of e-store clients. A data set for the competition was provided by FPT Group which is also the main sponsor of the awards.

Overview

FPT has always been a leading information and communication technology enterprise in Vietnam. By 2014, its revenue was about 1.65 billion US dollars, creating more than 22 thousand full-time jobs for the society. FPT has operations in 17 countries including Vietnam, Laos, Cambodia, America, Japan, Singapore, Germany, Myanmar, France, Malaysia, Australia, Thailand, United Kingdom, Philippines, Kuwait, Bangladesh and Indonesia. The main businesses are Software Development, System Integration, Information Technology Services, Distribution and Manufacturing of Information Technology products and Retails, Internet Services Providing and Data Center Services, Online News and Advertising, e-Commerce, Educational Services, Financing Services.

In e-Commerce, FPT runs several B2B2C (business-to-business-to-customer) services that provide online shopping sites and mobile applications for small and medium sellers. Transaction data, such as product browsing and purchasing activities, from buyer, and product portfolio, from seller, can be aggregated, to provide more efficient buying and selling experiences. For example, statistical machine learning techniques can be applied to predict the optimal organization and display of products that maximize the chance of bringing useful information to user, facilitate the online purchases. Perhaps, one of the vital insights, especially for fashion-related products, is the understanding of the relevancy of product to a gender of the user. In PAKDD'15 Data Mining Competition we would like to address this particular problem. More details regarding the task and a description of the competition data can be found in Task Description section.

In case of any questions please post on the forum or write us an email: son@mimuw.edu.pl

Terms & Conditions
 
 
  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons.
  • Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
  • One person may be incorporated in maximally 3 teams.
  • Each team needs to be composed of a different set of persons.
  • The total number of submissions for any single team is limited to 100 solutions.
  • A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by May 4, 2015. Only submissions made by teams that provided the reports will qualify for the final evaluation.
  • By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.

In case of questions related to the competition please contact us via email: webmaster@knowledgepit.fedcsis.org or through the competition forum.

Please logIn to the system!

Data format: The data for participants were divided into separate training and test sets - trainingData.csv and testData.csv, respectively. Each of these files contains 15,000 records which correspond to product viewing logs. A single log is composed of four columns, separated by commas. The first one is a session ID. The second and the third column correspond to a session start time and session end time, respectively. The last column contains a list of product IDs which were viewed during the session, (the order of viewing is preserved). Consecutive product IDs are separated by semicolons. There is also available trainingLabels.csv file which contains labels identifying true gender of users whose sessions are described in the training data set.

Since a distribution of unique product IDs in the data is very sparse, the IDs contain additional information regarding product category hierarchy. Each product ID can be decomposed into four different IDs which are separated by slashes. The IDs starting with ‘A’ letter are the most general categories and those starting with ‘D’ correspond to individual products. The IDs which start with ‘B’ and ‘C’ are associated with subcategories and sub-subcategories, respectively.

Format of submissions: The participants of the competition are asked to predict the gender of users from the test data and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 15,000 lines. The format of submitted files should follow the format of trainingLabels.csv. In the consecutive lines, this file should contain a single label which identifies the gender of a user who generated the corresponding session log in the test set.

Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session at the PAKDD'15 conference (http://www.pakdd2015.jvn.edu.vn/).

Since the distribution of labels in the data is not balanced, the assessment of solutions will be done using the balanced accuracy measure which is defined as an average accuracy within the decision classes. Namely, for a vector of predictions preds and a vector of true gender labels genders we define the balance accuracy as: \[ACC_{m}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = male|}{|j : genders_{j} = male|}\] \[ACC_{f}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = female|}{|j : genders_{j} = female|}\] \[BAC(preds, genders) = \left(ACC_{f}(preds, genders) + ACC_{m}(preds, genders)\right)/2\]

Rank Team Name Score Submission Date
1
ngocan211
0.890432 2015-05-1 10:26:59
2
frdc
0.878928 2015-05-1 04:33:07
3
newolfy
0.877889 2015-05-1 00:43:48
4
ws
0.851119 2015-05-2 01:16:04
5
sohrab
0.851032 2015-05-1 19:10:39
6
kimiyoung
0.849046 2015-05-1 20:36:30
7
dymitrruta
0.847979 2015-04-21 22:59:55
8
ibayer
0.840673 2015-05-1 22:16:34
9
songshuangyong
0.830603 2015-05-1 11:55:40
10
stderr
0.811541 2015-04-30 03:41:36
11
amy
0.810673 2015-04-28 14:03:01
12
gambi
0.810181 2015-05-1 15:15:00
13
siyu
0.805283 2015-04-30 14:28:01
14
ahwangyuwei
0.801450 2015-04-23 11:54:52
15
kkurach
0.798821 2015-05-1 12:33:04
16
duongtranduc
0.797838 2015-05-1 19:52:29
17
hnt
0.797127 2015-04-30 05:25:46
18
vinhn
0.795656 2015-03-26 12:55:35
19
zhu_ark
0.791138 2015-04-28 04:17:46
20
mrw19138
0.790483 2015-04-27 17:28:49
21
xiexingbo
0.783441 2015-04-29 10:20:22
22
pdoviet
0.783204 2015-03-28 19:11:48
23
jankralj
0.780010 2015-04-29 14:13:48
24
zhijin
0.772296 2015-03-29 19:42:54
25
ikttan
0.771237 2015-04-29 12:59:53
26
ds_tcy
0.770413 2015-04-3 10:53:59
27
thetuxedo
0.766886 2015-04-5 08:39:56
28
langochai
0.765252 2015-04-27 18:32:08
29
neozhangthe1
No report file found or report rejected. 2015-05-1 19:19:26
30
linegroup
No report file found or report rejected. 2015-04-18 09:00:11
31
antry
No report file found or report rejected. 2015-04-11 12:04:42
32
ziom
No report file found or report rejected. 2015-04-27 21:17:31
33
zm
No report file found or report rejected. 2015-04-30 10:22:29
34
rjo2909
No report file found or report rejected. 2015-04-30 17:53:39
35
khuongnd
No report file found or report rejected. 2015-03-27 17:16:25
36
thaidt
No report file found or report rejected. 2015-04-10 08:10:02
37
ggvspp
No report file found or report rejected. 2015-04-3 10:06:36
38
xspring
No report file found or report rejected. 2015-03-27 04:25:53
39
thaidang
No report file found or report rejected. 2015-04-10 06:53:55
40
tund
No report file found or report rejected. 2015-04-5 08:41:31
41
wyw
No report file found or report rejected. 2015-04-23 10:19:47
42
yanghaisong
No report file found or report rejected. 2015-04-14 12:45:47
43
mamy
No report file found or report rejected. 2015-04-16 07:16:10
44
orange
No report file found or report rejected. 2015-03-30 19:17:58
45
derivedbydata
No report file found or report rejected. 2015-04-11 19:44:43
46
dirichlet.process
No report file found or report rejected. 2015-04-9 15:26:38
47
nhatuan
No report file found or report rejected. 2015-04-18 15:55:58
48
yzan
No report file found or report rejected. 2015-04-28 15:10:39
49
ramesh.krnt
No report file found or report rejected. 2015-05-1 15:32:30
50
ntienvu
No report file found or report rejected. 2015-03-26 11:59:44
51
ttd
No report file found or report rejected. 2015-04-9 11:42:13
52
cwhuang
No report file found or report rejected. 2015-05-1 10:54:48
53
zhangzhongxia
No report file found or report rejected. 2015-04-18 08:13:11
54
hnguyen
No report file found or report rejected. 2015-03-26 07:06:42
55
wttool
No report file found or report rejected. 2015-03-25 17:14:17
56
lihang00
No report file found or report rejected. 2015-04-29 10:01:45
57
fancyspeed
No report file found or report rejected. 2015-04-4 16:34:51
58
dat_phuoc
No report file found or report rejected. 2015-05-1 17:26:00
59
huuphuc2609
No report file found or report rejected. 2015-03-29 15:39:30
60
0000
No report file found or report rejected. 2015-04-28 02:10:09
61
lab213
No report file found or report rejected. 2015-04-3 04:57:50
62
etw
No report file found or report rejected. 2015-04-27 09:16:36
63
sasnzy
No report file found or report rejected. 2015-03-30 06:25:37
64
ketanatnmims
No report file found or report rejected. 2015-04-2 13:19:26
65
pikachust8811
No report file found or report rejected. 2015-04-29 19:33:42
66
muye5
No report file found or report rejected. 2015-04-22 12:54:20
67
iran-amin
No report file found or report rejected. 2015-04-27 10:38:44
68
eric
No report file found or report rejected. 2015-03-29 09:15:03
69
huong2
No report file found or report rejected. 2015-04-30 06:33:58
70
abhgh
No report file found or report rejected. 2015-04-7 20:44:30
71
wst_casd
No report file found or report rejected. 2015-04-19 09:21:10
72
binh
No report file found or report rejected. 2015-04-27 19:25:47
73
p2trieu
No report file found or report rejected. 2015-04-10 04:56:44
74
pth1993
No report file found or report rejected. 2015-04-18 20:12:16
75
whatif
No report file found or report rejected. 2015-03-26 09:47:19
76
yin520liang
No report file found or report rejected. 2015-04-8 15:36:09
77
vutm
No report file found or report rejected. 2015-04-11 17:48:18
78
tchang
No report file found or report rejected. 2015-04-2 05:38:25
79
bati
No report file found or report rejected. 2015-04-20 09:23:37
80
nightfury
No report file found or report rejected. 2015-03-30 19:00:21
81
sslim
No report file found or report rejected. 2015-04-8 15:16:23
82
h3p
No report file found or report rejected. 2015-03-30 14:23:48
83
billguess
No report file found or report rejected. 2015-04-28 17:16:17
84
marcb
No report file found or report rejected. 2015-04-1 18:03:38
85
violet_zct
No report file found or report rejected. 2015-03-28 16:15:37
86
zhili
No report file found or report rejected. 2015-04-9 13:50:33
87
fajri91
No report file found or report rejected. 2015-05-2 01:26:47
88
jyclin
No report file found or report rejected. 2015-04-28 15:44:36
89
d10207305
No report file found or report rejected. 2015-04-27 14:09:52
90
vpodpecan
No report file found or report rejected. 2015-04-4 21:05:39
91
neurons
No report file found or report rejected. 2015-04-3 19:25:07
92
tuandinh
No report file found or report rejected. 2015-04-27 11:05:36
93
saj
No report file found or report rejected. 2015-04-19 10:48:16
94
qjt
No report file found or report rejected. 2015-05-1 20:52:14
95
xuxiaofeng
No report file found or report rejected. 2015-03-25 15:08:43
96
tidom
No report file found or report rejected. 2015-03-24 18:00:47
97
spiritoflinz
No report file found or report rejected. 2015-03-25 16:21:31
98
bruincui
No report file found or report rejected. 2015-04-17 03:45:15
99
mautoan11
No report file found or report rejected. 2015-04-22 06:03:25
100
hamylinh
No report file found or report rejected. 2015-04-27 21:04:32
101
jackson13
No report file found or report rejected. 2015-04-12 18:18:49
102
starryc
No report file found or report rejected. 2015-03-24 15:42:49
103
little_number
No report file found or report rejected. 2015-04-14 14:33:12
104
nampham
No report file found or report rejected. 2015-04-18 17:30:17
105
hoangphan
No report file found or report rejected. 2015-04-22 19:36:12
106
gentaiscool
No report file found or report rejected. 2015-04-25 15:59:48
107
helen
No report file found or report rejected. 2015-04-9 13:03:53
108
kuanhoong
No report file found or report rejected. 2015-04-4 09:07:49
109
pin
No report file found or report rejected. 2015-04-29 15:57:02
110
gmustafa
No report file found or report rejected. 2015-03-27 14:49:34
111
seeyouhere
No report file found or report rejected. 2015-04-2 12:09:37
112
mayankkejriwal
No report file found or report rejected. 2015-04-30 20:47:58
113
mrboring
No report file found or report rejected. 2015-03-30 00:34:24
114
lingdian618
No report file found or report rejected. 2015-04-1 16:58:16
115
strnam
No report file found or report rejected. 2015-04-30 06:19:19
116
zagorecki
No report file found or report rejected. 2015-03-27 01:04:07
117
rembern
No report file found or report rejected. 2015-04-21 10:02:00
118
sudarsun
No report file found or report rejected. 2015-03-26 19:35:36
119
zgbdsg
No report file found or report rejected. 2015-04-5 06:50:06
120
janezkranjc
No report file found or report rejected. 2015-03-24 16:04:38
121
sg.qq
No report file found or report rejected. 2015-04-8 05:08:19
122
exploit
No report file found or report rejected. 2015-03-25 14:18:39
123
blah
No report file found or report rejected. 2015-04-22 15:14:28
124
test
No report file found or report rejected. 2015-03-29 09:58:55
125
sdx0112
No report file found or report rejected. 2015-03-25 09:17:33
126
wilson891226
No report file found or report rejected. 2015-04-6 16:08:30
127
fengqi
No report file found or report rejected. 2015-04-29 11:24:51
128
deagle9413
No report file found or report rejected. 2015-03-24 11:35:05
129
chaitu516
No report file found or report rejected. 2015-04-4 00:01:04
130
khanhlh
No report file found or report rejected. 2015-04-16 17:43:23
131
wolfinlove
No report file found or report rejected. 2015-04-26 21:19:30
132
huongtt
No report file found or report rejected. 2015-04-27 09:41:33
133
f-ken1010
No report file found or report rejected. 2015-04-10 06:47:15
134
sathik
No report file found or report rejected. 2015-04-22 19:19:22
135
hnt1
No report file found or report rejected. 2015-04-26 19:46:35
136
nghiemduc
No report file found or report rejected. 2015-04-12 17:27:54
137
statgeek
No report file found or report rejected. 2015-04-15 20:37:32
138
franksnail
No report file found or report rejected. 2015-04-19 11:27:54
139
meerkat
No report file found or report rejected. 2015-03-23 16:12:43
140
oahcil
No report file found or report rejected. 2015-03-30 11:16:10
141
huwenp
No report file found or report rejected. 2015-04-17 13:16:04
142
linjie_zhu
No report file found or report rejected. 2015-04-22 11:02:53
143
abhijit
No report file found or report rejected. 2015-03-23 19:42:48
144
kloud
No report file found or report rejected. 2015-03-26 19:16:05
145
thnhu
No report file found or report rejected. 2015-04-15 05:17:54
146
yupbank
No report file found or report rejected. 2015-03-25 16:53:37
147
ssssqd
No report file found or report rejected. 2015-05-1 10:25:55
148
customs
No report file found or report rejected. 2015-04-11 17:56:10
149
cllab
No report file found or report rejected. 2015-04-2 14:12:27
150
baseline_solution
No report file found or report rejected. 2015-03-23 07:02:47
151
sink
No report file found or report rejected. 2015-03-26 19:11:19
152
mathimohanraj
No report file found or report rejected. 2015-04-9 15:27:30
153
clapika2010
No report file found or report rejected. 2015-04-23 02:02:59
154
pg7799
No report file found or report rejected. 2015-04-30 20:46:19

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data has ended after over a month of intense rivalry!

The competition attracted 330 teams from which 149 participated actively by submitting at least one solution to the leaderboard. A total number of submissions was nearly 3000. From the active teams, 28 provided us a brief report describing their approach.

Awards:

  1. The winner: Team FRDC
    Members: Ruiyu Fang (the leader), Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen, 
    Affiliation: Fujitsu Research and Development Center, Beijing, China
  2. The runner up: Team newolfy,
    Members: Yingju Xia (the leader), Shuangyong Song, Qingliang Miao, Zhongquang Zheng
    Affiliation: Fujitsu Research and Development Center, Beijing, China

Invitations to oral presentation at PAKDD-2015 Contest Workshop:

  • Team FRDC, Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen (PAKDD'15 presentation)
  • Team newolfy: Yingju Xia, Shuangyong Song, Qingliang Miao, Zhongquang Zheng (PAKDD'15 presentation)
  • Team ngocan211: Pham Ngoc An, FPT University, Hanoi, Viet Nam (PAKDD'15 presentation)
  • Team ws: Wojciech Świeboda, University of Warsaw, Poland (PAKDD'15 presentation)
  • Team sohrab: Mohammad Golam Sohrab, Makoto Miwa, Yutaka Sasaki, Toyota Technological Institute, Japan (PAKDD'15 presentation)
  • Team kimiyoung: Zhilin Yang, Yutao Zhang, Jie Tang, Tsinghua University, Beijing, China
  • Team ibayer: Immanuel Bayer, University of Konstanz, Germany (PAKDD'15 presentation)
  • Team gambi: Maria Brbic, Dragan Gamberger, Matej Mihelcic, Matija Piskorec, Tomislav Smuc, Rudjer Boskovic Institute, Zagreb, Croatia (PAKDD'15 presentation)

I would like to thank all participants for their hard work. Congratulations on your excellent results!

In order to stimulate future research in the topic of the competition we revealed all competition data (including the labels for test cases) in the data files folder.

  • March 23, 2015: start of the competition, data sets become available,
  • May 1, 2015: deadline for submitting the predictions,
  • May 3, 2015: deadline for sending the reports, end of the challenge,
  • May 19, 2015: beginning of the PAKDD'15 conference, official announcement of the winners.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes sponsored by FPT Group:

  • First Prize: Apple Mac Book Air + one free PAKDD'15 conference registration,
  • Second Prize: new FPT smartphone (to be defined) + one free PAKDD'15 conference registration.

The award ceremony will take place during the PAKDD'15 conference (May 19 - 22, Ho Chi Minh City, Vietnam).

Hung Son Nguyen, University of Warsaw

Tran The Trung, FPT University

Tu Bao Ho, Japan Advanced Institute of Science and Technology

Duc Dung Nguyen, Vietnam Academy of Science and Technology

Andrzej Janusz, University of Warsaw

  Discussion Author Replies Last post
whether top10 reports are available sometime? muye5 2 by Andrzej
Thursday, May 21, 2015, 11:42:59
Complete test data: Gender Prediction Based on E-commerce Data Ramesh 1 by Andrzej
Thursday, May 21, 2015, 11:52:19
Postpone the deadline Sajjad 1 by Sajjad
Saturday, May 02, 2015, 02:53:18
The last week of PAKDD'15 Data Mining Competition Andrzej 0 by Andrzej
Monday, April 27, 2015, 15:06:02
Training data VS Test data Amin 1 by Amin
Sunday, April 26, 2015, 03:05:08
New competition at Knowledge Pit: IJCRS’15 Data Challenge Andrzej 0 by Andrzej
Monday, April 13, 2015, 13:22:57
A problem with KnowledgePit server Andrzej 0 by Andrzej
Sunday, April 12, 2015, 16:58:27
Final submission Tu 2 by Andrzej
Wednesday, April 01, 2015, 11:17:09
small inconsistency in test data Vid 2 by Andrzej
Monday, March 30, 2015, 12:50:10
Conference participation Eftim 1 by Andrzej
Monday, March 30, 2015, 16:03:01
AAIA'15 Data Mining Competition: Activity Recognition Based on Body Sensor Networks Andrzej 0 by Andrzej
Friday, March 27, 2015, 18:18:35
Multiple Participants in single team? Abhay 2 by Abhay
Monday, March 30, 2015, 21:36:20