9 years, 6 months ago

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data is our first competition organized within the frame of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). We would like to challenge participants with a task of devising effective algorithms for recognizing a gender of e-store clients. A data set for the competition was provided by FPT Group which is also the main sponsor of the awards.

Overview

FPT has always been a leading information and communication technology enterprise in Vietnam. By 2014, its revenue was about 1.65 billion US dollars, creating more than 22 thousand full-time jobs for the society. FPT has operations in 17 countries including Vietnam, Laos, Cambodia, America, Japan, Singapore, Germany, Myanmar, France, Malaysia, Australia, Thailand, United Kingdom, Philippines, Kuwait, Bangladesh and Indonesia. The main businesses are Software Development, System Integration, Information Technology Services, Distribution and Manufacturing of Information Technology products and Retails, Internet Services Providing and Data Center Services, Online News and Advertising, e-Commerce, Educational Services, Financing Services.

In e-Commerce, FPT runs several B2B2C (business-to-business-to-customer) services that provide online shopping sites and mobile applications for small and medium sellers. Transaction data, such as product browsing and purchasing activities, from buyer, and product portfolio, from seller, can be aggregated, to provide more efficient buying and selling experiences. For example, statistical machine learning techniques can be applied to predict the optimal organization and display of products that maximize the chance of bringing useful information to user, facilitate the online purchases. Perhaps, one of the vital insights, especially for fashion-related products, is the understanding of the relevancy of product to a gender of the user. In PAKDD'15 Data Mining Competition we would like to address this particular problem. More details regarding the task and a description of the competition data can be found in Task Description section.

In case of any questions please post on the forum or write us an email: son@mimuw.edu.pl

Terms & Conditions
 
 

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data has ended after over a month of intense rivalry!

The competition attracted 330 teams from which 149 participated actively by submitting at least one solution to the leaderboard. A total number of submissions was nearly 3000. From the active teams, 28 provided us a brief report describing their approach.

Awards:

  1. The winner: Team FRDC
    Members: Ruiyu Fang (the leader), Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen, 
    Affiliation: Fujitsu Research and Development Center, Beijing, China
  2. The runner up: Team newolfy,
    Members: Yingju Xia (the leader), Shuangyong Song, Qingliang Miao, Zhongquang Zheng
    Affiliation: Fujitsu Research and Development Center, Beijing, China

Invitations to oral presentation at PAKDD-2015 Contest Workshop:

  • Team FRDC, Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen (PAKDD'15 presentation)
  • Team newolfy: Yingju Xia, Shuangyong Song, Qingliang Miao, Zhongquang Zheng (PAKDD'15 presentation)
  • Team ngocan211: Pham Ngoc An, FPT University, Hanoi, Viet Nam (PAKDD'15 presentation)
  • Team ws: Wojciech Świeboda, University of Warsaw, Poland (PAKDD'15 presentation)
  • Team sohrab: Mohammad Golam Sohrab, Makoto Miwa, Yutaka Sasaki, Toyota Technological Institute, Japan (PAKDD'15 presentation)
  • Team kimiyoung: Zhilin Yang, Yutao Zhang, Jie Tang, Tsinghua University, Beijing, China
  • Team ibayer: Immanuel Bayer, University of Konstanz, Germany (PAKDD'15 presentation)
  • Team gambi: Maria Brbic, Dragan Gamberger, Matej Mihelcic, Matija Piskorec, Tomislav Smuc, Rudjer Boskovic Institute, Zagreb, Croatia (PAKDD'15 presentation)

I would like to thank all participants for their hard work. Congratulations on your excellent results!

In order to stimulate future research in the topic of the competition we revealed all competition data (including the labels for test cases) in the data files folder.

  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons.
  • Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
  • One person may be incorporated in maximally 3 teams.
  • Each team needs to be composed of a different set of persons.
  • The total number of submissions for any single team is limited to 100 solutions.
  • A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by May 4, 2015. Only submissions made by teams that provided the reports will qualify for the final evaluation.
  • By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.

In case of questions related to the competition please contact us via email: webmaster@knowledgepit.fedcsis.org or through the competition forum.

Please log in to the system!

Data format: The data for participants were divided into separate training and test sets - trainingData.csv and testData.csv, respectively. Each of these files contains 15,000 records which correspond to product viewing logs. A single log is composed of four columns, separated by commas. The first one is a session ID. The second and the third column correspond to a session start time and session end time, respectively. The last column contains a list of product IDs which were viewed during the session, (the order of viewing is preserved). Consecutive product IDs are separated by semicolons. There is also available trainingLabels.csv file which contains labels identifying true gender of users whose sessions are described in the training data set.

Since a distribution of unique product IDs in the data is very sparse, the IDs contain additional information regarding product category hierarchy. Each product ID can be decomposed into four different IDs which are separated by slashes. The IDs starting with ‘A’ letter are the most general categories and those starting with ‘D’ correspond to individual products. The IDs which start with ‘B’ and ‘C’ are associated with subcategories and sub-subcategories, respectively.

Format of submissions: The participants of the competition are asked to predict the gender of users from the test data and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 15,000 lines. The format of submitted files should follow the format of trainingLabels.csv. In the consecutive lines, this file should contain a single label which identifies the gender of a user who generated the corresponding session log in the test set.

Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session at the PAKDD'15 conference (http://www.pakdd2015.jvn.edu.vn/).

Since the distribution of labels in the data is not balanced, the assessment of solutions will be done using the balanced accuracy measure which is defined as an average accuracy within the decision classes. Namely, for a vector of predictions preds and a vector of true gender labels genders we define the balance accuracy as: \[ACC_{m}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = male|}{|j : genders_{j} = male|}\] \[ACC_{f}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = female|}{|j : genders_{j} = female|}\] \[BAC(preds, genders) = \left(ACC_{f}(preds, genders) + ACC_{m}(preds, genders)\right)/2\]

Rank Team Name Is Report   Preliminary Score Final Score Submissions
1
ngocan211
True True 0.8878 0.890432 2
2
frdc
True True 0.8875 0.878928 2
3
newolfy
True True 0.8874 0.877889 2
4
ws
True True 0.8614 0.851119 2
5
sohrab
True True 0.8546 0.851032 2
6
kimiyoung
True True 0.8508 0.849046 2
7
dymitrruta
True True 0.8474 0.847979 2
8
ibayer
True True 0.8415 0.840673 2
9
songshuangyong
True True 0.8534 0.830603 2
10
stderr
True True 0.8170 0.811541 2
11
amy
True True 0.8183 0.810673 2
12
gambi
True True 0.8235 0.810181 2
13
siyu
True True 0.8217 0.805283 2
14
ahwangyuwei
True True 0.8111 0.801450 2
15
kkurach
True True 0.8053 0.798821 2
16
duongtranduc
True True 0.8100 0.797838 2
17
hnt
True True 0.7949 0.797127 2
18
vinhn
True True 0.7975 0.795656 2
19
zhu_ark
True True 0.7948 0.791138 2
20
mrw19138
True True 0.7947 0.790483 2
21
xiexingbo
True True 0.7894 0.783441 2
22
pdoviet
True True 0.7894 0.783204 2
23
jankralj
True True 0.7849 0.780010 2
24
zhijin
True True 0.7787 0.772296 2
25
ikttan
True True 0.7824 0.771237 2
26
ds_tcy
True True 0.7688 0.770413 2
27
thetuxedo
True True 0.7764 0.766886 2
28
langochai
True True 0.7624 0.765252 2
29
neozhangthe1
False True 0.8530 No report file found or report rejected. 2
30
linegroup
False True 0.8358 No report file found or report rejected. 2
31
zm
False True 0.8216 No report file found or report rejected. 2
32
antry
False True 0.8163 No report file found or report rejected. 2
33
mamy
False True 0.8137 No report file found or report rejected. 2
34
tund
False True 0.8136 No report file found or report rejected. 2
35
ggvspp
False True 0.8135 No report file found or report rejected. 2
36
khuongnd
False True 0.8130 No report file found or report rejected. 2
37
rjo2909
False True 0.8115 No report file found or report rejected. 2
38
wyw
False True 0.8114 No report file found or report rejected. 2
39
yanghaisong
False True 0.8104 No report file found or report rejected. 2
40
thaidang
False True 0.8091 No report file found or report rejected. 2
41
thaidt
False True 0.8085 No report file found or report rejected. 2
42
ttd
False True 0.8084 No report file found or report rejected. 2
43
orange
False True 0.8046 No report file found or report rejected. 2
44
ziom
False True 0.8040 No report file found or report rejected. 2
45
xspring
False True 0.8066 No report file found or report rejected. 2
46
dirichlet.process
False True 0.8020 No report file found or report rejected. 2
47
nhatuan
False True 0.8008 No report file found or report rejected. 2
48
ramesh.krnt
False True 0.7995 No report file found or report rejected. 2
49
ntienvu
False True 0.8013 No report file found or report rejected. 2
50
cwhuang
False True 0.7980 No report file found or report rejected. 2
51
fancyspeed
False True 0.7966 No report file found or report rejected. 2
52
zhangzhongxia
False True 0.7992 No report file found or report rejected. 2
53
derivedbydata
False True 0.7959 No report file found or report rejected. 2
54
hnguyen
False True 0.7957 No report file found or report rejected. 2
55
dat_phuoc
False True 0.7952 No report file found or report rejected. 2
56
lihang00
False True 0.7933 No report file found or report rejected. 2
57
0000
False True 0.7930 No report file found or report rejected. 2
58
yzan
False True 0.7920 No report file found or report rejected. 2
59
wttool
False True 0.7919 No report file found or report rejected. 2
60
sasnzy
False True 0.7901 No report file found or report rejected. 2
61
eric
False True 0.7887 No report file found or report rejected. 2
62
lab213
False True 0.7868 No report file found or report rejected. 2
63
zhili
False True 0.7858 No report file found or report rejected. 2
64
huuphuc2609
False True 0.7856 No report file found or report rejected. 2
65
muye5
False True 0.7837 No report file found or report rejected. 2
66
whatif
False True 0.7879 No report file found or report rejected. 2
67
pth1993
False True 0.7816 No report file found or report rejected. 2
68
sslim
False True 0.7813 No report file found or report rejected. 2
69
iran-amin
False True 0.7801 No report file found or report rejected. 2
70
pikachust8811
False True 0.7801 No report file found or report rejected. 2
71
binh
False True 0.7794 No report file found or report rejected. 2
72
huong2
False True 0.7790 No report file found or report rejected. 2
73
saj
False True 0.7790 No report file found or report rejected. 2
74
etw
False True 0.7788 No report file found or report rejected. 2
75
bati
False True 0.7787 No report file found or report rejected. 2
76
ketanatnmims
False True 0.7824 No report file found or report rejected. 2
77
p2trieu
False True 0.7778 No report file found or report rejected. 2
78
yin520liang
False True 0.7773 No report file found or report rejected. 2
79
tchang
False True 0.7773 No report file found or report rejected. 2
80
baseline_solution
False True 0.7755 No report file found or report rejected. 2
81
neurons
False True 0.7751 No report file found or report rejected. 2
82
vutm
False True 0.7743 No report file found or report rejected. 2
83
spiritoflinz
False True 0.8094 No report file found or report rejected. 2
84
qjt
False True 0.7741 No report file found or report rejected. 2
85
abhgh
False True 0.7732 No report file found or report rejected. 2
86
wst_casd
False True 0.7725 No report file found or report rejected. 2
87
h3p
False True 0.7757 No report file found or report rejected. 2
88
billguess
False True 0.7951 No report file found or report rejected. 2
89
xuxiaofeng
False True 0.7710 No report file found or report rejected. 2
90
violet_zct
False True 0.7765 No report file found or report rejected. 2
91
marcb
False True 0.7693 No report file found or report rejected. 2
92
nightfury
False True 0.7853 No report file found or report rejected. 2
93
d10207305
False True 0.7688 No report file found or report rejected. 2
94
fajri91
False True 0.7683 No report file found or report rejected. 2
95
mautoan11
False True 0.7710 No report file found or report rejected. 2
96
jyclin
False True 0.7672 No report file found or report rejected. 2
97
tidom
False True 0.7657 No report file found or report rejected. 2
98
little_number
False True 0.7707 No report file found or report rejected. 2
99
vpodpecan
False True 0.7644 No report file found or report rejected. 2
100
tuandinh
False True 0.8049 No report file found or report rejected. 2
101
gentaiscool
False True 0.7774 No report file found or report rejected. 2
102
bruincui
False True 0.7629 No report file found or report rejected. 2
103
starryc
False True 0.7603 No report file found or report rejected. 2
104
jackson13
False True 0.7602 No report file found or report rejected. 2
105
hamylinh
False True 0.7591 No report file found or report rejected. 2
106
nampham
False True 0.7573 No report file found or report rejected. 2
107
hoangphan
False True 0.7653 No report file found or report rejected. 2
108
helen
False True 0.7662 No report file found or report rejected. 2
109
kuanhoong
False True 0.7524 No report file found or report rejected. 2
110
seeyouhere
False True 0.7487 No report file found or report rejected. 2
111
pin
False True 0.7455 No report file found or report rejected. 2
112
gmustafa
False True 0.7545 No report file found or report rejected. 2
113
mayankkejriwal
False True 0.7370 No report file found or report rejected. 2
114
mrboring
False True 0.7365 No report file found or report rejected. 2
115
lingdian618
False True 0.7328 No report file found or report rejected. 2
116
rembern
False True 0.7614 No report file found or report rejected. 2
117
strnam
False True 0.7313 No report file found or report rejected. 2
118
zagorecki
False True 0.7844 No report file found or report rejected. 2
119
sudarsun
False True 0.7175 No report file found or report rejected. 2
120
zgbdsg
False True 0.7846 No report file found or report rejected. 2
121
sg.qq
False True 0.7090 No report file found or report rejected. 2
122
sdx0112
False True 0.7089 No report file found or report rejected. 2
123
test
False True 0.7044 No report file found or report rejected. 2
124
blah
False True 0.7716 No report file found or report rejected. 2
125
wilson891226
False True 0.7897 No report file found or report rejected. 2
126
exploit
False True 0.6976 No report file found or report rejected. 2
127
janezkranjc
False True 0.6948 No report file found or report rejected. 2
128
chaitu516
False True 0.6868 No report file found or report rejected. 2
129
fengqi
False True 0.7112 No report file found or report rejected. 2
130
deagle9413
False True 0.6796 No report file found or report rejected. 2
131
khanhlh
False True 0.6660 No report file found or report rejected. 2
132
sathik
False True 0.6620 No report file found or report rejected. 2
133
wolfinlove
False True 0.6615 No report file found or report rejected. 2
134
huongtt
False True 0.6691 No report file found or report rejected. 2
135
f-ken1010
False True 0.6555 No report file found or report rejected. 2
136
hnt1
False True 0.6796 No report file found or report rejected. 2
137
nghiemduc
False True 0.7085 No report file found or report rejected. 2
138
statgeek
False True 0.7500 No report file found or report rejected. 2
139
franksnail
False True 0.6159 No report file found or report rejected. 2
140
meerkat
False True 0.5641 No report file found or report rejected. 2
141
oahcil
False True 0.5633 No report file found or report rejected. 2
142
huwenp
False True 0.5577 No report file found or report rejected. 2
143
linjie_zhu
False True 0.5565 No report file found or report rejected. 2
144
abhijit
False True 0.5473 No report file found or report rejected. 2
145
ssssqd
False True 0.5085 No report file found or report rejected. 2
146
kloud
False True 0.5058 No report file found or report rejected. 2
147
yupbank
False True 0.5006 No report file found or report rejected. 2
148
thnhu
False True 0.7996 No report file found or report rejected. 2
149
customs
False True 0.6881 No report file found or report rejected. 2
150
cllab
False True 0.7930 No report file found or report rejected. 2
151
sink
False True 0.5969 No report file found or report rejected. 2
152
mathimohanraj
False True 0.0000 No report file found or report rejected. 2
153
clapika2010
False True 0.7870 No report file found or report rejected. 2
154
pg7799
False True 0.7766 No report file found or report rejected. 2
  • March 23, 2015: start of the competition, data sets become available,
  • May 1, 2015: deadline for submitting the predictions,
  • May 3, 2015: deadline for sending the reports, end of the challenge,
  • May 19, 2015: beginning of the PAKDD'15 conference, official announcement of the winners.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes sponsored by FPT Group:

  • First Prize: Apple Mac Book Air + one free PAKDD'15 conference registration,
  • Second Prize: new FPT smartphone (to be defined) + one free PAKDD'15 conference registration.

The award ceremony will take place during the PAKDD'15 conference (May 19 - 22, Ho Chi Minh City, Vietnam).

Hung Son Nguyen, University of Warsaw

Tran The Trung, FPT University

Tu Bao Ho, Japan Advanced Institute of Science and Technology

Duc Dung Nguyen, Vietnam Academy of Science and Technology

Andrzej Janusz, University of Warsaw

  Discussion Author Replies Last post
whether top10 reports are available sometime? muye5 2 by Andrzej
Thursday, May 21, 2015, 09:42:59
Complete test data: Gender Prediction Based on E-commerce Data Ramesh 1 by Andrzej
Thursday, May 21, 2015, 09:52:19
Postpone the deadline Sajjad 1 by Sajjad
Saturday, May 02, 2015, 00:53:18
The last week of PAKDD'15 Data Mining Competition Andrzej 0 by Andrzej
Monday, April 27, 2015, 13:06:02
Training data VS Test data Amin 1 by Amin
Sunday, April 26, 2015, 01:05:08
New competition at Knowledge Pit: IJCRS’15 Data Challenge Andrzej 0 by Andrzej
Monday, April 13, 2015, 11:22:57
A problem with KnowledgePit server Andrzej 0 by Andrzej
Sunday, April 12, 2015, 14:58:27
Final submission Tu 2 by Andrzej
Wednesday, April 01, 2015, 09:17:09
small inconsistency in test data Vid 2 by Andrzej
Monday, March 30, 2015, 10:50:10
Conference participation Eftim 1 by Andrzej
Monday, March 30, 2015, 14:03:01
AAIA'15 Data Mining Competition: Activity Recognition Based on Body Sensor Networks Andrzej 0 by Andrzej
Friday, March 27, 2015, 17:18:35
Multiple Participants in single team? Abhay 2 by Abhay
Monday, March 30, 2015, 19:36:20