8 years, 11 months ago

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data is our first competition organized within the frame of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). We would like to challenge participants with a task of devising effective algorithms for recognizing a gender of e-store clients. A data set for the competition was provided by FPT Group which is also the main sponsor of the awards.

Overview

FPT has always been a leading information and communication technology enterprise in Vietnam. By 2014, its revenue was about 1.65 billion US dollars, creating more than 22 thousand full-time jobs for the society. FPT has operations in 17 countries including Vietnam, Laos, Cambodia, America, Japan, Singapore, Germany, Myanmar, France, Malaysia, Australia, Thailand, United Kingdom, Philippines, Kuwait, Bangladesh and Indonesia. The main businesses are Software Development, System Integration, Information Technology Services, Distribution and Manufacturing of Information Technology products and Retails, Internet Services Providing and Data Center Services, Online News and Advertising, e-Commerce, Educational Services, Financing Services.

In e-Commerce, FPT runs several B2B2C (business-to-business-to-customer) services that provide online shopping sites and mobile applications for small and medium sellers. Transaction data, such as product browsing and purchasing activities, from buyer, and product portfolio, from seller, can be aggregated, to provide more efficient buying and selling experiences. For example, statistical machine learning techniques can be applied to predict the optimal organization and display of products that maximize the chance of bringing useful information to user, facilitate the online purchases. Perhaps, one of the vital insights, especially for fashion-related products, is the understanding of the relevancy of product to a gender of the user. In PAKDD'15 Data Mining Competition we would like to address this particular problem. More details regarding the task and a description of the competition data can be found in Task Description section.

In case of any questions please post on the forum or write us an email: son@mimuw.edu.pl

Terms & Conditions
 
 

PAKDD'15 Data Mining Competition: Gender Prediction Based on E-commerce Data has ended after over a month of intense rivalry!

The competition attracted 330 teams from which 149 participated actively by submitting at least one solution to the leaderboard. A total number of submissions was nearly 3000. From the active teams, 28 provided us a brief report describing their approach.

Awards:

  1. The winner: Team FRDC
    Members: Ruiyu Fang (the leader), Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen, 
    Affiliation: Fujitsu Research and Development Center, Beijing, China
  2. The runner up: Team newolfy,
    Members: Yingju Xia (the leader), Shuangyong Song, Qingliang Miao, Zhongquang Zheng
    Affiliation: Fujitsu Research and Development Center, Beijing, China

Invitations to oral presentation at PAKDD-2015 Contest Workshop:

  • Team FRDC, Ruiyu Fang, Qingliang Miao, Cuiqin Hou, Yao Meng, Lu Fang, Dajun Chen (PAKDD'15 presentation)
  • Team newolfy: Yingju Xia, Shuangyong Song, Qingliang Miao, Zhongquang Zheng (PAKDD'15 presentation)
  • Team ngocan211: Pham Ngoc An, FPT University, Hanoi, Viet Nam (PAKDD'15 presentation)
  • Team ws: Wojciech Świeboda, University of Warsaw, Poland (PAKDD'15 presentation)
  • Team sohrab: Mohammad Golam Sohrab, Makoto Miwa, Yutaka Sasaki, Toyota Technological Institute, Japan (PAKDD'15 presentation)
  • Team kimiyoung: Zhilin Yang, Yutao Zhang, Jie Tang, Tsinghua University, Beijing, China
  • Team ibayer: Immanuel Bayer, University of Konstanz, Germany (PAKDD'15 presentation)
  • Team gambi: Maria Brbic, Dragan Gamberger, Matej Mihelcic, Matija Piskorec, Tomislav Smuc, Rudjer Boskovic Institute, Zagreb, Croatia (PAKDD'15 presentation)

I would like to thank all participants for their hard work. Congratulations on your excellent results!

In order to stimulate future research in the topic of the competition we revealed all competition data (including the labels for test cases) in the data files folder.

  • The competition is open for all interested researchers, specialists and students. Only members of the Contest Organizing Committee cannot participate.
  • Participants may submit solutions as teams made up of one or more persons.
  • Each team needs to designate a leader responsible for communication with the Organizers. A single person can be a leader of only one team.
  • One person may be incorporated in maximally 3 teams.
  • Each team needs to be composed of a different set of persons.
  • The total number of submissions for any single team is limited to 100 solutions.
  • A winner of the competition is chosen on the basis of the final evaluation results. In a case of draws in the evaluation scores, time of the submission will be taken into account.
  • Each team is obliged to provide a short report describing their final solution. Reports must contain information such as the name of a team, names of all team members, the last preliminary evaluation score and a brief overview of the used approach. Their length should not exceed 2000 words and they should be submitted in the pdf format using our submission system by May 4, 2015. Only submissions made by teams that provided the reports will qualify for the final evaluation.
  • By enrolling to this competition you grant the organizers rights to process your submissions for the purpose of evaluation and post-competition research.

In case of questions related to the competition please contact us via email: webmaster@knowledgepit.fedcsis.org or through the competition forum.

Please log in to the system!

Data format: The data for participants were divided into separate training and test sets - trainingData.csv and testData.csv, respectively. Each of these files contains 15,000 records which correspond to product viewing logs. A single log is composed of four columns, separated by commas. The first one is a session ID. The second and the third column correspond to a session start time and session end time, respectively. The last column contains a list of product IDs which were viewed during the session, (the order of viewing is preserved). Consecutive product IDs are separated by semicolons. There is also available trainingLabels.csv file which contains labels identifying true gender of users whose sessions are described in the training data set.

Since a distribution of unique product IDs in the data is very sparse, the IDs contain additional information regarding product category hierarchy. Each product ID can be decomposed into four different IDs which are separated by slashes. The IDs starting with ‘A’ letter are the most general categories and those starting with ‘D’ correspond to individual products. The IDs which start with ‘B’ and ‘C’ are associated with subcategories and sub-subcategories, respectively.

Format of submissions: The participants of the competition are asked to predict the gender of users from the test data and send us their solutions using the submission system. Each solution should be sent in a single text file containing exactly 15,000 lines. The format of submitted files should follow the format of trainingLabels.csv. In the consecutive lines, this file should contain a single label which identifies the gender of a user who generated the corresponding session log in the test set.

Evaluation of results: The submitted solutions will be evaluated on-line and the preliminary results will be published on the competition leaderboard. It will correspond to approximately 20% of the test data. The final evaluation will be performed after completion of the competition using the remaining part of the test data. Those results will also be published on-line. It is important to note that only teams which submit a short report describing their approach before the end of the contest will qualify for the final evaluation. The winning teams will be officially announced during a special session at the PAKDD'15 conference (http://www.pakdd2015.jvn.edu.vn/).

Since the distribution of labels in the data is not balanced, the assessment of solutions will be done using the balanced accuracy measure which is defined as an average accuracy within the decision classes. Namely, for a vector of predictions preds and a vector of true gender labels genders we define the balance accuracy as: \[ACC_{m}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = male|}{|j : genders_{j} = male|}\] \[ACC_{f}(preds, genders) = \frac{|j : preds_{j} = genders_{j} = female|}{|j : genders_{j} = female|}\] \[BAC(preds, genders) = \left(ACC_{f}(preds, genders) + ACC_{m}(preds, genders)\right)/2\]

Rank Team Name Is Report Preliminary Score Final Score Submissions
1
ngocan211
True 0.8878 0.890432 2
2
frdc
True 0.8875 0.878928 2
3
newolfy
True 0.8874 0.877889 2
4
ws
True 0.8614 0.851119 2
5
sohrab
True 0.8546 0.851032 2
6
kimiyoung
True 0.8508 0.849046 2
7
dymitrruta
True 0.8474 0.847979 2
8
ibayer
True 0.8415 0.840673 2
9
songshuangyong
True 0.8534 0.830603 2
10
stderr
True 0.8170 0.811541 2
11
amy
True 0.8183 0.810673 2
12
gambi
True 0.8235 0.810181 2
13
siyu
True 0.8217 0.805283 2
14
ahwangyuwei
True 0.8111 0.801450 2
15
kkurach
True 0.8053 0.798821 2
16
duongtranduc
True 0.8100 0.797838 2
17
hnt
True 0.7949 0.797127 2
18
vinhn
True 0.7975 0.795656 2
19
zhu_ark
True 0.7948 0.791138 2
20
mrw19138
True 0.7947 0.790483 2
21
xiexingbo
True 0.7894 0.783441 2
22
pdoviet
True 0.7894 0.783204 2
23
jankralj
True 0.7849 0.780010 2
24
zhijin
True 0.7787 0.772296 2
25
ikttan
True 0.7824 0.771237 2
26
ds_tcy
True 0.7688 0.770413 2
27
thetuxedo
True 0.7764 0.766886 2
28
langochai
True 0.7624 0.765252 2
29
neozhangthe1
False 0.8530 No report file found or report rejected. 2
30
linegroup
False 0.8358 No report file found or report rejected. 2
31
zm
False 0.8216 No report file found or report rejected. 2
32
antry
False 0.8163 No report file found or report rejected. 2
33
mamy
False 0.8137 No report file found or report rejected. 2
34
tund
False 0.8136 No report file found or report rejected. 2
35
ggvspp
False 0.8135 No report file found or report rejected. 2
36
khuongnd
False 0.8130 No report file found or report rejected. 2
37
rjo2909
False 0.8115 No report file found or report rejected. 2
38
wyw
False 0.8114 No report file found or report rejected. 2
39
yanghaisong
False 0.8104 No report file found or report rejected. 2
40
thaidang
False 0.8091 No report file found or report rejected. 2
41
thaidt
False 0.8085 No report file found or report rejected. 2
42
ttd
False 0.8084 No report file found or report rejected. 2
43
orange
False 0.8046 No report file found or report rejected. 2
44
ziom
False 0.8040 No report file found or report rejected. 2
45
xspring
False 0.8066 No report file found or report rejected. 2
46
dirichlet.process
False 0.8020 No report file found or report rejected. 2
47
nhatuan
False 0.8008 No report file found or report rejected. 2
48
ramesh.krnt
False 0.7995 No report file found or report rejected. 2
49
ntienvu
False 0.8013 No report file found or report rejected. 2
50
cwhuang
False 0.7980 No report file found or report rejected. 2
51
fancyspeed
False 0.7966 No report file found or report rejected. 2
52
zhangzhongxia
False 0.7992 No report file found or report rejected. 2
53
derivedbydata
False 0.7959 No report file found or report rejected. 2
54
hnguyen
False 0.7957 No report file found or report rejected. 2
55
dat_phuoc
False 0.7952 No report file found or report rejected. 2
56
lihang00
False 0.7933 No report file found or report rejected. 2
57
0000
False 0.7930 No report file found or report rejected. 2
58
yzan
False 0.7920 No report file found or report rejected. 2
59
wttool
False 0.7919 No report file found or report rejected. 2
60
sasnzy
False 0.7901 No report file found or report rejected. 2
61
eric
False 0.7887 No report file found or report rejected. 2
62
lab213
False 0.7868 No report file found or report rejected. 2
63
zhili
False 0.7858 No report file found or report rejected. 2
64
huuphuc2609
False 0.7856 No report file found or report rejected. 2
65
muye5
False 0.7837 No report file found or report rejected. 2
66
whatif
False 0.7879 No report file found or report rejected. 2
67
pth1993
False 0.7816 No report file found or report rejected. 2
68
sslim
False 0.7813 No report file found or report rejected. 2
69
iran-amin
False 0.7801 No report file found or report rejected. 2
70
pikachust8811
False 0.7801 No report file found or report rejected. 2
71
binh
False 0.7794 No report file found or report rejected. 2
72
huong2
False 0.7790 No report file found or report rejected. 2
73
saj
False 0.7790 No report file found or report rejected. 2
74
etw
False 0.7788 No report file found or report rejected. 2
75
bati
False 0.7787 No report file found or report rejected. 2
76
ketanatnmims
False 0.7824 No report file found or report rejected. 2
77
p2trieu
False 0.7778 No report file found or report rejected. 2
78
yin520liang
False 0.7773 No report file found or report rejected. 2
79
tchang
False 0.7773 No report file found or report rejected. 2
80
baseline_solution
False 0.7755 No report file found or report rejected. 2
81
neurons
False 0.7751 No report file found or report rejected. 2
82
vutm
False 0.7743 No report file found or report rejected. 2
83
spiritoflinz
False 0.8094 No report file found or report rejected. 2
84
qjt
False 0.7741 No report file found or report rejected. 2
85
abhgh
False 0.7732 No report file found or report rejected. 2
86
wst_casd
False 0.7725 No report file found or report rejected. 2
87
h3p
False 0.7757 No report file found or report rejected. 2
88
billguess
False 0.7951 No report file found or report rejected. 2
89
xuxiaofeng
False 0.7710 No report file found or report rejected. 2
90
violet_zct
False 0.7765 No report file found or report rejected. 2
91
marcb
False 0.7693 No report file found or report rejected. 2
92
nightfury
False 0.7853 No report file found or report rejected. 2
93
d10207305
False 0.7688 No report file found or report rejected. 2
94
fajri91
False 0.7683 No report file found or report rejected. 2
95
mautoan11
False 0.7710 No report file found or report rejected. 2
96
jyclin
False 0.7672 No report file found or report rejected. 2
97
tidom
False 0.7657 No report file found or report rejected. 2
98
little_number
False 0.7707 No report file found or report rejected. 2
99
vpodpecan
False 0.7644 No report file found or report rejected. 2
100
tuandinh
False 0.8049 No report file found or report rejected. 2
101
gentaiscool
False 0.7774 No report file found or report rejected. 2
102
bruincui
False 0.7629 No report file found or report rejected. 2
103
starryc
False 0.7603 No report file found or report rejected. 2
104
jackson13
False 0.7602 No report file found or report rejected. 2
105
hamylinh
False 0.7591 No report file found or report rejected. 2
106
nampham
False 0.7573 No report file found or report rejected. 2
107
hoangphan
False 0.7653 No report file found or report rejected. 2
108
helen
False 0.7662 No report file found or report rejected. 2
109
kuanhoong
False 0.7524 No report file found or report rejected. 2
110
seeyouhere
False 0.7487 No report file found or report rejected. 2
111
pin
False 0.7455 No report file found or report rejected. 2
112
gmustafa
False 0.7545 No report file found or report rejected. 2
113
mayankkejriwal
False 0.7370 No report file found or report rejected. 2
114
mrboring
False 0.7365 No report file found or report rejected. 2
115
lingdian618
False 0.7328 No report file found or report rejected. 2
116
rembern
False 0.7614 No report file found or report rejected. 2
117
strnam
False 0.7313 No report file found or report rejected. 2
118
zagorecki
False 0.7844 No report file found or report rejected. 2
119
sudarsun
False 0.7175 No report file found or report rejected. 2
120
zgbdsg
False 0.7846 No report file found or report rejected. 2
121
sg.qq
False 0.7090 No report file found or report rejected. 2
122
sdx0112
False 0.7089 No report file found or report rejected. 2
123
test
False 0.7044 No report file found or report rejected. 2
124
blah
False 0.7716 No report file found or report rejected. 2
125
wilson891226
False 0.7897 No report file found or report rejected. 2
126
exploit
False 0.6976 No report file found or report rejected. 2
127
janezkranjc
False 0.6948 No report file found or report rejected. 2
128
chaitu516
False 0.6868 No report file found or report rejected. 2
129
fengqi
False 0.7112 No report file found or report rejected. 2
130
deagle9413
False 0.6796 No report file found or report rejected. 2
131
khanhlh
False 0.6660 No report file found or report rejected. 2
132
sathik
False 0.6620 No report file found or report rejected. 2
133
wolfinlove
False 0.6615 No report file found or report rejected. 2
134
huongtt
False 0.6691 No report file found or report rejected. 2
135
f-ken1010
False 0.6555 No report file found or report rejected. 2
136
hnt1
False 0.6796 No report file found or report rejected. 2
137
nghiemduc
False 0.7085 No report file found or report rejected. 2
138
statgeek
False 0.7500 No report file found or report rejected. 2
139
franksnail
False 0.6159 No report file found or report rejected. 2
140
meerkat
False 0.5641 No report file found or report rejected. 2
141
oahcil
False 0.5633 No report file found or report rejected. 2
142
huwenp
False 0.5577 No report file found or report rejected. 2
143
linjie_zhu
False 0.5565 No report file found or report rejected. 2
144
abhijit
False 0.5473 No report file found or report rejected. 2
145
ssssqd
False 0.5085 No report file found or report rejected. 2
146
kloud
False 0.5058 No report file found or report rejected. 2
147
yupbank
False 0.5006 No report file found or report rejected. 2
148
thnhu
False 0.7996 No report file found or report rejected. 2
149
customs
False 0.6881 No report file found or report rejected. 2
150
cllab
False 0.7930 No report file found or report rejected. 2
151
sink
False 0.5969 No report file found or report rejected. 2
152
mathimohanraj
False 0.0000 No report file found or report rejected. 2
153
clapika2010
False 0.7870 No report file found or report rejected. 2
154
pg7799
False 0.7766 No report file found or report rejected. 2
  • March 23, 2015: start of the competition, data sets become available,
  • May 1, 2015: deadline for submitting the predictions,
  • May 3, 2015: deadline for sending the reports, end of the challenge,
  • May 19, 2015: beginning of the PAKDD'15 conference, official announcement of the winners.

Authors of the top ranked solutions (based on the final evaluation scores) will be awarded with prizes sponsored by FPT Group:

  • First Prize: Apple Mac Book Air + one free PAKDD'15 conference registration,
  • Second Prize: new FPT smartphone (to be defined) + one free PAKDD'15 conference registration.

The award ceremony will take place during the PAKDD'15 conference (May 19 - 22, Ho Chi Minh City, Vietnam).

Hung Son Nguyen, University of Warsaw

Tran The Trung, FPT University

Tu Bao Ho, Japan Advanced Institute of Science and Technology

Duc Dung Nguyen, Vietnam Academy of Science and Technology

Andrzej Janusz, University of Warsaw

  Discussion Author Replies Last post
whether top10 reports are available sometime? muye5 2 by Andrzej
Thursday, May 21, 2015, 11:42:59
Complete test data: Gender Prediction Based on E-commerce Data Ramesh 1 by Andrzej
Thursday, May 21, 2015, 11:52:19
Postpone the deadline Sajjad 1 by Sajjad
Saturday, May 02, 2015, 02:53:18
The last week of PAKDD'15 Data Mining Competition Andrzej 0 by Andrzej
Monday, April 27, 2015, 15:06:02
Training data VS Test data Amin 1 by Amin
Sunday, April 26, 2015, 03:05:08
New competition at Knowledge Pit: IJCRS’15 Data Challenge Andrzej 0 by Andrzej
Monday, April 13, 2015, 13:22:57
A problem with KnowledgePit server Andrzej 0 by Andrzej
Sunday, April 12, 2015, 16:58:27
Final submission Tu 2 by Andrzej
Wednesday, April 01, 2015, 11:17:09
small inconsistency in test data Vid 2 by Andrzej
Monday, March 30, 2015, 12:50:10
Conference participation Eftim 1 by Andrzej
Monday, March 30, 2015, 16:03:01
AAIA'15 Data Mining Competition: Activity Recognition Based on Body Sensor Networks Andrzej 0 by Andrzej
Friday, March 27, 2015, 18:18:35
Multiple Participants in single team? Abhay 2 by Abhay
Monday, March 30, 2015, 21:36:20