gdmc v11 presentation
Post on 24-Jan-2018
1.098 Views
Preview:
TRANSCRIPT
2017 IEEE CIG
Game Data Mining Competition (GDMC)(https://cilab.sejong.ac.kr/gdmc2017/
1
KyungJoong Kim, Dumim Yoon and Jihoon Jeon
(Cognition & Intelligence Lab, Sejong University)
Sung-il Yang and SangKwang Lee
(Electronics and Telecommunications Research Institute)
EunJo Lee and Yoonjae Jang
(NCSOFT)
Game Data Mining
• Understanding game players’ behaviors from data
• Especially, predict players’ churn/retention or purchase behaviors from game log data
• Few public datasets available to researchers and it limits the growth of the field
2
Game Data Mining Competition
• Access to the big game log data (about 100G) from
commercially successful MMORPG game, Blade & Soul
by NCSOFT, one of the biggest game companies in South
Korea
• Predict the game players’ churn (binary classification problem)
and survival time (regression problem) from the massive
game log data
3
4http://www.bladeandsoul.com/en/
Competition Tracks
Track 1: Churn Prediction
In this track, participants will predict players’ churn or retention on the test datasets. The winner will be determined based on the average F1-Measure.
Track 2: Survival Analysis
In this track, participants will predict the survival time (the number of days) of game players on the test datasets. The winner will be determined based on the average Root Mean Squared Logarithmic Error (RMSLE).
5
GDMC 2017 Homepage
• Important Dates
• Problem Description
• Tutorial (with R)
• Data Description
• Rules
6
https://cilab.sejong.ac.kr/gdmc2017/
GDMC 2017 Google Groupshttps://groups.google.com/d/forum/gdmc2017
• Announcement
• Sample Log
• Log Schema
• Log Data Download• Training Data
• Test Data without Label
• Question/Answer
7
0
76
106
206
255 264
0
50
100
150
200
250
300
March April May June July August
# o
f M
em
bers
Test Serverhttp://web_cilab.sejong.ac.kr/gdmcServer/
8
• Test your predictions before the deadline
• 10% of test data used for this test server (not used in final rankings)
• For security reason, limit maximum 48 trials per day (30 minutes waiting time from the last submission)
Problems Description
9
Prediction Targets
10
Expense
Loya
lty
Light Usersor
Malicious Users(Bots)
Prediction Targets
Predictions about 3 Weeks from Now
11
Churn/Retention
TimeThree WeeksTwo Months
User Data
Churn/Retention
• Long-term inactive stateas a Churn
• How many weeks for churn decision? • Five Weeks
• Retention: Logged in the game more than once during the five weeks
12
Concept Drift
(Dec 2016~)
13
Subscription Model (Monthly Fixed Charge Payment) Free-to-Play
Data Description
14
Data Set Time Period WeeksNumber of
GamersData Size*
Training APR-1-2017 ~ MAY-11-2017 64000
(30% churn)
48G(175m Events)
Test Set 1 JULY-27-2016 ~ SEP-21-2016 83000
(30% churn)30G
Test Set 2 DEC-14-2017 ~ FEB-08-2017 83000
(30% churn)30G
* Uncompressed Size
Log Data Sample
15
Time Event Type Details (up to 72 columns)
2016-05-04 6:38:32 PM Enter World Login Type, Actor Data …
2016-05-04 6:39:16 PM Enter Zone Enter Zone Reason, Zone Type …
2016-05-04 6:39:36 PM Lose Item Item Type, Item Count, …
2016-05-04 6:39:36 PM Get Item Item Type, Item Count, …
2016-05-04 6:39:40 PM Get Item Item Type, Item Count, …
⋮ ⋮ ⋮
82 Event Types(World, Zone, Item, Party, Quest, Guild)
Competition ResultsTrack 1 Churn Prediction
16
Participants (13 Teams)
17
Team name Team member Affiliation Type County
GoAlone 1 Yonsei University Academia South Korea
DTND 3 DTND ? South Korea
goedle.io 2 goedle.io GmbH Industry Germany
IISLABSKKU 3 Sungkyunkwan University Academia South Korea
leessang 2 Yonsei University Academia South Korea
TheCowKing 2 KAIST Academia South Korea
TripleS 3 - ? South Korea
UTU 4 University of Turku Academia Finland
YD 6 Silicon Studio Industry Japan
YK 1 Yonsei University Academia South Korea
suya 1 Yonsei University Academia South Korea
NoJam 3 Yonsei University Academia South Korea
MNDS 3 Yonsei University Academia South Korea
18
Rank Team Test1 score Test2 score Total score
1 YD (Japan) 0.61008 0.63326 0.62145
2 UTU (Finland) 0.60326 0.60370 0.60348
3 TripleS (Korea) 0.57968 0.62459 0.60130
4 TheCowKing 0.59370 0.60718 0.60036
5 goedleio 0.57717 0.60095 0.58882
6 MNDS 0.55920 0.56205 0.56062
7 DTND 0.49937 0.58776 0.53997
8 IISLABSKKU 0.56643 0.48733 0.52391
9 suya 0.44460 0.40967 0.42642
10 YK 0.49099 0.33181 0.39600
11 GoAlone 0.42697 0.31019 0.35933
12 NoJam 0.30741 0.30930 0.30835
13 Lessang 0.29760 0.29202 0.29479
YD (Winner)
• Silicon Studio, Japan
• Team Members: Paul Bertens, Pei Pei Chen, Kexin Chen, AnnaGuitart, Sovann Lay, Africa Perianez
• Find features which have similar distribution between trainingset and testing set.
• Test 1 : LSTM + DNN (implemented with Keras)
• Test 2 : Extra Tree Classifier (# of trees = 50)
19
20
LSTM+DNN
from the document of YD team
21
Rank Team Techniques
1 YD LSTM+DNN, Extra-Trees Classifier
2 UTU Logistic Regression
3 TripleS Random Forest
4 TheCowKingLightGBM
(Light Gradient Boosting Machine)
5 goedleio Feed Forward Neural Network
6 MNDS Deep Neural Network
7 DTND Generalized Linear Model
8 IISLABSKKU Tree Boosting
9 suya Deep Neural Network
10 YK Logistic Regression
11 GoAlone Logistic Regression
12 NoJam Decision Tree
13 Lessang Deep Neural Network
Neural Net
Tree Approach
LinearModels
Competition ResultsTrack 2 Survival Analysis
22
Participants (5 Teams)
23
Team name Team member Affiliation County
DTND 3 DTND South Korea
IISLABSKKU 3 Sungkyunkwan University South Korea
TripleS 3 - South Korea
UTU 4 University of Turku Finland
YD 6 Silicon Studio Japan
24
Rank Team Test1 score Test2 score Total score
1 YD (Japan) 0.883248 0.616499 0.726151
2 IISLABSKKU (Korea) 1.034321 0.679214 0.819972
3 UTU (Finland) 0.927712 0.898471 0.912857
4 TripleS 0.958308 0.891106 0.923486
5 DTND 1.032688 0.930417 0.978888
25
Rank Team Techniques
1 YDEnsemble of Conditional Inference Trees
(# of Trees = 900)
2 IISLABSKKU Tree Boosting
3 UTU Linear Regression
4 TripleS Ensemble Tree Method
5 DTND Generalized Linear Model
Neural Net
Tree Approach
Linear Models
Future Data Use
• Data Download Deadline• Active until end of August, we’re under discussion to extend the
deadline
• Data Use for Academic Research • No restriction on the data use for academic research (please include
acknowledgement on this competition and NCSOFT)
• Test Data Label • We’ll open the test data label soon.
26
Q & A
27
top related