a combination of simple models by forward predictor selection for job recommendation

1

A COMBINATION OF SIMPLE MODELS BY FORWARD PREDICTOR SELECTION FOR JOB

RECOMMENDATION

Dávid Zibriczky, PhD (DaveXster)Budapest University of Technology and Economics,

Budapest, Hungary

2The Dataset – Data preparation

• Events (interactions, impressions)› Target format: (time,user_id,item_id,type,value)› Interactions Format OK› Impressions:

• Generating unique (time,user_id,item_id) triples• Value count of their occurrence• Time 12pm on Thursday of the week• Type 5

• Catalog (items, users)› Target format:(id,key1,key2,…,keyN)› Items and users Format OK› Unknown „0” values empty values› Inconsistency: Geo-location vs. country/region Metadata enhancement based on geo-location

3The Dataset – Basic statistics

Size of training set• 211M events, 2.8M users, 1.3M items• Effect: huge and very sparse matrix

Distribution• 95% of events are impressions• 72% of the users have impressions only• Item support for interactions is low (~9)• Effect: weak collaboration using

interactions

Target users• 150K users• 73% active, 16% inactive, 12% new• Effect: user cold start and warm-up

problem

Data source #events #users #items

Interactions 8,826,678 784,687 1,029,480

Impressions 201,872,093 2,755,167 846,814

All events 210,698,777 2,792,405 1,257,422

Catalog - 1,367,057 1,358,098

Catalog OR Events - 2,829,563 1,362,890

4Methods – Concept

Terminology

• Method: A technique of estimating the relevance of an item for a user (p-Value)• Predictor/model: An instance of a method with a specified parameter setting• Combination: Linear combination of prediction values for a user-item pairsApproach

1. Exploring the properties of the data set

2. Definition of „simple” methods with different functionality (time-decay is commonly used)*

3. Finding a set of relevant predictors and optimal combination of them

4. Top-N ranking of available event supported items with non-zero p-Values (~200K)

* Equations of the methods can be found in the paper

5Methods – Item-kNN

• Observation: Very sparse user-item matrix (0.005%), 211M events

• Goal: Next best items to click, estimating recommendations of Xing

• Method: Standard Item-based kNN with special fetures› Input-output event types› Controlling popularity factor› Similarity of the same item is 0› Efficient implementation

• Notation: IKNN(I,O)› I: input event type› O: output event type

• Comment: No improvement combining other CF algorithms (MF, FM, User-kNN)

6Methods – Recalling recommendations

• Chart: The distribution of impression events by the number of weeks on that the same item has already been shown

• Observation: 38% of recommendations are recurring items

• Goal: Reverse engineering, recalling recommendations

• Method:› Recommendation of already shown

items› Weighted by expected CTR

• Notation: RCTR

7Methods – Already seen items

• Chart: The probability of returning to an already seen item after interacting on other items

• Observation: Significant probability of re-clicking on an already clicked item

• Goal: Capturing re-clicking phenomena

• Method: Recommendation of already clicked items

• Notation: AS(I)

8Methods – User metadata-based popularity

• Observation:› Significant amount of passive and new users› All target users have metadata

• Goal:› Semi-personalized recommendations for new users› Improving accuracy on inactive users

• Method:1. Item model: Expected popularity of an item in each user group2. Prediction: Average popularity of an item for a user› Applied keys: jobroles, edu_fieldofstudies

• Notation: UPOP

9Methods – MS: Meta cosine similarity

• Observation:› Item-cold start problem, many low-supported items› Almost all items has metadata

• Goal:› Model building for new items› Improving the model of low-supported items

• Method:1. Item model: Meta-data representation, tf-idf2. User model: Meta-words of items seen by the user3. Prediction: Average cosine similarity between user-item models› Keys: tags, title, industry_id, geo_country, geo_region, discipline_id

• Notation: MS

10Methods – AP: Age-based popularity change

• Observation: Significant drop in popularity of items with ~30 and ~60 days

• Goal: Underscoring these items

• Method: Expected ratio of the popularity in the next week

• Notation: AP

11Methods – OM: The omit method

• Observation: Unwanted items in recommendation lists

• Goal: Omitting poorly modelled items of a predictor or combination

• Method:1. Sub-train-test split2. Retrain a new combination3. Generating top-N recommendations4. Measuring how the total evaluation would change by omitting items5. Omitting worst K items on the original combination

• Notation: OM

12Methods – Optimization

1. Time-based train-test split (test set: last week)

2. Coordinate gradient descent optimization of various methods candidate predictor set

3. Support-based distinct user groups (new users, inactive users, 10 equal sized group of active users)

4. Forward Predictor Selection

1. Initialization:1. Predictors that are selected from the candidate set for final combination selected predictor

set2. Selected predictor set is empty in the beginning

2. Loop:1. Calculate the accuracy of selected predictor set2. For all remained candidate predictor, calculate the gain in accuracy that would give the

predictor if it would be moved to the selected set3. Move the best one to the selected set and recalculate combination weights4. Repeat the loop until there is improvement or reamining candidate preditor

3. Return: the set of the predictors and corresponding weights5. Retrain selected predictors on the full data set

13

… let’s put it together and see how it performs!

14Evaluation – Forward Predictor Selection

• Best single model› Item-kNN trained on positive interactions› 2.5 min training time› 7 ms prediction time

# Predictor tTR(s)* tPR(ms)* Score Rank

1 IKNN(C,C) 148 7 450,046 24

* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory



• Sub-combinations› 4 models: 600K+ score (w/o item metadata)


1 IKNN(C,C) 148 7 450,046 24

2 +RCTR 208 15 548,338 9

3 +AS(1) 237 17 590,526 6

4 +UPOP 247 50 614,674 5



• Sub-combinations› 4 models: 600K+ score (w/o item metadata)› 5 models: 3rd place


1 IKNN(C,C) 148 7 450,046 24

2 +RCTR 208 15 548,338 9

3 +AS(1) 237 17 590,526 6

4 +UPOP 247 50 614,674 5

5 +MS 364 122 623,909 3



• Sub-combinations› 4 models: 600K+ score (w/o item metadata)› 5 models: 3rd place› 6 models: 95% of final score


1 IKNN(C,C) 148 7 450,046 24

2 +RCTR 208 15 548,338 9

3 +AS(1) 237 17 590,526 6

4 +UPOP 247 50 614,674 5

5 +MS 364 122 623,909 3

6 +IKNN(R,R) 1,150 168 635,278 3



• Sub-combinations› 4 models: 600K+ score (w/o item metadata)› 5 models: 3rd place› 6 models: 95% of final score› 10 models: 650K+ score (<30 mins. training

time)


1 IKNN(C,C) 148 7 450,046 24

2 +RCTR 208 15 548,338 9

3 +AS(1) 237 17 590,526 6

4 +UPOP 247 50 614,674 5

5 +MS 364 122 623,909 3

6 +IKNN(R,R) 1,150 168 635,278 3

7 +AS(3) 1,205 178 636,498 3

8 +IKNN(R,C) 1,557 197 643,145 3

9 +AS(4) 1,582 202 644,710 3

10 +AP 1,621 207 652,802 3



• Sub-combinations› 4 models: 600K+ score (w/o item metadata)› 5 models: 3rd place› 6 models: 95% of final score› 10 models: 650K+ score (<30 mins. training

time)

• Final combination› 3rd place› ~666K leaderboard score› 11 instances› user-support-based weighting› 3h+ training time, 200 ms prediction time


1 IKNN(C,C) 148 7 450,046 24

2 +RCTR 208 15 548,338 9

3 +AS(1) 237 17 590,526 6

4 +UPOP 247 50 614,674 5

5 +MS 364 122 623,909 3

6 +IKNN(R,R) 1,150 168 635,278 3

7 +AS(3) 1,205 178 636,498 3

8 +IKNN(R,C) 1,557 197 643,145 3

9 +AS(4) 1,582 202 644,710 3

10 +AP 1,621 207 652,802 3

SUPP_C(1-10) 1,639 194 661,359 3

11 +OM 11,790 199 665,592 3

* Java-based framework, 8-core 3.4 GHz CPU, 32 GB memory

20Evaluation – Timeline

Apr-2

5

May-

02

May-

09

May-

16

May-

23

May-

30

Jun-

06

Jun-

13

Jun-

20

Jun-

27

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

0

5

10

15

20

25

30

35

40

4539

1514141415121110

2 31 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3

115.

4

366.

941

8.7

0.0

438.

345

4.2

468.

448

1.9

513.

456

9.6

0.0

596.

560

0.2

0.0

603.

20.

00.

00.

00.

00.

061

0.0

0.0

611.

30.

00.

061

1.6

625.

262

7.2

0.0

0.0

627.

50.

062

8.9

0.0

0.0

0.0

0.0

633.

10.

00.

00.

00.

00.

00.

063

7.6

638.

163

9.7

640.

464

3.5

644.

70.

00.

00.

00.

00.

00.

065

2.8

0.0

0.0

653.

265

3.7

665.

6

Timeline

Date

Lead

erbo

ard

scor

e (t

hous

ands

)

Lead

erbo

ard

rank

Initial setup Model design and implementation Final sprint

21Lessons learnt

• Exploiting the specificity of the dataset• Using Item-kNN over factorization in a very sparse dataset• Paying attention to recurrence• Forward Predictor Selection is effective• Different optimization for different user groups• Underscoring/omitting weak items• Ranking 200K items is slow• Keep it simple and transparent!

22

Presenter

Contact

Thank you for your attention!

Dávid Zibriczky, PhD

[email protected]

a combination of simple models by forward predictor selection for job recommendation

Data & Analytics