Download - Search Ranking Across Heterogeneous Information Sources

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Search Ranking Across Heterogeneous Information Sources

Viet Ha-Thuc and Dhruv AryaSearch Quality - LinkedIn

1

Heterogeneous Information Access at SIGIR 2016

2

• 200+ countries and territories

• 2+ new members per second

3

● Dual Roles of Search○ Enable talent discover opportunity○ Help companies to search for the right talent

4

FLAGSHIP SEARCH

RECRUITER SEARCH

SALES NAVIGATOR

Unique Nature of LinkedIn Search

5

▪Heterogeneous sources–Different entity types: People, jobs, companies,

slideshares–Many use-cases: Hiring, sales, connecting, job

seeking, content discovery–Requires different features, training data and

objectives▪Scale

–400+MM members, 6+MM jobs, 18+MM slideshows▪Federation across the sources

Overview

6

Query

Federated SearchSpell CorrectionQuery Tagging

Intent Prediction

People Companies

Federated SearchPage Construction

Name Title Skill

Jobs

Overview

7

Query


Intent Prediction

People Companies


Name Title Skill

Jobs

Agenda

▪Introduction

▪Vertical Ranking–Job Search [KDD’16]–People Search by Skills [BigData’15, SIGIR’16]

▪Federation [CIKM’15]

▪Lessons 8

Challenges of Job Search

▪“Hidden” structures

▪Query only represents a small fraction of information need–“San Francisco”, “software engineer”, “java”

▪Job attractiveness varies on many aspects–“Hot” titles: “data scientist”–Top companies: Google, Facebook, etc. –Trending skills: machine learning, big data, etc.,–Location

9

Entity-Aware Matching

10

Expertise Homophily

▪“Classic” homophily in social networks–People tend to interact with similar ones

▪Expertise homophily in job search–Searcher tends to apply for jobs with similar expertise–Apply rate of job results with overlapping skills is 2x higher

▪Expertise:–Jobs: extract skills from job description–Searcher: explicit and implicit skills–Jaccard similarity

11

Entity-faceted CTRs

▪Job attractiveness–Historical CTRs for individual jobs

–Challenge: job lifetime is short -> unreliable estimation

▪Entity-faceted historical CTRs–CTRs of jobs with standardized tile “data scientist”–CTRs of jobs from company IBM –CTRs of jobs requiring trending skill: machine learning, big data, etc.

▪Advantages–Alleviate data sparseness by grouping jobs by facets–Resolve cold start problem

12

Other features

13

Labeling Strategy

▪ Job Applies, Views and Skips are considered

Uncertain (removed)

Skipped: label = 0

Good: label = 1Click

Applied Highest: label = 4

Learning to Rank

▪Listwise– Consider relevance is relative to every query– Allow optimizing quality metric directly

▪Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels

15

Experiment Results

16

▪Baseline–All of the existing features except entity-aware ones–Machine learned–Optimized for the same objective function

CTR Apply RateImprovement +11.3% +5.3%

Overview

17

Query


Intent Prediction

People Companies


Name Title Skill

Jobs

Introduction

▪Skills–Represent

professional expertise– 35K+ standardized skills– Members get endorsed on

skills▪Skill queries

–Contains skills and no personal name

18

Introduction▪Unique challenges to LinkedIn expertise Search

– Scale: 400M members x 35K standardized skills

– Sparsity of skills in profiles

– Personalization

19

…

ReputationInformation a decision maker uses to make a

judgment on an entity with a record (*)

20

(*) “Building web reputation systems”, Glass and Farmer, 2010

Skill Reputation Scores [Ha-Thuc et al. BigData’15]

21

▪Decision Maker: searcher

▪Record: Professional career

▪Skill reputation: member expertise on a skill

▪Judgment: Hire?

Estimating Skill Reputation

22

Endorse profile

browsemap

? .85 .45? ? .35

? .42 ?

? ? .05Mem

bers

Skills

P(expert| member, skill)

Supervised Learning algorithm


23

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

? ? .05Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

Each row is a representation of a member in latent space

Each column represents a skill in

latent space

Matrix Factorization


24

Endorse profile

browsemap

? .85 .45

? ? .35

? .42 ?

.02 ? ?Mem

bers

Skills0.5 1

0.7 0

0 0.6

0.1 0

0.2 0.3 0.5

0.5 0.7 0.2

Mem

bers

Skills

.6 .85 .45

.14 .21 .35

.3 .42 .12

.02 .03 .05Mem

bers

SkillsFill in unknown cells in

the original matrix

Features▪Reputation feature

▪Social Connection

▪Homophily– Geo– Industry

▪Textual Features

25

Experiments

CTR@10 # Messages per Search

Flagship +11% +20%

Premium +18% +37%

26

▪Query Tagging

▪Target Segment: skill and no-name▪ Baseline

– No skill reputation feature– Hand-tuned

Overview

27

Query


Intent Prediction

People Companies


Name Title Skill

Jobs

Personalized Federated Search

28

▪Why do we need this?

29

Personalized Federated Search - Motivation

Personalized Federated Search - Overall

30

Personalized Federated Model [Arya, Ha-Thuc et al. CIKM’15]

▪ Relevance scores from base rankers▪ Query intent: P(vertical| query)▪ Searcher intent

– Mine searcher profiles and past behavior to infer intent▪ Title recruiter -> recruiting intent▪ Search for jobs -> job seeking intent

– Machine-learned models predict member intents:▪ Job seeking▪ Recruiting▪ Content consuming

31

Calibrate Signals across Verticals

▪Verticals associate with different intents

32

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent



33

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent



34

People Result

Job Result

Group Result

Recruiting Intent

Job Seeking Intent

Content Consuming

Intent

Take-Aways▪Text match is still important but not enough

▪Advanced features based on semi-structured data

–People search: skill reputation scores–Job Search: expertise homophily

▪Personalized Learning-to-Rank is crucial

35

References

▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015▪“Personalized Federated Search at LinkedIn”, Arya, Ha-Thuc and Sinha, CIKM, 2015▪“Learning to Rank Personalized Search Results in Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc, Sinha, KDD, 2016

36

Download - Search Ranking Across Heterogeneous Information Sources

Top Related