Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search Ranking Across Heterogeneous Information Sources
Viet Ha-Thuc and Dhruv AryaSearch Quality - LinkedIn
1
Heterogeneous Information Access at SIGIR 2016
2
• 200+ countries and territories
• 2+ new members per second
3
● Dual Roles of Search○ Enable talent discover opportunity○ Help companies to search for the right talent
4
FLAGSHIP SEARCH
RECRUITER SEARCH
SALES NAVIGATOR
Unique Nature of LinkedIn Search
5
▪Heterogeneous sources–Different entity types: People, jobs, companies,
slideshares–Many use-cases: Hiring, sales, connecting, job
seeking, content discovery–Requires different features, training data and
objectives▪Scale
–400+MM members, 6+MM jobs, 18+MM slideshows▪Federation across the sources
Overview
6
Query
Federated SearchSpell CorrectionQuery Tagging
Intent Prediction
People Companies
Federated SearchPage Construction
Name Title Skill
Jobs
Overview
7
Query
Federated SearchSpell CorrectionQuery Tagging
Intent Prediction
People Companies
Federated SearchPage Construction
Name Title Skill
Jobs
Agenda
▪Introduction
▪Vertical Ranking–Job Search [KDD’16]–People Search by Skills [BigData’15, SIGIR’16]
▪Federation [CIKM’15]
▪Lessons 8
Challenges of Job Search
▪“Hidden” structures
▪Query only represents a small fraction of information need–“San Francisco”, “software engineer”, “java”
▪Job attractiveness varies on many aspects–“Hot” titles: “data scientist”–Top companies: Google, Facebook, etc. –Trending skills: machine learning, big data, etc.,–Location
9
Entity-Aware Matching
10
Expertise Homophily
▪“Classic” homophily in social networks–People tend to interact with similar ones
▪Expertise homophily in job search–Searcher tends to apply for jobs with similar expertise–Apply rate of job results with overlapping skills is 2x higher
▪Expertise:–Jobs: extract skills from job description–Searcher: explicit and implicit skills–Jaccard similarity
11
Entity-faceted CTRs
▪Job attractiveness–Historical CTRs for individual jobs
–Challenge: job lifetime is short -> unreliable estimation
▪Entity-faceted historical CTRs–CTRs of jobs with standardized tile “data scientist”–CTRs of jobs from company IBM –CTRs of jobs requiring trending skill: machine learning, big data, etc.
▪Advantages–Alleviate data sparseness by grouping jobs by facets–Resolve cold start problem
12
Other features
13
Labeling Strategy
▪ Job Applies, Views and Skips are considered
Uncertain (removed)
Skipped: label = 0
Good: label = 1Click
Applied Highest: label = 4
Learning to Rank
▪Listwise– Consider relevance is relative to every query– Allow optimizing quality metric directly
▪Objective function– Normalized Discounted Cumulative Gain (NDCG@K)– Graded relevance labels
15
Experiment Results
16
▪Baseline–All of the existing features except entity-aware ones–Machine learned–Optimized for the same objective function
CTR Apply RateImprovement +11.3% +5.3%
Overview
17
Query
Federated SearchSpell CorrectionQuery Tagging
Intent Prediction
People Companies
Federated SearchPage Construction
Name Title Skill
Jobs
Introduction
▪Skills–Represent
professional expertise– 35K+ standardized skills– Members get endorsed on
skills▪Skill queries
–Contains skills and no personal name
18
Introduction▪Unique challenges to LinkedIn expertise Search
– Scale: 400M members x 35K standardized skills
– Sparsity of skills in profiles
– Personalization
19
…
ReputationInformation a decision maker uses to make a
judgment on an entity with a record (*)
20
(*) “Building web reputation systems”, Glass and Farmer, 2010
Skill Reputation Scores [Ha-Thuc et al. BigData’15]
21
▪Decision Maker: searcher
▪Record: Professional career
▪Skill reputation: member expertise on a skill
▪Judgment: Hire?
Estimating Skill Reputation
22
Endorse profile
browsemap
? .85 .45? ? .35
? .42 ?
? ? .05Mem
bers
Skills
P(expert| member, skill)
Supervised Learning algorithm
Estimating Skill Reputation
23
Endorse profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05Mem
bers
Skills0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Mem
bers
Skills
Each row is a representation of a member in latent space
Each column represents a skill in
latent space
Matrix Factorization
Estimating Skill Reputation
24
Endorse profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?Mem
bers
Skills0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Mem
bers
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05Mem
bers
SkillsFill in unknown cells in
the original matrix
Features▪Reputation feature
▪Social Connection
▪Homophily– Geo– Industry
▪Textual Features
25
Experiments
CTR@10 # Messages per Search
Flagship +11% +20%
Premium +18% +37%
26
▪Query Tagging
▪Target Segment: skill and no-name▪ Baseline
– No skill reputation feature– Hand-tuned
Overview
27
Query
Federated SearchSpell CorrectionQuery Tagging
Intent Prediction
People Companies
Federated SearchPage Construction
Name Title Skill
Jobs
Personalized Federated Search
28
▪Why do we need this?
29
Personalized Federated Search - Motivation
Personalized Federated Search - Overall
30
Personalized Federated Model [Arya, Ha-Thuc et al. CIKM’15]
▪ Relevance scores from base rankers▪ Query intent: P(vertical| query)▪ Searcher intent
– Mine searcher profiles and past behavior to infer intent▪ Title recruiter -> recruiting intent▪ Search for jobs -> job seeking intent
– Machine-learned models predict member intents:▪ Job seeking▪ Recruiting▪ Content consuming
31
Calibrate Signals across Verticals
▪Verticals associate with different intents
32
People Result
Job Result
Group Result
Recruiting Intent
Job Seeking Intent
Content Consuming
Intent
Calibrate Signals across Verticals
▪Verticals associate with different intents
33
People Result
Job Result
Group Result
Recruiting Intent
Job Seeking Intent
Content Consuming
Intent
Calibrate Signals across Verticals
▪Verticals associate with different intents
34
People Result
Job Result
Group Result
Recruiting Intent
Job Seeking Intent
Content Consuming
Intent
Take-Aways▪Text match is still important but not enough
▪Advanced features based on semi-structured data
–People search: skill reputation scores–Job Search: expertise homophily
▪Personalized Learning-to-Rank is crucial
35
References
▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015▪“Personalized Federated Search at LinkedIn”, Arya, Ha-Thuc and Sinha, CIKM, 2015▪“Learning to Rank Personalized Search Results in Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc, Sinha, KDD, 2016
36
37