search as communication: lessons from a personal journey

Search as Communica/on: Lessons from a Personal Journey

Daniel Tunkelang Head of Query Understanding, LinkedIn

These are great textbooks on informa/on retrieval.

Unfortunately, I never read them in school.

But I did study graphs and stuff.

I found myself developing a search engine.

And the next thing I knew, I was a search guy.

So what did I learn along the way?

Search isn't a ranking problem. It's a communica/on problem.

Outline

1. Lessons from Library Science 2. Adventures with InformaAon ExtracAon 3. A Moment of Clarity

1. Lessons from Library Science

InformaAon need query select from results

rank using IR model

SYSTEM: M-‐idf PageRank

A birds-‐eye view of how search engines work.

Old school search: ask a librarian.

Search lives in an informa/on-‐seeking context.

[Pirolli and Card, 2005]

Recognize ambiguity and ask for clarifica/on.

Clarify, then refine.

Computers Books

Faceted search. It’s not just for e-‐commerce.

Give users transparency, guidance, and control.

Take-‐away for search engine developers:

Act like a librarian. Communicate with your user.

2. Adventures with Informa/on Extrac/on

String matching is great but has limits.

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

People search for en//es. Recognize them!

Named en/ty recogni/on is free, as in free beer.

Problem: they process each document separately.

EnAty DetecAon System

Why not take advantage of corpus features?

Give your documents the right to vote!

Use a high-‐recall method to collect candidates. •  e.g., all Atle-‐case spans of words other

than single word beginning a sentence. Process each document separately.

•  Each candidate is assigned an enAty type, or no type at all.

If a candidate is mostly assigned a single enAty type, extrapolate to all its occurrences.

Looking for topics? Use idf, and its cousin ridf.

Inverse document frequency (idf) •  Too low? Probably a stop word. •  Too high? Could be noise. Residual inverse document frequency (ridf) •  Predict idf using Poisson model. •  Difference between idf and predicted idf.

“a good keyword is far from Poisson” [Church and Gale, 1995]

Terminology extrac/on? Try data recycling.

Obtain en//es by any means necessary.

En/ty detec/on is crucial. And it isn’t that hard.

3. A Moment of Clarity

informaAon Need query select from results

rank using IR model

SYSTEM: M-‐idf PageRank

Let’s go back to our pigeons for a moment.

What does this process look like to the system?

And here’s what it looks like to the user.

GOOD NOT SO GOOD

But can the system tell the difference?

User experience should reflect system confidence.

h^p://searchengineland.com/ge`ng-‐organized-‐paid-‐search-‐user-‐intent-‐the-‐search-‐funnel-‐116312 Derived from [Jansen et al, 2007].

Searches reflect a variety of informa/on needs.

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

We can segment informa/on need from the query.

We can learn from analyzing user behavior.

And we can look at our relevance scores.

Naviga/onal Exploratory

Claudia Hauff, Query Difficulty for Digital Libraries [2009]

There are many pre-‐ and post-‐retrieval signals.

Queries vary in difficulty. Recognize and adapt.

Review

1.  Lessons from Library Science •  Act like a librarian. Communicate with users.

2. Adventures with InformaAon ExtracAon

•  EnAty detecAon is crucial. And isn’t that hard. 3. A Moment of Clarity

•  Queries vary in difficulty. Recognize and adapt.

Conclusion: Read the textbooks.

But treat search as a communica/on problem.

WE’RE HIRING! hbp://data.linkedin.com/search

Contact me: dtunkelang@linkedin.com

hbp://linkedin.com/in/dtunkelang

search as communication: lessons from a personal journey

Technology

the barker journey: lessons from cohort 1

reflections on & lessons from: my research journey

02.25.2019 franchise paid search...a digital customer...

lessons from redesigning linkedin search

lessons learned: the government healthcare transformation...

4 key lessons from my startup journey

shomi's journey to chef: lessons learned on implementing...

lessons from flight 447 undersea search operations

our journey to cloud cadence, lessons learned at microsoft

the journey from search to social: marrying paid search with...

lessons in serendipity my edd journey

lessons learned from an early multi-cloud journey

7 lessons from our content marketing journey

job search lessons learned

beginning the rti journey: implementation lessons learned

enterprise mobile website journey – lessons learned |...

lessons learned: our journey to digital

wikiwijs: an unexpected journey and the lessons learned...

journey greatness character lessons from the past ·...

wikiwijs, an unexpected journey: lessons learned