search at twitter

86
Search @twitter Michael Busch @michibusch [email protected] [email protected] 1

Upload: lucenerevolution

Post on 27-Jan-2015

110 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 2: Search at Twitter

Agenda

‣ Introduction

- Search Architecture

- Inverted Index 101

- Realtime Posting Lists

Search @twitter

2

Page 3: Search at Twitter

Introduction

3

Page 4: Search at Twitter

Introduction

Twitter has more than 230 million monthly active users.

4

Page 5: Search at Twitter

Introduction

500 million tweets are sent per day.

5

Page 6: Search at Twitter

Introduction

More than 300 billion tweets have been sent since company founding in 2006.

6

Page 7: Search at Twitter

Introduction

Tweets-per-second world record:33,388 TPS.

7

Page 8: Search at Twitter

Introduction

More than 2 billion search queries per day.

8

Page 9: Search at Twitter

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

9

Page 10: Search at Twitter

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

10

Page 11: Search at Twitter

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

11

Page 12: Search at Twitter

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

12

Page 13: Search at Twitter

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

13

Page 14: Search at Twitter

Realtime Search @twitter

Agenda

- Introduction

‣ Search Architecture

- Inverted Index 101

- Realtime Posting Lists

14

Page 15: Search at Twitter

Search Architecture

15

Page 16: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

MapreduceAnalyzer

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

16

Page 17: Search at Twitter

Search Architecture

Analyzer/Partitioner

• Pre-processes Tweets for indexing

• Analyzing (tokenization/normalization) of text

• Geo-coding, URL expansion, etc.

• Hash partitioning

17

Page 18: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

18

Page 19: Search at Twitter

RT index

Search Architecture

RT index(Earlybird)

• Modified Lucene index implementation optimized for realtime search

• IndexWriter buffer is searchable (no need to flush to allow searching)

• In-memory

• Hash-partitioned, static layout

19

Page 20: Search at Twitter

Cluster layout

Replicas

EarlybirdEarlybird

Earlybird

20

Page 21: Search at Twitter

Cluster layout

...

n hash partitions (docId % n)

Replicas

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

21

Page 22: Search at Twitter

Cluster layout

...

...

...

... ... ... ...Timeslices

n hash partitions (docId % n)

Replicas

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

22

Page 23: Search at Twitter

Cluster layout

...

...

...

... ... ... ...

Writabletimeslice

Completetimeslices

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

23

Page 24: Search at Twitter

RT index

Search Architecture

RT index(Earlybird)

• Modified Lucene index implementation optimized for realtime search

• IndexWriter buffer is searchable (no need to flush to allow searching)

• In-memory

• Hash-partitioned, static layout

24

Page 25: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

25

Page 26: Search at Twitter

Search Architecture

MapreduceAnalyzer

• Daily jobs that process raw tweets

• Analyzes text

• Aggregates metadata and signals

26

Page 27: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

27

Page 28: Search at Twitter

Search Architecture

RT indexArchive index

• Standard Lucene (4.4) indexes

• Reverse time-sorted (new to old)

• Cluster layout similar to realtime search cluster

28

Page 29: Search at Twitter

Search Architecture

RT indexArchive index

• Two tiers: In-memory and on SSD

In-memory index

SSD index

29

Page 30: Search at Twitter

Search Architecture

RT indexArchive index

• Two tiers: In-memory and on SSD

In-memory index

SSD index

Contains small number of best tweets of all time

30

Page 31: Search at Twitter

Search Architecture

RT indexArchive index

• Two tiers: In-memory and on SSD

In-memory index

SSD index

Much bigger index with more tweets, less max. QPS, limited by

SSD IOPS.Only needs to be queried if in-

memory index did not yield enough results

31

Page 32: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

32

Page 33: Search at Twitter

RT index

Search Architecture

RT index(Earlybird)

Blender

RT indexArchive index

searcheswrites

Searchrequests

• Blender is our Thrift service aggregator

• Queries multiple Earlybirds, merges results

33

Page 34: Search at Twitter

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

34

Page 35: Search at Twitter

RT index

Search Architecture

TweetsAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

queue

HDFS

Searchrequests

Updates Deletes/Engagement (e.g. retweets/favs)

searcheswrites

MapreduceAnalyzer

35

Page 36: Search at Twitter

Realtime Search @twitter

Agenda

- Introduction

- Search Architecture

‣ Inverted Index 101

- Realtime Posting Lists

36

Page 37: Search at Twitter

Inverted Index 101

37

Page 38: Search at Twitter

Inverted Index 101

1 The old night keeper keeps the keep in the town

2 In the big old house in the big old gown.

3 The house in the town had the big old keep

4 Where the old night keeper never did sleep.

5 The night keeper keeps the keep in the night

6 And keeps in the dark and sleeps in the light.

Table with 6 documents

Example from:Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006

38

Page 39: Search at Twitter

Inverted Index 101

1 The old night keeper keeps the keep in the town

2 In the big old house in the big old gown.

3 The house in the town had the big old keep

4 Where the old night keeper never did sleep.

5 The night keeper keeps the keep in the night

6 And keeps in the dark and sleeps in the light.

term freqand 1 <6>big 2 <2> <3>

dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>

keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>

never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>

sleep 1 <4>sleeps 1 <6>

the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>

Table with 6 documents

Dictionary and posting lists39

Page 40: Search at Twitter

Inverted Index 101

1 The old night keeper keeps the keep in the town

2 In the big old house in the big old gown.

3 The house in the town had the big old keep

4 Where the old night keeper never did sleep.

5 The night keeper keeps the keep in the night

6 And keeps in the dark and sleeps in the light.

term freqand 1 <6>big 2 <2> <3>

dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>

keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>

never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>

sleep 1 <4>sleeps 1 <6>

the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>

Table with 6 documents

Dictionary and posting lists

Query: keeper

40

Page 41: Search at Twitter

Inverted Index 101

1 The old night keeper keeps the keep in the town

2 In the big old house in the big old gown.

3 The house in the town had the big old keep

4 Where the old night keeper never did sleep.

5 The night keeper keeps the keep in the night

6 And keeps in the dark and sleeps in the light.

term freqand 1 <6>big 2 <2> <3>

dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>

keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>

never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>

sleep 1 <4>sleeps 1 <6>

the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>

Table with 6 documents

Dictionary and posting lists

Query: keeper

41

Page 42: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

42

Page 43: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

43

Page 44: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

00000101VInt compression:

Values 0 <= delta <= 127 need one byte

44

Page 45: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

11000110VInt compression:

Values 128 <= delta <= 16384 need two bytes

00011001

45

Page 46: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

11000110VInt compression:

First bit indicates whether next byte belongs to the same value

00011001

46

Page 47: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

11000110VInt compression: 00011001

• Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically

47

Page 48: Search at Twitter

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

Read direction

• Each posting depends on previous one; decoding only possible in old-to-new direction

• With recency ranking (new-to-old) no early termination is possible

48

Page 49: Search at Twitter

Posting list encoding

• By default Lucene uses a combination of delta encoding and VInt compression

• VInts are expensive to decode

• Problem 1: How to traverse posting lists backwards?

• Problem 2: How to write a posting atomically?

49

Page 50: Search at Twitter

Realtime Search @twitter

Agenda

- Introduction

- Search Architecture

- Inverted Index 101

‣ Realtime Posting Lists

50

Page 51: Search at Twitter

Realtime Posting Lists

51

Page 52: Search at Twitter

Posting list encoding in Earlybird v1

int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Tweet text can only have 140 chars

52

Page 53: Search at Twitter

Posting list encoding in Earlybird v1

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

Earlybird encoding:

Read direction

5 15 9000 9002 100000 100090

53

Page 54: Search at Twitter

Early query termination

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

Earlybird encoding:

Read direction

5 15 9000 9002 100000 100090

E.g. 3 result are requested: Here we can terminate after reading 3

postings

54

Page 55: Search at Twitter

Inverted index components

Parallel arraysDictionary

pointer to the most recently indexed posting for a term

Posting list storage

?

55

Page 56: Search at Twitter

Inverted index components

Parallel arraysDictionary

pointer to the most recently indexed posting for a term

Posting list storage

?

56

Page 57: Search at Twitter

• Store many single-linked lists of different lengths space-efficiently

• The number of java objects should be independent of the number of lists or number of items in the lists

• Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding)

• Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads)

• Traversal in backwards order

Posting lists storage - Objectives

57

Page 58: Search at Twitter

Memory management

= 32K int[]

4 int[]pools

58

Page 59: Search at Twitter

Memory management

= 32K int[]

4 int[]pools

Each pool can be grown

individually by adding 32K

blocks

59

Page 60: Search at Twitter

Memory management

• For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays

• Small total number of Java objects (each 32K block is one object)

4 int[]pools

60

Page 61: Search at Twitter

Memory management

• Slices can be allocated in each pool

• Each pool has a different, but fixed slice size

21

24

27

211slice size

61

Page 62: Search at Twitter

Adding and appending to a list

21

24

27

211slice size

availableallocatedcurrent list

62

Page 63: Search at Twitter

Adding and appending to a list

21

24

27

211slice size

Store first twopostings in this slice

availableallocatedcurrent list

63

Page 64: Search at Twitter

Adding and appending to a list

21

24

27

211slice size

When first slice is full, allocate another one in second pool

availableallocatedcurrent list

64

Page 65: Search at Twitter

Adding and appending to a list

21

24

27

211slice size

availableallocatedcurrent list

Allocate a slice on each level as list grows

65

Page 66: Search at Twitter

Adding and appending to a list

21

24

27

211slice size

availableallocatedcurrent list

On upper most level one list can own multiple slices

66

Page 67: Search at Twitter

Posting list format v1

int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Tweet text can only have 140 chars

67

Page 68: Search at Twitter

Addressing items

• Use 32 bit (int) pointers to address any item in any list unambiguously:

int (32 bits)

poolIndex2 bits0-3

offset in slice1-11 bits

depends on pool

sliceIndex19-29 bits

depends on pool

• Nice symmetry: Postings and address pointers both fit into a 32 bit int

68

Page 69: Search at Twitter

Linking the slices

21

24

27

211slice size

availableallocatedcurrent list

69

Page 70: Search at Twitter

Linking the slices

21

24

27

211slice size

availableallocatedcurrent list

Parallel arraysDictionary

pointer to the last posting indexed for a term

70

Page 71: Search at Twitter

Posting list encoding - Summary

• ints can be written atomically in Java

• Backwards traversal easy on absolute docIDs (not deltas)

• Every posting is a possible entry point for a searcher

• Skipping can be done without additional data structures as binary search, though there are better approaches (skip lists)

• Repeating docIDs if a term occurs multiple times in the same document only works for small docs

• Max. segment size: 2^24 = 16.7M tweets

71

Page 72: Search at Twitter

New posting list encoding

• Objectives:

• 32 bit positions and variable-length payloads

• Store term frequency (TF) instead of repeating docIDs

• Keep:

• Concurrency model

• Space-efficiency for short documents

• Performance

72

Page 73: Search at Twitter

New posting list encoding

DocID, termFreq Position, Payload

73

Page 74: Search at Twitter

New posting list encoding

DocID, termFreq Position, Payload

Fixed length for each posting

74

Page 75: Search at Twitter

New posting list encoding

DocID, termFreq Position, Payload

Variable length

75

Page 76: Search at Twitter

New posting list encoding

DocID, termFreq

Position, Payload

76

Page 77: Search at Twitter

New posting list encoding

DocID, termFreq

Position, Payload

DocID, termFreq

Position, Payload, Position

DocID, termFreq

Position, Payload

...

...

77

Page 78: Search at Twitter

New posting list encoding

DocID, termFreq

Position, Payload

DocID, termFreq

Position, Payload, Position

DocID, termFreq

Position, Payload

...

...

• Store TF instead of repeating the same DocID

• Store DocID/TF pairs separately from position/payloads

• Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive)

78

Page 79: Search at Twitter

New posting list encoding

DocID, termFreq

Position, Payload

DocID, termFreq

Position, Payload, Position

DocID, termFreq

Position, Payload

...

...

• Store TF instead of repeating the same DocID

• Store DocID/TF pairs separately from position/payloads

• Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive)

Fixed length for each posting (32 bits)

79

Page 80: Search at Twitter

New posting list encoding

• Idea: Use an embedded skip list as periodical “synchronization points”

• Keeps memory overhead for pointers low and improves search performance

80

Page 81: Search at Twitter

21

24

27

211slice size

availableallocatedcurrent list

New posting list encoding

81

Page 82: Search at Twitter

New posting list encoding

Slice header

• Header contains:

• Back-pointer to previous slice (as before)

• Skip list

• Slice id

82

Page 83: Search at Twitter

New posting list encoding

int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Observation: Most tweets don’t need all 8 bits for text position

• Idea: Use the position “inlining” approach for short documents, but support Lucene’s 32-bit positions and variable length payloads

83

Page 84: Search at Twitter

New posting list encoding

int (32 bits)

docID24 bits

max. 16.7M

textPositionor

termFreq7 bits

max. 127

As a storage optimization, the text position is stored with the docID if:o termFreq == 1 (term occurs once only in the doc) ANDo textPosition <= 127 AND o Posting has no payload ANDo Posting is not at a skip point of the docID posting list (see later).

0=textPosition1=termFreq

1 bit

84

Page 85: Search at Twitter

New posting list encoding - Summary

• Support for 32 bit positions and arbitrary length payloads stored in separate data structure

• Performance and space consumption very similar compared to previous encoding for tweet search

• Skip lists used for speed and synchronization points

• For short documents positions can still be inlined

85