delayed-dynamic-selective (dds) prediction for reducing … · 2018. 1. 4. ·...

1
Delayed - Dynamic - Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search 1 Department of Computer Science and Engineering, POSTECH 2 Microsoft Research Redmond Saehoon Kim 1 , Yuxiong He 2 , Seung-wong Hwang 1 , Sameh Elnikety 2 , Seungjin Choi 1 Motivation Reduce tail latency (high-percentile latency) of user queries, e.g., 99 th percentile Reduce extreme tail latency at each index server, e.g., 99.99 th percentile Contribution Delayed - Dynamic - Selective (DDS) prediction: identify long(-running) queries with high accuracy DDS Parallelization: use DDS to parallelize index servers for reducing extreme tail latency Reducing Tail Latency by Parallelization Challenges 1. Parallelizing all queries (inefficient) 2. Parallelizing short queries (no speed up) Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Opportunity 1. Available idle cores 2. CPU-intensive workloads DDS (Delayed-Dynamic-Selective) Prediction PREDictive Parallelization [SIGIR’14] Parallelize the predicted long queries only Query Long Short Feature Extraction Regression function Prediction model Query Finished Queries < 10ms Delayed prediction Queries > 10ms Predictor for execution time Long Short Dynamic prediction Predictor for confidence level Not confident Selective prediction Dynamic features Collected at query runtime 1. NumEstMatchDoc := # ℎ # 2. Statistics of the dynamic score distribution Selective prediction Predicted 1 error Predicted execution time Parallelize the unpredictable queries Parallel query if Predicted execution time > Predicted 1 error > Why Extreme Tail Latency? DDS Parallelization Aggregator ISN The 99 th –percentile response time < 120ms The 99.99 th –percentile response time < 120ms Long query Requirements Limitation of PRED Recall Precision Requirements >= 98.9% Should be high Reason To optimize 99.99 th tail latency Less queries to be parallelized 1. 99 th tail latency at aggregator <= 120ms 2. Reduce 99.99 th tail latency at each ISN <= 120ms Recall Precision 100ms 0.601 0.789 20ms 0.905 0.098 10ms 0.952 0.037 2.3ms 0.989 0.011 PRED cannot effectively reduce 99.99 th tail latency Delayed prediction Complete many short queries sequentially Collect dynamic features Importance of dynamic features Top-5 feature importance by boosted regression tree NumEstMachDoc helps to predict # total matched doc DynScore helps to predict early termination Feature Importance NumEstMatchDoc 1 MinDynScore 0.7075 MinIDF 0.2767 VarIDF 0.2730 MaxDynScore 0.2662 Predictor accuracy Baseline: PRED 957% precision improvement at 98.9% recall over PRED Simulation results on tail latency reduction Response time at index server Baseline S Prediction before running a query Parallelize the long query Proposed DDS Run a query for 10ms sequentially Parallelizes the predicted long or unpredictable queries Response time at aggregator

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Delayed-Dynamic-Selective (DDS) Prediction for Reducing … · 2018. 1. 4. · Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search 1 Department

Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search

1 Department of Computer Science and Engineering, POSTECH2 Microsoft Research Redmond

Saehoon Kim1, Yuxiong He2, Seung-wong Hwang1, Sameh Elnikety2, Seungjin Choi1

Motivation

• Reduce tail latency (high-percentile latency) of user queries, e.g., 99th percentile

• Reduce extreme tail latency at each index server,

e.g., 99.99th percentile

Contribution

• Delayed-Dynamic-Selective (DDS) prediction: identify long(-running) queries with high accuracy

• DDS Parallelization: use DDS to parallelize index servers for reducing extreme tail latency

Reducing Tail Latency by Parallelization

Challenges

1. Parallelizing all queries (inefficient)

2. Parallelizing short queries (no speed up)

Breakdown Latency

Network 4.26 ms

Queueing 0.15 ms

I/O 4.70 ms

CPU 194.95 ms

Opportunity

1. Available idle cores

2. CPU-intensive workloads

DDS (Delayed-Dynamic-Selective) Prediction

PREDictive Parallelization [SIGIR’14]

Parallelize the predicted long queries only

Query Long

Short

FeatureExtraction

Regressionfunction

Prediction model

Query

Finished

Queries < 10ms

Delayed prediction

Queries > 10ms

Predictor for execution time

Long

Short

Dynamic prediction

Predictor for confidence level

Not confident

Selective prediction

Dynamic features

Collected at query runtime

1. NumEstMatchDoc := # 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑑𝑜𝑐𝑠

# 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑑 𝑑𝑜𝑐𝑠

2. Statistics of the dynamic score distribution

Selective prediction

Predicted 𝐿1 error

Predicted execution time

• Parallelize the unpredictable queries

• Parallel query if Predicted execution time > 𝛼 Predicted 𝐿1 error > 𝛽

Why Extreme Tail Latency?

DDS Parallelization

Aggregator

ISN

The 99th–percentile response time < 120ms

The 99.99th–percentileresponse time < 120ms

Long query

Requirements Limitation of PRED

Recall Precision

Requirements >= 98.9% Should be high

ReasonTo optimize 99.99th

tail latencyLess queries to be parallelized

1. 99th tail latency at aggregator <= 120ms2. Reduce 99.99th tail latency at each ISN <= 120ms

𝜃 Recall Precision

100ms 0.601 0.789

20ms 0.905 0.098

10ms 0.952 0.037

2.3ms 0.989 0.011

PRED cannot effectively reduce 99.99th tail latency

Delayed prediction

• Complete many short queriessequentially

• Collect dynamic features

Importance of dynamic features

• Top-5 feature importance by boosted regression tree

• NumEstMachDoc helps to predict # total matched doc

• DynScore helps to predict early termination

Feature Importance

NumEstMatchDoc 1

MinDynScore 0.7075

MinIDF 0.2767

VarIDF 0.2730

MaxDynScore 0.2662

Predictor accuracy

• Baseline: PRED• 957% precision improvement

at 98.9% recall over PRED

Simulation results on tail latency reduction

Response time at index server

Baseline SPrediction before running a queryParallelize the long query

Proposed DDSRun a query for 10ms sequentiallyParallelizes the predicted long or

unpredictable queries

Response time at aggregator