your cache is overdue a revolution: mrcs for cache ... ahmad, cto, cloudphysics . your cache is...

23
Irfan Ahmad, CTO, CloudPhysics Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Upload: trinhkhanh

Post on 17-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Irfan Ahmad, CTO, CloudPhysics

Your Cache is Overdue a Revolution: MRCs for Cache Performance

and Isolation

Page 2: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:

Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

2

Page 3: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Abstract

3

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Recently, a new, revolutionary set of techniques have been discovered for online cache optimization. Based on work published at top academic venues (FAST '15 and OSDI '14), we will discuss how to 1) perform online selection of cache parameters including cache block size and read-ahead strategies to tune the cache to actual customer workloads, 2) dynamic cache partitioning to improve cache hit ratios without adding hardware and finally, 3) cache sizing and troubleshooting field performance problems in a data-driven manner. With average performance improvements of 40% across large number of real, multi-tenant workloads, the new analytical techniques are worth learning more about.

Page 4: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Cache as a Layer of Control

4

Distributed Platform

DBs KV Stores

Containers Apps

Servers

Network Storage

… OS

APPS

Server / Mobile /

Embedded

Local Storage

Cloud Scale Data Caching Single System Data Caching

Goal: Customer Self-Serve on Sizing, Tuning, Troubleshooting Goal: 50%+ Hardware Efficiency Gain

Goal: SLO guarantees, Latency & Throughput

Network Elastic Cache

Page 5: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Cache Performance Questions Unanswered

5

Is this performance good? Can it be improved? What happens if I add / remove some cache? What if I add / remove workloads? Is there cache thrashing / pollution? What if I change cache algorithm parameters?

Cache Performance Hit Ratio Cache Size

43% 128GB

Page 6: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Modeling Cache Performance

6

Lower is better

42 84 128 170 0

Cache Size (GB)

Mis

s R

atio

0.0

0.2

0.4

0.6

0.8 Miss Ratio Curve (MRC) Performance as f(size) Working set knees Inform allocation policy

Reuse distance Unique intervening blocks between use and reuse LRU, stack algorithms

Page 7: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

7

MRC Algorithm Research

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

separate simulation per cache size

Bennett & Kruskal balanced tree

O(N), O(N log N)

Olken tree of unique refs

O(M), O(N log M)

SHARDS* spatial hashing

Counter Stacks probabilistic counters

O(1), O(N)

O(log M), O(N log M)

PARDA parallelism

UMON-DSS hw set sampling

RapidMRC on-off periods

Kessler, Hill & Wood set, time sampling

Bryan & Conte cluster sampling

Mattson Stack Algorithm single pass

O(M), O(NM)

Space, Time Complexity N = total refs, M = unique refs

2016

AET kinetic modeling

O(1), O(N)

* Spatially Hashed Approximate Reuse Distance Sampling

Presenter
Presentation Notes
I’ll use this timeline to highlight some key events in the 45-year history of MRC algorithm research (see paper for a more comprehensive treatment of related work). Before the 70s, cache modeling required running a separate experiment for each cache size. In 1970, Mattson’s seminal paper introduced a single-pass algorithm that computes the whole curve in one shot for stack algorithms such as LRU. Unfortunately, cost was still high: O(M) space, where M = # unique references in the trace, and O(N*M) time, where N = # total number of references. Research in the 70s and 80s focused on data-structure improvements for exact offline analysis: Bennett&Kruskal tree over all refs; Olken tree over unique refs. In the 90s thru mid-2000s, approximate techniques were introduced to speed up offline analysis: Kessler with set sampling; Bryan&Conte with temporal cluster sampling. Starting in mid-2000s, online techniques were proposed – using new hardware such as UMON-DSS for set-assoc processor caches, and using coarse on/off sampling periods to reduce overhead, such as in RapidMRC. Later, PARDA achieved speedups leveraging multi-core parallelism. Finally, there has been a recent resurgence of interest in this topic that’s very exciting. Over just the past few months, there have been several concurrent efforts using different approaches. Counter Stacks approximation algorithm (OSDI ‘14) improved space complexity significantly to O(log M), ROUNDER (SOCC ‘14) reduced CPU overhead via grouping, and SHARDS approximates MRCs in constant space and linear time. Problem is old, but solutions are only now becoming practical for scalable online modeling.
Page 8: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

MRCs from Production Workloads

8

Lower is better

Page 9: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Achieving Latency Targets

9

Latency Target (7 ms)

Cache Allocation (>16 GB)

• Convert to a Latency-Cache Curve

• Set client cache partition sizes to provide average latency SLOs

• Can set targets for %ile latency, not just avg (e.g. 75th %ile, 95th %ile)

• Drive per-client tier occupancy sizing for latency SLOs

Client target average IO latency is 7 ms

Autoset cache partitions size to 16GB to guarantee avg latency SLOs

* Throughput targets can be implemented similarly

Page 10: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Per-Client Multi-Tier Sizing

10

• MRCs give strong guidance on sizing per-client tier allocations

• Latency guarantees can be achieved by computing hit/miss costs (latency) of the tiers against hit/miss ratios

• Each tier can be multi-tenant using sizing via MRCs

• Model network bandwidth as a function of cache misses from each tier

Tier 0 allocation for this client Tier 1 allocation for this client

Tier 2 allocation for this client

Page 11: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Self-Service Monitor / Size / Troubleshoot

11

Tune and optimize production workloads Show MRCs in monitoring UI

Ops teams can troubleshoot or self-service on sizing Size caches in production, trigger alerts, etc.

Higher is better

Interactive what-if analysis of effect to a workload of more or less cache

Page 12: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Performance Gain Thrash Remediation

12

Ideal Curve

• New mechanism allows online, optimal MRC bending via a thrash remediation algorithm

• Optimize cache for cache-unfriendly workloads

• Lower miss rates even for single workloads

• Compatible with partitioning for additional benefit

Page 13: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Performance Gain From Partitioning

13

Improve aggregate cache performance

Allocate space based on client benefit Prevent inefficient space utilization / thrashing

Mechanism: Partition cache across clients

Compute per-client MRCs cheaply Isolate and control competing clients, LUNs, VMs, tenants, DB tables, etc. Optimize partition sizes using MRCs

Adapt to changing workload behavior Certain SDS platforms already support Partitioning

vs

Partitioned Cache (using cache reservation)

Client 0 Client 1

Part. 0 Part. 1

Page 14: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Performance Gain From Partitioning (Results)

14

Effective Cache Size Increase (%)* Fr

eque

ncy

(# E

xper

imen

ts)

MRC-guided partitions vs. global LRU Effective cache size

40% larger (avg) 146% larger (max) Computed as the %age larger cache hardware needed whose performance would match MRC-guided partitioning. Metric reflects the capital expenditure avoided to get higher performance by using a software-only technique.

Page 15: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Performance gain from Auto Cache Policy Tuning

Quantify impact of parameter changes Cache block size, use of sub-blocks, Write-thru vs. write-back, Size of Read vs Write cache, replacement policy…

Explore without modifying actual production

Simulate multiple configurations concurrently Multiple MRCs, each with different parameters

Dynamic online optimization Determine best configuration Adjust actual cache parameters

Tuning parameters impact performance dramatically (can be 100%+) Parameter selection can be automated

15

Two examples show workload-sensitive cache-block-sensitive performance curves

Can automatically pick the correct parameter settings via online MRCs.

Page 16: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Generalizing to Non-LRU Policies (Special to SHARDS [FAST 2015])

16

Sophisticated caching algorithms ARC, LIRS, CAR, Clock-Pro, 2Q, … No known single-pass methods!

Scaled-down simulation Leverages SHARDS hashed spatial sampling algorithm Simulate each size separately

Still highly efficient Low sampling rate R = 0.001 1000× reduction in memory, processing 100× for concurrent simulation of 10 cache sizes!

Lower is better

CLOCK-Pro Trace t04

ARC MSR-Web Trace

Page 17: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Systems Implementation

17

Easy integration with existing embedded systems Example C interface

void mrc_process_ref(MRC *mrc, KEY opaque);

void mrc_get_histo(MRC *mrc, Histo *histo);

Extremely low resource usage (SHARDS example) Accurate MRCs in <1 MB footprint Single-threaded throughput of ~20M blocks/sec

Average time of mrc_process_ref() call ~50 ns No floating-point, no dynamic memory allocation

Page 18: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Cache Resource Management in Software Defined Storage

18

Distributed Multi-tenant Policies Global Latency Guarantees Global Throughput Guarantees Flash Cache Auto-Reservation DRAM Cache Allocation Cache Time Series Analysis Cache Thrashing Remediation Remote v Local Cache Allocation Write Cache Allocation Self-tuning Cache Policies Automatic Write Policy Selection Automated Tiering

New Features possible with Miss Ratio Curves (MRCs)

Distributed Platform

DBs KV Stores

Containers Apps

Servers

Network Storage

Network Elastic Cache

Page 19: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Multi-Tenant Performance Challenges

Allocate resources to get biggest bang for the buck Equalize marginal value of additional cache block to each workload Support weighted value based on SLO goals, importance, price Similar to existing resource management techniques for CPU and memory Cluster-wide cache resource management extremely valuable

Today, global LRU or global ARC commonly used Can exhibit gross unfairness, interference and unpredictability Vulnerable to thrashing/scan pollution with multi-tenancy

How to determine marginal returns of additional cache? MRCs solve this problem by providing the entire utility curve But prior methods too expensive to construct online (both time and space)

Lightweight MRCs enable dramatic cache-management benefits

19

Page 20: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Step-by-Step Recap

20

1. Calculate Miss Ratio Curve (MRC)

Trivial integration with cache code base. In case of SHARDS, single new function call in I/O path (~ 50 ns).

2. Display Cache Utility

Trivial integration with monitoring code base.

3. Auto-Tune Cache Parameters

Small modifications to cache code base. Auto-select cache parameters using multiple online MRCs. Performance improvements (often see 50%+ opportunity)

Li Ti

Lower is better

42 84 128 170 0 Cache Size (GB)

Mis

s R

atio

0.0

0.2

0.4

0.6

0.8

Page 21: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Step-by-Step Recap

21

4. Thrashing Remediation

Thrashing: a workload fills cache with entries not reused. Small modifications to cache code base to selectively prefer some blocks during eviction. Not uncommon to see 50% performance improvement from this step.

5. Latency Guarantees

Trivial arithmetic to convert miss ratio curve to a latencies-cache curve. Then a simple control loop to set cache size to match SLO. First predictive latency/throughput guarantee system.

6. Accurate Tiering Decisions

A new cache allocation policy manager in caching code base. Cluster-wide allocation of cache resources to match SLOs. For example, to achieve some 500 µs, allocate 5GB local DRAM, 5GB remote DRAM, 15 GB local SSD, 20 GB remote SSD.

Tier 0 allocation for this client Tier 1 allocation for this client Latency

Target (7 ms)

Cache Allocation (>16 GB)

Client target IO latency is: 7 ms

Guarantee avg latency: autoset cache partition to 16GB

Page 22: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Step-by-Step Recap

22

7. Multi-Tenant Partitioning

Small modifications to cache code base to create virtual partitions. Our data shows an average cache efficiency improvement of ~50%.

vs

Client 0 Client 1

Part. 0 Part. 1

8. Cloud Storage

Use the latencies-cache curve to size the cloud tier. Enable a data-driven, predictive sizing of the cold data set in a remote tier (for ROBO, edge, colo use cases)

Cloud Tier

6. Cache Time-Series MRCs

Analytics to predict caching cost-benefit analysis over time. Enables automated and predictive cache sizing using time series analysis.

Page 23: Your Cache is Overdue a Revolution: MRCs for Cache ... Ahmad, CTO, CloudPhysics . Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation

Your Cache is Overdue a Revolution: MRCs for Cache Performance and Isolation Approved SNIA Tutorial © 2016 CloudPhysics, Inc., Storage Networking Industry Association. All Rights Reserved.

Attribution & Feedback

23

Please send any questions or comments regarding this SNIA Tutorial to [email protected]

The SNIA Education Committee thanks the following Individuals for their contributions to this Tutorial.

Authorship History Irfan Ahmad / 2016-09-22

Additional Contributors Carl Waldspurger