presented by: eric carty-fickes

Memory System Characterization of Commercial Workloads

L.A. Barroso, K. Gharachorloo and E. Bugnion Western Research LaboratoryDigital Equipment Corporation

Presented by: Eric Carty-Fickes

Introduction

• commercial workloads > engineering– but most still using scientific benchmarks (in

1998)

• difficult to create commercial benchmarks– large, expensive, proprietary, changing

• paper uses commercial workloads to study current trends

Database Workloads

• first two run on Oracle DB server• OLTP

– small r/w queries on part of DB– models banking req’s in dedicated mode– more kernel time; hides I/O

• DSS (decision support systems)– long read-only queries on much of DB– models wholesaler’s SQL queries– fewer context-switches

Database Workloads

• Web Index Search– doesn’t require DB server– multiple threads hide misses– read-only req’s and cached recent searches

Test Systems

• 4 processor AlphaServer 4100 and 8 processor 8400 for hardware testing– IPROBE tool for event counting– DCPI for profiling– ATOM for studying ORACLE

• SimOS for testing architectural changes– models Alpha 21164– simplified, but still with some detail

Aspects of Testing

• 3 issues: memory size, I/O bandwidth, runtime– scale down DB– change block buffer cache sizes

• OLTP and DSS: need to warm up SGA before testing; need to scale DB to be resident

• Web Index: no scaling – same system

Hardware Results

• OLTP – higher CPI, maybe due to TPC-B– long secondary cache latency– lots of primary cache misses, esp Icache– dirty miss latency significant, lots of communication

• DSS – lower CPI means this config works– only suggestion is larger 1st level caches

• AltaVista – use 8400 just like original– good CPI, well written code– 1st level caches important

Simulator Results

• simulator like hardware, some cache and consistency differences = different timing– close cycle counts, miss rates

• OLTP – test assoc and Bcache size– idle time increase when servers can’t hide I/O– lots of cache intricacies…– bigger caches = fewer replacemt, inst misses – more

important for OLTP than DSS– bigger lines = more true sharing, less cold missing

Conclusions

• scaled OLTP and DSS give a decent estimate of real performance

• fairly narrow range of architectural issues explored

• more processes/processor = less I/O latency, fewer dirty misses

• simulators gloss over important details for ease of use (timing, OS, etc.)

Questions

• Can you get enough information by scaling down the DB and playing tricks with block buffer sizes?

presented by: eric carty-fickes

Documents