lca14: profiling server workload for arm64

21
www.huawei.com Security Level: HUAWEI TECHNOLOGIES CO., LTD. Profiling Enterprise Workload for ARM64 2012Lab Huawei 02/24/2014

Upload: linaro

Post on 13-Jun-2015

313 views

Category:

Technology


0 download

DESCRIPTION

Resource: LCA14 Name: Profiling server workload for ARM64 Date: 05-03-2014 Speaker: Xinwei Hu Video: https://www.youtube.com/watch?v=mxq6CI-uKu0

TRANSCRIPT

Page 1: LCA14: Profiling server workload for ARM64

www.huawei.com

Security Level:

HUAWEI TECHNOLOGIES CO., LTD.

Profiling Enterprise Workload for ARM64

2012Lab Huawei

02/24/2014

Page 2: LCA14: Profiling server workload for ARM64

Page 2HUAWEI TECHNOLOGIES CO., LTD.

Huawei at a glance

Page 4Page 4

The 3rd largest smartphone vendor in the world

US $933 million in welfare spending in 2012

Annual sales revenue reached US $35.4 billion

The world's No. 2 telecom equipment provider

150,000+ dedicated employees globally, 45% of whom are engaged in R&D

Serving 45 of the world's top 50 telecom carriers

16 R&D centers worldwide

933

16

150,000+

45/50

No. 2

35.4

Page 4

NO. 3

Page 3: LCA14: Profiling server workload for ARM64

Page 3HUAWEI TECHNOLOGIES CO., LTD.

Huawei's also growing strongly in IT

Geneva, Switzerland

• Operation and maintenance cost

reduced by 45%

• Over 20 PB scientific data processed

every year

• Excellent architecture meeting

EB-level challenges

"CERN is hitting the technology limits for

resource-intensive simulations and analysis. Our

collaboration with Huawei shows an exciting

new approach, where their novel architecture

extends the capabilities in preparation for the

Exascale data rates and volumes we expect in

the future."-Bob Jones, head of CERN Openlab

The world's largest and highest-energy particle accelerator

Page 4: LCA14: Profiling server workload for ARM64

Page 4HUAWEI TECHNOLOGIES CO., LTD.

The performance of enterprise workload is subjective

Business Goal

Page 5: LCA14: Profiling server workload for ARM64

Page 5HUAWEI TECHNOLOGIES CO., LTD.

Needs optimization to achieve

Optimization

Business Goal

Page 6: LCA14: Profiling server workload for ARM64

Page 6HUAWEI TECHNOLOGIES CO., LTD.

Needs profiling to guide

Optimization

Business Goal

Profiling

Page 7: LCA14: Profiling server workload for ARM64

Page 7HUAWEI TECHNOLOGIES CO., LTD.

As a continuously loop

Optimization

Business Goal

Profiling

Page 8: LCA14: Profiling server workload for ARM64

Page 8HUAWEI TECHNOLOGIES CO., LTD.

Let’s talk about the dark side

Optimization Profiling

Page 9: LCA14: Profiling server workload for ARM64

Page 9HUAWEI TECHNOLOGIES CO., LTD.

Enterprise workloads is new experience on ARM64

• New applications:• Web/Mail/File Server• Caching• Database• Storage

• New deployment methods:• OS• Virtualization• SaaS

Page 10: LCA14: Profiling server workload for ARM64

Page 10HUAWEI TECHNOLOGIES CO., LTD.

Challenges of optimizing enterprise workload

• business goal is different from benchmark of a single subsystem• it is not just about performance

• bottleneck is hard to identify• developers don't know arm64 well enough• in the cluster, instead of in a single machine• multi-tenancy environment• applications written in perl/python/ruby/javascript…..

• boundary between development and operation is blurred

Page 11: LCA14: Profiling server workload for ARM64

Page 11HUAWEI TECHNOLOGIES CO., LTD.

The profiling toolbox for enterprise workload

• Hardware tools• software can pin down the saturation on hardware side.

• System tools• analysis kernel and processes running inside the server.

• Cluster Integrated tools• measuring, profiling and reporting distributed or cloud applications.

Page 12: LCA14: Profiling server workload for ARM64

Page 12HUAWEI TECHNOLOGIES CO., LTD.

Software has to understand hardware through PMU

CPU

L1D_CACHE_WB_CLEAN

L1D_CACHE_WB_VICTIM

L1D_CACHE_REFILL_ST

L1D_CACHE_REFILL_LD

L1D_CACHE_ST

L1D_CACHE_LD

BUS_CYCLES

TTBR_WRITE_RETIRED

INST_SPEC

BUS_ACCESS L2D_CACHE_WB

L2D_CACHE_REFILL

L2D_CACHE

L1D_CACHE_WB

L1I_CACHE

MEM_ACCESS

BR_PRED

CPU_CYCLES

BR_MIS_PRED

CID_WRITE_RETIRED

EXC_RETURN

EXC_TAKEN

INST_RETIRED

L1D_TLB_REFILL

L1D_CACHEL1D_CACHE_REFILL

L1I_TLB_REFIL

L1I_CACHE_REFILL

……

Page 13: LCA14: Profiling server workload for ARM64

Page 13HUAWEI TECHNOLOGIES CO., LTD.

Better be understandable to developers and operators

A screenshot of https://gooda-visualizer.googlecode.com/git/index.html#report=reports%2FSample

Page 14: LCA14: Profiling server workload for ARM64

Page 14HUAWEI TECHNOLOGIES CO., LTD.

ARM64 is catching up with system profiling tools

ApplicationsMiddles and Databases

ApplicationsMiddles and Databases

Libc and other core libsLibc and other core libs

syscallsyscall

FilesystemFilesystem Virtual MemoryVirtual MemoryNetworkNetwork SchedulerScheduler

Device driverDevice driver

DeviceDevice

netstat

powertop

strace

iostat

blktrace

tcpdump

vmstat

top

sar

ps

slabtop

pmap

iotop

/proc

perf

stap

lttng

traceroute

ftrace

Page 15: LCA14: Profiling server workload for ARM64

Page 15HUAWEI TECHNOLOGIES CO., LTD.

But dynamic instrumentation for whole stack is missing

ApplicationsMiddles and Databases

ApplicationsMiddles and Databases

Libc and other core libsLibc and other core libs

syscallsyscall

FilesystemFilesystem Virtual MemoryVirtual MemoryNetworkNetwork SchedulerScheduler

Device driverDevice driver

DeviceDevice

netstat

powertop

strace

iostat

blktrace

tcpdump

vmstat

top

sar

ps

slabtop

pmap

iotop

/proc

perf

stap

lttng

traceroute

ftrace

• be able to instrument both kernel and user-space applications dynamically• stable enough to run in production environments• minimize the CPU/memory penalty

Page 16: LCA14: Profiling server workload for ARM64

Page 16HUAWEI TECHNOLOGIES CO., LTD.

The high density of servers is a challenge to profiling

• Better to have a single point you can instrument the workload end-to-end in the whole cluster of machines

• Better to have a good visualizer which connect the dots, and help you understanding the big picture

Page 17: LCA14: Profiling server workload for ARM64

Page 17HUAWEI TECHNOLOGIES CO., LTD.

or a rack

Page 18: LCA14: Profiling server workload for ARM64

Page 18HUAWEI TECHNOLOGIES CO., LTD.

Conclusion

Making real enterprise workloads performing well on ARM64 is a challenge.

Enterprise workloads need continuous optimization, profiling tools are used by both developers and operators.

Lacking of profiling tools is a common problem to the community, and we’d like to have a community solution.

Page 19: LCA14: Profiling server workload for ARM64

Page 19HUAWEI TECHNOLOGIES CO., LTD.

TAO is not complete without the dark side

Page 20: LCA14: Profiling server workload for ARM64

Page 20HUAWEI TECHNOLOGIES CO., LTD.

Huawei’s plan for 2014

A top-down CPU analyzer for ARM64 A lightweight dynamic instrumentation tool for ARM64 A prototype of end-to-end profiling/monitoring framework for

ARM64 servers

You are welcomed to join the technical discussion at

Profiling tools, what we have & what we are missing5:05 PM – 6:00 PM on Thursday, Mar 6, 2014

Loulan 4103-4104 - Session 1

Page 21: LCA14: Profiling server workload for ARM64

Thank youwww.huawei.com

Copyright©2011 Huawei Technologies Co., Ltd. All Rights Reserved.The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.