technology for better business outcomes © 2007 hewlett-packard development company, l.p. the...

99
Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Tools………… The big picture………

Post on 19-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Tools…………The big picture………

Page 2: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Overview

• High Level Review and Background• HP Core Server Management Software

− HP ProLiant Essentials− HP Smart Start Scripting Tools kit - utilities− HP Smart Update Manager

• HP Systems Insight Manager − Insight Control Environment – Linux− HP SIM Insight Power Manager

• Items we could cover if there is interest and time…….− CMU− XC− Collectl− Xtools / HPcpi

Page 3: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Integrated Lights-Out 2Integrated Lights-Out

Server Console/BMC Positioning

Industry leading, comprehensive remote

management for ProLiant 300/500/Blades

Basic, affordable remote management for ProLiant 100

series

Lights-Out 100

•Total control•Advanced security•Scalable•Versatile•Standards and innovation

•Simple control•Basic security•Standards centric

Page 4: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Standardized processes w/ Vendor-managed solutions

& Custom Integration

Vendor-managed solutions with Standard integration

on Standard Systems

Utilities & Toolkitswith DIY integration

Linux Management Capabilities for HP Servers

Solution Solution ScopeScope

Solution Solution ScaleScale

HighHigh

HighHighLowLow

Distro tools & Open Source components

Single System

Multi-System10s of servers

Multi-Environment100s of servers 1000s of serversFew servers

Open Source

Software

IntegratedStandard

Technologies

IntegratedStandard

Processes

ICE-Linux

XCCMU

RDPSSTK

ProLiant Essentials

Opsware

Page 5: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HPC Linux Management Continuum

XC Turnkey, supported environment with Linux, LSF, HP-MPI, management and monitoring

Scalable, production environment

CMUReal-time monitoring of clusters

Scalable cloning of customer-defined images to cluster or groups of nodes

Insight Control Environment

Linux

Automated provisioning scripts for Linux distributions, suitable for department scale clusters, with SIM integration

Lightweight cluster monitoring interface

Blades and iLO servers

HP Core HP SIM, Proliant Essentials, PSP, HP SUM, etc…

Page 6: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

04/18/23 Non-Disclosure Presentation - HP Confidential - Rapid Deployment Pack 6

HP Tool Continuum – more details

SmartStart CD/DVD

ServerMigration

Pack– P2P Edition

SmartStartScriptingToolkit

RDPOpenView

ConfigurationManager & Opsware

Insight Control

Environment for Linux

Single serverInteractive,

assisted installInterview-based

or replication

Single serverAutomated

Target boot diskrequired

OS, app, datamigrations

Multiple serversAutomated

with PXE/USB/CDCustomer-

createdscripts

Multiple serversAutomated fromremote console

Imaging & scripting

Pre-packageddeployment

events

Up to 10,000sof

heterogeneous servers,clients,

thin clientsDesired-statemanagement

Page 7: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ProLiant Essentials Foundation Pack

What is it?•The ProLiant Essentials Foundation Pack kit is delivered standard with ProLiant ML/DL/BL servers ( not DL1xx)

•The Foundation Pack includes the SmartStart CDs, Management CD.

•The Foundation Pack provides server setup and management tools that help customers perform hardware configuration and maintenance, OS installation, and server management.

Page 8: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Core Remote Configuration

• DL100/LO100 – scripted IPMItool

• DL300/ILO2 – scripted XML – examples in Pearl

• pdsh /ssh Smart Start Scripting ToolKit (SSST) utilities−Not all nodes fully support today…..

Page 9: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Suite of tools and utilities designed to

automate installations of ProLiant servers.

Automates repetitive and predictable tasks Customized installations using

leading-industry standard scripting methods.

Unattended configuration of server and option hardware

Flexible deployment solution Ready to integrate into existing

data center deployment environments

Integrates into silent installations of Windows and Linux operating systems

Integrates with third party imaging tools

Supports across the network installations, or via customized bootable CD

Designed for the IT expert experienced with scripting Windows and Linux operating systems and ProLiant installations

SmartStart Scripting Toolkit

Page 10: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

How does the Toolkit work?

Create reference orbase server

using SmartStart

Capture serverconfiguration

and create scriptsinto a bootable ISO/CD/USB

Configure new serverand install OS

Bootable CD orover the Network install

Update configurationsInstall OS from

CD or Network Share

11 2233

server data file

options data file

Script File

Page 11: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Key Product Features

Hardware Discovery

Collects system information for configuration

Configuration and Replication

Utilities

Create and copy server hardware and array configuration files

Lights-out Technology

Uses Virtual CDand Virtual Floppy

Scripting for Windows and Linux

Script server configuration files and link to the unattended

installation tools of the operating system

Users Guide and

Best Practices Document

Page 12: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Primary Configuration Utilities• CONREP – Configuration Replication Utility

Generates a hardware configuration script file used to duplicate the hardware configuration of one ProLiant server onto another

• ACU - Array Configuration UtilityConfigures the SMART‑2, Smart Array, and RAID Array 4000 (RA4000) controllers on a target server. ACU reads the configuration information from a script file and applies the configuration to the controllers in the target server

• HPONLOCFG - Lights Out Configuration UtilityConfigures RILOE II and iLO settings

• Assorted utilities – Utilities that handle the details of keeping track of the system state between reboots, creating and formatting partitions.

Page 13: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Firmware

Page 14: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Available Tools to Deploy Firmware• ProLiant Essentials Firmware Maintenance CD

Provides latest set of firmware for all supported ProLiant servers and many options

Supports offline GUI mode and online Windows and Linux support

• ProLiant Support Pack (PSP) Provides latest set of software drivers, utilities and agents for all

supported ProLiant servers and many options Supports GUI and CLI mode for online Windows and Linux

deployment

• Individual Smart Components Separate components available for Windows and Linux Most can be installed without any other tools May require Firmware Maintenance CD boot environment to

work

− Cp.scexe (Shell, Tarball.gz with flash eng, XML, payload)

− Window (Stub wrapper, installer, XML, payload)

Page 15: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Available Tools to Deploy Firmware • BladeSystem Firmware Deployment Tool

BladeSystem specific bootable ISO image used for offline firmware deployment in silent, unattended, non-GUI mode.

Can be deployed through iLO Virtual Media or local DVD drive

• BladeSystem Bundles BladeSystem specific online firmware and software

components that are tested and supported as a bundled set.

A subset of the PSP and FW CD content. Support list grows as new BladeSystem components are created and released.

Separate Windows and Linux specific versions.

Page 16: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

It’s HPSUM under the coversFWCD

HPSUM

FW SC

~620MB

PSP

HPSUM

BP.XML

SW SC

~120MB

Blade Bundle

HPSUM

FW SC

SW SC

~55MB

Page 17: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ProLiant Software Maintenance HP Smart Update Manager

HP ProLiant Essentials Software Maintenance Pack

HP Smart Update Manager – the deployment engine for updating firmware and software on ProLiant server, options and enclosures

Key Feature:

• Dependency checking – ensure appropriate install order and dependency between components

• Improved deployment performance – deploy to multi-system simultaneously

• Intelligent deployment – only deploy the update necessary components

• Agent-less solution

Functionality:

• Local or remote online deployment

• Local offline deployments

• Remote offline deployment when used with the SmartStart Scripting Toolkit or iLO Virtual Media

• Remote command-line deployment

Firmware CDFirmware CD

Smart Components

System ROM

RILOE II/iLO/iLO2

Smart Array controller

Hard drive

NIC firmware

MSA 20, 60, 70, 500, 500 G2 enclosure

BladeSystem c-class Onboard Administrator

Hardware Firmware Updates

SmartStart CDSmartStart CD

PSP Linux PSP Windows

Hardware Configuration

Assisted Installation

Drivers

Agents

Utilities

Drivers

Agents

Utilities

Smart Components

LDULDU

Future

HP Smart Update ManagerHP Smart Update ManagerDeployment EngineDeployment Engine

Page 18: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP SUM Architecture for Remote Deployment

LocalDataStore(HDD/

USB/CD/DVD/

Net Share)

OA Firmware

Flash Support

Remote Discovery

Client

Remote InstallClient

User Network

HPSUM

RemoteNetwork

LayerAPI

ILO/L100i

HPSUM

SmartComponentSmart

ComponentSmartComponent

LocalDiscovery

Client

LocalInstallClient

Online Server Support

BareMetal

Deployment

VCA/VCRMHP SIM

Page 19: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

BladeSystem Firmware Deployment Tool

(1:1 iLO Manual, 1:X iLO Scripted, 1:X OA to iLO, 1:X Insight Display)

BladeSystem FW Deployment ISO

ILO Virtual Media Support viaInternet Explorer and

ILO Advanced Pack(REQUIRES ILO LICENSE)

Choices:

A. Customer uses BladeSystem Firmware Deployment ISO mounted as iLO virtual media to boot server.

B. Customer can also use OA GUI / Insight Display to instruct all or individual servers to boot from local DVD (Requires OA firmware version 2.04 or higher)

HP Smart Update Manager silently flashes all the server firmware

ILO on blade

BladeSystem FW Deployment ISO

(in local DVD drive)

ILO on blade

ILO on blade

ILO on blade

OA

Page 20: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Blade Bundle

Overview Firmware Update Process (Blade Bundles)

LocalDataStore(HDD/

USB/CD/DVD/

Net Share)

OA NIC

User Network

HPSUM(Windows

orLinux

workstation)

SmartComponentSmart

ComponentSmartComponent

Online Server Support

Windows or Linux running on remote

targetBlades in c-class

enclosure

Blade Server NIC

Blade Server NIC

Blade Server NIC

Page 21: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Systems Insight Manager (SIM)

Page 22: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP Insight Control managementBuilding the best run server infrastructure

Rapid Deployment

Performance Analysis

Remote Control

OS Scan & Patch

EmbeddedIntelligentInfrastructure

HP SIM

Management

Foundation

Core ProLiant EssentialsSoftware Management Options

HP Insight Control

Unified Infrastructure Management

tuned for ProLiant and BladeSystem

Power

Virtualization

Page 23: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP Systems Insight ManagerArchitectural framework

HP SIM – simple distributed architecture comprising three types of systems • Central Management Server (CMS)• Managed systems• Network clients

Authorized users access the CMS via web browser

Network client

Managed storage system

Managed routerCentral Management ServerSIM & DB residence

Management domain

System Grouporganizes managed systems

Managed systems

Managed servers

Page 24: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP Systems Insight ManagerComprehensive fault, config. asset & secure multi-system Mgmt.

Security

SIM DBCompletely managed

server system environment

Reports

Discovery filters

Blade visualization

AlertNotifications

Response • Browser-based remote access

Disksubsyste

mPower

EnvironmentProcessor

IO

Managed server

Memory

Version control

Page 25: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP SIM and Insight Control:Transparently manage virtual and physical environments

Operating System or

Service Console

VM VM VM VM VM

Deploy ESX, Linux or Windows to

HostMonitor

Performanceof Server

Vulnerability Scan and Patch VMs and Host**VMWare GSX and Microsoft Virtual Server

Associate, monitor, control measure,

move, and migrate virtual machines

Link into OpenView Service level management

Remote console access from HP SIM

Migrate servers into VMs

Host Server

Page 26: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Systems Insight ManagerEnabling storage management in HP

• The System Page will be enhanced to show storage and basic capacity information

• Inventory reports will be extended to include SAN-based storage devices

• Support for filtering

Reporting

Storage Capacity

Storage inventory, reports and basic array capacity

Page 27: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP Systems Insight ManagerVersion Control for maintenance of ProLiant systems

• Full integration with HP SIM• Version Control Repository

Manager− Catalogs system software packages

downloaded from HP website

− Allows creation of custom system software baselines

• Version Control Agent− Catalogs software on the end node

− Displays software version status

• VCRM and VCA work together to create software status and update BIOS, drivers, and agents

VCRM

VCA VCA VCA

Page 28: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP SIM 5.1Remote monitoring

HP SIM instantly records the event

HP SIM sends a page with the event

Possible outcomes depending on Warranty or Support Contract terms

Inte

rne

t

Blade fails

• A new case is instantly opened• Response center starts troubleshooting

problem• Case status is automatically updated in

HP SIM

View the event details from anywhere through secure browser connection

Problem resolution before the customer realizes there is a failure

HP engineers come onsite to carry out repairs

Message in HP SIM with links to self repair details

HP engineers resolve problems remotely

Page 29: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Insight Control Environment – Linux (ICE-Linux)

Page 30: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux BasicsAdvanced system monitoring, management, and

software deployment features inside of HP Systems Insight Manager (HP SIM V5.2)

• Software repository for storing deployable software images

• Base operating system images• RH Kickstart & SuSE AutoYast files• Proliant Service Packs (PSPs)• Previously captured system images

• Bare metal system discovery• Bare metal operating system deployment

• Interactive install• Fully automatic

• Server power control• System image capture & deployment• Automatic installation of ICE-Linux monitoring

agents and/or entire ProLiant Support Pack (PSP)• Advanced system monitoring and event handling

Page 31: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Architectural OverviewHP’s Insight Control Environment for Linux (ICE-Linux) provides

comprehensive imaging, monitoring, & management for Linux-based platforms.

ICE-Linux is built upon several HP & open source technologies. Each provides unique capabilities…

• HP’s System Insight Mgr (SIM)−HP common user

interface and discovery

• HP’s ICLE−Flexible, highly

productive, deployment & imaging

• HP’s XC Monitoring− Scalable, open source,

management & monitoring

Page 32: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Next Generation Linux ManagementICE-Linux v2.0

IPMVMM (Linux)

ProLiant Server ProLiant Server HardwareHardware

NGNGInstallerInstaller

ToolTool

others

SIM/Linux v5.2 (CMS & DB)SIM/Linux v5.2 (CMS & DB)

RHEL 5RHEL 5RHEL 4RHEL 4

SLES 10SLES 10SLES 9SLES 9

Linux DistrosLinux Distros(32-bit & 64-bit)(32-bit & 64-bit)

c-Class c-Class BladesBlades

DL300sDL300sDL500sDL500s

Deployment Deployment & &

ProvisioningProvisioning

Monitoring Monitoring & &

PerformancePerformance

Essentials

Management ServerManagement ServerManagement ServerManagement Server

ProLiant Server ProLiant Server HardwareHardware

Monitoring Monitoring agentsagents

RHEL 5RHEL 5RHEL 4RHEL 4

SLES 10SLES 10SLES 9SLES 9

Linux DistrosLinux Distros(32-bit & 64-bit)(32-bit & 64-bit)

c-Class c-Class BladesBlades

DL300sDL300sDL500sDL500s

PSP v.8 PSP v.8 (SNMP)(SNMP)

FutureFutureWBEMWBEM

Managed Server Managed Server Managed Server Managed Server

Agents, etcAgents, etcDiscovery KernelDiscovery Kernel

(RamDisk)(RamDisk)

Page 33: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: System discovery

• Setup tftp boot area

•Setup DHCP server

•Power up system

•System auto-discovers & registers with SIM

Integrate

Page 34: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Imaging task progressDeploy

Productivity – using a One-to-many deployment modeInteractive scripted or unattended image-based OS installRe-use existing config files (i.e. AutoYAST, KickStart)

Flexibility – providing levels of Capability-sensitive controls No OS (Bare metal device) --Agent-lessCustom Linux OS --Agent-less, End-user configs SNMP -Standard SNMP configsCommercial OS with HP-ProLiant agents

Page 35: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

• Bare metal and agent-less− “If it can be PXE booted, it can be

managed…”• ‘Custom Linux’ Distro flavors

− Red Hat variants --ex. Fedora, CentOS, ASIANUX

− SLES family --ex. Open SuSE 10− Debian variants − Others --ex. Slackware, Gentoo

• Non-Commercial distros are ‘enabled’* − HP Support available on a custom basis only − Functionality & Test coverage may be

limited• White Paper guidance for deploying a

custom (non-commercial) Linux OS: − “Installing a customized Linux operating

system using HP Insight Control Linux Edition”

− http://h71028.www7.hp.com/ERC/downloads/4AA1-1252ENW.pdf

ICE-Linux: Custom Linux OS Deployment

Deploy

* Standard HP Support is available for RHEL & SLES distributions only.

* Standard HP Support is available for RHEL & SLES distributions only.

Page 36: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Monitoring EnhancementsHealth & Performance Monitoring

• SIM integration− Menus, Collections, Events

• Automatic installation of ICE-Linux monitoring agents and/or the entire Proliant Service Pack (PSP)

• Advanced system monitoring and event handling• Integrated syslog• Serial console access and logging• Management tools

− Nagios, supermon, RRD, syslog-ng− Nagios UI

• Command line support− Shownode [metrics|all|config|status|enclosures …]− Pdsh with ssh single signon setup− Nrg - nagios report generator, nagios status cli− Console – iLO console interface cli

Monitor

Page 37: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Monitoring EnhancementsIntegrated open source capabilities• Open Source Technologies for Health & Performance Monitoring

− Nagios• A monitoring framework to gather info using numerous available plug-ins• Allows integration with other monitoring tools

− Supermon• High-speed cluster monitoring system, intended to monitor both OS and

hardware health and status data of cluster nodes.• Used by Nagios with CLI to data collection (‘shownode’ metrics)

− RRD• Graphs are generated by rrdtool, generally reporting data rates over time.• Graphs highlight the important info, including indication of available capacity.

− Syslog-ng• All nodes configured hierarchically to efficiently forward syslog events of priority

warning and higher to aggregators − Command line interface (CLI) support

• Ex. Console Management Facility (CMF)• Ex. Parallel distributed shell (pdsh)• Ex. Configuration & metric/sensor data display (shownode)• Ex. Nagios report generator (nrg)

• An Integrated Solution from HP

Monitor

Page 38: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Health status overview(Nagios)

Typical Displays•Health status overview (shown)•System-network status map for all hosts

Typical Displays•Health status overview (shown)•System-network status map for all hosts

Monitor

Hosts

Services

Network Outages

Monitoring Performance

Monitoring Features

Network Health

Page 39: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Per-node statusMonitor

Page 40: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Monitoring (RRD plots)

Typical Displays•Multiple attributes per system (shown)•Attributes across multiple systems•Historical metrics

Typical Displays•Multiple attributes per system (shown)•Attributes across multiple systems•Historical metrics

Monitor

select system

selecta node

detail metric

time period

display order

number of detail columns

expand on the details

Nagios menu

Primary graphs

Detail graphs

Page 41: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ICE-Linux: Alert summaryMonitor

Page 42: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Building value on top of SIM Linux

SIM CMS & DB

RHEL4/5, SUSE9/10, 32 & 64 bit Linux

Mgmt Agents

PSPs

Deployment and Imaging

(Diego)

ManagementAnd

Monitoring(Nikko)

DiscoveryKernel

(RamDisk)

iLO based hardware platforms DL1xx platforms IA64 platforms

User

Defined

Stacks

HP

Defined

Stacks

User

Defined

Stacks

HP

Defined

Stacks

Page 43: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

HP Insight Control Power Management via Insight Power Manager

Page 44: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

5858April 18, 2023

Delivering Energy Efficient Solutions for the IT Power & Cooling Chain

Optimizing from chip to chiller

Services•Thermal Zone Mapping, Data Center Assessments, Data Center Site Preparation

Manageability Tools•Insight Power Manager and iLO 2•Virtualization•Dynamic Capacity Management•Thin Provisioning & Data de-duplication

Servers•Energy Optimized Servers•Small Form Factor Drives•Efficient Power Supplies•Low Power Processors•Low Power Memory

Enclosures•BladeSystem•Thermal Logic•PARSEC enclosure cooling•Active Cool Fans

Data Center & Facilities•Dynamic Smart Cooling•Modular Cooling System•Power Distribution Rack•Three Phase UPS

Up to 60%

10%

10% to 25%

25% to 40%

40% to 50%

Energy Savings

Page 45: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Power: Measure, Regulate, Cap• Measure

−Peak/average

−Single or multiple servers

−User-specified periods of time

• Regulate−Configure HP Power

Regulator

−Calculate estimated savings

−View estimated cooling savings

• Cap−Configure Power Cap

−Single or multiple servers

Peak Power

Average Power

Power State Power Cap

View savings

Page 46: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Power Regulator Overview• ROM based – OS independent• Supports CPU operation at two

P-states – Pmin, Pmax• Three modes selectable via RBSU

and iLO− Full Performance Mode – Pmax

always

− Power Savings Mode – Pmin always

− Dynamic Power Savings Mode (default)

− Operating system controlled

• Spotlight on Dynamic Power Savings mode− New ROM algorithm monitors

application load• Application load = Ratio of application

activity to total CPU activity

− ROM automatically instructs CPU to switch P-states to ensure maximum performance underall loads

P-States for Xeon 3.6 GHz/800 MHz CPU

P-states

DescriptionCPU frequency

Approx.

CPU voltage

Pmax

Full performance

3.6 GHz 1.4 V

PminMinimum power

2.8 GHz 1.2 VCPU

ProLiantROM

ROM Based Setup Utility

iLO

Power ModeSelection

Application Load monitoringP-state setting

Page 47: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Power Capping in Action – Capping to Average

The peak average does not exceed power cap

Peak Power

Average Power

Power Cap

Page 48: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Cluster Mangement Utility

Page 49: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Cluster Issues addressed by CMU

Deployment of images

Site has customized or multiple images tuned for their workload

Site wants to deploy image quickly across many nodes

System management

Need for simple central GUI for monitoring and issuing commands

Need for real time monitoring of node status and activity on cluster and subgroups

Cost

“Free software” tools don’t work across all platforms and applications, and lack support

More expensive, comprehensive options may include features not required/not desired

Page 50: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP Cluster Management Utility (CMU)

• Easy, customizable utility• Features:

− Scalable provisioning− Configurable monitoring − Remote cluster commands

with GUI and Command Line Interfaces

− HP SIM Level 1 integration• Well adapted for HPC customized

clusters• Proven: over 150 customers • Broad HP hardware platforms

support• Multiple Linux distributions

Page 51: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP CMU V3.1 major features• Management (GUI and CLI)

−Day to day administration of the cluster from one central point.

−Halt, re/boot or broadcast commands to a set of nodes.

• Backup/Cloning (GUI and CLI)−Capture & deploy a golden image on all the nodes (or

groups of nodes)• fast and scalable

• Monitoring −View cluster activity in real time

−Monitor many machines from one window

−Receive alerts when something special happens on a compute node or on a set of compute nodes

Page 52: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

CMU Management Interfaces• Command line interface (CLI)

−CMU CLI allows to launch day-to-day administration commands such as:• Boot, reboot, halt,… cloning and backup

−CMU CLI can be launched interactively or can be used from a script

• Graphical User Interface−Java Based

−Supports HP SIM level 1 integration

Page 53: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Hardware registration with CMU

Page 54: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

CMU node information

Page 55: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

CMU remote commands

Page 56: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HP CMU Monitoring Tool• The monitoring component of HP CMU

provides at one glance the state of the cluster with a GUI specially designed for cluster monitoring −summarizes the states of all the nodes and a

summary per group

−displays customizable information and alerts on all the nodes of the cluster

• Default update time is 5 seconds• On a 100-node cluster, monitoring uses

about 0.01% of CPU on each compute node.

Page 57: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

CMU software monitoringAlert

raised

Group Summary

CPUusage

Node state

Selected sensors

Page 58: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

XC

Page 59: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

XC Software : More than Cluster Management

Management, Imaging, Monitoring plus … Turnkey – preconfigured (single DVD), integrated,

tested• Security firewalls, network address translation, Linux virtual server

• Supported by HP as a single solution

Enhanced RAS and scalability

HP-MPI Scalable Visualization Array Job and resource management Integrated HPC Linux compatible with Red Hat EL4

• Option to install XC as layered product, on top of ‘out of box’ Red Hat

Page 60: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

XC Key Software TechnologiesFunction Technology Features and Benefits

Distribution and Kernel

RHEL 4.0 compatible

Red Hat compatible with current shipping product, added enhancements for SFS, Quadrics

Inbound Network / Cluster Alias

LVS Linux Virtual Server High availability virtual server project for managing incoming requests, with load balancing

Batch LSF HPC or LSF Platform LSF HPC Premier scheduler, policy driven, allocation controls. Provides migration for AlphaserverSC customers

Resource Management

SLURM Simple Linux Utility for Resource Management Fault tolerant, highly scalable, uses standard kernel

MPI HP-MPI HP’s Message Passing Interface Provides standard interface for multiple interconnects, MPICH compatible, support for MPI-2 functionality

System Files

Management

SystemImagerConfiguration toolsCluster database

SystemImager Automates Linux installs, software distribution, and production deployment. Supports complete, bootable image; uses flamethrower multicast technology

Console Telnet/ssh based console commands

Power control Adaptable for HP integrated management processors – no need for terminal servers, reduced wiring

IPMI, ILO server interfaces to low level console controls

CMF console mgt facility

Availability of critical services

XC Infrastructure and ServiceGuard or…..

Critical services such as resource management/job scheduling, XC configuration database, /hptc_cluster file system, etc. highly available. Designed to be failover mechanism independent.

Monitoring Nagios,SuperMONSyslog-ng

Nagios Browser based, robust host, service and network monitor from open source. SuperMon supports high speed, high sample rates, low perturbation monitoring for clusters.

Page 61: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

XC Histories with Nagios

Page 62: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

XC Alerts with Nagios

Page 63: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Nagios and SIM

Page 64: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Job Performance Analyzer

Top level node view1.CPU utilization

2.Memory utilization

3. Interconnects

4.Disk I/O (in/out)

IMPACT:Knowledge of resource consumption for job

Optimize code and assignment of resources to increase performance

CLICK ON NODE VIEW TO DRILL

DOWN

CPU Utilization

Lustre traffic

InfiniBand traffic

disk accesses

Memory utilization

GigE traffic

Page 65: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Collectl

Page 66: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Collectl Features

• Multiple output formats • Multi-second and sub-second monitoring

intervals• Interactive + Record + Playback• Can be run continuously as a service• Very lightweight (<0.1% overhead)• Support for many types of

subsystems/devices−Cpu, Disk, Nfs, Inode, Lustre, Mem, Network, Socket, Tcp

−Quadrics/Infiniband, Slab, Process (threads optional)

Page 67: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Collectl Big Picture

Read Data/proc

Interactive mode Playback mode

Analyze Data

Raw dataRecord mode

Display mode

Plot Format

sexpr

External Feeds

socket

Page 68: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Summary Output Formats

#<-------CPU--------><-----------Disks-----------><-----------Network--------->#cpu sys inter ctxsw KBRead Reads KBWrit Writes netKBi pkt-in netKBo pkt-out 10 9 206 94 0 0 0 0 0 1 0 0 26 26 183 80 0 0 1279 27 18 78 12 37 27 27 396 70 0 0 31597 275 0 6 0 5 9 9 341 71 0 0 32629 274 4 43 0 2

### RECORD 3 >>> cag-dl380-01 <<< (1176471932.010) (Fri Apr 13 09:45:32 2007) #### CPU SUMMARY (INTR, CTXSW & PROC /sec)# USER NICE SYS IDLE WAIT INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15 0 0 32 67 0 272 138 0 135 0 3.63 1.63 0.92# DISK SUMMARY (/sec)#Reads R-Merged R-KBytes Writes W-Merged W-KBytes 0 0 0 208 5249 21880# NETWORK SUMMARY (/sec)#InPck InErr OutPck OutErr Mult ICmp OCmp IKB OKB 1 0 0 0 0 0 0 0 0

Verbose

Brief

Page 69: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Detail Format

# SINGLE CPU STATISTICS# CPU USER NICE SYS IDLE WAIT 0 0 0 1 98 0 1 0 0 11 88 0

# DISK STATISTICS (/sec)# <-------reads--------><-------writes------><----------averages----------> Percent#Name Ops Merged KBytes Ops Merged KBytes Request QueLen Wait SvcTim Utilcciss/c0d0 0 0 0 0 0 0 0 0 0 0 0cciss/c0d1 0 0 0 247 7983 33050 133 53 187 3 100cciss/c0d2 0 0 0 0 0 0 0 0 0 0 0

# NETWORK STATISTICS (/sec)#Num Name InPck InErr OutPck OutErr Mult ICmp OCmp IKB OKB 0 lo: 38 0 38 0 0 0 0 2 2 1 eth0: 0 0 0 0 0 0 0 0 0 2 eth1: 2 0 0 0 0 0 0 0 0 3 eth2: 0 0 0 0 0 0 0 0 0 4 eth3: 0 0 0 0 0 0 0 0 0

Page 70: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Monitoring Jobs on a Cluster

• Run collectl on each node job running on and gather data when done

• For real-time monitoring−Write data into shared directory in plot format

−No data to copy when collectl finishes

• Helper utility – runjob−Needs some more work, particularly docs

−Good starting point

Page 71: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

SFS Monitoring

• Already uses collectl/colplot for core performance monitoring components

• Aggregates plot data on admin node in /var/hpls/web/www/plotdata

• See /var/log/collectl/node on individual nodes for raw/plot data

Page 72: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Generating Plottable Files

• During collection−Use –P and include –oz to skip compression

−Add to DaemonCommands in /etc/collectl.conf

• Post collection processing−collectl –p filename –f dirname –P –oz

• On a cluster−Consider writing logs to a shared directory with -

f

−rsync…; collectl –p “dirname/*” switches• Consider getplotfiles.pl utility if doing with cron

Page 73: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ColPlot

• Web-based UI to gnuplot−Should run with any properly configured server

• Also supports command line interface• User definable plots

Page 74: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

ColPlot

Page 75: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

My XC Login Node is SlowHigh CPU

Page 76: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

1024 Node Cluster Head Node

Page 77: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Sources etc…

• http://collectl.sourceforge.net/

Page 78: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Technology for better business outcomes

© 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

HPCPI/Xtools Performance Analysis Toolset

Page 79: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HPCI/Xtools• HPCPI

−Statistical sampling profiler

−From DCPI (Digital Continuous Profiling Infrastructure)

−Compare (vaguely) to: • OProfile: conceptually based on DCPI

• Caliper: has many other modes/features

• Vtune: from Intel, with GUI

• Xtools−Performance visualization tools

−xclus: cluster-wide visualization tool

−xperf: node-specific visualization tool

Page 80: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HPCPI – Standard sampling• Set default database location

% setenv HPCPIDB ~/hpcpidb• Start daemon:

% hpcpid• Run programs:

% time ./mb_pi.O.exe –iters 1003.1415926535897932384626433832795028 3.913u 0.000s 0:03.91 100.0% 0+0k 0+0io 0pf+0w % time ./mb_pi.g.exe –iters 100 3.141592653589793238462643383279502837.752u 0.001s 0:37.76 99.9% 0+0k 0+0io 0pf+0w

• Flush database to disk% hpcpictl flushhpcpictl flush successful

• Analyze% hpcpiprof% hpcpiprof ./mb_pi.g.exe ./mb_pi.O.exe% hpcpilist mb_fill_in_data ./mb_pi.O.exe

Page 81: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

hpcpiprof (by image)% hpcpiprofEvent Name Events Period Samples ---------- ------------ ------ ------- CPU_CYCLES 202980300000 60000 3383005

CPU_CYCLES % cum% image ---------- ----- ------ -------------------------- 135815e06 66.9% 66.9% vmlinux-2.6.9-34.7hp.XCsmp 60180e06 29.6% 96.6% mb_pi.g.exe 6238e06 3.1% 99.6% mb_pi.O.exe 569040e03 0.3% 99.9% ipmi_si.ko 48660e03 0.0% 99.9% libperl.so 38040e03 0.0% 100.0% emacs 28260e03 0.0% 100.0% libc-2.3.4.so 11640e03 0.0% 100.0% ld-2.3.4.so 9420e03 0.0% 100.0% mdmpd ...

Page 82: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

hpcpiprof (by procedure)% hpcpiprof ./mb_pi.O.exe ./mb_pi.g.exe

Event Name Events Period Samples ---------- ----------- ------ ------- CPU_CYCLES 66419940000 60000 1106999

CPU_CYCLES % cum% procedure image ---------- ----- ------ --------------- ----------- 59195e06 89.1% 89.1% mandel_val mb_pi.g.exe 6238e06 9.4% 98.5% mandel_val mb_pi.O.exe 985800e03 1.5% 100.0% mb_fill_in_data mb_pi.g.exe

Page 83: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

hpcpilist (by source/assembly)• Unfortunately, mandel_val got inlined, so you have to know to look in mb_fill_in_data

% hpcpilist mb_fill_in_data ./mb_pi.O.exe

Event Name Events Period ---------- ---------- ------ CPU_CYCLES 6239580000 60000

Could not find source file for routine mb_fill_in_data try the -f option to specify the source file to use

CPU_CYCLES PC B ASM Source ---------- ---------------------- - ---------------------------------------------------- ----------- ... 1714e06 mb_fill_in_data+0x0270 : nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0271 fma.d.s0 f36=farg0,farg0,f0 mb_pi.c:164 0 mb_fill_in_data+0x0272 nop.b 0 mb_pi.c:164 9660e03 mb_fill_in_data+0x0280 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0281 fma.d.s0 f37=farg2,farg2,f0 mb_pi.c:164 0 mb_fill_in_data+0x0282 adds ret0=1,ret0 mb_pi.c:164 169800e03 mb_fill_in_data+0x0290 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0291 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0292 fma.d.s0 f35=farg0,farg2,f0;; mb_pi.c:164 223320e03 mb_fill_in_data+0x02a0 cmp4.lt p4,p5=ret0,r38 mb_pi.c:164 0 mb_fill_in_data+0x02a1 nop.m 0 unknown_src 0 mb_fill_in_data+0x02a2 nop.f 0;; unknown_src 1451e06 mb_fill_in_data+0x02b0 (p4) addl r14=1,r0 mb_pi.c:164 0 mb_fill_in_data+0x02b1 fma.d.s0 f34=f36,f1,f37 mb_pi.c:164 0 mb_fill_in_data+0x02b2 nop.i 0 unknown_src 0 mb_fill_in_data+0x02c0 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x02c1 fms.d.s0 f33=f36,f1,f37 mb_pi.c:164 0 mb_fill_in_data+0x02c2 (p5) adds r14=0,r0 mb_pi.c:164 155460e03 mb_fill_in_data+0x02d0 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x02d1 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x02d2 fma.d.s0 f32=f35,f1,f35;; mb_pi.c:164 1623e06 mb_fill_in_data+0x02e0 cmp4.eq p8,p9=0,r14 mb_pi.c:164 0 mb_fill_in_data+0x02e1 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x02e2 fcmp.le.s0 p2,p3=f34,f3 mb_pi.c:164 0 mb_fill_in_data+0x02f0 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x02f1 fma.d.s0 farg0=f33,f1,farg3 mb_pi.c:164 0 mb_fill_in_data+0x02f2 nop.b 0 mb_pi.c:164 253200e03 mb_fill_in_data+0x0300 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0301 nop.m 0 mb_pi.c:164 0 mb_fill_in_data+0x0302 fma.d.s0 farg2=f32,f1,f2;; mb_pi.c:164 158940e03 mb_fill_in_data+0x0310 nop.b 0 mb_pi.c:164 0 mb_fill_in_data+0x0311 (p3) br.cond.dpnt.few mb_fill_in_data+0x0320 mb_pi.c:164 0 mb_fill_in_data+0x0312 (p9) br.cond.dptk.few mb_fill_in_data+0x0270 mb_pi.c:164 ...

Page 84: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Sample rate and overhead• Default sample rate higher (interval lower),

minimum sample rate high in comparison:

Toolcycles per

sample interrupt

Incremental overhead sampling CPU_CYCLES at various intervals

60K 20K 10K 5K

Sta

nd

ar

d

OProfile 1070 2.13% 6.34% 11.35% 19.56%

HP Caliper 1660 2.80% 8.67% 16.35% 27.82%

HPCPI/C 657 1.09% 3.07% 6.02% 11.19%

asm HPCPI/asm 208 0.35% 0.99% 2.22% 3.70%

Tool Default interval Max interval

OProfile 100Kn/a

HP Caliper 500K

HPCPI 60K 64K

• Low overhead:

Page 85: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Feature: Can sample more than one event• Useful for deriving metrics at image, routine or loop level• So can OProfile and Vtune, but not yet Caliper• IPC

− CPU_CYCLES − NOPS_RETIRED − PREDICATE_SQUASHED_RETIRED − IA64_INST_RETIRED

• HPC− CPU_CYCLES − BACK_END_BUBBLE.ALL − DATA_EAR_EVENTS.CACHE_MISS.GE64 − BUS_MEMORY.ALL.SELF

• Server− CPU_CYCLES − BACK_END_BUBBLE.ALL− DATA_EAR_EVENTS.CACHE_MISS.GE64 − IA64_INST_RETIRED

Page 86: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Example: IPC for ‘mb_pi’• Collect and report:

% hpcpid –events IPC% ./mb_pi.O.exe –iters 100% ./mb_pi.g.exe –iters 100% hpcpictl flush% hpcpiprof ./mb_pi.O.exe ./mb_pi.g.exe

Event Name Events Period Samples -------------------------- ----------- ------ ------- CPU_CYCLES 66372780000 60000 1106213 NOPS_RETIRED 25517160000 60000 425286 PREDICATE_SQUASHED_RETIRED 890844000 6000 148474 IA64_INST_RETIRED 61668360000 60000 1027806

PREDICATE_ NOPS_ SQUASHED_ IA64_INST_ CPU_CYCLES % cum% RETIRED RETIRED RETIRED procedure image ---------- ----- ------ --------- ---------- ---------- --------------- ----------- 59160e06 89.1% 89.1% 17316e06 438846e03 45712e06 mandel_val mb_pi.g.exe 6236e06 9.4% 98.5% 7740e06 451974e03 14642e06 mandel_val mb_pi.O.exe 976140e03 1.5% 100.0% 460620e03 24000 1312e06 mb_fill_in_data mb_pi.g.exe 60000 0.0% 100.0% 0 0 0 __divdi3 mb_pi.O.exe 60000 0.0% 100.0% 0 0 0 main mb_pi.g.exe 0 0.0% 100.0% 0 0 60000 __divdi3 mb_pi.g.exe

• Compute IPC of mandel_val− In .g.: 45712e06 / 59160e06 = 0.773− In .O.: 14642e06 / 6236e06 = 2.348

Page 87: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Multiplex arbitrary events• Typical stall chase:

−CPU_CYCLES−BACK_END_BUBBLES.ALL

• BE_FLUSH_BUBBLE• BE_L1D_FPU_BUBBLE.ALL

− BE_L1D_FPU_BUBBLE.FPU− BE_L1D_FPU_BUBBLE.L1D

• … variety of L1D causes …• BE_EXE_BUBBLE.ALL

− BE_EXE_BUBBLE.GRALL• …variety of cache events…

− BE_EXE_BUBBLE.GRGR− BE_EXE_BUBBLE.FRALL

• BE_RSE_BUBBLE• BACK_END_BUBBLE.FE

• Why not just do them all?− And more!

• Unique to HPCPI

Page 88: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

HelpMe event set on bench12• Setup:

% hpcpid –events HelpMe

• Then one run per table size, followed by post-processing hpcpiprof output:All values are in things/update. item ltabsize: 17 18 19 20 21 22 ... 26 27 28 29 usecs: 0.018 0.023 0.072 0.138 0.173 0.190 ... 0.209 0.216 0.254 0.322 CPU_CYCLES 23.832 29.923 92.389 177.427 221.993 245.242 ... 270.756 280.362 328.287 415.196 IA64_INST_RETIRED 17.498 17.543 17.766 18.074 18.353 18.272 ... 19.394 22.735 39.640 58.850 NOPS_RETIRED 4.009 4.007 4.039 4.115 4.161 4.103 ... 4.301 5.464 11.261 17.844 PREDICATE_SQUASHED_RETIRED 0.500 0.502 0.501 0.508 0.520 0.506 ... 0.538 1.008 3.266 5.827 LOADS_RETIRED 2.006 2.044 2.557 2.801 2.964 2.976 ... 3.106 3.349 4.614 6.052 STORES_RETIRED 1.005 1.001 1.000 1.023 1.093 1.116 ... 1.435 1.412 1.406 1.418 BACK_END_BUBBLE.ALL 16.813 22.878 84.729 169.341 214.881 237.803 ... 262.682 270.515 308.295 383.796 BE_FLUSH_BUBBLE.ALL 0.002 0.004 0.022 0.041 0.057 0.048 ... 0.106 0.996 5.264 10.109 BE_FLUSH_BUBBLE.BRU 0.001 0.002 0.008 0.013 0.022 0.014 ... 0.025 0.073 0.326 0.616 BE_FLUSH_BUBBLE.XPN 0.001 0.002 0.013 0.021 0.034 0.033 ... 0.079 0.926 4.955 9.518 BACK_END_BUBBLE.L1D_FPU_RSE 5.531 6.586 14.576 18.667 21.422 23.595 ... 29.689 33.685 47.813 61.156 BE_L1D_FPU_BUBBLE.ALL 5.523 6.580 14.611 18.751 21.479 23.495 ... 29.602 33.344 46.805 59.367 BE_L1D_FPU_BUBBLE.L1D 5.523 6.580 14.611 18.748 21.475 23.495 ... 29.602 33.344 46.806 59.367 BE_L1D_FPU_BUBBLE.L1D_HPW 0.000 0.972 10.814 16.211 19.590 21.876 ... 27.523 31.026 43.208 54.326 BE_L1D_FPU_BUBBLE.L1D_TLB 0.893 0.907 0.454 0.235 0.127 0.071 ... 0.021 0.072 0.324 0.613 BE_L1D_FPU_BUBBLE.L1D_DCURECIR 3.654 4.665 12.911 17.330 20.228 22.353 ... 28.119 31.986 45.634 58.432 BE_L1D_FPU_BUBBLE.FPU 0.000 0.000 0.000 0.001 0.000 0.000 ... 0.000 0.000 0.000 0.000 BE_EXE_BUBBLE.ALL 11.276 16.281 69.944 150.406 193.157 213.963 ... 232.511 235.165 252.980 308.444 BE_EXE_BUBBLE.FRALL 0.000 0.000 0.006 0.000 0.012 0.012 ... 0.012 0.015 0.017 0.019 BE_EXE_BUBBLE.GRALL 11.294 16.296 69.990 150.697 193.215 214.044 ... 232.575 235.128 252.504 307.724 BE_EXE_BUBBLE.GRGR 0.000 0.000 0.000 0.000 0.000 0.000 ... 0.000 0.001 0.002 0.003 BE_EXE_BUBBLE.ARCR_PR_CANCEL_BANK 0.003 0.005 0.017 0.029 0.037 0.036 ... 0.054 0.148 0.545 0.775 BE_RSE_BUBBLE.ALL 0.001 0.001 0.003 0.006 0.010 0.007 ... 0.011 0.011 0.012 0.015 BACK_END_BUBBLE.FE 0.004 0.007 0.084 0.113 0.129 0.114 ... 0.261 0.552 2.075 3.836 DATA_REFERENCES_SET0 3.003 3.005 3.069 3.138 3.150 3.150 ... 3.510 3.859 5.636 7.655 L1DTLB_TRANSFER 0.892 0.908 0.456 0.238 0.126 0.072 ... 0.020 0.071 0.323 0.610 L2DTLB_MISSES 0.000 0.042 0.528 0.761 0.875 0.935 ... 0.995 0.997 0.997 0.997 DTLB_INSERTS_HPW 0.000 0.042 0.524 0.762 0.874 0.935 ... 0.991 0.941 0.690 0.405 HPW_DATA_REFERENCES 0.000 0.042 0.525 0.764 0.875 0.935 ... 0.994 0.996 0.997 0.997 L1D_READS_SET0 2.002 2.001 2.046 2.057 2.092 2.046 ... 2.104 2.451 4.216 6.231 L1D_READ_MISSES.ALL 1.080 1.083 1.088 1.081 1.090 1.062 ... 1.089 1.395 2.917 4.484 L2_REFERENCES 2.082 2.130 2.646 2.935 3.057 3.119 ... 3.546 3.716 4.741 5.745 L2_MISSES 0.756 0.900 1.010 1.112 1.212 1.352 ... 1.988 1.991 1.984 2.004 L3_REFERENCES 1.506 1.773 1.946 2.070 2.184 2.556 ... 2.991 2.996 2.989 3.009 L3_MISSES 0.000 0.018 0.270 0.635 0.816 0.905 ... 1.029 1.051 1.170 1.451 BUS_MEMORY.ALL.SELF 0.000 0.036 0.527 1.245 1.597 1.800 ... 2.010 2.043 2.164 2.446 DATA_EAR_EVENTS.CACHE_MISS.GE4 0.135 0.135 0.136 0.136 0.136 0.132 ... 0.136 0.167 0.325 0.483 DATA_EAR_EVENTS.CACHE_MISS.GE64 0.000 0.002 0.033 0.078 0.101 0.112 ... 0.124 0.125 0.130 0.156 CPU_CPL_CHANGES 0.000 0.000 0.000 0.000 0.000 0.001 ... 0.005 0.112 0.614 1.184

Page 89: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

The ‘label’ feature• Partitions samples, usually based on process(es)• See the man page for hpcpilabel• DCPI classic label:

% hpcpictl label run1 a.out one 1 uno% hpcpictl label run2 a.out two 2 dos

• Restrict to a script and its children:% hpcpictl label specs –pgid this runSpec

• Snapshot a system-wide interval:% hpcpictl label oneMinute –pid -1 –not sleep 60

• “Attach” to a process% hpcpictl label attached –pid desiredPID sleep 99999

• Monitor the idle process on CPU 0 for 5 minutes:% hpcpictl label pid0cpu0 –pid 0 –cpu 0 –and sleep 300

• Can be initiated and managed by programs− Use popen() of hpcpictl with ‘–pgid this’ or ‘-pid parent’

• Don’t forget to hpcpictl flush• Use ‘-label labelName’ with the analysis tools

Page 90: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

The ‘label’ feature -- example• HPCPI labelling its own ‘flush’ activity:% hpcpictl label daemonFlush -pid $pidOfDaemon hpcpictl flush hpcpictl flush successful

% hpcpictl flush hpcpictl flush successful

% hpcpiprof -label daemonFlush Event Name Events Period Samples ---------- --------- ------ ------- CPU_CYCLES 222130000 5000 44426 CPU_CYCLES % cum% image ---------- ----- ------ ------------------------------ 169635e03 76.4% 76.4% hpcpid.exe 38255e03 17.2% 93.6% vmlinux-2.4.21-15.14hp.XCsmp 11530e03 5.2% 98.8% libc-2.3.2.so 1880e03 0.8% 99.6% libstdc++.so.5.0.3 400000 0.2% 99.8% ecount.2.4.21-15.14hp.XCsmp.ko 200000 0.1% 99.9% libpthread-0.60.so 120000 0.1% 100.0% ipmi_kcs_drv.o 70000 0.0% 100.0% ld-2.3.2.so 30000 0.0% 100.0% scsi_mod.o 5000 0.0% 100.0% ipmi_msghandler.o 5000 0.0% 100.0% libgcc_s-3.2.3-20040414.so.1

Page 91: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

The ‘label’ feature – example (cont)• Inside the kernel:% hpcpiprof -label daemonFlush /boot/vmlinux-2.4.21-15.14hp.XCsmp | head -20 Event Name Events Period Samples ---------- -------- ------ ------- CPU_CYCLES 38255000 5000 7651

CPU_CYCLES % cum% procedure image ---------- ---- ----- --------------------------- ---------------------------- 1865e03 4.9% 4.9% ext3_find_entry vmlinux-2.4.21-15.14hp.XCsmp 1735e03 4.5% 9.4% ext3_check_dir_entry vmlinux-2.4.21-15.14hp.XCsmp 1530e03 4.0% 13.4% clear_page vmlinux-2.4.21-15.14hp.XCsmp 1405e03 3.7% 17.1% link_path_walk_it vmlinux-2.4.21-15.14hp.XCsmp 1225e03 3.2% 20.3% unlock_buffer vmlinux-2.4.21-15.14hp.XCsmp 1210e03 3.2% 23.4% memset vmlinux-2.4.21-15.14hp.XCsmp 1170e03 3.1% 26.5% do_get_write_access vmlinux-2.4.21-15.14hp.XCsmp 1095e03 2.9% 29.4% ext3_add_entry vmlinux-2.4.21-15.14hp.XCsmp 975000 2.5% 31.9% journal_cancel_revoke vmlinux-2.4.21-15.14hp.XCsmp 910000 2.4% 34.3% get_hash_table vmlinux-2.4.21-15.14hp.XCsmp 900000 2.4% 36.6% d_lookup vmlinux-2.4.21-15.14hp.XCsmp 860000 2.2% 38.9% journal_dirty_metadata vmlinux-2.4.21-15.14hp.XCsmp 860000 2.2% 41.1% journal_add_journal_head vmlinux-2.4.21-15.14hp.XCsmp 825000 2.2% 43.3% ext3_do_update_inode vmlinux-2.4.21-15.14hp.XCsmp

Page 92: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

The ‘label’ feature – example (cont)• In libc:% hpcpiprof -label daemonFlush /lib/tls/libc-2.3.2.so | head -20 Event Name Events Period Samples ---------- -------- ------ ------- CPU_CYCLES 11530000 5000 2306

CPU_CYCLES % cum% procedure image ---------- ----- ------ ------------------------------- ------------- 2165e03 18.8% 18.8% __GI_memset libc-2.3.2.so 1470e03 12.7% 31.5% _IO_vfprintf_internal libc-2.3.2.so 815000 7.1% 38.6% _IO_fwrite_internal libc-2.3.2.so 585000 5.1% 43.7% _IO_new_file_xsputn libc-2.3.2.so 380000 3.3% 47.0% _int_malloc libc-2.3.2.so 365000 3.2% 50.1% __GI_getenv libc-2.3.2.so 290000 2.5% 52.6% __GI_strftime libc-2.3.2.so 285000 2.5% 55.1% __find_specmb libc-2.3.2.so 285000 2.5% 57.6% __GC___libc_write libc-2.3.2.so 265000 2.3% 59.9% _wordcopy_fwd_aligned libc-2.3.2.so 260000 2.3% 62.1% __GI_strlen libc-2.3.2.so 255000 2.2% 64.4% _IO_default_xsputn_internal libc-2.3.2.so 240000 2.1% 66.4% __tzfile_compute libc-2.3.2.so 235000 2.0% 68.5% _IO_str_overflow_internal libc-2.3.2.so

Page 93: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Attention to accuracy• Wrote micro-benchmarks with known behavior• Eliminated post-unfreeze-pre-RFI event leaks

• Micro-benchmark has no NOPS nor any predicate-squashed instructions

• Determined event-based multiplexing better than time-based• Micro-benchmark has known (high) IPC

IPC(actual: 3)

NOPSper sample

pred-squashed

per sample

problematic 2.464 67.959 1.00

corrected 2.917 0.1998 0.009

Interval Actual IPC Non-muxed Time-muxed Modeled

error Event-muxed

40K

5.918

5.893(-0.37%) 5.586(-5.61%) -6.81% 5.896(-0.37%)

20K 5.883(-0.60%) 5.295(-10.53%) -12.50% 5.883(-0.59%)

10K 5.858(-1.02%) 4.874(-17.65%) -21.43% 5.588(-1.03%)

5K 5.807(-1.87%) 4.247(-28.24%) -33.33% 5.509(-1.85%)

Page 94: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Xtools• Pair of visualization tools• Separable and cooperative with HPCPI• Xclus

−Cluster-wide monitoring

−Utilizations: CPU, FSB and MID bus I/O

• Xperf−Single-node monitoring

−Graphs of derived events based on hardware counters• CPU utilization, IPC, cycle accounting, cache penalties, I/O

activity, etc

Page 95: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Basic structure of a systemFor icon-design of xclus:

−Processors

−Front-Side Bus (FSB)

−(I/O Memory controller)

−I/O ropes

−MID bus

−Memory

memorymemorymemorymemorymemorymemory

memorymemory

processorprocessorprocessorprocessorprocessorprocessorprocessorprocessor

I/O I/O Memory Memory controllecontrolle

rr

FSBFSB

I/O ropesI/O ropes

MIDMID

Page 96: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Xtools

Page 97: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

xclus screenshot

Page 98: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

xperf screenshot

Page 99: Technology for better business outcomes © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without

Availability• Available under an evaluation license• Contact [email protected]