recent advances in periscope for performance analysis and ......periscope • motivation for...
TRANSCRIPT
Technische Universität München
Recent Advances in Periscope for
Performance Analysis and Tuning
Isaias Compres, Michael Firbach, Michael Gerndt
Robert Mijakovic, Yury Oleynik, Ventsislav Petkov
Technische Universität München
Yury Oleynik, [email protected]
Technische Universität München
Outline
• Periscope overview
• Advances in Periscope Development
I. PAThWay
II. Performance Dynamics Analysis with Periscope
III. Periscope Tuning Framework
30.08.2013 Yury Oleynik, [email protected] 2
Technische Universität München
Projects
• LMAC – Leistungsdynamik massiv-paralleler Codes
Performance Dynamics of Massively Parallel Codes– BMBF project
• AutoTune – Automatic Online Tuning – European Union FP7 project
30.08.2013 Yury Oleynik, [email protected] 3
Technische Universität München
Periscope overview
• Distributed Architecture– Analysis performed by multiple distributed hierarchical agents
• Iterative Online Analysis– Measurements are configured, obtained and evaluated on the fly
• Automatic Analysis– Based on formalized knowledge of performance optimization experts
• Eclipse Integration– Eclipse based integrated development and performance analysis
environment
• Measurement and Instrumentation– Score-P or MRIMonitor
30.08.2013 Yury Oleynik, [email protected] 4
Technische Universität München
Advances in Periscope Development
• Performance Dynamics– Cross-experiment performance dynamics:
Provide a tool for automating and organization of performance
experiments during the optimization process
– Runtime performance dynamics:
Automatically search for runtime performance dynamics properties
• Performance TuningPerform automatic search for application configuration delivering
best performance according to given objective
30.08.2013 Yury Oleynik, [email protected] 5
Technische Universität München
PATHWAY
I. Cross-experiment performance dynamics
30.08.2013 Yury Oleynik, [email protected] 6
Technische Universität München
Problem statement – Performance Engineering
• Performance engineering is an iterative cycle
– Requires in-depth knowledge of hw and sw
– Each step may involve many tools & different configurations
– Repetitive and manual
• Optimization spans over months – Hard to organize data & results
– No clear track of process evolution
• Examples– Scalability analysis
– Cross-platform analysis
30.08.2013 Yury Oleynik, [email protected] 7
Baseline Establish/Update
ExecuteParallel
application
Monitor Performance
Analyze Bottlenecks
Optimize problematic
code sections
Verify
Technische Universität München
PAThWay
• Eclipse plug-in for structured and methodical
performance engineering using workflows
• Goals:– Manage individual tasks as part of one workflow
– Automate performance engineering tasks, where possible
– Keep track and organize the process
– Abstract complexity of the underlying software and hardware
30.08.2013 Yury Oleynik, [email protected] 8
Technische Universität München
30.08.2013 Yury Oleynik, [email protected] 9
Technische Universität München
Workflow Editor
30.08.2013 Yury Oleynik, [email protected] 10
Workflow editor
Available
workflow
components
Technische Universität München
Experiment Browser
30.08.2013 Yury Oleynik, [email protected] 11
Experiments
view
Experiments
Meta-data
Database stores
also properties of
the tools
Standard output
and environment
configuration
Technische Universität München
Project Documentation
• Accessible documentation is important– Requirements
– Work progress
– Optimization ideas
• Commonly spread around multiple documents
• Wiki-based editor– Completed experiments
– Links to other external resources
– Other wiki pages
30.08.2013 Yury Oleynik, [email protected] 12
Technische Universität München
Supportive Modules
• Parallel Tools Platform Module– Starting interactive/batch jobs
– Monitoring execution & accessing data
• Code Managements– Keeps snapshots of the sources
– Based on Git
• Environment Detection– Detects loaded modules
– Copies defined environment
variables
– ...
30.08.2013 Yury Oleynik, [email protected] 13
Technische Universität München
PAThWay
• Available as an Eclipse plugin from the update site:http://periscope.in.tum.de/pathway/eclipse/
• Installation guide:http://periscope.in.tum.de/pathway/
30.08.2013 Yury Oleynik, [email protected] 14
Technische Universität München
AUTOMATIC PERFORMANCE
DYNAMICS ANALYSIS WITH
PERISCOPE
II. Performance Dynamics: at runtime
30.08.2013 Yury Oleynik, [email protected] 15
Technische Universität München
Automatic Performance Dynamics Analysis with
Periscope
• Motivation for Performance Dynamics Analysis– Location and severity of performance bottlenecks is time-dependent
– Performance changes manifest themselves at various time scales
– Dimensionality of performance measurements makes manual
investigation by the user tedious
• Analysis goals:– Automatically detect changes in temporal performance behavior
– Quantify the negative impact of performance changes
– Reduce complexity and size of time-dependent measurements
– Simplify comprehension (no graphical visualization)
– Group entities with similar temporal performance behavior
30.08.2013 Yury Oleynik, [email protected] 16
Technische Universität München
Automatic Performance Dynamics Analysis with
Periscope
• Helps to answer following typical questions:– Does the performance degrade over time?
– When is the degradation observed?
– What is the impact of the particular change?
– Which process/location is impacted by the performance degradation?
– Are there similar degradations found in other processes or functions?
• Approach– Multi-scale analysis
– Qualitative abstraction of time series
• with quantitative information sufficient to characterize impact
– Representation mimics human “mental model” of temporal behavior
– Automatic search for performance dynamics properties
30.08.2013 Yury Oleynik, [email protected] 17
Technische Universität München
Automatic Performance Dynamics Analysis with
Periscope: Analysis Steps
1. Measurementa) Collect dynamic profile time-series using Score-P
2. Preprocessinga) Perform Scale-Space Filtering by filtering with Gaussian
b) Extract extremas and inflexion points
3. Qualitative Abstractiona) Track extremas and inflexion points from coarse to fine scales
b) Label intervals between extremas and inflexion points
c) Extract maximum “lifetime” level of the resulting tree of intervals
4. Search for performance dynamics propertiesa) Search maximum “lifetime” level for predefined patterns both
qualitatively and quantitatively
30.08.2013 Yury Oleynik, [email protected] 18
Technische Universität München
Automatic Performance Dynamics Analysis with
Periscope: Analysis Steps
30.08.2013 Yury Oleynik, [email protected] 19
D A
B
C
C B CD B C
AB
C
C B
B CD
C B CD
CD
A D A
C B C
C B C
DABCBCDABCDABCDABCDABC
A - concave increase
B - concave decrease
C - convex decrease
D - convex increase
E - linear increase
F - linear decrease
G - constant
Technische Universität München
Automatic Performance Dynamics Analysis with
Periscope: Search for dynamics properties
Search for dynamic properties:• Find all picks (AB): DABCBCDABCDABCDABCDABC
• Find the most “prominent” valley (CD): DABCBCDABCDABCDABCDABC
• Find the highest increase (DA): DABCBCDABCDABCDABCDABC
30.08.2013 Yury Oleynik, [email protected] 20
Technische Universität München
PERISCOPE TUNING
FRAMEWORK
III. Performance tuning
30.08.2013 Yury Oleynik, [email protected] 21
Technische Universität München
Periscope Tuning Framework
• Goals:– Tune codes to improve performance and energy efficiency
– Combine analysis and tuning to speedup the tuning process
– Support multicore and GPU accelerated parallel systems
• Idea:– Automatically evaluate optimization space
– Produce tuning recommendation
– Use it to improve production runs
30.08.2013 Yury Oleynik, [email protected] 22
Technische Universität München
PTF: Approach
• Define tuning strategies combining performance analysis
infrastructure and tuning plugins
• Measured performance and energy properties are used
in plugins to navigate the search for optimal configuration
• Available tuning plugins focus on:– Tuning of High-Level Patterns for GPGPU
– Tuning of HMPP Codelets
– Tuning of Energy Consumption via CPU frequency
– Tuning of Master-Worker Pattern in MPI
– Tuning of MPI Runtime
– Tuning of Compiler Flag Selection
30.08.2013 Yury Oleynik, [email protected] 23
Technische Universität München
30.08.2013 Yury Oleynik, [email protected] 24
Technische Universität München
Tuning of High-Level Patterns for GPGPU
• Target applications– Applications implemented in the pipeline patterns framework
(developed in PEPPHER project)
• Tuning objective– Optimize throughput of the pipeline
• Tuning points and tuning actions– Replication factors of individual stages
– Buffer sizes of input and output ports of individual stages
– Splitting and merging of the stages
30.08.2013 Yury Oleynik, [email protected] 25
Technische Universität München
Tuning of HMPP Codelets
• Target applications– OpenHMPP annotated applications
– To be run on heterogeneous many-core architecture
• Tuning Objective– Optimize HMPP codelets performance
• Tuning points and tuning actions– Static codelet tuning points:
• operations, transformations and algorithms used to implement a codelet,
e.g. unrolling factor, the HMPP grid size
– Dynamic codelet tuning points:
• variables or callbacks available at runtime
30.08.2013 Yury Oleynik, [email protected] 26
Technische Universität München
Tuning of Energy Consumption via CPU Frequency
• Target applications– Any application running on the thin-node islands of SuperMUC
• Tuning objective– Minimize energy consumption of an application
• Tuning points and tuning actions– Available governors or direct frequency settings
30.08.2013 Yury Oleynik, [email protected] 27
Technische Universität München
Tuning of the Master-Worker Pattern in MPI
• Target applications– Applications implemented with Master Worker Pattern
• Tuning objective– Improve load balancing
• Tuning points and tuning actions– Partition factor
– Number of workers
30.08.2013 Yury Oleynik, [email protected] 28
Technische Universität München
Tuning of MPI Runtime
• Target application
– Currently parallel applications build with ibm MPI
• Tuning objective
– Optimize performance
• Tuning points and tuning actions
– MPI environment parameters
• MPI application mapping
– adapting tasks per node/core, adapting the affinity of the processes
• MPI communication buffer/protocol
– adapting the sending/receiving buffer
– analyzing the size pattern of the messages
– adapting the communication protocol (eager/rendezvous)
– code variants for MPI communication
30.08.2013 Yury Oleynik, [email protected] 29
Technische Universität München
Tuning of Compiler Flag Selection
• Target applications– Any application
• Tuning objective– Reduce the execution time of the application’s phase region
• Tuning points and tuning actions– Individual compiler flags of the compiler
– Switching ON or OFF of compiler switches during recompilation
30.08.2013 Yury Oleynik, [email protected] 30
Technische Universität München
Thank you!
• Questions?
30.08.2013 Yury Oleynik, [email protected] 31