multidimensional performance modeling for advanced embedded signal processors michael stebnisky carl...
TRANSCRIPT
Multidimensional Performance Modeling Multidimensional Performance Modeling for Advanced Embedded Signal for Advanced Embedded Signal
ProcessorsProcessors
Michael StebniskyMichael StebniskyCarl HeinCarl Hein
Lockheed Martin Advanced Technology LaboratoriesLockheed Martin Advanced Technology Laboratories1 Federal Street • A&E Building 2W1 Federal Street • A&E Building 2W
Camden, New Jersey 08102Camden, New Jersey [email protected]@atl.lmco.com
Systems IntegrationSystems Integration
22
Multidimensional Performance ModelingMultidimensional Performance Modeling
• Problem: Problem: — Traditional performance modeling approaches \are unable to address Traditional performance modeling approaches \are unable to address
emerging requirements and component technologies. This is a result emerging requirements and component technologies. This is a result of an increased awareness and need for dynamically adaptive or of an increased awareness and need for dynamically adaptive or reconfigurable systems, particularly in the area of power reconfigurable systems, particularly in the area of power dissipation/performance.dissipation/performance.
• Goal(s)/Objectives(s)Goal(s)/Objectives(s)— Define methods/algorithms to acurately model and optimize Define methods/algorithms to acurately model and optimize
reconfigurable architectures and functions (services) required to reconfigurable architectures and functions (services) required to support multidimensional performance modeling.support multidimensional performance modeling.
— Apply ideas developed from InfoPad, ACS, PAC/C, DARES, PCA, and Apply ideas developed from InfoPad, ACS, PAC/C, DARES, PCA, and MSP to develop a uniqe new rapid prototyping/optimization capability.MSP to develop a uniqe new rapid prototyping/optimization capability.
• ApproachApproach— Define features required to support accurate performance and Define features required to support accurate performance and
multidimensional modeling and optimization of DRAs. multidimensional modeling and optimization of DRAs. — Evaluate algorithms/methods for performing intelligent, reactive Evaluate algorithms/methods for performing intelligent, reactive
dynamic scheduling.dynamic scheduling.— Evaluate algorithms/methods for performing offline analysis, data Evaluate algorithms/methods for performing offline analysis, data
reduction, pattern recognition, and execution planning. reduction, pattern recognition, and execution planning.
DoD missions/systems require new approaches/ tools DoD missions/systems require new approaches/ tools to exploit emerging to exploit emerging reconfigurable technologies to form polymorphous/power reconfigurable technologies to form polymorphous/power aware systems.aware systems.
33
A Methodology for Verifiable, Validatable Polymorphic Architectures
DRA Optimization Process Flow
RuntimeSystemModel
Parameterized Resource
Executables
Offline Optimization:
Speed
Latency Power Energy
QoS ...
DRAObjectLibrary Runtime
Optim-ization: Speed
Latency Power Energy
QoS ...
ApplicationFlowgraphs
PreprocessedSchedulingInformation
Schedule/Component
Analysis/Feedback to Optimization
Characteristics
Est
imat
es
Lo
cal
Des
ign
s
CO
TS
Testcase/ Scenario Testbench
Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures
• Based upon a two phase Based upon a two phase optimization process - offline optimization process - offline and onlineand online
• Offline optimizationOffline optimization— Overall goal: perform Overall goal: perform
component selection, pruning, component selection, pruning, data extraction and sensitivity data extraction and sensitivity analysis that will maximize the analysis that will maximize the effectiveness of the online effectiveness of the online optimizationoptimization
— Selects an optimum set of Selects an optimum set of verified, validated components verified, validated components from existing librariesfrom existing libraries
— Facilitates analysis to identify Facilitates analysis to identify potential implementations more potential implementations more optimal to the applicationoptimal to the application
— Identifies dependencies/relationships Identifies dependencies/relationships in the data flow graph that will facilitate online schedulingin the data flow graph that will facilitate online scheduling
• Online optimizationOnline optimization— Components are selected for execution at runtime based upon dynamic figures of meritComponents are selected for execution at runtime based upon dynamic figures of merit— Figures of merit are complex functions of component characteristicsFigures of merit are complex functions of component characteristics
44
Key Aspects of ApproachKey Aspects of Approach• Dealing with overwhelming flexibilityDealing with overwhelming flexibility
— Management of complexity and Management of complexity and methods of abstraction/simplificationmethods of abstraction/simplification
• Two stage optimizationTwo stage optimization— Offline (component and subsystem) Offline (component and subsystem)
and online (scheduling)and online (scheduling)
• Methodology for generating/optimizing/ Methodology for generating/optimizing/ representing components and representing components and reconfigurable devices reconfigurable devices — Develop operating points, analyze, improve, and Develop operating points, analyze, improve, and
abstractabstract
• Dynamic scheduling with multidimensional Dynamic scheduling with multidimensional goals/constraintsgoals/constraints— ““Toolkit” approach using offline extracted informationToolkit” approach using offline extracted information
• Offline information extraction and optimization of schedulingOffline information extraction and optimization of scheduling— Analysis/characterization/abstraction of tasks/graphsAnalysis/characterization/abstraction of tasks/graphs
• Constraint/goal simplification using modesConstraint/goal simplification using modes— Abstraction method minimizing computation and enforcing interfaceAbstraction method minimizing computation and enforcing interfacess
• Reuse of existing capability to perform the required analysis and data visualizationReuse of existing capability to perform the required analysis and data visualization— Use the internally developed CSIM Use the internally developed CSIM
55
Graphical Hierarchical Hardware Architecture Definition
Graphical Hierarchical Software Architecture Definition
Produces Animated and Static Simulation Timeline Displays as well as processor, communication, and
memory statistical data
Modify
Modify
Model of Hardware
BehaviorsPE SE ME
BehaviorsPE SE ME
Behaviors
PE SE ME
Structure:Network Topology
Structure:Network Topology
SW Data Flow GraphDFG
FFT
FIR FFTVMULVMULVMUL
DFG
FFT
FIR FFTVMULVMULVMUL
Application SW for PEsBoard 1/ PE1 Recv 1024
Compute 3.2 us Send 512,2 Send 512,3 Recv 1024
Schedule
Map / Allocate
Partition
Analysis
HW / SWCo-SimulationVisualization &
Animation
Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/
CSIM — CSIM — HW/SW Virtual Prototyping HW/SW Virtual Prototyping Capabilities and FeaturesCapabilities and Features
• CSIM has demonstrated the ability to CSIM has demonstrated the ability to model complex signal processor model complex signal processor network behavior and performancenetwork behavior and performance
• CSIM is a C-based, hierarchical CSIM is a C-based, hierarchical simulation environment for modeling simulation environment for modeling system, subsystem, and individual system, subsystem, and individual module/component HW/SW module/component HW/SW performanceperformance
• CSIM’s independent use of HW/SW CSIM’s independent use of HW/SW models provides a path for reducing models provides a path for reducing development costs as well as the life development costs as well as the life cycle costscycle costs
• CSIM provides a common environment CSIM provides a common environment for developing and verifying system, for developing and verifying system, subsystem, and module HW/SW subsystem, and module HW/SW interfaces and overall system interfaces and overall system performanceperformance
66
Design space resulting from evaluating a Design space resulting from evaluating a representative signal processing applicationrepresentative signal processing application
Automated Support to Offline Analysis/ Automated Support to Offline Analysis/ OptimizationOptimization
• Each task/component (i.e. Each task/component (i.e. fft, fir) is abstracted by fft, fir) is abstracted by several parameters (i.e. several parameters (i.e. throughput, latency, power,throughput, latency, power,size, etc.)size, etc.)
• Each task may have severalEach task may have severaldifferently optimized differently optimized implementationsimplementations
• An online library of An online library of implementations is selectedimplementations is selected
• The application(s) of interest The application(s) of interest are simulated using a subset are simulated using a subset of the available componentsof the available components
• The results of simulating The results of simulating each subset is characterized each subset is characterized with respect to the parameters of interestwith respect to the parameters of interest
• The characterizations are used to identify a subset of implementations that are most The characterizations are used to identify a subset of implementations that are most optimal across the parameters of interestoptimal across the parameters of interest
77
Processor Architecture and EnvironmentProcessor Architecture and Environment
This graphic illustrates a This graphic illustrates a representative processor representative processor architecture and an associated architecture and an associated CSIM visualization environment. CSIM visualization environment. The core CSIM capability has been The core CSIM capability has been augmented with a dynamic augmented with a dynamic scheduler that can apply complex scheduler that can apply complex selection criteria to the scheduling selection criteria to the scheduling process. In this example, these process. In this example, these complex selection criteria are complex selection criteria are encapsulated as several “modes”; encapsulated as several “modes”; the buttons on the lower right are the buttons on the lower right are used to interactively switch used to interactively switch between the modes. In addition, between the modes. In addition, any of the processors can be any of the processors can be “killed” or returned to the “killed” or returned to the simulation, facilitating the simulation, facilitating the evaluation of failover and similar evaluation of failover and similar analyses. Illustrative results are analyses. Illustrative results are shown in the following slides.shown in the following slides.
88
Power-driven scheduling; “LoPwr”
Complex criteria driven scheduling, “DDPSWC”
Assigned task scheduling, “Base”
Assigned speed scheduling, “Fast”
Max speed-scheduling, “Fastest”
Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures• Real time on-demand Real time on-demand
dynamic allocation of dynamic allocation of computational resources computational resources
• Each task/component Each task/component (i.e. fft, fir) is abstracted (i.e. fft, fir) is abstracted by several parameters by several parameters (i.e. throughput, latency, (i.e. throughput, latency, power, size, etc.)power, size, etc.)
• Each task may have Each task may have several differently several differently optimized implementationsoptimized implementations
• An online library of An online library of implementations is selectedimplementations is selected
• At run time the At run time the implementation best implementation best matched to the current matched to the current selection criteria is executedselection criteria is executed
• For this example, the maximum For this example, the maximum throughput to minimum power throughput to minimum power dissipation ratio is improved >10X versus standard practicedissipation ratio is improved >10X versus standard practice
SWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practiceSWEPT improvement >10X vs standard practice
99
Processors faulted/
switched out
Processors returned to operation
Dynamically Reconfigurable ArchitecturesDynamically Reconfigurable Architectures
• Validation and verification of Validation and verification of next generation processor next generation processor architecturesarchitectures
• Any processor may be Any processor may be temporarily or permanently temporarily or permanently removed from the architectureremoved from the architecture
• Processors may be returned Processors may be returned to the architectureto the architecture
• Facilitates development and Facilitates development and analysis of failover/fault analysis of failover/fault adaptation methodologiesadaptation methodologies
• Extends multidimensional Extends multidimensional optimization to architectures optimization to architectures with intermittently inactive with intermittently inactive components (especially components (especially useful for throughput/ power useful for throughput/ power trades)trades)
1010
Other Applications of CSIMOther Applications of CSIM
ProgramProgram
VNSVNS
AMRFSAMRFS
Advanced Advanced Surface Surface Ship ModelShip Model
Avionics Avionics Mission Mission Computing Computing SystemSystem
Wireless Wireless Mobile Mobile NetworksNetworks
DescriptionDescription
Network attack Network attack simulator for training simulator for training IA operatorsIA operators
Agent-based resource Agent-based resource managementmanagement
Computing Computing infrastructure modelinfrastructure model
Model of JSF-Model of JSF-Integrated Core Integrated Core ProcessorProcessor
Dynamic ad hoc Dynamic ad hoc routing behaviorsrouting behaviors
CustomerCustomer
US Army US Army CECOMCECOM
ONRONR
LM NE&SS- LM NE&SS- MoorestownMoorestown
LM NE&SS- LM NE&SS- EganEgan
DARPA DARPA (XGComms) (XGComms) US Army US Army CECOM (WIN-T)CECOM (WIN-T)
Key TechnologyKey Technology
Token-based performance Token-based performance modeling, HLAmodeling, HLA
Intelligent agents, Intelligent agents, dynamic schedulerdynamic scheduler
Token-based performance Token-based performance modeling, pre-emptive modeling, pre-emptive multitasking CPU modelsmultitasking CPU models
Token-based performance Token-based performance modeling, hardware/ modeling, hardware/ software co-simulationsoftware co-simulation
Abstract propagation and Abstract propagation and arbitration models, arbitration models, dynamic routing dynamic routing algorithmsalgorithms
1111
Specification
Requirements
CSIM & MaC
Run-time Environment and Design Run-time Environment and Design Application for Polymorphous Technology Application for Polymorphous Technology Verification & Validation (RE-ADAPT V&V)Verification & Validation (RE-ADAPT V&V)
• ImpactImpact
— Design-Time Modeling and Design-Time Modeling and Simulation Environment for Simulation Environment for Verification and Validation (V&V) of PCA enabled applicationsVerification and Validation (V&V) of PCA enabled applications
— Run-Time Monitoring for PCA Enabled Application V&VRun-Time Monitoring for PCA Enabled Application V&V
— Approach Validation via application to PCA enabled avionics designApproach Validation via application to PCA enabled avionics design
• New IdeasNew Ideas
— Methodology for run-time Methodology for run-time detection of requirement detection of requirement violationsviolations
— Framework for automatic Framework for automatic generation of monitoring and generation of monitoring and checking componentschecking components
— Dynamic run-time corrector to Dynamic run-time corrector to force reconfiguration into a force reconfiguration into a safe statesafe state
Real-time Systems Group, University of Pennsylvania
1212
Stream Processors indicated by filled boxes
GP Processors indicated by outlined boxes
Dynamic bar chart indicating total active processors, active stream processor active GP processors and active threaded processors
Total active processor count display
MaCS messages and status
Mission status RADAR Tasks
Imaging
Route Planning
Self Test and MaCS
Flight Control
Threat Avoidance
Mission Assignment
Communi-cations
PCA Virtual Processor State and Activity
System State and Task Flow
Real-time Systems Group, University of Pennsylvania
DARPATech DemoDARPATech Demo