simulation and evaluation framework for manycore architectures

42
Simulation and Evaluation Framework for Manycore Architectures Andreas Savva, UCY Final Project Report ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ

Upload: ormand

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ. ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ. Simulation and Evaluation Framework for Manycore Architectures. Andreas Savva, UCY Final Project Report. OUTLINE. Introduction in Many-core architectures. Main technical objectives of the project. Project Breakdown. Work Packages. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simulation and Evaluation Framework for Manycore Architectures

Simulation and Evaluation

Framework for Manycore

ArchitecturesAndreas Savva, UCYFinal Project Report

ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ

Page 2: Simulation and Evaluation Framework for Manycore Architectures

OUTLINE• Introduction in Many-core architectures.• Main technical objectives of the project.• Project Breakdown.• Work Packages.• Using the developed framework – Case Studies.• Simulation and Results.• Project Outcomes / Deliverables.

Page 3: Simulation and Evaluation Framework for Manycore Architectures

Manycore Architectures• Emerging dominant trend in general purpose CPUS• Expected to be interconnected using on-chip networks• Tens to hundreds of cores• Simple cores, large parallelism• Several design parameters

• I/O system• Processor Architecture• Interconnection Network Architecture

• This project aims to:• Develop a simulation and evaluation framework so that

researchers do parameter exploration related to the aforementioned parameters

Page 4: Simulation and Evaluation Framework for Manycore Architectures
Page 5: Simulation and Evaluation Framework for Manycore Architectures
Page 6: Simulation and Evaluation Framework for Manycore Architectures

Main Technical Objectives – Achieved 1. Developed a simulation and evaluation framework

for many-core architectures using JAVA programming language.

2. Developed benchmarks in order to evaluate many-core architectures.

3. Developed on-chip network simulator which supports different architectures / routing algorithms and different traffic patterns.

4. Developed cross-compiler in C/C++ programming language which translates programs into instructions which can be executed from the architectures which are under evaluation.

5. Developed new architectures in order to evaluate the framework.

Page 7: Simulation and Evaluation Framework for Manycore Architectures

Project Breakdown• Work Packages:

• Progress and Result Dissemination (WP1, WP2).• Develop simulator in order to interconnect cores (WP3).

• Develop models for the execution units and the cores (WP4).

• Develop Cross-Compiler (WP5).• Create benchmarks to measure performance (WP6).

• Develop new architectures to evaluate the framework (WP7).

Page 8: Simulation and Evaluation Framework for Manycore Architectures

WP1 + WP2: PROGRESS + RESULTS DISSEMINATION

Implementation Strategy

WP7EVALUATE

FRAMEWORK

WP3

DEVELOP MANY–CORE SIMULATOR

WP4

DEVELOP EXECUTION

UNITS

WP5

CROSS - COMPILER

WP6BENCHMAR

KS

…OVERLAP…

Page 9: Simulation and Evaluation Framework for Manycore Architectures

Project Management (WP1)• Kick-Off Meeting December 2008

• Targeted Application Models Developed• Application Design Trade-Offs• Roles

• Six-Month Progress Reports• 18- Month (Interim) Progress Report• Financial Issues• Final Progress Report

• Final Financial issues

Page 10: Simulation and Evaluation Framework for Manycore Architectures

Dissemination of Results (WP2)• Project Website

• http://www.ece.ucy.ac.cy/labs/easoc/Research/SEFMA/home.html

• Publications• Publications in selected Journals and Conferences.

Page 11: Simulation and Evaluation Framework for Manycore Architectures

WP3: Simulator for Interconnecting Cores

• Determine specifications for many-core network simulator.

• Evaluate existent simulation frameworks • POPNET simulator – C++ program language.• GPNOC simulator – JAVA program language.

• Adapt simulation framework in order to simulate our many-core systems.

• Develop traffic models based on many-core applications for future evaluation• Random Traffic Pattern.• Tornado Traffic Pattern.• Transpose Traffic Pattern.• Neighbor Traffic Pattern.C O M P L E T E D !

Page 12: Simulation and Evaluation Framework for Manycore Architectures

WP4: Core and Execution Unit Models

• Develop communication protocol between units and network

• Design and develop unit models• Cores.• Memory.• Input/output data models.

• Framework to develop models based on the specifications.C O M P L E T E D !

Page 13: Simulation and Evaluation Framework for Manycore Architectures

WP5: Cross - Compiler• Create instruction set architecture.• Study existing compilers for RISC processors.• Adapt existing compiler to translate programs into

machine instructions.• Adapt compiler into the framework.

C O M P L E T E D !

Page 14: Simulation and Evaluation Framework for Manycore Architectures

WP6: Benchmarks• Define and evaluate all possible functions of the

system based on :• Performance• Power consumption• Reliability

• Develop algorithms to measure performance, power consumption, reliability.

• Develop benchmarks for many-core processors in Assembly language.

C O M P L E T E D !

Page 15: Simulation and Evaluation Framework for Manycore Architectures

WP7: Framework Evaluation • WP Goals:

• Develop and evaluate novel many-core architectures.

• Develop and evaluate algorithms for work distribution in many-core processors.

• Cross-evaluation of the developed framework based on the new many-core architectures.

C O M P L E T E D !

Page 16: Simulation and Evaluation Framework for Manycore Architectures

USING/EVALUATING THE FRAMEWORKCase Studies

Page 17: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption

• Power Consumption: Major limitation in NoCs.• Links and NoC routers: the most power-hungry

components.• Intel’s Teraflop NoC prototype suggests that link

power consumption could be as high as 17% and the rest power consumption is dedicated at routers.

• Reduce both static and dynamic power consumption.• Proposed works focus on simple static threshold

mechanisms.Need of new intelligent dynamic

power management policy for NoCs.

Page 18: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption Threshold based algorithm for turning

links off/on:• Run Simulation and check link utilization.• Choose threshold.• Run simulation.• If new link utilization smaller than threshold

turn link off for a period of time.• After x cycles turn link back on.

NEXT: A new Intelligent Dynamic on/off Link Management for NoCs

based on ANNs.

Page 19: Simulation and Evaluation Framework for Manycore Architectures

Artificial Neural Networks• Information processing

paradigm inspired by the way biological neurons process information.

• Composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems.

• Used as prediction and forecasting mechanisms in several application areas

• Able to determine hidden and strongly non-linear dependencies.

Reducing power consumption

Input layer Hidden layer

{ { {

Output neuron

Page 20: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption Intelligent ANN algorithm:• Pre-training.

• Choose links with minimum link utilization

• Size of network more manageable

• Prediction scheme based on ANN• Divide network into smaller nets• Pass chosen links as inputs in

ANNs• Output links to turn off

Power Saves for 8x8 mesh and torus networksANN can be used for prediction since

they can discover hidden dependencies.

Page 21: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption

PE

PE

PE

PE

PE PE PE

PE

PE

PE

PE PE

PE

PE

PE

PE

ANN

ANN 1 ANN 2

ANN 3 ANN 4

ANN predictor with NoCs and an 8×8 network partition into four 4×4 networks with their ANNs.

Page 22: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption • Experiments with several NoC

regions.• Compare hardware overheads

and responding power savings.• 4×4 NoC region offers

satisfactory power savings and less ANN overheads when compared to a 5×5 NoC region.

• 3×3 NoC region does not provide enough information to the ANN in order to make accurate predictions.

• We designed the based ANN system to monitor 4x4 NoC regions. Power Saves and hardware

overheads for 3x3, 4x4,5x5 NoC regions

Page 23: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption Prediction scheme based

on ANN• ANN mechanism

receives all the average link utilizations from all the links of the 4×4 NoC partition.

• ANN uses the utilization values to find optimal threshold

• Determine if a link is going to be turned off or on for the next n-cycle interval.

ANN mechanism

Intelligently computed threshold

Yes/timeout

No

Receive link utilization for a 4x4 NoC

partition

Neural Network

Chose links based on threshold

Output Control packets to turn on/off links

Monitor link utilization

Receive from ALL links

completed?

Next time interval

Page 24: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption ANN hardware optimization

• A 4x4 ANN monitors 16 routers => at least 8 input neurons.

• Eight neurons at the input layer of the ANN => hidden layer should have five neurons.

• Based rule of thumb that a satisfactory number of the hidden layer neurons equals to half the number of input neurons plus one neuron.

Try to minimize the size of the hidden layer…

Page 25: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption • Choose appropriate

size of the hidden layer of the ANN.

• Three different ANNs were developed with five, four and three neurons at the hidden layer.

• Using four neurons (instead of five), in the hidden layer exhibits the best power savings for all the traffic patterns.

Power Savings for different neuron sizes in the hidden layer

Page 26: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption • How the bit representation

of the training weights affects the threshold computation?

• 24, 16, 8, 6 and 4 bit representations were used.

• 24, 16, 8 and 6 bits show similar power savings, but these savings are significantly reduced when 4 bits are used, due to reduced training accuracy.

• => 6 bits are chosen, which made the multiplier-accumulation hardware very small

Power savings for different training weight bit

representations

Page 27: Simulation and Evaluation Framework for Manycore Architectures

Simulation and Results...• Power savings of the

ANN-based mechanism are better than the savings in the other cases.

• ANN-based mechanism can identify a significant amount of future behavior in the observed traffic patterns.

• Can intelligently select the threshold necessary for the next timing interval. Power Saves for 8x8 mesh

and torus networks

Page 28: Simulation and Evaluation Framework for Manycore Architectures

Simulation and Results...• Measure throughput

in each mechanism.• Having no on/off

mechanism yields a higher throughput, the ANN-based technique shows better throughput results compared to statically determined threshold techniques. Throughput for 8x8 mesh

and torus networks

Page 29: Simulation and Evaluation Framework for Manycore Architectures

Simulation and Results...• Measure energy in each

mechanism.• Energy consumed using

ANN mechanism is less than the other cases.

• The ANN exhibits a reduction in the overall energy, because of a balanced performance-to-power savings ratio, when compared to not having on/off links or when compared to static threshold computation.

Normalized Energy for 8x8 torus networks

Page 30: Simulation and Evaluation Framework for Manycore Architectures

Simulation and Results...• Measure packet

latency in each mechanism.

• The ANN-based mechanism incurs more delay, but we believe that the delay penalty is acceptable when compared to the associated power savings. Average Packet Latency

Page 31: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption New Intelligent ANN

algorithm:• Pre-training.

• Choose router ports with minimum port utilization

• Size of network more manageable

• Prediction scheme based on ANN• Divide network into smaller nets• Pass chosen ports as inputs in

ANNs• Output ports to turn off

ANN mechanism

Intelligently computed threshold

Yes/timeout

No

Receive port utilization for a 4x4

NoC partition

Neural Network

Chose ports based on threshold

Output Control packets to turn ports on/off

Monitor port utilization

Receive from ALL ports

completed?

Next time interval

Page 32: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption • When the router ports become unavailable,

temporarily or permanently, X-Y routing cannot guarantee deadlock free system.

• Since router ports are turned off in our work, a new routing algorithm must be developed in order to make sure that there are no deadlocks.

• Fully adaptive routing algorithms perform better in the cases of faults but they are very difficult to implement due to higher overhead in silicon area and energy consumption.

• Based on this, a partially adaptive routing algorithm was chosen in order to achieve a certain degree of fault tolerance in our system.

Page 33: Simulation and Evaluation Framework for Manycore Architectures

Reducing power consumption • Fault Tolerant Negative First

algorithm is based on the turn models.

• It makes certain turns forbidden so that the deadlock can be avoided.

• A packet is routed at first in the negative direction in each dimension and then, it is routed at the positive direction. The forwarding message at first moves to west or south until the offset is zero and after that it moves to the north or east.

Negative First Routing Algorithm in 8x8 Mesh

network

Page 34: Simulation and Evaluation Framework for Manycore Architectures

Simulation Results• The power savings

of the ANN-based mechanism are better compared to statically-determined case, and the case without any on/off ports for all the traffic models.

Power Saves for 8x8 mesh and torus networks

Page 35: Simulation and Evaluation Framework for Manycore Architectures

Simulation Results...• Having no on/off

mechanism yields a higher throughput; however, the ANN-based technique yields better throughput when compared to the statically-determined threshold

Normalized throughput for 8x8 mesh and torus

networks

Page 36: Simulation and Evaluation Framework for Manycore Architectures

Results from the framework use• Framework can be used from researchers in order to

evaluate many-core architectures.• It helps to compare how the number of cores affects

the total power consumption of the network.• Intel showed that the number of cores may be affected

from the power consumption because of the increase number of routers, interconnects and data travelling through the network.

• Researchers can do parameter exploration related to many-core architectures.

• This new Network on Chip framework helps researchers to solve different NoC tasks through simulations.

Page 37: Simulation and Evaluation Framework for Manycore Architectures

Project Outcomes• Smooth flow of work

• Some simulator problems have been overcome• Help from Dr. Soteriou and Drs. Michael and Chadjicostis

• Results Dissemination on target with Project Goals.

• Publications in conferences/journals• Participation in ISVLSI Conference July 2011, Chennai, India.• Publication in Journal of Electrical and Computer

Engineering, Hindawi Publishing Corporation, 2012.• Submission at the ISVLSI 2012: paper for turning router

ports on/off. (Under Review)

Page 38: Simulation and Evaluation Framework for Manycore Architectures

PublicationsARTICLES:• A. Savva, T. Theocharides, V. Soteriou, “Intelligent On/Off Link

Management for On-Chip Networks”, In Proc. IEEE Annual Symposium on VLSI, pp. 343 – 344, July 2011.

• Under Review: A. Savva, T. Theocharides, V. Soteriou, “Intelligent On/Off Router Ports Management for Networks on Chip”, ISVLSI Conference 2012

JOURNALS:• Andreas G. Savva, T. Theocharides, V. Soteriou, "Intelligent On/Off

Dynamic Link Management for On-Chip Networks," Journal of Electrical and Computer Engineering, vol. 2012, Article ID 107821, 2012

POSTER:• Poster at HiPEAC Ph.D. Student Poster Presentation - Paphos,

Cyprus, January 2009.WORKSHOP:• Results of this work were presented in a workshop at KIOS

Research Centre – 30 Nov. 2011

Page 39: Simulation and Evaluation Framework for Manycore Architectures

Project Deliverables:• D1: Six Month, Interim, Final Report, Financial Reports• D2: Project Website, Publications• D3: Network communication simulator in JAVA, Four

traffic models for purposes of simulation and evaluation of the network (Available source code)

• D4: RISC processor models, memory models, core models, Input Output models (VHDL/C++ Code)

• D5: Cross-compiler • D6: Benchmarks, Algorithms for power consumption

and performance measurements.• D7: Many-core architectures, Evaluation of the

developed framework.

Page 40: Simulation and Evaluation Framework for Manycore Architectures

Acknowledgements to:• Dr. Maria K. Michael – for the verification and

automation algorithms feedback.

• Dr. Christoforos Hadjicostis – for the reliability aspects and the discrete event algorithms employed in building the simulator.

• Dr. Vassos Soteriou - for the feedback on the Interconnect.

• Dr. Theocharis Theocharides - for the coordination of this project and all the help.

Page 41: Simulation and Evaluation Framework for Manycore Architectures

ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ

This work falls under the Cyprus Research Promotion Foundation’s Framework Programme for Research,

Technological Development and Innovation 2008 (DESMI 2008), co-funded by the Republic of Cyprus and the

European Regional Development Fund, and specifically under Grant PENEK/ENISX/0308

Page 42: Simulation and Evaluation Framework for Manycore Architectures

THANK YOU!Project Host Organization

University of CyprusAndreas Savva, Theocharis Theocharides , Maria K.

Michael, Christoforos Hadjicostis

Collaborating Partners

Cyprus University of TechnologyVassos Soteriou