time-predictable multi-core architecture for embedded systemsnov 16, 2012  · app scheduler task...

49
Time-predictable Multi-Core Architecture for Embedded Systems Martin Schoeberl Technical University of Denmark

Upload: others

Post on 27-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Time-predictable Multi-Core Architecture for Embedded

Systems

Martin Schoeberl

Technical University of Denmark

Page 2: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Real-Time Systems

  Systems with timing constraints   In (hard) real-time systems

•  Function has to be correct •  Function has to deliver result in time

  Timing proof with schedulability analysis   Execution time of tasks need to be known   WCET analysis gives the input

Martin Schoeberl DREAM talk at UCB 2

Page 3: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Worst-Case Execution Time

  Measurement of execution time is not safe   Execution time is data dependent   Did we trigger the worst-case?

  Static WCET Analysis   High-level WCET analysis is mature

research   Considers control flow and flow facts (loop

bounds)

Martin Schoeberl DREAM talk at UCB 3

Page 4: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Static WCET Analysis Issue

  Low-level analysis is the main issue   Modern processors are too complex   Lot of (hidden) state information

•  Key for performance •  Issue for WCET analysis

  WCET analysis is about 10 years behind current processors

  Multiprocessors currently not analyzable

Martin Schoeberl DREAM talk at UCB 4

Page 5: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

New Architectures Needed

  Design a computer architecture for real-time systems   WCET is the main design constraint   Average-case performance not (so)

interesting

  Use and develop features that are   WCET analysis driven   Have a low WCET

Martin Schoeberl DREAM talk at UCB 5

Page 6: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Time-predictable Computer Architecture

  Common computer architecture wisdom

Make the common case fast and the uncommon case just correct

  Time-predictable computer architecture

Make the worst case fast and the whole system analyzable

Martin Schoeberl DREAM talk at UCB 6

Page 7: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Our WCET Target Architecture

  Trying to catch up with the analysis on the complexity of average case optimized architectures is not an option

  We need a sea change and take a constructive approach. Design processors, memory, and interconnect for real-time systems!

Martin Schoeberl DREAM talk at UCB

Execution timeACET WCET Bound

COTS Processor

Execution timeACET WCET Bound

TP Processor A

Execution timeACET WCET Bound

TP Processor B

BCET

BCET

BCET

Pessimism

7

Page 8: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST Architecture

  Chip-multiprocessor for high performance   Target: 64 core in an FPGA

  Time-predictable   Processor   Network-on-Chip (NoC)   Local memory (SPM, $)   SDRAM controller

  Integration in WCET analysis

Martin Schoeberl 8 DREAM talk at UCB

Page 9: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST Architecture

application 2

local memory

3. network on chip

application 1

software

tile

4. memory (controller)

! 2. processor

T1 T2 T3 T4 T5 T6

hardware

2. RTOS app scheduler

task scheduler task scheduler

NI NI NI

NI

3. NOC WCET analysis

4. mem. controller WCET analysis

6. code-level WCET analysis

R R

!

5. compiler

tools architecture

Martin Schoeberl DREAM talk at UCB 9

Page 10: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST Outcome

  Provide a complete platform   Hardware in an FPGA   Supporting compiler and analysis tool

  Resulting designs in open source   BSD license   Cooperation welcome

  Up to compiler   No operating system research   No MoC research   No automatic parallelization research

Martin Schoeberl 10 DREAM talk at UCB

Page 11: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST Funding

  3 Year STREP project   Started 09/2011

  Funded by the European Commission   Total EC contribution 2.65 M EUR   4 Universities, 4 industry partners

  Open Group project coordinator   DTU technical lead

Martin Schoeberl 11 DREAM talk at UCB

Page 12: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST Partners

Martin Schoeberl DREAM talk at UCB 12

Page 13: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Work Packages

  WP 1 Requirements Analysis (GMV)   WP 2 Processor (DTU)   WP 3 Network on Chip (DTU)   WP 4 Memory Hierarchy (UoY)   WP 5 Compiler (TUV)   WP 6 Code-Level WCET Analysis (AbsInt)   WP 7 Integration and Evaluation (GMV)   WP 8 Dissemination (TOG)   WP 9 Management (TOG)

Martin Schoeberl DREAM talk at UCB 13

Page 14: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

WP Interaction

Martin Schoeberl DREAM talk at UCB

Dissemination

Management Requirements Evaluation

NoC

Memory

Processor

Compiler

WCET

14

Page 15: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

WP2: Processor

  Time-predictable processor   Called Patmos   Flexibility to define the instruction set

  A compiler is adapted for Patmos

  Co-design for low WCET of   Patmos   Compiler   WCET analysis

Martin Schoeberl 15 DREAM talk at UCB

Page 16: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Patmos

  RISC style microprocessor   Dual issue   Full predication – all instructions   Split caches   Split load   Intended as research platform for real-

time architecture (e.g. caches, SPM)

Martin Schoeberl 16 DREAM talk at UCB

Page 17: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Pipeline Overview

RF M$

IRPC

+

Dec

S$

SP

D$

RF

+

Fetch Decode Execute Memory Writeback

Martin Schoeberl 17 DREAM talk at UCB

Page 18: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Instruction Set

  3 register ALU instructions   Immediate ALU instructions

  Short and long ALU (2. issue slot)

  Load/store with register + offset   Word, half word, byte   Aligned access   Typed ld/st for different data caches   Split load

Martin Schoeberl 18 DREAM talk at UCB

Page 19: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Instruction Set cont.

  Branch with offset and register indirect   Call instruction to support method

cache   Compare and predicate operation   Stack cache support instructions   Dual issue

  ALU in both pipelines   Load/store and branch in 1. pipeline

Martin Schoeberl 19 DREAM talk at UCB

Page 20: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Simulator

  Faithful emulation of Patmos   Reasonable simulation speed   Easily adaptable and extensible   Detailed feedback and runtime statistics

  Intended as design aid   Compiler development   Runtime statistics   Support execution tracing

Martin Schoeberl 20 DREAM talk at UCB

Page 21: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Simulator Capabilities

  Pipeline model   by-passing, stalling, delay slots

  Stack cache   Method cache

  FIFO and LRU replacement   Data cache

  Set associative, LRU, write-through, no-allocation

  Main memory   Timing of block transfers

  IO Devices: UART, cycle counter

Martin Schoeberl 21 DREAM talk at UCB

Page 22: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Compiler

  Adaption of LLVM   Basic code generation works

  Only single pipeline

  Support for stack cache instructions   Reserve, free, ensure

  Method cache support   Shall optimize for WCET

  Integration with AbsInt tool aiT

Martin Schoeberl 22 DREAM talk at UCB

Page 23: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Technology Observation

  Prototype in FPGA   Logic cells, registers, on-chip memories

  About 200-400 LCs per on-chip memory block   Processor 3000 LCs => 10 memory blocks   Memory is the bottleneck

  On-chip memories are synchronous   Registers at the inputs   Influences the pipeline structure

Martin Schoeberl 23 DREAM talk at UCB

Page 24: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Patmos Status

  Simulator complete   Complete ISA coverage   Executes MiBench benchmark suite

  Hardware described in VHDL   FPGA implementation   On-chip instruction ROM and data memory   Test programs in assembler   Tiny C programs are ok

Martin Schoeberl 24 DREAM talk at UCB

Page 25: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Co-simulation

  Compare high-level simulation with VHDL description   Pasim and VHDL in ModelSim

  Compare most important processor state every cycle: register file   RF content dumped into files   Small tool compares

  Collection of assembler test cases   Nightly run on test server + email

Martin Schoeberl 25 DREAM talk at UCB

Page 26: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Patmos Next Steps

  Executing MiBench in the HW   Stack cache implementation and paper   Dual pipeline   More on caches, SPMs,…   Attach the Network-on-Chip   Integration with memory controller

  Is there a memory NoC?

Martin Schoeberl 26 DREAM talk at UCB

Page 27: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

WP3: Network-on-Chip

  A basic processor pipeline is predictable   Not too many research challenges

  Communication and memory hierarchy is where the action is in a CMP

  Multicore needs a time-predictable on-chip communication

Martin Schoeberl 27 DREAM talk at UCB

Page 28: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Real-Time CMP

  NoC for real-time systems   Include NoC latency in WCET analysis   Statically scheduled arbitration   Time-division multiplexing

  NOCS 2012: “A Statically Scheduled Time-Division-Multiplexed Network-on-Chip for Real-Time Systems”

  + RTNS, + NORCHIP, + DATE

Martin Schoeberl 28 DREAM talk at UCB

Page 29: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

NoC for Chip-Multiprocessing

  Homogenous CMP   Regular network to connect cores

  Mesh, bidirectional torus

  Serves two communication purposes   Message passing between cores   Cores access to shared memory

  This talk is about the message passing NoC

Martin Schoeberl 29 DREAM talk at UCB

Page 30: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

NoC Communication Cloud

IP IP

IP IPIP

IP

! Virtual circuits; all!to!all! Topologies: 2D!mesh, torous, tree

! TDM!basedNetwork!on!chip

Martin Schoeberl 30 DREAM talk at UCB

Page 31: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

S4NoC and T-CREST

  S4NoC is a first step to explore ideas   Real T-CREST NoC will be

  Asynchronous   Configurable TDM schedule   Might contain 2 (or more) NoCs   Fancier network adapter   …we will see during the next 2 years…

Martin Schoeberl 31 DREAM talk at UCB

Page 32: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Real-Time Guarantees

  NoC is a shared communication medium

  Needs arbitration   Time-division-multiplexing is predictable

  Message latency/bandwidth depends on   Schedule   Topology   Number of nodes

Martin Schoeberl 32 DREAM talk at UCB

Page 33: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

First Design Decisions

  All-to-all communication   Single word messages   Routing information in the

  Router   Network adapter

  Single cycle per hop   No buffering in the router

  No flow-control at NoC level   Done at higher level

Martin Schoeberl 33 DREAM talk at UCB

Page 34: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

The Router

  Just multiplexer and register

  Static schedule   Conflict free   No way to buffer   No flow control

  Low resource consumption

L

N

S

E

W

N

L

S

E

W

L N S E W

ST

ST

ST

ST

ST

Slot Cnt

Martin Schoeberl 34 DREAM talk at UCB

Page 35: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

TDM Schedule

  Static schedule   Generated off-line   ‘Before chip production’

  All-to-all communication   Has a period   Single word scheduling simplifies

schedule generation   No ‘pipeline’ effects to consider

Martin Schoeberl 35 DREAM talk at UCB

Page 36: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Period Bounds

  A TDM round includes all communication needs

  That round is the TDM period   Period determines maximum latency   Minimize schedule period

  We found optimal solutions •  Up to 5x5

  Heuristics for larger NoCs •  Nice solution for regular structures

Martin Schoeberl 36 DREAM talk at UCB

Page 37: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Period Bounds

  IO Bound (n-1)   Capacity bound (# links)   Bisection bound (half to half comm.)

Size Mesh Torus Bi-torus 3x3 8 9 8 4x4 16 24 15 5x5 32 50 24 6x6 90 35 7x7 48 8x8 64 9x9 92

Martin Schoeberl 37 DREAM talk at UCB

Page 38: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Router Implementation

  Build a many core NoC in a medium sized FPGA   Router is small   Use a tiny processor – Leros

  Router is simple   Double clock the NoC

  First experiment without a real application

Martin Schoeberl 38 DREAM talk at UCB

Page 39: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Size and Frequency

  Leros processor   ~220 LCs, ~125 MHz

  Router/NoC   50-160 LCs, 230—330 MHz

  9x9 fitted into the Altera DE2-70!   However, no real network adapter

  A simple RISC pipeline ca. 2000 LCs

Martin Schoeberl 39 DREAM talk at UCB

Page 40: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

A Simple Network Adapter

  Router/NoC is minimal   What is a minimal NA?

  Single rx and tx registers   But one pair for each channel

  Rx/tx register full/empty flags   Like a serial port on a PC

Martin Schoeberl 40 DREAM talk at UCB

Page 41: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

NA First Numbers

  4x4 bi-torus system   Network adapter:

  1 on-chip memory block   ~ 230 LCs (18 for schedule table)

  Router   98 LCs (19 for schedule table)

  Fmax: 90 MHz Leros, 170 MHz NoC

Martin Schoeberl 41 DREAM talk at UCB

Page 42: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Schedule Tables

  Fixed schedules   Generated VHDL code   Implemented in LUTs

Cores NA Table Router Table Schedule Length

16 18 LCs 19 LCs 20 25 26 LCs 22 LCs 28 36 52 LCs 37 LCs 43 49 73 LCs 50 LCs 59

Martin Schoeberl 42 DREAM talk at UCB

Page 43: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Discussion

  TDM wastes bandwidth   All-to-all schedule wastes even more!

  Does it matter?

  There is plenty of bandwidth on-chip   Wires are cheap   1024 wide busses in an FPGA possible

  TDM all-to-all routers are cheap   Bandwidth relative to cost matters

Martin Schoeberl 43 DREAM talk at UCB

Page 44: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

NoC Summary

  Static TDM schedule is   Time-predictable   Simple (low HW resources)

  Network adapter in development   TDM from end to end

•  Memory – NoC – Memory   No flow control needed   Also simple

  Usability needs to be explored

Martin Schoeberl 44 DREAM talk at UCB

Page 45: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

T-CREST: More Info

  T-CREST web site   http://www.t-crest.org/   Most deliverables are public   Some first papers

  Development   Most artifacts are open-source   Hosted at GitHub   https://github.com/t-crest

Martin Schoeberl 45 DREAM talk at UCB

Page 46: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Cooperation: PRET

  PRET – PTARM   Similar goal as Patmos

  Look at memory hierarchy   SPM – caches is a spectrum

•  SPM with address translation •  SPM with blocks •  Method cache

  Multicore   A PTARM CMP with the T-CREST NoC

Martin Schoeberl DREAM talk at UCB 46

Page 47: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Cooperation: Ptolemy

  T-CREST has no MoC and no OS   But we need one!

  Can we learn from Ptolemy?   Look into Ptolemy examples

  Actors fit nicely to many-cores   How can we map communication to NoC

message passing?   Can we generate code for many-cores?

Martin Schoeberl DREAM talk at UCB 47

Page 48: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Cooperation: Time-predictable

  What does time-predictable mean?   There are some definitions out there

•  Don’t like them   Can it be quantified?

  What is worst-case performance?   Difference between

  Repeatable timing   Time-predictable

Martin Schoeberl DREAM talk at UCB 48

Page 49: Time-predictable Multi-Core Architecture for Embedded SystemsNov 16, 2012  · app scheduler task scheduler task scheduler NI NI NI NI 3. NOC WCET analysis 4. mem. controller WCET

Summary

  New computer architecture for real-time systems

  WCET analyzability is of primary importance

  T-CREST is a platform   Processor, NoC, memory controller,

compiler, WCET analysis

  Technology will be mostly open source

Martin Schoeberl DREAM talk at UCB 49