center for embedded systems research (cesr) department of electrical & computer eng’g
DESCRIPTION
Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg. Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g - PowerPoint PPT PresentationTRANSCRIPT
NC STATE UNIVERSITY
Center for Embedded Systems Research (CESR)Department of Electrical & Computer Eng’g
North Carolina State University
Ali El-Haj-Mahmoud, Ahmed S. AL-Zawawi, Aravindh Anantaraman, and Eric Rotenberg
Virtual Multiprocessor: An Analyzable, High-Performance
Microarchitecture for Real-Time Computing
2El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Embedded Processor Trends
Inheriting desktop high-performance features Examples
• ARM11: 8-stage pipeline, caches, dynamic br. pred.• Ubicom IP3023: 8 hardware threads• PowerPC 750: 2-way superscalar, OOO execution
3El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Systems and Analyzability
Schedulability of task-set determined a priori• Requires worst-case execution times (WCET) statically analyzable microarchitecture
Dynamic microarchitecture features complicate real-time design
A trade-off between performance and analyzability
AB
4El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Multiple Simple Processors
+ Analyzable+ Natural fit with real-time systems− Rigid resource partitioning− Higher cost/performance metric
proc 1 proc 2
A BC
5El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Simultaneous Multithreading (SMT)
+ Flexible resource sharing+ Better cost/performance metric− Unanalyzable
SMT
A B C
6El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Unanalyzability of SMT
− Violates single-task WCET assumption (tasks analyzed separately)
− Arbitrary periods arbitrary overlap of tasks− Dynamic interference
Cannot derive WCETs Cannot perform schedulability
7El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Virtual Multiprocessor Combine MP analyzability and SMT flexibility Key idea: Interference-free multithreading
• SMT performance• WCET of each task independent of task-set
RVMP substrate: two parts• Highly reconfigurable multithreaded superscalar
Space: multiple arbitrary interference-free partitionsTime: rapidly reconfigure partitions
• Static schedule orchestrates partitioning
8El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Big Picture
Co-design processor and real-time scheduling for analyzable high-performance
9El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
RVMP Architecture Superscalar “ways” are natural partitioning
granularity Different-sized virtual processors carved out
of single superscalar
10El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Processor Architecture Starting point
• Alpha 21164: 4-way in-order superscalar• Ubicom IP3023: 8 hardware threads
(4 in RVMP 4 VPs)
Simplifications for analyzability• In-order issue within VPs• Software-managed scratchpads • Static branch prediction
Not limitations of RVMP!
11El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Processor Architecture
FetchUnit
PC
InterleavedInstruction Scratchpad
DecodeSlotter and Scoreboard
(Issue Logic)
Int RF
ShadowBuffers
FP RF
Data Scratchpad
FU4: FPU
FU0: INT
FU1: INT/MUL/DIV
FU2: INT/AGEN
FU3: INT/AGEN
4 4
1
ShadowBuffers
RD
RDWR
HRT
Fetch Buffer
12El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Fetch Buffer
InstructionScratchpad
FetchUnit
Decode Issue
Backend
FV
PV
Shadow Buffers
Shadow Buffers
FV
PV
HRT
Instruction Fetch
13El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Real-Time Scheduling
Too complicated• Must schedule entire hyper-period• Overwhelming # of possible space/time schedules• High dedicated-storage cost for schedule
ABCD
14El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
WCET2WCET1
Task A
WCET3WCET4
period
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…Round
15El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Bperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
16El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Cperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
17El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Task Dperiod
1234
# of
way
s
4 ways
3 ways
2 ways
1 way
…
…
…
…
…
18El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
FAILEDSUCCEEDEDCyclic schedule
19El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Interaction: Scheduling and Architecture
HRT LTC
FU0FU1FU2FU3FU4
FU0FU1FU2FU3FU4
60
40
100 cycles
60 40
FV PV CVs
0
EOT
1
INVALIDINVALID
20El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Experiments Tasks from C-lab and MiBench benchmarks 100 task-sets
• 4 tasks per task-set (also 8 tasks in paper)• grouped according to scalar utilization (U_scalar)
Two experiments• Worst-case schedulability analysis • Run-time experiments
21El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Worst-Case Schedulability Tests
25 25 25 25
1614 15
5
16
97
2
7
1 1
25
911 10
20
25
9
1618
2325
18
24 24 2525
0%
25%
50%
75%
100%
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
Sca
lar
RV
MP
4x1
2x2
1x4
0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
Succ
ess
rate
(%)
FailureSuccess
22El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
RVMP Configurations
0
1
2
3
4
5
6
71-
1-1-
12-
21-
3/2-
21-
3/2-
1-1
4/1-
1-2
4/4/
2-2
4/4/
1-3
4/4/
4/4
1-1-
1-1
2-2
1-3/
2-2
1-3/
2-1-
14/
1-1-
24/
4/2-
24/
4/1-
34/
4/4/
4
1-1-
1-1
2-2
1-3/
2-2
1-3/
2-1-
14/
1-1-
24/
4/2-
24/
4/1-
34/
4/4/
4
1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
# of
task
-set
s
23El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Run-Time Experiments
25 25 25 25 25
1715 15
5
18 17 16
97
3
16
7
1
96
810 10
20
7 8 9
1618
22
911
18
24 24 25
1619
14
25
10%
25%
50%
75%
100%
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
RV
MP
4x1
2x2
1x4
SM
T-E
DF
SM
T-IC
NT
0 < U_scalar <= 1 1 < U_scalar <= 2 2 < U_scalar <= 3 3 < U_scalar <= 4
Task-set bins
Succ
ess
rate
(%)
FailureSuccess
24El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Unsafe Behavior of SMT
16
12
6
Only SMT-EDF successOnly SMT-ICNT successBoth successBoth failure
Failure40%
Success60%
1 < U_scalar <= 2
25El-Haj-Mahmoud © 2005
NC STATE UNIVERSITY
CASES 2005
Summary Novel contributions
• Virtualize a single processor− Space: variable-size interference-free partitions− Time: rapid reconfiguration
• Simple real-time scheduling approach Analyzability of MP with flexibility of SMT Co-design processor and real-time scheduling
for analyzable high-performance