zj hyperthreading
Post on 23-Feb-2018
236 Views
Preview:
TRANSCRIPT
-
7/24/2019 ZJ Hyperthreading
1/30
Hyper-Threading, Chip
multiprocessors andboth
Zoran Jovanovic
-
7/24/2019 ZJ Hyperthreading
2/30
2
To Be Tackled in Multithreading
Revie o! Threading "lgorithms
Hyper-Threading Concepts Hyper-Threading "rchitecture
"dvantages#$isadvantages
-
7/24/2019 ZJ Hyperthreading
3/30
3
Threading "lgorithms Time-slicing
" processor sitches beteen threads in !i%edtime intervals&
High e%penses, especially i! one o! theprocesses is in the ait state& Fine grain
'itch-on-event
Task sitching in case o! long pauses(aiting !or data coming !rom a relatively slo
source, C)* resources are given to otherprocesses& Coarse grain
-
7/24/2019 ZJ Hyperthreading
4/30
4
Threading "lgorithms +cont& Multiprocessing
$istribute the load over many processors
"dds e%tra cost
'imultaneous multi-threadingMultiple threads e%ecute on a single
processor ithout sitching&
Basis o! ntel.s Hyper-Threading technology&
-
7/24/2019 ZJ Hyperthreading
5/30
5
Hyper-Threading Concept
"t each point o! time only a part o!processor resources is used !or e%ecutiono! the program code o! a thread&
*nused resources can also be loaded, !ore%ample, ith parallel e%ecution o!another thread#application&
/%tremely use!ul in desktop and serverapplications here many threads areused&
-
7/24/2019 ZJ Hyperthreading
6/30
0uick Recall1 Many Resources$2/3
From: Tullsen,Eggers, and Levy,SimultaneousMultithreading:Maximizing On-chip
Parallelism, ISCA1995.
For an 8-waysuperscalar.
Slide source: John Kubiatowicz
-
7/24/2019 ZJ Hyperthreading
7/30
7
-
7/24/2019 ZJ Hyperthreading
8/30
4
+a " superscalar processor ith no multithreading
+b " superscalar processor ith coarse-grain multithreading
+c " superscalar processor ith !ine-grain multithreading
+d " superscalar processor ith simultaneous multithreading+'MT
(a)(a) (b)(b) (c)(c) (d)(d)
-
7/24/2019 ZJ Hyperthreading
9/30
5
'imultaneous Multithreading
+'MT/%ample1 ne )entium ith 6Hyperthreading78ey dea1 /%ploit 2) across multiple threads3
i&e&, convert thread-level parallelism into more 2)
e%ploit !olloing !eatures o! modern processors1 multiple !unctional units
modern processors typically have more !unctional unitsavailable than a single thread can utili9e
register renaming and dynamic scheduling multiple instructions !rom independent threads can co-e%ist
and co-e%ecute3
-
7/24/2019 ZJ Hyperthreading
10/30
10
Hyper-Threading "rchitecture
:irst used in ntel ;eon M) processor Makes a single physical processor appear as
multiple logical processors&
/ach logical processor has a copy o! architecturestate& 2ogical processors share a single set o! physical
e%ecution resources
-
7/24/2019 ZJ Hyperthreading
11/30
11
Hyper-Threading "rchitecture
-
7/24/2019 ZJ Hyperthreading
12/30
)oer = data!lo &&&
(hy only to threads>
With 4, one of the shared resources (physical registers,cache, memory bandwidth) would be prone to bottleneck
Cost1
The Power core is about !4" larger than the Power4 corebecause of the addition of #$T support
-
7/24/2019 ZJ Hyperthreading
13/30
13
"dvantages
/%tra architecture onlyadds about =? to thetotal die area&
@o per!ormance loss i!only one thread is active&ncreased per!ormanceith multiple threads
Better resourceutili9ation&
-
7/24/2019 ZJ Hyperthreading
14/30
14
$isadvantages
To take advantage o! hyper-threadingper!ormance, serial e%ecution can not beused&Threads are non-deterministic and involve
e%tra design
Threads have increased overhead
'hared resource con!licts
-
7/24/2019 ZJ Hyperthreading
15/30
Multicore
Multiprocessors on a single chip
15
-
7/24/2019 ZJ Hyperthreading
16/30
CS267 Lecture 6 16
Basic 'hared Memory"rchitecture )rocessors all connected to a large shared memory (here are caches>
A @o take a closer look at structure, costs, limits,programming
)
interconnect
memory
) )n
-
7/24/2019 ZJ Hyperthreading
17/30
Slide source: John Kubiatowicz
(hat "bout Caching>>>
(ant High per!ormance !or shared memory1 *se Caches3
/ach processor has its on cache +or multiple caches )lace data !rom memory into cache
(riteback cache1 don.t send all rites over bus to memory
Caches Reduce average latency "utomatic replication closer to processor
Moreimportant to multiprocessor than uniprocessor1 latencies longer
@ormal uniprocessor mechanisms to access data 2oads and 'tores !orm very lo-overhead communication primitive
)roblem1 Cache Coherence3
#< devicesMem
)
D D
)n
Bus
-
7/24/2019 ZJ Hyperthreading
18/30
/%ample Cache Coherence )roblem
#< devices
Memory
)
D D D
) )E
=
u F >
G
u F >
u 1=
u 1=
u 1=
E
u F
Things to note1 )rocessors could see di!!erent values !or u a!ter event E (ith rite back caches, value ritten back to memory depends on
happenstance o! hich cache !lushes or rites back value hen
Ho to !i% ith a bus1 Coherence )rotocol *se bus to broadcast rites or invalidations 'imple protocols rely on presence o! broadcast medium
Bus not scalable beyond about IG processors +ma% Capacity, bandidth limitations
Slide source: John Kubiatowicz
-
7/24/2019 ZJ Hyperthreading
19/30
CS267 Lecture 6
2imits o! Bus-Based 'haredMemory#< M/M M/M
)R
-
7/24/2019 ZJ Hyperthreading
20/30
20
-
7/24/2019 ZJ Hyperthreading
21/30
Cache
-
7/24/2019 ZJ Hyperthreading
22/30
" Reminder1 'MT+'imultaneous Multi Threading
'MT vs& CM)
-
7/24/2019 ZJ Hyperthreading
23/30
" 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5
A :or 'ame area +a billion tr& $R"Marea
'uperscalar and 'MT1 Pery Comple%A(ideA"dvanced Branch predictionARegister RenamingA
-
7/24/2019 ZJ Hyperthreading
24/30
'' and 'MT vs& CM)
CP% Cores&Three main hardare design problems +o! '' and'MT1A"rea increases Ouadraticallyith core comple%ity
A@umber o! Registers
-
7/24/2019 ZJ Hyperthreading
25/30
'' and 'MT vs& CM)
$emory&A issue '' or 'MT reOuire multiport data cache +G-I ports
A ; 4 8byte + cycle latencyCM) I ; I 8byte +single cycle latency, but secondarycache is sloer +multiport
'hared memory1 rite through caches#$T C$P
-
7/24/2019 ZJ Hyperthreading
26/30
)er!ormance comparison
ACompress1 +nteger apps 2o 2) and no T2)A Mpeg-1 +MMedia apps
High 2) and T2) and moderate memory reOuirement +paralleli9ed by hand
'MT utili9es core resources better But CM) has I issue slots instead o!
A Tomcatv1 +:) applications2arge loop-level parallelism and large memory bandidth +T2) by compiler
CM) has large memory bandidth on primary cache - 'MT !undamental problem1 uni!ied and slo cache
A Multiprogram1 nteger multiprogramming orkload, all computation-intensive +2o 2), High )2)
-
7/24/2019 ZJ Hyperthreading
27/30
CM) Motivation
Ho to utili9e available silicon>
'peculation +aggressive superscalar
'imultaneous Multithreading +'MT, Hyperthreading
'everal processors on a single chip
(hat is a CM) +Chip Multi)rocessor>
'everal processors +several masters
Both shared and distributed memory architectures
Both homogenous and heterogeneous processor types
(hy>
(ire $elays
$iminishing o! *niprocessors
Pery long design and veri!ication times !or modern processors
-
7/24/2019 ZJ Hyperthreading
28/30
" 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5
AT2) and )2) become idespread in !uture applicationsAParious Multimedia applicationsACompilers and
-
7/24/2019 ZJ Hyperthreading
29/30
" Reminder1 'MT+'imultaneous Multi Threading
'MT CM)
A)ool o! e%ecution units +(ide machineA'everal 2ogical processorsACopy o! 'tate !or each
AMul& Threads are runningconcurrentlyABetter utili9ation and 2atencyTolerance
A'imple CoresAModerate amount o! parallelismAThreads are running concurrently
on di!!erent cores
-
7/24/2019 ZJ Hyperthreading
30/30
E
'MT $ual-core1 all !our threads can runconcurrently
BTB and -T2B
$ecoder
Trace Cache
Rename#"lloc
*op Oueues
'chedulers
nteger :loating )oint
2 $-Cache $-T2B
uCodeR
top related