zj hyperthreading

7/24/2019 ZJ Hyperthreading

1/30

Hyper-Threading, Chip

multiprocessors andboth

Zoran Jovanovic


2/30

2

To Be Tackled in Multithreading

Revie o! Threading "lgorithms

Hyper-Threading Concepts Hyper-Threading "rchitecture

"dvantages#$isadvantages


3/30

3

Threading "lgorithms Time-slicing

" processor sitches beteen threads in !i%edtime intervals&

High e%penses, especially i! one o! theprocesses is in the ait state& Fine grain

'itch-on-event

Task sitching in case o! long pauses(aiting !or data coming !rom a relatively slo

source, C)* resources are given to otherprocesses& Coarse grain


4/30

4

Threading "lgorithms +cont& Multiprocessing

$istribute the load over many processors

"dds e%tra cost

'imultaneous multi-threadingMultiple threads e%ecute on a single

processor ithout sitching&

Basis o! ntel.s Hyper-Threading technology&


5/30

5

Hyper-Threading Concept

"t each point o! time only a part o!processor resources is used !or e%ecutiono! the program code o! a thread&

*nused resources can also be loaded, !ore%ample, ith parallel e%ecution o!another thread#application&

/%tremely use!ul in desktop and serverapplications here many threads areused&


6/30

0uick Recall1 Many Resources$2/3

From: Tullsen,Eggers, and Levy,SimultaneousMultithreading:Maximizing On-chip

Parallelism, ISCA1995.

For an 8-waysuperscalar.

Slide source: John Kubiatowicz


7/30

7


8/30

4

+a " superscalar processor ith no multithreading

+b " superscalar processor ith coarse-grain multithreading

+c " superscalar processor ith !ine-grain multithreading

+d " superscalar processor ith simultaneous multithreading+'MT

(a)(a) (b)(b) (c)(c) (d)(d)


9/30

5

'imultaneous Multithreading

+'MT/%ample1 ne )entium ith 6Hyperthreading78ey dea1 /%ploit 2) across multiple threads3

i&e&, convert thread-level parallelism into more 2)

e%ploit !olloing !eatures o! modern processors1 multiple !unctional units

modern processors typically have more !unctional unitsavailable than a single thread can utili9e

register renaming and dynamic scheduling multiple instructions !rom independent threads can co-e%ist

and co-e%ecute3


10/30

10

Hyper-Threading "rchitecture

:irst used in ntel ;eon M) processor Makes a single physical processor appear as

multiple logical processors&

/ach logical processor has a copy o! architecturestate& 2ogical processors share a single set o! physical

e%ecution resources


11/30

11

Hyper-Threading "rchitecture


12/30

)oer = data!lo &&&

(hy only to threads>

With 4, one of the shared resources (physical registers,cache, memory bandwidth) would be prone to bottleneck

Cost1

The Power core is about !4" larger than the Power4 corebecause of the addition of #$T support


13/30

13

"dvantages

/%tra architecture onlyadds about =? to thetotal die area&

@o per!ormance loss i!only one thread is active&ncreased per!ormanceith multiple threads

Better resourceutili9ation&


14/30

14

$isadvantages

To take advantage o! hyper-threadingper!ormance, serial e%ecution can not beused&Threads are non-deterministic and involve

e%tra design

Threads have increased overhead

'hared resource con!licts


15/30

Multicore

Multiprocessors on a single chip

15


16/30

CS267 Lecture 6 16

Basic 'hared Memory"rchitecture )rocessors all connected to a large shared memory (here are caches>

A @o take a closer look at structure, costs, limits,programming

)

interconnect

memory

) )n


17/30


(hat "bout Caching>>>

(ant High per!ormance !or shared memory1 *se Caches3

/ach processor has its on cache +or multiple caches )lace data !rom memory into cache

(riteback cache1 don.t send all rites over bus to memory

Caches Reduce average latency "utomatic replication closer to processor

Moreimportant to multiprocessor than uniprocessor1 latencies longer

@ormal uniprocessor mechanisms to access data 2oads and 'tores !orm very lo-overhead communication primitive

)roblem1 Cache Coherence3

#< devicesMem

)

D D

)n

Bus


18/30

/%ample Cache Coherence )roblem

#< devices

Memory

)

D D D

) )E

=

u F >

G

u F >

u 1=

u 1=

u 1=

E

u F

Things to note1 )rocessors could see di!!erent values !or u a!ter event E (ith rite back caches, value ritten back to memory depends on

happenstance o! hich cache !lushes or rites back value hen

Ho to !i% ith a bus1 Coherence )rotocol *se bus to broadcast rites or invalidations 'imple protocols rely on presence o! broadcast medium

Bus not scalable beyond about IG processors +ma% Capacity, bandidth limitations



19/30

CS267 Lecture 6

2imits o! Bus-Based 'haredMemory#< M/M M/M

)R


20/30

20


21/30

Cache


22/30

" Reminder1 'MT+'imultaneous Multi Threading

'MT vs& CM)


23/30

" 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5

A :or 'ame area +a billion tr& $R"Marea

'uperscalar and 'MT1 Pery Comple%A(ideA"dvanced Branch predictionARegister RenamingA


24/30

'' and 'MT vs& CM)

CP% Cores&Three main hardare design problems +o! '' and'MT1A"rea increases Ouadraticallyith core comple%ity

A@umber o! Registers


25/30

'' and 'MT vs& CM)

$emory&A issue '' or 'MT reOuire multiport data cache +G-I ports

A ; 4 8byte + cycle latencyCM) I ; I 8byte +single cycle latency, but secondarycache is sloer +multiport

'hared memory1 rite through caches#$T C$P


26/30

)er!ormance comparison

ACompress1 +nteger apps 2o 2) and no T2)A Mpeg-1 +MMedia apps

High 2) and T2) and moderate memory reOuirement +paralleli9ed by hand

'MT utili9es core resources better But CM) has I issue slots instead o!

A Tomcatv1 +:) applications2arge loop-level parallelism and large memory bandidth +T2) by compiler

CM) has large memory bandidth on primary cache - 'MT !undamental problem1 uni!ied and slo cache

A Multiprogram1 nteger multiprogramming orkload, all computation-intensive +2o 2), High )2)


27/30

CM) Motivation

Ho to utili9e available silicon>

'peculation +aggressive superscalar

'imultaneous Multithreading +'MT, Hyperthreading

'everal processors on a single chip

(hat is a CM) +Chip Multi)rocessor>

'everal processors +several masters

Both shared and distributed memory architectures

Both homogenous and heterogeneous processor types

(hy>

(ire $elays

$iminishing o! *niprocessors

Pery long design and veri!ication times !or modern processors


28/30

" 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5

AT2) and )2) become idespread in !uture applicationsAParious Multimedia applicationsACompilers and


29/30

" Reminder1 'MT+'imultaneous Multi Threading

'MT CM)

A)ool o! e%ecution units +(ide machineA'everal 2ogical processorsACopy o! 'tate !or each

AMul& Threads are runningconcurrentlyABetter utili9ation and 2atencyTolerance

A'imple CoresAModerate amount o! parallelismAThreads are running concurrently

on di!!erent cores


30/30

E

'MT $ual-core1 all !our threads can runconcurrently

BTB and -T2B

$ecoder

Trace Cache

Rename#"lloc

*op Oueues

'chedulers

nteger :loating )oint

2 $-Cache $-T2B

uCodeR

zj hyperthreading

Documents

partsbook zoje zj-500

zj trans&xfer

jeep parts list 1998 zj

intel xeon hyperthreading

zj 34281294

inspirational words zj

u1j .. ::zj l board

allinea performance reports: exploring hyperthreading: a...

zj-5780 caseadeira reta zoje

zj instrumentpanel

ocllcg;zj]]lcglf jd)lijlcglt~

zj driveshaft

djvu document - lg electronics manuale - ita.pdf42pc1 /...

朱春兰 zcl123@zj

wyk - zj -druk

aaaa future of planning zj

zj lube&maintenance

multisplit dc inverter hyper range - default store vie · 3...

eurofighter zj 936

qjy4045-zj installation & adjustment manual