l21 directory 2

Upload: bmkvsr

Post on 06-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 L21 Directory 2

    1/26

    CS 152 Computer Architecture

    and Engineering

    Lecture 21: Directory-BasedCache Protocols

    Scott Beamer

    (substituting for Krste Asanovic)Electrical Engineering and Computer Sciences

    University of California, Berkeley

    http://www.eecs.berkeley.edu/~sbeamer

    http://inst.cs.berkeley.edu/~cs152

  • 8/3/2019 L21 Directory 2

    2/26

    4/28/2009 2CS152-Spring09

    Recap: Snoopy Cache Protocols

    Use snoopy mechanism to keep all processorsview of memory coherent

    M1

    M2

    M3

    SnoopyCache

    DMA

    Physical

    Memory

    MemoryBus

    SnoopyCache

    Snoopy

    Cache DISKS

  • 8/3/2019 L21 Directory 2

    3/26

    4/28/2009 3CS152-Spring09

    Recap: MESI: An Enhanced MSI protocolincreased performance for private data

    M E

    S I

    M: Modified ExclusiveE: Exclusive, unmodifiedS: SharedI: Invalid

    Each cache line has a tag

    Address tag

    statebits

    Write miss

    Other processorintent to write

    Read miss,shared

    Other processorintent to write

    P1 write

    Read by anyprocessor

    Other processor reads

    P1 writes back

    P1 readP1 writeor read

    Cache state in

    processor P1

    Read miss,not shared

  • 8/3/2019 L21 Directory 2

    4/26

    4/28/2009 4CS152-Spring09

    Performance of Symmetric Shared-MemoryMultiprocessors

    Cache performance is combination of:1. Uniprocessor cache miss traffic

    2. Traffic caused by communication Results in invalidations and subsequent cache misses

    Adds 4th C: coherence miss Joins Compulsory, Capacity, Conflict

    (Sometimes called a Communication miss)

  • 8/3/2019 L21 Directory 2

    5/26

    4/28/2009 5CS152-Spring09

    Coherency Misses

    1. True sharing misses arise from the communicationof data through the cache coherence mechanism Invalidates due to 1st write to shared block

    Reads by another CPU of modified block in different cache

    Miss would still occur if block size were 1 word

    2. False sharing misses when a block is invalidatedbecause some word in the block, other than the onebeing read, is written into Invalidation does not cause a new value to be communicated, but

    only causes an extra cache miss

    Block is shared, but no word in block is actually shared miss would not occur if block size were 1 word

  • 8/3/2019 L21 Directory 2

    6/26

    4/28/2009 6CS152-Spring09

    Example: True v. False Sharing v.Hit?

    Time P1 P2 True, False, Hit? Why?

    1 Write x1

    2 Read x2

    3 Write x1

    4 Write x2

    5 Read x2

    Assume x1 and x2 in same cache block.

    P1 and P2 both read x1 and x2 before.

    True miss; invalidate x1 in P2

    False miss; x1 irrelevant to P2

    False miss; x1 irrelevant to P2

    False miss; x1 irrelevant to P2

    True miss; invalidate x2 in P1

  • 8/3/2019 L21 Directory 2

    7/26

    4/28/2009 7CS152-Spring09

    MP Performance 4 ProcessorCommercial Workload: OLTP, DecisionSupport (Database), Search Engine

    0

    0.25

    0.50.75

    1

    1.25

    1.5

    1.75

    2

    2.25

    2.5

    2.75

    3

    3.25

    1 MB 2 MB 4 MB 8 MB

    Cache size

    Instruction

    Capacity/Conflict

    Cold

    False Sharing

    True Sharing

    True sharing andfalse sharing

    unchanged going

    from 1 MB to 8

    MB (L3 cache)

    Uniprocessor

    cache misses

    improve with

    cache sizeincrease(Instruction,

    Capacity/Conflict,

    Compulsory)

  • 8/3/2019 L21 Directory 2

    8/26

    4/28/2009 8CS152-Spring09

    MP Performance 2MB CacheCommercial Workload: OLTP, DecisionSupport (Database), Search Engine

    True

    sharing,

    false sharing

    increasegoing from 1

    to 8 CPUs

    0

    0.5

    1

    1.5

    2

    2.5

    3

    1 2 4 6 8

    Processor count

    Instruction

    Conflict/Capacity

    Cold

    False Sharing

    True Sharing

  • 8/3/2019 L21 Directory 2

    9/26

    4/28/2009 9CS152-Spring09

    A Cache Coherent System Must:

    Provide set of states, state transition diagram, andactions

    Manage coherence protocol (0) Determine when to invoke coherence protocol

    (a) Find info about state of block in other caches to determine

    action whether need to communicate with other cached copies

    (b) Locate the other copies

    (c) Communicate with those copies (invalidate/update)

    (0) is done the same way on all systems state of the line is maintained in the cache

    protocol is invoked if an access fault occurs on the line

    Different approaches distinguished by (a) to (c)

  • 8/3/2019 L21 Directory 2

    10/26

    4/28/2009 10CS152-Spring09

    Bus-based Coherence

    All of (a), (b), (c) done through broadcast on bus faulting processor sends out a search

    others respond to the search probe and take necessary action

    Could do it in scalable network too broadcast to all processors, and let them respond

    Conceptually simple, but broadcast doesnt scale withnumber of processors, P on bus, bus bandwidth doesnt scale

    on scalable network, every fault leads to at least P networktransactions

    Scalable coherence: can have same cache states and state transition diagram different mechanisms to manage protocol

  • 8/3/2019 L21 Directory 2

    11/26

    4/28/2009 11CS152-Spring09

    Scalable Approach: Directories

    Every memory block has associated directoryinformation keeps track of copies of cached blocks and their states

    on a miss, find directory entry, look it up, and communicate onlywith the nodes that have copies if necessary

    in scalable networks, communication with directory and copies isthrough network transactions

    Many alternatives for organizing directory information

  • 8/3/2019 L21 Directory 2

    12/26

    4/28/2009 12CS152-Spring09

    Basic Operation of Directory

    k processors.

    With each cache-block in memory:k presence-bits, 1 dirty-bit

    With each cache-block in cache:1 valid bit, and 1 dirty (owner) bit

    P P

    Cache Cache

    Memory Directory

    presence bits dirty bit

    Interconnection Network

    Read from main memory by processor i:

    If dirty-bit OFF then { read from main memory; turn p[i] ON; }

    if dirty-bit ON then { recall line from dirty proc (cache state toshared); update memory; turn dirty-bit OFF; turn p[i] ON; supplyrecalled data to i;}

    Write to main memory by processor i:

    If dirty-bit OFF then {send invalidations to all caches that have theblock; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }

  • 8/3/2019 L21 Directory 2

    13/26

    4/28/2009 13CS152-Spring09

    CS152 Administrivia

    Informal Survey, Thursday April 30 Last lecture, Tuesday May 5

    Summary of the course

    Putting it all together: Nehalem

    Course Survey at End (we want your feedback)

    Final quiz, Thursday May 7

  • 8/3/2019 L21 Directory 2

    14/26

    4/28/2009 14CS152-Spring09

    Directory Cache Protocol(Handout 6)

    Assumptions: Reliable network, FIFO messagedelivery between any given source-destination pair

    CPU

    Cache

    Interconnection Network

    Directory

    Controller

    DRAM Bank

    Directory

    Controller

    DRAM Bank

    CPU

    Cache

    CPU

    Cache

    CPU

    Cache

    CPU

    Cache

    CPU

    Cache

    Directory

    Controller

    DRAM Bank

    Directory

    Controller

    DRAM Bank

  • 8/3/2019 L21 Directory 2

    15/26

    4/28/2009 15CS152-Spring09

    Cache States

    For each cache line, there are 4 possible states: C-invalid (= Nothing): The accessed data is not resident in the

    cache.

    C-shared (= Sh): The accessed data is resident in the cache,

    and possibly also cached at other sites. The data in memory

    is valid.

    C-modified (= Ex): The accessed data is exclusively resident

    in this cache, and has been modified. Memory does not have

    the most up-to-date data.

    C-transient (= Pending): The accessed data is in a transient

    state (for example, the site has just issued a protocol request,

    but has not received the corresponding protocol reply).

  • 8/3/2019 L21 Directory 2

    16/26

    4/28/2009 16CS152-Spring09

    Home directory states

    For each memory block, there are 4 possiblestates: R(dir): The memory block is shared by the sites specified in

    dir (dir is a set of sites). The data in memory is valid in thisstate. If dir is empty (i.e., dir = ), the memory block is notcached by any site.

    W(id): The memory block is exclusively cached at site id,and has been modified at that site. Memory does not havethe most up-to-date data.

    TR(dir): The memory block is in a transient state waiting forthe acknowledgements to the invalidation requests that thehome site has issued.

    TW(id): The memory block is in a transient state waiting fora block exclusively cached at site id (i.e., in C-modifiedstate) to make the memory block at the home site up-to-date.

  • 8/3/2019 L21 Directory 2

    17/26

    4/28/2009 17CS152-Spring09

    Category Messages

    Cache to MemoryRequests

    ShReq, ExReq

    Memory to CacheRequests

    WbReq, InvReq, FlushReq

    Cache to MemoryResponses

    WbRep(v), InvRep, FlushRep(v)

    Memory to CacheResponses

    ShRep(v), ExRep(v)

    Protocol Messages

    There are 10 different protocol messages:

  • 8/3/2019 L21 Directory 2

    18/26

    4/28/2009 18CS152-Spring09

    Cache State Transitions(from invalid state)

  • 8/3/2019 L21 Directory 2

    19/26

    4/28/2009 19CS152-Spring09

    Cache State Transitions(from shared state)

  • 8/3/2019 L21 Directory 2

    20/26

    4/28/2009 20CS152-Spring09

    Cache State Transitions(from exclusive state)

  • 8/3/2019 L21 Directory 2

    21/26

    4/28/2009 21CS152-Spring09

    Cache Transitions(from pending)

  • 8/3/2019 L21 Directory 2

    22/26

    4/28/2009 22CS152-Spring09

    Home Directory State Transitions

    Messages sent from site id

  • 8/3/2019 L21 Directory 2

    23/26

    4/28/2009 23CS152-Spring09

    Home Directory State Transitions

    Messages sent from site id

  • 8/3/2019 L21 Directory 2

    24/26

    4/28/2009 24CS152-Spring09

    Home Directory State Transitions

    Messages sent from site id

  • 8/3/2019 L21 Directory 2

    25/26

    4/28/2009 25CS152-Spring09

    Home Directory State Transitions

    Messages sent from site id

  • 8/3/2019 L21 Directory 2

    26/26

    4/28/2009 26CS152-Spring09

    Acknowledgements

    These slides contain material developed andcopyright by: Arvind (MIT)

    Krste Asanovic (MIT/UCB)

    Joel Emer (Intel/MIT)

    James Hoe (CMU) John Kubiatowicz (UCB)

    David Patterson (UCB)

    MIT material derived from course 6.823

    UCB material derived from course CS252