implementation and verification of a cache coherence protocol using spin steven farago

Post on 28-Mar-2015

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Implementation and Verification of a Cache Coherence protocol

using Spin

Steven Farago

Goal

• To use Spin to design a “plausible” cache coherence protocol – Introduce nothing in the Spin model that would

not be realistic in hardware (e.g. instant global knowledge between unrelated state machines)

• To verify the correctness of the protocol

Background

• Definition: Cache = Small, high-speed memory that is used by a single processor. All processor memory accesses are via the cache.

• Problem: – In a multiprocessor system, each processor could have a cache.

– Each cache could contain (potentially different) data for the same addresses.

– Given this, how to ensure that processors see a consistent picture of memory?

Coherence protocol

• A Coherence protocol specifies how caches communicate with processors and each other so that processors will have a predictable view of memory.

• Caches that always provide this “predictable view of memory” are said to be coherent.

A Definition of Coherence

• A “view of memory” is coherent if the following property holds:– Given cacheline A, two processors may not see storage accesses to

A in a conflicting order.

– Example:

– Processor 0 Processor 1 Processor 2 Processor 3

Store A, 0 Load A, 0 Load A, 0 Load A, 1

Store A, 1 Load A, 1 Load A, 0 Load A, 0

Coherent Coherent ** NOT Coherent

• Informally, a processor may not see “old” data after seeing “new” data.

Standard Coherence Protocol

• MESI (Modified, Exclusive, Shared, Invalid) – Standard protocol that is supposed to guarantee cache coherence

• Each block in the cacheline is marked with one of these states.

• Cacheline accesses are only allowed if the cache states are “correct” w.r.t the coherence protocol

• Examples: – A cache that is marked “invalid” may not provide data to a

processor.

– Cacheline data may not be updated unless the line is in the Exclusive or Modified

System Model

• Initial version

• Three state machines

– ProcessorModel: Non-deterministically issues Loads and Stores to cache forever

– CacheModel: Two parts - initially combined into a single process

• MainCache - Services processor requests.

• Snooper - Responds to messages from memory controller

– MemoryController - Services requests from each cache and maintains coherency among all

MemoryController

Processor

MainCache

Snooper

Processor

MainCache

Snooper

System Model

ProcessorModel

• Simple

• Continually issues Load/Store requests to associated Cache. – Communication done via Bus Model. – Read requests are blocking

• Coherence verification done when Load receives data (via Spin assert statement)

CacheModel

• Two parts: MainCache and Snooper– MainCache services ProcessorModel Load and Store

requests and initiates contact with the MemoryController when an “invalid” cache state is encountered

– Snooper services independent request from MemoryController. Requests necessary for MemoryController to coordinate coherence responses.

MemoryControllerModel

• Responsible for servicing Cache requests

• 3 Types of requests– Data request: Cache requires up-to-date data to supply to

processor

– Permission-to-store: A Cache may not transition to the Modified state w/o MC’s permission

– A combination of these two

• All types of requests may require MC to communicate with all system caches (via Snooper processes) to ensure coherence

Implementation of Busses

• All processes represent independent state machines. Need communication mechanism

• Use Spin depth 1 queues to simulate communication.

• Destructive/Blocking read of queues requires global bool to indicate bus activity (required for polling).– Global between processes valid to make up for

differences between Spin queues and real busses

Problems - Part 1

• MainCache and Snooper initially implemented as a single process.

• Process nondeterministically determines which to execute at each iteration

• Communication between Processor/Cache and Cache/Memory done with blocking queues

• Blocked receive in MainCache --> Snooper cannot execute

• Leads to deadlock in certain situations

Solution 1

• Split MainCache and Snooper into separate processes.

• Both can access “global” cacheData and cacheState variables independently

--> Problems - Part2

• As separate processes, Snooper and MainCache could change cache state unpredictably.

• Race conditions: Snooper changes cache state/data while MainCache is in mid-transaction --> returns invalidated data to processor.

Solution 2

• Add locking mechanism to cache.– MainCache or Snooper may only access cache if they first lock it.

• Locking mechanism: For simplicity, cheated by using Spin’s atomic keyword to implement test-set on a shared variable.

• Assumption: Real hardware would have some similar mechanism available to lock caches.

• Question: Revised model now equivalent to original??

--> Problem 3

• Memory controller allows multiple outstanding requests from caches.

• Snooper of cache which has a MainCache request outstanding cannot respond to MC queries for other outstanding requests (due to locked cacheline).

• Deadlock.

Solution 3

• Disallow multiple outstanding Cache/MC transactions.

• Introduce global bool variable shared across all caches: outstandingBusOp.

• A cache may only issue requests to the memory controller if no requests from other caches outstanding.

• Global knowledge across all caches unrealistic.• Equivalent to “retries” from MC??

--> Problem 4

• Previous problems failed in Spin simulation within 1000 steps.

• Given last solution, random simulation failures vanish in first 3000 steps.

• Verification fails after ~20000 steps

• Cause of problem as yet unresolved

Verification

• How to verify coherence generally??

• Verify something stronger: A processor will never see conflicting ordering of data if it always sees the newest data available in the system.

• For all loads, assert that data is “new”

Modeling of Data

• Concern that modeling data as random integer would cause Spin to run out of memory

• Model data as a bit with values OLD and NEW.

• All processor Stores store NEW data.

• When transitioning to a Modified state, a cache will change all other values of data in memory and other caches to OLD– Global access to data here strictly a part of verification effort, not

algorithm. Thus allowed.

Debugging

• Found debugging parallel processes difficult.

• Made much easier by Spin’s message sequence diagrams– Graphically shows sends and receives of all

messages. – Requires use of Spin queues rather than globals

for interprocess communication

Future work

• Make existing protocol completely bug free• Activate additional “features” disabled for

debugging purposes (e.g. bus transaction types)• Verify protocol specific rules

– No two caches may be simultaneously Modified

– Cache Modified or Exclusive --> no other cache is Shared

top related