computer architecture j. daniel garcía sánchez...

35
Synchronization Synchronization Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 1/35

Upload: voanh

Post on 12-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

SynchronizationComputer Architecture

J. Daniel García Sánchez (coordinator)David Expósito Singh

Francisco Javier García Blas

ARCOS GroupComputer Science and Engineering Department

University Carlos III of Madrid

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 1/35

Page 2: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Introduction

1 Introduction

2 Hardware primitives

3 Locks

4 Barriers

5 Conclusion

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 2/35

Page 3: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Introduction

Synchronization in shared memory

Communication performed through shared memory.It is necessary to synchronize multiple accesses to sharedvariables.

Alternatives:Communication 1-1.Collective communication (1-N).

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 3/35

Page 4: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Introduction

Communication 1 to 1

Ensure that reading (receive) is performed after writing(send).

In case of reuse (loops):Ensure that writing (send) is performed after formerreading (receive).

Need to access with mutual exclusion.Only one of the processes accesses a variable at the sametime.

Critical section:Sequence of instructions accessing one or more variableswith mutual exclusion.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 4/35

Page 5: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Introduction

Collective communication

Needs coordination of multiple accesses to variables.Writes without interferences.Reads must wait for data to be available.

Must guarantee accesses to variable in mutualexclusion.

Must guarantee that result is not read until allprocesses/threads have executed their critical section.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 5/35

Page 6: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Introduction

Adding a vector

Critical section in loop

void f ( int max) {vector<double> v = get_vector(max);double sum = 0;

auto do_sum = [&](int start , int n) {for ( int i=start ; i<n; ++i) {

sum += v[i ];}

}

thread t1{do_sum,0,max/2};thread t2{do_sum,max/2+1,max};t1 . join () ;t2 . join () ;

}

Critical section out of loop

void f ( int max) {vector<double> v = get_vector(max);double sum = 0;

auto do_sum = [&](int start , int n) {double local_sum = 0;for ( int i=start ; i<n; ++i) {

local_sum += v[i ];}sum += local_sum;

}

thread t1{do_sum,0,max/2};thread t2{do_sum,max/2+1,max};t1 . join () ;t2 . join () ;

}

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 6/35

Page 7: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

1 Introduction

2 Hardware primitives

3 Locks

4 Barriers

5 Conclusion

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 7/35

Page 8: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Hardware support

Need to fix a global order in operations.

Consistency model can be insufficient and complex.

Usually complemented with read-modify-writeoperations.

Example in IA-32:Instructions with prefix LOCK.Access to bus in exclusive mode if location is not in cache.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 8/35

Page 9: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Primitives: Test and set

Instruction Test and Set:Atomic sequence:

1 Read memory location into register (will be returned asresult).

2 Write value 1 in memory location.

Uses: IBM 370, Sparc V9

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 9/35

Page 10: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Primitives: Exchange

Instruction for exchange (swap):Atomic sequence:

1 Exchanges contents in a memory location and a register.

2 Includes a memory read and a memory write.

More general that test-and-set.

Instruction IA-32:XCHG reg, mem

Uses: Sparc V9, IA-32, Itanium

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 10/35

Page 11: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Primitives: Fetch and operation

Instruction for fetching and applying operation(fetch-and-op):

Several operations: fetch-add, fetch-or, fetch-inc, . . .

Atomic sequence:

1 Read memory location into a register (return that value).

2 Write to memory location the result of applying an operationto the original value.

Instruction IA-32:LOCK XADD reg, mem

Uses: IBM SP3, Origin 2000, IA-32, Itanium.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 11/35

Page 12: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Primitives: Compare and exchange

Instruction to compare and exchange(compare-and-swap o compare-and-exchange):

Operation on two local variables (registers a and b) and amemory location (variable x).

Atomic sequence:

1 Read value from x.

2 If x is equal to register a → exchange x and register b.

Instruction IA-32:LOCK CMPXCHG mem, regImplicitly uses additional register eax.

Uses: IBM 370, Sparc V9, IA-32, Itanium.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 12/35

Page 13: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Hardware primitives

Primitives: Conditional store

Pair of instructions LL/SC (Load Linked/Store Conditional).

Operation:If content of read variable through LL is modified before aSC storage is not performed.When a context switch happens between LL and SC, SCis not performed.SC returns a success/failure code.

Example in Power-PC:LWARXSTWCX

Uses: Origin 2000, Sparc V9, Power PC

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 13/35

Page 14: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

1 Introduction

2 Hardware primitives

3 Locks

4 Barriers

5 Conclusion

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 14/35

Page 15: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Locks

A lock is a mechanism to ensure mutual exclusion.

Two synchronization functions:

Lock(k):Acquires the lock.If several processes try to acquire the lock, n-1 are keptwaiting.If more processes arrive, they are kept to waiting.

Unlock(k):Releases the lock.Allow that a waiting process acquires the lock.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 15/35

Page 16: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Waiting mechanisms

Two alternatives: busy waiting and blocking.

Busy waiting:Process waits in a loop that constantly queries the waitcontrol variable value.Spin-lock.

Blocking:Process remains suspended and yields processor to otherprocess.If a process executes unlock and there are blockedprocesses, one of them is un-blocked.Requires support from a scheduler (usually OS or runtime).

Alternative selection depends on cost.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 16/35

Page 17: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Components

Three elements of design in a locking mechanism:acquisition, waiting y release.

Acquisition method:Used to try to acquire the lock.

Waiting method:Mechanism to wait until lock can be acquired.

Release method:Mechanism to release one or several waiting processes.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 17/35

Page 18: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Simple locks

Shared variable k with two values.0 → open.1 → closed.

Lock(k)If k=1 → Busy waiting while k=1.If k=0 → k=1.Do not allow that 2 processes acquire a locksimultaneously.

Use read-modify-write to close it.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 18/35

Page 19: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Simple implementations

Test and set

void lock(atomic_flag & k) {while (k.test_and_set())

{}}

void unlock(atomic_flag & k) {k.clear () ;

}

Fetch and operate

void lock(atomic<int> & k) {while (k.fetch_or(1) == 1)

{}}

void unlock(atomic<int> & k) {k.store(0) ;

}

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 19/35

Page 20: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Simple implementations

Exchange IA-32

do_lock: mov eax, 1repeat: xchg eax, _k

cmp eax, 1jz repeat

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 20/35

Page 21: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Exponential delay

Goal:Reduce number of memory accesses.Limit energy consumption.

Lock with exponential delay

void lock(atomic_flag & k) {while (k.test_and_set()){

perform_pause(delay);delay ∗= 2;

}}

Time betweeninvocations totest_and_set() isincrementedexponentially

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 21/35

Page 22: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Synchronization and modification

Performance can be improved if using the samevariable to synchronize and communicate.

Avoid using shared variables only to synchronize.

Add a vector

double partial = 0;for ( int i=iproc; i<max; i+=nproc) {

partial += v[ i ];}sum.fetch_add(partial);

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 22/35

Page 23: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Locks and arrival order

Problem:Simple implementations do not fix a lock acquisition order.Starvation might happen.

Solution:Make the lock is acquired by request age.Guarantees FIFO order.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 23/35

Page 24: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Tagged locks

Two counters:Acquire counter: Number of processes that haverequested the lock.Release counter: Number of times the lock has beenreleased.

Lock:Tag → Acquisition counter value.Acquisition counter is incremented.Process remains waiting until the release counter matchesthe tag.

Unlock:Increments release counter.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 24/35

Page 25: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Locks

Queue based locks

Keep a queue with processes waiting to enter into acritical section.

Lock:Check if queue is empty.If a process joins the queue it performs busy waiting in avariable.

Each process performs busy waiting in a different variable.

Unlock:Removes process from queue.Modifies process waiting control variable.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 25/35

Page 26: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

1 Introduction

2 Hardware primitives

3 Locks

4 Barriers

5 Conclusion

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 26/35

Page 27: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

Barrera

A barrier allows to synchronize several processes in somepoint.

Guarantees that no process passes the barrier until all havearrived.

Used to synchronize phases in a program.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 27/35

Page 28: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

Centralized barriers

Centralized counter associated to the barrier.Counts number of processes that have arrived the barrier.

Barrier function:Increments counterWaits the counter to reach the number of processes to besynchronized.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 28/35

Page 29: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

Simple barrier

Simple implementation

do_barrier( barrier , n) {lock( barrier . lock) ;if ( barrier .counter == 0) {

barrier . flag=0;}local_counter = barrier .counter++;unlock(barrier . lock) ;if (local_counter == NP) {

barrier .counter=0;barrier . flag=1;

}else {

while ( barrier . flag==0) {}}

}

Problem if barrier isreused in a loop.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 29/35

Page 30: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

Barrier with way inversion

Simple implementation

do_barrier( barrier , n) {local_flag = ! local_flag ;lock( barrier . lock) ;local_counter = barrier .counter++;unlock(barrier . lock) ;if (local_counter == NP) {

barrier .counter=0;barrier . flag=local_flag ;

}else {

while ( barrier . flag==local_flag) {}}

}

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 30/35

Page 31: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Barriers

Tree barriers

A simple implementation of barriers is not scalable.Contention in access to shared variables.

Tree structure for process arrival and release.Specially useful in distributed networks.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 31/35

Page 32: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Conclusion

1 Introduction

2 Hardware primitives

3 Locks

4 Barriers

5 Conclusion

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 32/35

Page 33: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Conclusion

Summary

Need for shared memory access synchronization:Individual (1-1) and collective (1-N) communication.

Diversity of hardware primitives for synchronization.Locks as a mechanism for mutual exclusion.

Busy waiting versus blocking.Three design elements: acquisition, waiting, and release.

Locks may lead to problems if order is not fixed(starvation).

Solutions based in tags or queues.

Barriers offer mechanisms to structure programs inphases.

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 33/35

Page 34: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Conclusion

References

Computer Architecture. A Quantitative Approach.5th Ed.Hennessy and Patterson.Section: 5.5

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 34/35

Page 35: Computer Architecture J. Daniel García Sánchez ...ocw.uc3m.es/ingenieria-informatica/computer-architecture/lecture... · invocations to test_and_set() is incremented exponentially

Synchronization

Conclusion

SynchronizationComputer Architecture

J. Daniel García Sánchez (coordinator)David Expósito Singh

Francisco Javier García Blas

ARCOS GroupComputer Science and Engineering Department

University Carlos III of Madrid

cbed – Computer Architecture – ARCOS Group – http://www.arcos.inf.uc3m.es 35/35