computer architecture 1dt016: multiprocessing and...

42
[email protected] 2017 1 Multiprocessing and Operating systems from a Computer architecture perspective Computer Architecture 1DT016 distance Fall 2017 http://xyx.se/1DT016/index.php Per Foyer Mail: [email protected] 1

Upload: others

Post on 22-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

[email protected] 2017 1

Multiprocessingand

Operating systems from a Computer architecture perspective

Computer Architecture

1DT016 distanceFall 2017

http://xyx.se/1DT016/index.php

Per FoyerMail: [email protected]

1

Page 2: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

In this session

[email protected] 2017 2

Page 3: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Challenges i parallel computing

[email protected] 2017 3

”Nine women can not give birth to one child in one month no matter

how hard they try”

The fundamental challenges of parallel computing:

• Not all problems can be parallized. Some tasks must be executed in sequence.

• Tasks that have parallelizable algorithms are not infinitely scalable

• There is little compiler support for parallel programming

• Some parallel algorithms are plagued with massive load inbalance due to non-uniform data distribution

• Parallel distributed algorithms are not always easy to synchronize and debug

Page 4: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Embedded systems

[email protected] 2017 4

• Can perform independent or distributed tasks

• Networking over CAN-bus, I2C and even TCP/IP

• May operate under real-time constraints

• If powerful enough, can be used as very low cost computing nodes in distributed systems as grids or clusters

Page 5: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

System on a Chip (SoC)

[email protected] 2017 5

BCM 2835 Raspberry Pi SoC

The Raspberry Pi has an intricate boot sequence:Stage one to four is executed by the GPU (!)

Stage 1: Boot is in the GPU on-chip ROM. Loads Stage 2 in the L2 cache

Stage 2: bootcode.bin from SD-cardEnables SDRAM and loads Stage 3

Stage 3: loader.bin. Knows about the .elf format and loads start.elf

Stage 4: start.elf loads kernel.img firmware into ARM CPU.

Stage 5: kernel.img is run on the ARM that loads OS

GPU: Graphics Processing UnitELF: Executable and Linkable Format

Page 6: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

FPGAs and Soft Cores

[email protected] 2017 6

Field Programmable Gate Array• LUTs - LookUp Tables

(~Truth tables)

ARM Cortex-M0 processor now availablefree of charge from ARM Holding Inc.…several ARM clones available (OpenCores.org)

FPGA development workflow• HDL (Verilog / VHDL)• Compile• Synthesize / Verify• Bitstream

Page 7: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Unique: The Propeller CPU

[email protected] 2017 7

Round-Robin Scheduler between active Cogs

Page 8: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Boot sequence: x86 / x86_32 / x86_64

[email protected] 2017 8

[1] 1MB max. 640 kB DOS – 16-bit instructions now probably microcoded[2] 4 GB max. Supervisor/User modes, memory protection Virtual x86 (16-bit) support[3] 2 ^ 64 = 1.833 x 10^19 B

[1]

[2]

[3]

Page 9: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

AMD/Intel protection features

[email protected] 2017 9

In order to safeguard in a multiprocessor environment, both AMD andIntel have some essential features in hardware:

Function Intel AMDVirtual Technology Extensions VT-x AMD-vPhysical Address Extension [1] PAE PAEExecution Protection (data) [2] XD NXStreaming SIMD Extension SSE SSE

Acronyms:NX: No eXecute, XD: eXecution Disable

[1] Makes it possible to address more than 4GB in 32-bit mode.Needs NX/XD to be active

[2] Prevents exploits like executing malicious code in the data area(buffer overflow attacks, malware,…)Note: x86 is a vN architecture. A Harvard machine doesn’tneed this kind of protection.

Page 10: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Multicore processor boot sequence

[email protected] 2017 10

U3A2

A1C0

Memory

Booting an operating system from cold upto fully running applications:

Intel model for x86_32 and x86_64:

• C0 performs initial loading from low levelhardware interface in 16-bit x86 real mode

• C0 switches to protected supervisor modex86_64 and loads the operating system

• C0 (the OS) allocates resources for theapplication cores and starts them

• One or more cores may be allocatedfor utility processing (U0)

Note: C0 is always the boot processor

If it’s a Harvard or von Neumann configuration doesn’t matter. The principles are the same.

Page 11: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Windows task manager

[email protected] 2017 11

Page 12: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

BIOS / UEFI / U-boot

[email protected] 2017 12

Page 13: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Frankly a very scary technology (when looking at the potential security ramifications) included in all modern Intel CPU:s

Intel AMT / ME / IE:

• Is independent of main CPU• Based on the MINIX operating system [1]• Executes in Ring -3• Can access host memory via DMA (with restrictions)• Dedicated link to NIC, and its filtering capabilities• Can force host OS to reboot at any time (and boot the

system from the emulated CDROM)• Active even in S3 (suspended mode) sleep!• Exploited at Black Hat Europe conference on december 6th, 2017

Some Virtual Hypervisors (Xen) uses Intel VT-d in order to protect itself and consequently, for example malicious software is not able to accessthis memory of such hypervisors. Or so it’s believed…

[1] Professor Andrew S. Tanenbaum, the MINIX OS creator, is very angry about this

Intel AMT / ME / IE

[email protected] 2017 13

AMT = Active Management TechnologyME = Management EngineIE = Innovation Engine (what ever that is… - undocumented)

Page 14: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Tightly coupled distributed system

[email protected] 2017 14

MultiprocessorLatency: nS

C

C

C

C

CCC

C

C

C CC

C = CPU entity

SharedMemory

Multicore or SMP

Page 15: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Closely coupled distributed system

[email protected] 2017 15

C

C

C

C

CCC

C

C

C CC

Inter-connect

M M

M M

M

M

M M

M M

M

M

C = CPU entityM = Local memory

MulticomputerLatency: µS

Page 16: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Loosely coupled distributed system

[email protected] 2017 16

M

C+

M

C+

M

C+

M

C+

M

C+

M

C+

MultisystemLatency: mS

M

C+

C+ = Complete systemM = Memory configuration

Wide Area Network

Page 17: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Grid computing

[email protected] 2017 17

C

C

CC

CC

C

C

C

CC

C+• Node availability and capacity is not known

or guaranteed beforehand• Nodes “phone home” to grid controller• Nodes may be homogenous or heterogenous

Grid controller

Nodes

Local or wide area network

Good for tasks that are easyto parallelize or split

Page 18: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Famous grid example: seti@home

[email protected] 2017 18

Search for Extra Terrestial IntelligenceActive since 1999. Driven by UC Berkeley (https://setiathome.berkeley.edu)

Page 19: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Computer clusters

[email protected] 2017 19

M

C+

M

C+

M

C+

M

C+

M

C+

Load balancer

Cluster controller• Uses cluster aware OS

Computing nodes

Load balancer:Passive: Round-robin task distributionActive: Measures load on nodes

before task distribution

The load balancer may be transparent to the cluster controller

A cluster can be homogenous (same architecture) or heterogeneous (mixed architecture)

Page 20: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Other connection schemes (1)

[email protected] 2017 20

Traffic routing between independent nodes in parallel computing is normally not trivial. It may impose a burden on the operating system(s) causing overhead in scheduling due to routing calculations.

Some configurations for sending data from one (independent) node to another:

Ring Hopsmax = n/2

Complete mesh Hopsmax = 1

CubeHopsmax = log2 nWhat happens if one node fails?

F = Frontend processor (FEP)

F

F

F

Page 21: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Other connection schemes (2)

[email protected] 2017 21

4

2

1 3

6

5 7

Balanced binary treeHopsmax = 2 * | log2 n |

What happens if one node fails?

HypercubeHopsmax = | log2 n |

F

F

F = Frontend processor (FEP)

Page 22: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Super computing by architecture

[email protected] 2017 22

Page 23: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Multicomputing redundancy

[email protected] 2017 23

M M M

DB

Intercommunicationprotocol between nodes

OL: On-lineHS: Hot standby

The system consists of one computing system and a database.There are two hot-standby systems ready to take over if the on-line system fails. How is failure determined?

• If OL fails, HS1 immediately takes over control and becomes OL

• In mission critical systems where a node doesn’t produce the same resultsas the others, the faulty node will be disconnected and another takes over.

OL HS2HS1

Page 24: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Redundancy design misstake (1)

[email protected] 2017 24

M M

DB

Simple heartbeat protocol between application nodes

OL: On-line nodeHS: Hot-standby nodeDB: Database server

OL HS

Communication betweenapplication nodes and databaseserver based on TCP/IP

Theory: If one of the application nodes fails, the heartbeat will cease and the other one take over.

WRONG: There is no MUTEX guarantee here. If the heartbeat line fails butboth application nodes are ok, BOTH think their neighbor has failed. Theresult is a “split brain” disaster where both application nodes accessesthe database and almost certainly destroys data and cause inconsistencies.

MUTEX =MUTual EXclusion

Page 25: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Redundancy design misstake (2)

[email protected] 2017 25

M M

DB

Simple heartbeat protocol between application nodes

OL: On-line nodeHS: Hot-standby nodeDB: Database server

OL HS

Communication betweenapplication nodes and databaseserver based on TCP/IP

How resolve the “split brain” problem on the previous slide?

Use the database disk control hardware AND heartbeat tests between OL, HS and DB to guarantee MUTEX at any one time.

Page 26: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Virtual machines (1)

[email protected] 2017 26

• VM Technology allows virtual machines to run on a single physical machine• VM is not about simulation. The guest OS must follow the underlying

hardware architecture (e.g. Intel x86_64, SPARC, etc)• The guest OS has no knowledge about that it is executing in a VM

Hardware

Virtual Machine Monitor (VMM) / Hypervisor

VM VM VM

Guest OS Guest OS Guest OS

App App App App App

Page 27: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Virtual machines (2)

[email protected] 2017 27

Hardware

Virtual Machine Monitor (VMM) / Hypervisor

VM VM VM

Guest OS VM supplies guestwith completevirtual hardware

VMM optimizes theutilization of theunderlying physicalhardware

Guest OS uses devicedrivers that match thevirtual hardware

With paravirtualization a VM can execute very close tophysical hardware speed.The VMM distributes load over physical hardware CPUs and/or CPU cores

Page 28: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

VMM: XenServer

[email protected] 2017 28

Uses Paravirtualizationvery close to the physical Hardware

Can pre-allocateresources as memoryand CPUs/cores

Completely free at:xenserver.org

Executes directlyabove thehardware levelXenServer VMM isan OS in itself

Page 29: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

VMM: VirtualBox

[email protected] 2017 29

Completely free at virtualbox.orgExecutes within a host OS (Windows, macOS, Linux) with good performance

Page 30: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Operating systems

[email protected] 2017 30

If there is no support in software for hardware with multiprocessing capabilities, that hardware will be useless!

Page 31: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Programs, Processes and Threads

[email protected] 2017 31

Program: Binary containing executable code and data segments Needs an OS to load and run.

Process: Executing entity having it’s own context (code and resources) Have been scheduled by OS

Thread: (Software): Lightweight process executing in a “host process context” sharing the host resources

Thread: (Hardware – Hyper-threading): Presents a number of logical CPU:s to the OS. E.g., A hyper-threaded single core appears as two virtual CPU:s to the OS.

If one virtual CPU is waiting, the other can borrow it’s resources. The OS doesn’t now about this. It sees two cores (or more)

Page 32: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Operating system layers

[email protected] 2017 32

Device drivers

Hardware

Kernel

OS Core services

APIs

System libraries

Low level SW to HW Interface

Process scheduler, low levelresource management and protection

File systems, timed events,High level resource mgmt

Application to operatingsystem SW interface

Common application highlevel routines

Prog Prog Prog ProgA program may use severalinterconnected processes

Page 33: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Operating system execution rings

[email protected] 2017 33

There’s more to this…

Ring -1 (minus one): • (HW) Hypervisor mode• Can pre-empt ring 0

Ring -2 (minus two):• (HW) System Management

Mode (SMM)• Can pre-empt ring -1

Ring -3 (x86) (minus three):• Separate processing unit

inside Intel CPUs• BIG controversy (MINIX)• Very little is known about

this mode• Intel ME/IE• THIS IS SCARY !!!…depending on hardware

Page 34: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: Scheduler

[email protected] 2017 34

The Scheduler becomes more complex for each computing elementadded CPU-cores, Multi-CPU, distributed nodes

Page 35: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: The context switch

[email protected] 2017 35

The Context switch is the single most time critical part of an operating system

It switches execution context between processes

It has to protect CPU-registers etc that are used by processes on a low level

The context switch is very often written in assembler for maximum speed

When switching1. Freeze execution of current process2. Save state for current process (save registers, private stack pointer, …)3. Load (frozen) state for next process (restore registers, …)4. Resume execution of next process. jnhtfrewdsr56§qw

In a overly loaded system a situation called thrashing may occur:

The number of context switches per time unit is so many that the operatingsystems spends more time on switching context than executing processes.

Page 36: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS process states

[email protected] 2017 36

Waiting

Ready Running

TerminatedNew

Interrupt

SchedulerDispatch

Admitted Exit

I/O or eventcompletion

I/O or eventwait

Page 37: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

Scheduling

[email protected] 2017 37

Assume processes P1, P2 and P3 and one time-frame

P2P1 P1P3 P2 P3

t1

Execution time for each process always 1/3 * tSimple but wasteful if some processes are in wait and don’t need to be scheduled

P1 P2 P3 P1 P2

Priority scheduling

Round Robin (with pre-emption)

P3 enters wait for resource, or exitst2

t1 t2

If P2 has higher prioritythan P1, P2 can be givenmore execution timein next time frame

Page 38: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: pipes

[email protected] 2017 38

A pipe is a mechanism thatallows for bi-directionalasynchronous communicationbetween two processes

Pipe operation is controlled bythe OS scheduler. Data can onlyflow when a process is in itsrunning state.

Pipes are mainly used where latency is low, e.g. in tightly coupled systems

Page 39: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: Semaphores

[email protected] 2017 39

Semaphores is an operating system mechanism that is used to protect a sharedresource.

The resource can be sharedby two or more processes.

The OS guarantees that one,and only one, process canaccess the shared resourceat any one time (MUTEX).

MUTEX stands for MUTualEXclusion

Single core processor systemsSometimes use spin-locksWaiting for MUTEX.

Page 40: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: Deadlocks

[email protected] 2017 40

A deadlock can occur if two processes are waiting for each other orif several processes are in a circular wait.

It may also happen if one process holding a shared resource stops or dies

P1 P2

R2

R1

P: ProcessR: Resource

Has

Has

Waits for

Waits for

Ways for a kernel to break a deadlock:• Forced process pre-emption and rescheduling• Process termination• Force resource release

Page 41: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

OS: Message queues (Mailboxes)

[email protected] 2017 41

Message queues are used for interprocesscommunication.

Processes can set message priorities whichare handled by the OS

The OS guarantees MUTEX on queues

Client Server

Page 42: Computer Architecture 1DT016: Multiprocessing and ...xyx.se/1DT016/sessions/Session7-Multiprocessing-and-OS.pdf · Computer architecture perspective Computer Architecture 1DT016 distance

[email protected] 2017 42

Häpp! Finito la musica!;-)