ds 2009: introduction - university of cambridge · 2009-04-23 · 2.independent failure modes...

40
DS 2009: introduction David Evans [email protected]

Upload: others

Post on 24-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

DS 2009: introduction

David [email protected]

Page 2: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Distributed systems, Easter 20091. Introduction system, legal, social context;

technology-driven evolution; fundamental characteristics;software structure; models, architecture, engineering;

2. Time event ordering; physical clock synchronisation;process groups; ordering message delivery;

3. Distributed algorithms and protocols strong and weakconsistency, replicas; concurrency control; atomiccommitment; election algorithms; distributed mutualexclusion;

4. Middleware RPC, OOM, MOM, event-based middleware;5. Naming6. Access control capabilities, ACLs, RBAC and access

control policy; OASIS RBAC case study;7. Event-driven systems8. Storage services distribution issues

Page 3: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Distributed systems: introduction

I some systems backgroundI some legal & social contextI development of technology—DS evolutionI fundamental characteristics of DSI software structure for a nodeI model, architecture, engineeringI architectures for doing it on a large scale

Page 4: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Costly failures

I UK stock exchange share trading systemI abandoned 1993, £400M

I Califormia automated childcare supportI suspended 1997, $300M

I US tax system modernisationI scrapped 1997, $4× 109

I UK ASSIST, statistics on welfare benefitsI terminated 1994, £3.5M

I London Ambulance Service computer-aided despatchingI scrapped 1992, £7.5M, 20 deaths in 2 days

Page 5: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

What makes things special?

I normal software failure⇒ errant behaviour not accommodated by other parts of the

system⇒ a cascade of the failure that is spectacular

Page 6: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Where do high expectations come from?

I Web experiencee.g. information services: trains, postcodes, . . .e.g. online bankinge.g. airline reservationse.g. conference managemente.g. online shopping

I Things mostly work, helped byI read mostlyI client-serverI closely coupledI synchronous interaction (request-reply)I single-purposeI (often) private sectorI (often) focussed

Page 7: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Public-sector systems especially. . .

I are bespoke and complexI are largeI have heterogeneous clients and rolesI need a web portal, but are not like “traditional” web sitesI must be around for a long timeI often have ubiquitous/mobile requirementsI are influenced by competition & independent procurement

vs. interoperationI are influenced by legislation and government policy

Page 8: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Some legal and policy requirements

I exclusions: “patients may specify who may see, and notsee, their electronic health records (EHRs)”

I relationships: “only the doctor with whom the patient isregistered (for treatment) may prescribe drugs, read thepatient’s EHR, etc.”

I “the existence of certain sensitive components of EHRsmust be invisible, except to explicitly authorised roles”

I “buses should run to time and bus operators will bepunished if published timetables are not met”, so busoperators can be reluctant to cooperate in trafficmonitoring, even though monitoring could show that delayis often not their fault

Page 9: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Some legal and policy requirements

I exclusions: “patients may specify who may see, and notsee, their electronic health records (EHRs)”

I relationships: “only the doctor with whom the patient isregistered (for treatment) may prescribe drugs, read thepatient’s EHR, etc.”

I “the existence of certain sensitive components of EHRsmust be invisible, except to explicitly authorised roles”

I “buses should run to time and bus operators will bepunished if published timetables are not met”, so busoperators can be reluctant to cooperate in trafficmonitoring, even though monitoring could show that delayis often not their fault

Page 10: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Some legal and policy requirements

I exclusions: “patients may specify who may see, and notsee, their electronic health records (EHRs)”

I relationships: “only the doctor with whom the patient isregistered (for treatment) may prescribe drugs, read thepatient’s EHR, etc.”

I “the existence of certain sensitive components of EHRsmust be invisible, except to explicitly authorised roles”

I “buses should run to time and bus operators will bepunished if published timetables are not met”, so busoperators can be reluctant to cooperate in trafficmonitoring, even though monitoring could show that delayis often not their fault

Page 11: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Some legal and policy requirements

I exclusions: “patients may specify who may see, and notsee, their electronic health records (EHRs)”

I relationships: “only the doctor with whom the patient isregistered (for treatment) may prescribe drugs, read thepatient’s EHR, etc.”

I “the existence of certain sensitive components of EHRsmust be invisible, except to explicitly authorised roles”

I “buses should run to time and bus operators will bepunished if published timetables are not met”, so busoperators can be reluctant to cooperate in trafficmonitoring, even though monitoring could show that delayis often not their fault

Page 12: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Data protection legislation

Gathered data that identify individuals must not be stored.

I CCTV camera software must not recognise people andstore identities with images

I vehicle number plates must not be recognised and thenlinked to and stored with identities

Very messy and changes constantly.

Page 13: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Rapid development and new technology

I don’t get to build a second systemI rapid obsolescence means that incremental growth isn’t

sustainable. . .I . . . but should design for incremental deploymentI may demand use by mobile workers (healthcare, police,

utilities) and include cameras/sensors

Page 14: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

DS history: technology-driven evolution

I fast, reliable LANs (e.g., Ethernet, Cambridge Ring) madeDS possible in the 1980s

I early research was on distribution of OS functionality1. terminals + multi-access system(s)2. terminals + pool of processors + dedicated servers

(Cambridge CDCS)3. diskless workstations + servers (Stanford)4. workstations + servers (Xerox PARC, MIT Athena)

I now WANs are fast and reliable, so. . .

Page 15: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

terminals + terminals +multi-access system(s) processor pool + servers

LAN

Mainframe Mainframe

Terminal concentrator

Terminal concentrator

LAN

Resource manager

Terminal concentrator

Terminal concentrator

S

S

S

SS

D

D

P P PP

Processor pool

diskless workstations + servers workstations + servers

LANS

S

S

SS

D

D

W W WW

LANS

S

S

SS

D

D

W W WW

D D D D

Page 16: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

How to think about distributed systems

I fundamental characteristicsI software structure for a nodeI model/architecture/engineering for a system

Page 17: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Fundamental characteristics

1. concurrent execution of components2. independent failure modes3. transmission delay4. no global time

Implications:

2, 3 can’t know why there’s no reply—node/comms. failureand/or node/comms. congestion

4 can’t use locally generated timestamps for orderingdistributed events

1, 3 inconsistent views of state/data when it’s distributed1 can’t wait for quiescence to resolve inconsistencies

Page 18: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Single node software structure

Support for distributed software may be1. directly by OS in a homogeneousish cluster (distributed

OS design)—not the focus of this course2. by a software layer (middleware) above one or more

potentially heterogeneous OSs

OSfunctions

communicationssubsystem

middleware

components of distributed software

network

OS interface

homogeneous interface atoppossibly heterogeneous OSs

Page 19: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Distributed application structure: email, ftp, . . .

user's emailinterface (MUA)

SMTP

OS comms. interface

standard comms. supporting all applications

specific application protocol

names required for clients and messages

Page 20: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Distributed application structure: the web

user's webinterface

(browser, etc.)

HTTP

OS comms. interface

standard comms. supporting all applications

specific application protocol

names required for documents

Page 21: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Distributed application structure: in general

component of distributed application

RPC(or whatever)

OS comms. interface

standard comms. supporting all applications

general application-level protocol

names required for interfaces, procedures, …

Page 22: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Open and proprietary middleware

I evolution controlled by standards bodies or consortiaI interoperability “bake-offs” are not uncommonI resulting system something of a compromise, which can be

good or bad

versus

I can be changed by the owner (users may need to buy a newrelease)

I consistency across versions is not guaranteedI good for technical extortion

Page 23: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Interoperability and languages

I can your system span multiple middlewares (includingdifferent implementations of the same MW)?

I can components be written in different languages andinteroperate?

Page 24: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Model of distributed computation

I what are the named entities? objects, components,services, . . .

I how is communication achieved?I synchronous/blocking (request-response) invocation, e.g.,

the client-server modelI asynchronous messages, e.g., the event notification modelI one-to-one, one-to-many?

I are the communicating entities closely or loosely coupled?I must they share a programming context?I must they be running at the same time?

Page 25: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Model of distributed computation

I what are the named entities? objects, components,services, . . .

I how is communication achieved?I synchronous/blocking (request-response) invocation, e.g.,

the client-server modelI asynchronous messages, e.g., the event notification modelI one-to-one, one-to-many?

I are the communicating entities closely or loosely coupled?I must they share a programming context?I must they be running at the same time?

Page 26: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Model of distributed computation

I what are the named entities? objects, components,services, . . .

I how is communication achieved?I synchronous/blocking (request-response) invocation, e.g.,

the client-server modelI asynchronous messages, e.g., the event notification modelI one-to-one, one-to-many?

I are the communicating entities closely or loosely coupled?I must they share a programming context?I must they be running at the same time?

Page 27: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

System architecture

. . . the framework within which the entities in the modelinteroperate

I namingI location of named objectsI security of communicationI authentication of participantsI protection/access control/authorisationI replication to meet requirements for reliability, availability

Entities may be defined within administration domains; need toconsider multi-domain systems and interoperation within andbetween domains

Page 28: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

System engineering implementation decisions

I placement of functionality: client libraries, user agents,servers, wrappers/interception

I replication for failure tolerance, performance, loadbalancing⇒ consistency issues

I optimisations, e.g., caching, batchingI selection of standards, e.g., XML, X.509I what “transparencies”1 to provide at what level

I distribution transparency: location? failure? migration?⇒ may not be achievable or may be too costly

1hidden from application developer; needn’t be programmed for, can’t bedetected when running

Page 29: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Architectures for large-scale, networked systems

1. federated administration domainsI integration of databasesI integration of sensor networksI small dynamic domains with members grounded in various

static administration domains

2. independent, external services to be integrated3. detached, ad hoc, anonymous groups; anonymous

principals, issues of risk and trust

Page 30: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Examples of federated administration domains

I national healthcare services: many hospitals, clinics,primary care practices

I national police services: county police forcesI global company: branches in London, Tokyo, New York,

Berlin, Paris, . . .I transport: County Councils responsible for cities, some

roadsI active city: fire, police, ambulance, healthcare services;

mobile workers; sensor networks, e.g., for traffic/pollutionmonitoring

Page 31: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Federated domains—characteristics

I names administered per domain (users, roles, services,data-types, messages, sensors, . . . )

I authentication users administered within a domainI communication needed within and between domainsI security per-domain firewall-protectionI policies specified per domain, e.g., for access control;

intra- and inter-domain, plus some external policies tosatisfy government, legal and institutional requirements

I high trust high accountability

Page 32: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Small dynamic domains with members groundedin static administration domains

I e.g.assisted home-living (sheltered housing) “patient” +various carers + technology

I carers have roles in primary care practices, hospitals,social services, out-sourced services

I care programme is specified by contractI rights of patient to defined careI obligations of carers and patientsI privacy of patient dataI need to audit people and technology

Page 33: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Dynamic domains—characteristics

I names principals (users, roles): from home domains;services, data types, messages, sensors set up for smalldynamic domains

I authentication users administered within home domain;need for credential check back to home domain (as infederated domains)

I communication needed across domainsI policies indicate contractual obligations and privileges

(access control)I audit of people, technologyI trust based on observation of audit (and reputation?)

Page 34: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Examples of independent, external services

I commercial web-based services: online banking, airlinebooking

I national services used by police and others: DVLA,court-case workflow

I national health services: national Electronic HealthRecord (EHR) service

I e-science (grid) databases and generic services:astronomical, transport, medical databases for computationor storage

I virtual organisations: collaborating groups across severaldomains

Page 35: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Independent, external services—characteristics

I naming and authentication may be client-domain-related,and/or of individuals via certification authorities (CAs)

I policies related to client roles in domains and/or individualprincipals may provide support for “virtual organisations”

I need accounting, charging, auditI trust based on evidence of behaviour; clients exchange

experiences, services monitor and record; often assumefull connectivity

Page 36: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Detached, ad hoc, anonymous groups

I may be connected by wirelessI can’t assume trusted third parties (CAs) accessibleI can’t assume knowledge of names and roles; identity likely

to be by key/pseudonymI new identities can be generated (by detected villains)I parties need to decide whether to interactI each has a trust policy and a trust engineI each computes whether to proceed—policy is based on:

I accumulated trust information (from recommendations andevidence from monitoring)

I risk (resource-cost) and likelihood of possible outcomes

Page 37: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Examples of detached, ad hoc, anonymous groups

I Commuters regularly play cards on the trainI E-purse purchasesI Recommendations of people and, e.g., restaurants in a

tourist scenarioI Wireless routing via peersI Routing of messages P2P rather than by dedicated

brokers—reliability, confidentiality, altruismI Trust has a context

Page 38: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

Promising approaches for large-scale systems

I Roles for scalabilityI Parametrised roles for expressivenessI RBAC for services, service-managed objects, including the

communication serviceI Policy specification and change managementI Policy-driven system managementI Asynchronous, loosely-coupled communication:

publish/subscribe for scalability; event-driven paradigmfor ubiquitous computing

I Database integration—how best to achieve it?

Page 39: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

The OPERA group—research themesSome are specific projects (from the past or present); some areprinciples.

I Access Control: OASIS RBAC (Open Architecture forSecurely Interworking Services)

I Policy expression and managementI Event-driven systems: CEA, Hermes, EDSAC21I Trust and risk in global computing (EU SECURE): secure

collaboration among ubiquitous roaming entitiesI TIME: a Traffic Information Monitoring Environment

I TIME-EACM Event Architecture and ContextManagement

I CareGrid: dynamic trust domains for healthcareapplications

I SmartFlow: extensible event-based middleware

Page 40: DS 2009: introduction - University of Cambridge · 2009-04-23 · 2.independent failure modes 3.transmission delay 4.no global time Implications: 2,3can’t know why there’s no

The OPERA group—research themes

seehttp://www.cl.cam.ac.uk/Research/SRG/operafor people, projects, publications for download