reliable distributed systems web services

40
1 Reliable Distributed Systems Web Services Notes Provided with Ken Birman’s Text Modified by B. Ramamurthy

Upload: datacenters

Post on 11-Feb-2017

353 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Reliable Distributed Systems Web Services

1

Reliable Distributed Systems

Web ServicesNotes Provided with Ken Birman’s

TextModified by B. Ramamurthy

Page 2: Reliable Distributed Systems Web Services

2

Topics Web Services – Introduction “Remote Procedure Call” in WS

Binding, Marshalling…

Page 3: Reliable Distributed Systems Web Services

3

What are Web Services? Today, we normally use Web browsers

to talk to Web sites Browser names document via URL (lots of

fun and games can happen here) Request and reply encoded in HTML, using

HTTP to issue request to the site Web Services generalize this model so

that applications can talk to applications Informational to transactional transition

Page 4: Reliable Distributed Systems Web Services

4

What are Web Services?

Web Service

SOAP Router

Backend Processes

Client System

Page 5: Reliable Distributed Systems Web Services

5

What are Web Services? “Web Services are software

components described via WSDL which are capable of being accessed via standard network protocols such as SOAP over HTTP.”

Web Service

SOAP Router

Backend Processes

Page 6: Reliable Distributed Systems Web Services

6

What are Web Services? “Web Services are software

components described via WSDL which are capable of being accessed via standard network protocols such as SOAP over HTTP.”

Web Service

SOAP Router

Backend ProcessesToday, SOAP is the primary standard.

SOAP provides rules for encoding the request and its arguments.

Page 7: Reliable Distributed Systems Web Services

7

What are Web Services? “Web Services are software

components described via WSDL which are capable of being accessed via standard network protocols such as SOAP over HTTP.”

Web Service

SOAP Router

Backend ProcessesSimilarly, the architecture doesn’t assume

that all access will employ HTTP over TCP. In fact, .NET uses Web Services “internally” even on a single machine. But in that case,

communication is over COM

Page 8: Reliable Distributed Systems Web Services

8

What are Web Services? “Web Services are software

components described via WSDL which are capable of being accessed via standard network protocols such as SOAP over HTTP.”

Web Service

SOAP Router

WSDLdocument+

Backend Processes

WSDL documents are used to drive object assembly,

code generation, and other

development tools.

Page 9: Reliable Distributed Systems Web Services

9

Web Services are often Front Ends

WebServer

(e.g., IBM WebSphere,

BEAWebLogic)

DB2server

SAP

WSDL-described

Web Service

WebApp

Server

Web Service invoker

COMApp

CORBAApp

C#App

Server PlatformClient Platform

SOAPmessaging

Page 10: Reliable Distributed Systems Web Services

10

The Web Services “stack”BusinessProcesses

Qualityof

Service

Description

Messaging

Transport

CoordinationReliable

Messaging Security

BPEL4WS (IBM only, for now)

XML, EncodingOther

Protocols

TCP/IP or other network transport protocols

SOAPWSDL, UDDI, Inspection

Transactions

Page 11: Reliable Distributed Systems Web Services

11

What are Web Services? Amazon would hand out

“serverlets” for 3rd party developers to use

This connects their applications directly to Amazon’s system

Web Service

SOAP Router

Backend Processesserverlet

Page 12: Reliable Distributed Systems Web Services

12

Advantages of web services?*

Web services provide interoperability between various software applications running on various platforms.

“vendor, platform, and language agnostic” Web services leverage open standards and protocols.

Protocols and data formats are text based where possible

Easy for developers to understand what is going on. By piggybacking on HTTP, web services can work

through many common firewall security measures without requiring changes to their filtering rules.*: From Wikipedia

Page 13: Reliable Distributed Systems Web Services

13

How Web Services work First the client discovers the

service. Typically, client then binds to the

server. By setting up TCP connection to the

discovered address . But binding not always needed.

Page 14: Reliable Distributed Systems Web Services

14

How it works… Next build the SOAP request: (Marshaling)

Fill in what service is needed, and the arguments. Send it to server side.

XML is the standard for encoding the data (but is very verbose and results in HUGE overheads)

SOAP router routes the request to the appropriate server(assuming more than one available server) Can do load balancing here.

Page 15: Reliable Distributed Systems Web Services

15

How it works… Server unpacks the request,

(Demarshaling) handles it, computes result.

Result sent back in the reverse direction: from the server to the SOAP router back to the client.

Page 16: Reliable Distributed Systems Web Services

16

Marshalling Issues Data exchanged between client and

server needs to be in a platform independent format. “Endian”ness differ between machines. Data alignment issue (16/32/64 bits) Multiple floating point representations. Pointers (Have to support legacy systems too)

Page 17: Reliable Distributed Systems Web Services

17

Discovery This is the problem of finding the

“right” service one way to do it – with a URL Web Services community favors what

they call a URN: Uniform Resource Name But the more general approach is to

use an intermediary: a discovery service

Page 18: Reliable Distributed Systems Web Services

18

Name Type Publisher Toolkit Language OS

Web Services Performance and Load Tester

Application LisaWu N/A Cross-Platform

Temperature Service Client Application vinuk Glue Java Cross-Platform

Weather Buddy Application rdmgh724890 MS .NET C# Windows

DreamFactory Client Application billappleton DreamFactory Javascript Cross-Platform

Temperature Perl Client Example Source gfinke13 Perl Cross-Platform

Apache SOAP sample source Example Source xmethods.net Apache SOAP Java Cross-Platform

ASS 4 Example Source TVG SOAPLite N/A Cross-Platform

PocketSOAP demo Example Source simonfell PocketSOAP C++ Windows

easysoap temperature Example Source a00 EasySoap++ C++ Windows

Weather Service Client with MS- Visual Basic

Example Source oglimmer MS SOAP Visual Basic Windows

TemperatureClient Example Source jgalyan MS .NET C# Windows

Example of a repository

Page 19: Reliable Distributed Systems Web Services

19

Repository summary A database listing servers Each is described using the UDDI

language, which is defined over XML Hence can be searched with XML queries

An extensible standard Defines some required information about

interfaces available and argument types, etc But services can provide extra information

too.

Page 20: Reliable Distributed Systems Web Services

20

Roles? UDDI is used to write down the

information that became a “row” in the repository (“I have a temperature service…”)

WSDL documents the interfaces and data types used by the service

Page 21: Reliable Distributed Systems Web Services

21

Discovery and naming The topic raises some tough questions

Many settings, like the big data centers run by large corporations, have rather standard structure. Can we automate discovery?

How to debug if applications might sometimes bind to the wrong service?

Delegation and migration are very tricky Should a system automatically launch

services on demand?

Page 22: Reliable Distributed Systems Web Services

22

Example: Why discovery is tricky Client has opinions

“I want current map data for Disneyland showing line-lengths for the rides right now”

Service has opinions Amazon.com would like requests from Ithaca to

go to the NJ-3 datacenter, and if possible, to the same server instance within each clustered service

DNS has opinions Many systems play with name -> IP bindings

Internet has opinions (routing)

Page 23: Reliable Distributed Systems Web Services

23

So, what’s tricky? Web Services doesn’t standardize

these four steps, it just assumes that people will hack solutions

Hence some are hard to implement, we lack standards, and in some cases, solutions are poor ones

UDDI and WSDL are just a corner of the overall picture!

Page 24: Reliable Distributed Systems Web Services

24

Network address translation… Another issue: Often, the internal address

is not addressable from outside! A tiny bit of security. But if RPC server is behind a NAT, trouble!

NAT needs the host behind it to start the connection process.

Need to configure NAT to let specified traffic through.

Generally: (WS traffic)HTTP is let through. Tough to have a connection in between two

hosts behind NATs. There are some tricks to bypass this though.

Page 25: Reliable Distributed Systems Web Services

25

Firewalls These allow/disallow traffic, depending on source,

destination, protocol used, etc. Often only allow connection from the inside to the

outside! Stateful: remember active flows, and disallow

unexpected packets (NAT) Again, need to configure to ensure server traffic gets

through. (General RPC) Again, (WS)HTTP does not face as much of a restriction.

Get traffic statistics. Spam/virus checking, etc. NAT and firewall typically in the same box.

Page 26: Reliable Distributed Systems Web Services

26

Demilitarized Zone (DMZ) DMZ: used to host publicly

accessible services like company webpages, ftp, dns.

Good place to host the Web Service!

DMZ situated outside the private network.

No outgoing connections from DMZ.

If DMZ attacked, damage limited to DMZ.

Page 27: Reliable Distributed Systems Web Services

27

WebServiceWeb

ServiceWebServices

Client talks to eStuff.com Moving on… let’s oversimplify and

just assume the client manages to find the data center

We think of remote method invocation and Web Services as a simple chain:

Clientsystem

Soap RPCSOAProuter

Page 28: Reliable Distributed Systems Web Services

28

So… suppose we get in Assuming we can connect to the

data center (to its Web Services router), then what?

If you just use Visual Studio out of the box, you end up with a single-machine Web Server

But massive datacenters are common!

Page 29: Reliable Distributed Systems Web Services

29

A glimpse inside eStuff.com

Pub-sub combined with point-to-pointcommunication technologies like TCP

LB

service

LB

service

LB

service

LB

service

LB

service

LB

service

“front-end applications”

Page 30: Reliable Distributed Systems Web Services

30

Clusters and load balancing Idea here is that some form of load

balancer spreads work over a cluster

And cluster replicates data for availability and load management

Page 31: Reliable Distributed Systems Web Services

31

What about “legacy” applications? Some of these Web services are really just

front-ends to older legacy applications So to talk to an old IBM database, we might

Run the database on some sort of machine, or virtual machine

Build one of these translator front-ends And then register it with the Web Services router

This may sound expensive (it is) but it works!

Obviously, our fancy clustering and load-balancing won’t apply to a legacy application, so those fancy tricks are only for “new” code

Page 32: Reliable Distributed Systems Web Services

32

Discovery in eStuff.com Data centers are increasingly common And they raise hard questions!

How can a data center in California control decisions a client is making in Ithaca?

Services are clustered. How should client request be “routed” to the right member

Once you start talking to a server it may cache data for you. How can you be sure to get the right one next time?

Page 33: Reliable Distributed Systems Web Services

33

These are modern challenges Web Services can be seen as evolving

from prior work Most often cited: CORBA, which also

was used in many big data centers But CORBA didn’t assume that clients

came in over the public Internet More often, CORBA was used between a

hand-built client and the service it talks to

Page 34: Reliable Distributed Systems Web Services

34

CORBA approach CORBA had what are called

Ways to export specialized client stubs The client stub could include server

provided decision logic, like “which data center to connect with”

Gives data center a form of remote control Factory services: manufacture certain

kinds of objects as needed Effect was that “discovery” can also be a

“service creation” activity

Page 35: Reliable Distributed Systems Web Services

35

CORBA is object oriented Seems obvious… and it is. CORBA is centered

around the notion of an object Objects can be passive (data) … active (programs) … persistent (data that gets saved) … volatile (state only while running)

In CORBA the application that manages the object is inseparable from the object

And the stub on the client side is part of the application The request per-se is an action by the object on itself

and could even exploit various special protocols We can’t do this in Web Services

Page 36: Reliable Distributed Systems Web Services

36

Web Services are document-centric That is, communication is by sending

documents (like pages) from client to server and back

And most guarantees or properties are associated with the document itself, not the service

For example, WS_RELIABILITY isn’t about making services reliable, it defines rules for writing reliability requests down and attaching them to documents

In contrast, CORBA fault-tolerance standard tells how to make a CORBA service into a highly available clustered service

Page 37: Reliable Distributed Systems Web Services

37

Will Web Services “help” with naming and discovery? Web Services tells us how

One client can… … find one server and … bind to that server and … send a request that will make

sense … and make sense of the response

Page 38: Reliable Distributed Systems Web Services

38

But Web Services won’t… Allow the data center to control decisions

the client makes Assist us in implementing naming and

discovery in scalable cluster-style services How to load balance? How to replicate data?

What precisely happens if a node crashes or one is launched while the service is up?

Help with dynamics. For example, best server for a given client can be a function of load but also affinity, recent tasks, etc

Page 39: Reliable Distributed Systems Web Services

39

How we do it now Client queries directory to find the service Server has several options:

Web pages with dynamically created URLs Server can point to different places, by changing host names Content hosting companies remap URLs on the fly. E.g.

http://www.akamai.com/www.cs.cornell.edu (reroutes requests for www.cs.cornell.edu to Akamai)

Server can control mapping from host to IP addr. Must use short-lived DNS records; overheads are very high! Can also intercept incoming requests and redirect on the fly

Page 40: Reliable Distributed Systems Web Services

40

Why this isn’t good enough The mechanisms aren’t standard and are

hard to implement Akamai, for example, does content hosting

using all sorts of proprietary tricks And they are costly

The DNS control mechanisms force DNS cache misses and hence many requests do RPC to the data center

We lack a standard, well supported, solution!

How content is managed in even larger systems, that have multiple data centers