cs6320 – systems, networking and intro to performance
DESCRIPTION
CS6320 – Systems, Networking and intro to Performance. L. Grewe. Systems and Issues. Common ingredients of the Web (review) URL, HTML, and HTTP HTTP: the protocol and its stateless property Web Systems Components (review) Clients Servers DNS (Domain Name System) - PowerPoint PPT PresentationTRANSCRIPT
11
CS6320 – Systems, CS6320 – Systems, Networking and intro to Networking and intro to
Performance Performance
L. GreweL. Grewe
22
Systems and IssuesSystems and Issues Common ingredients of the Web (review)Common ingredients of the Web (review)
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems Components (review)Web Systems Components (review)• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network protocol: Interaction with underlying network protocol: TCPTCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
33
Web HistoryWeb History Before the 1970s-1980sBefore the 1970s-1980s
• Internet used mainly by researchers and academicsInternet used mainly by researchers and academics• Log in remote machines, transfer files, exchange e-mailLog in remote machines, transfer files, exchange e-mail
Internet growth and commercializationInternet growth and commercialization• 1988: ARPANET gradually replaced by the NSFNET1988: ARPANET gradually replaced by the NSFNET• Early 1990s: NSFNET begins to allow commercial trafficEarly 1990s: NSFNET begins to allow commercial traffic
Initial proposal for the Web by Berners-Lee in 1989Initial proposal for the Web by Berners-Lee in 1989 Enablers for the success of the WebEnablers for the success of the Web
• 1980s: Home computers with graphical user interfaces1980s: Home computers with graphical user interfaces• 1990s: Power of PCs increases, and cost decreases1990s: Power of PCs increases, and cost decreases
44
Common ingredients of the WebCommon ingredients of the Web
URLURL• Denotes the global unique location of the web resourceDenotes the global unique location of the web resource• Formatted stringFormatted string
e.g., http://www.princeton.edu/index.htmle.g., http://www.princeton.edu/index.html
Protocol for communicating with server (e.g., Protocol for communicating with server (e.g., http)http)
Name of the server (e.g., www.Name of the server (e.g., www.princeton.eduprinceton.edu))
Name of the resource (e.g., Name of the resource (e.g., index.htmlindex.html))
HTMLHTML• Actual content of web resource, represented in ASCIIActual content of web resource, represented in ASCII
55
Common ingredients of the Common ingredients of the Web: HTMLWeb: HTML
HyperText Markup Language (HTML)HyperText Markup Language (HTML)• Format text, reference images, embed hyperlinksFormat text, reference images, embed hyperlinks• Representation of hypertext documents in ASCII formatRepresentation of hypertext documents in ASCII format• Interpreted by Web browsers when rendering a pageInterpreted by Web browsers when rendering a page
Web pageWeb page• Base HTML fileBase HTML file• referenced objects (e.g., images)referenced objects (e.g., images), , Each object has its Each object has its
own URL own URL
Straight-forward and easy to learnStraight-forward and easy to learn• Simplest HTML document is a plain text fileSimplest HTML document is a plain text file• Automatically generated by authoring programsAutomatically generated by authoring programs
66
Main ingredients of the Web: Main ingredients of the Web: HTTPHTTP
Client programClient program• E.g., Web browserE.g., Web browser• Running on end hostRunning on end host• Requests serviceRequests service
Server programServer program• E.g., Web serverE.g., Web server• Provides serviceProvides service
GET /index.html
“Site under construction”
77
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
88
HTTP Example: HTTP Example: Request and Response MessageRequest and Response Message
GET /courses/archive/spring06/cos461/ HTTP/1.1Host: www.cs.princeton.eduUser-Agent: Mozilla/4.03<CRLF>
HTTP/1.1 200 OKDate: Mon, 6 Feb 2006 13:09:03 GMTServer: Netscape-Enterprise/3.5.1Last-Modified: Mon, 6 Feb 2006 11:12:23 GMTContent-Length: 21<CRLF>Site under construction
Request
Response
99
HTTP Request MessageHTTP Request Message Request message sent by a clientRequest message sent by a client
• Request line: method, resource, and protocol versionRequest line: method, resource, and protocol version
• Request headers: provide information or requestRequest headers: provide information or request
• Body: optional data (e.g., to “POST” data to the server)Body: optional data (e.g., to “POST” data to the server)
GET /somedir/page.html HTTP/1.1Host: www.someschool.edu User-agent: Mozilla/4.0Connection: close Accept-language:fr
(extra carriage return, line feed)
request line(GET, POST, HEAD commands)
header lines
Carriage return, line feed indicates end of message
1010
HTTP Response MessageHTTP Response Message Response message sent by a serverResponse message sent by a server
• Status line: protocol version, status code, status phraseStatus line: protocol version, status code, status phrase
• Response headers: provide informationResponse headers: provide information
• Body: optional dataBody: optional data
HTTP/1.1 200 OK Connection closeDate: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ...
status line(protocolstatus codestatus phrase)
header lines
data, e.g., requestedHTML file
1111
HTTP:HTTP:Request Methods and Response CodesRequest Methods and Response Codes
Request methods includeRequest methods include• GET: return current value of resource, …GET: return current value of resource, …
• HEAD: return the meta-data associated with a resourceHEAD: return the meta-data associated with a resource
• POST: update a resource, provide input to a program, …POST: update a resource, provide input to a program, …
• Etc.Etc.
Response code classesResponse code classes• 1xx: informational (e.g., “100 Continue”)1xx: informational (e.g., “100 Continue”)
• 2xx: success (e.g., “200 OK”)2xx: success (e.g., “200 OK”)
• 3xx: redirection (e.g., “304 Not Modified”)3xx: redirection (e.g., “304 Not Modified”)
• 4xx: client error (e.g., “404 Not Found”)4xx: client error (e.g., “404 Not Found”)
• 5xx: server error (e.g., “503 Service Unavailable”)5xx: server error (e.g., “503 Service Unavailable”)
1212
HTTP is a HTTP is a StatelessStateless Protocol Protocol
StatelessStateless• Each request-response exchange treated independentlyEach request-response exchange treated independently
• Clients and servers not required to retain stateClients and servers not required to retain state
Statelessness to improve scalabilityStatelessness to improve scalability• AvoidAvoidss need for the server to retain info across requests need for the server to retain info across requests
• EnableEnabless the server to handle a higher rate of requests the server to handle a higher rate of requests
1313
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
1414
Web Web Systems Systems ComponentsComponents
ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents
ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses
DNS (Domain Name System)DNS (Domain Name System)• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers
1515
Web BrowserWeb Browser
Generating HTTP requestsGenerating HTTP requests• User types URL, clicks a hyperlink, or selects bookmarkUser types URL, clicks a hyperlink, or selects bookmark• User clicks “reload”, or “submit” on a Web pageUser clicks “reload”, or “submit” on a Web page• Automatic downloading of embedded imagesAutomatic downloading of embedded images
Layout of responseLayout of response• Parsing HTML and rendering the Web pageParsing HTML and rendering the Web page• Invoking helper applications (e.g., Acrobat, PowerPoint)Invoking helper applications (e.g., Acrobat, PowerPoint)
Maintaining a cacheMaintaining a cache• Storing recently-viewed objectsStoring recently-viewed objects• Checking that cached objects are freshChecking that cached objects are fresh
1616
Web TransactionWeb Transaction User clicks on a hyperlinkUser clicks on a hyperlink
• http://www.cnn.com/index.htmlhttp://www.cnn.com/index.html
Browser learns the IP address of the serverBrowser learns the IP address of the server• Invokes gethostbyname(Invokes gethostbyname(www.cnn.com))• And gets a return value of 64.236.16.20And gets a return value of 64.236.16.20
Browser establishes a TCP connectionBrowser establishes a TCP connection• Selects an ephemeral port for its end of the connectionSelects an ephemeral port for its end of the connection• Contacts 64.236.16.20 on port 80Contacts 64.236.16.20 on port 80
Browser sends the HTTP requestBrowser sends the HTTP request• ““GET /index.html HTTP/1.1GET /index.html HTTP/1.1
Host: www.cnn.com” Host: www.cnn.com”
1717
Web Transaction (Continued)Web Transaction (Continued)
Browser parses the HTTP response Browser parses the HTTP response messagemessage• Extract the URL for each embedded imageExtract the URL for each embedded image
• Create new TCP connections and send new requestsCreate new TCP connections and send new requests
• Render the Web page, including the imagesRender the Web page, including the images
Opportunities for caching in the Opportunities for caching in the browserbrowser• HTML fileHTML file
• Each embedded imageEach embedded image
• IP address of the Web siteIP address of the Web site
1818
Web Web Systems Systems ComponentsComponents
ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents
ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses
DNS (Domain Name System)DNS (Domain Name System)• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers
1919
Web ServerWeb Server
Web site vs. Web serverWeb site vs. Web server• Web site: collections of Web pages associated with a Web site: collections of Web pages associated with a
particular host nameparticular host name• Web server: program that satisfies client requests for Web server: program that satisfies client requests for
Web resourcesWeb resources
Handling a client requestHandling a client request• Accept the TCP connectionAccept the TCP connection• Read and parse the HTTP request messageRead and parse the HTTP request message• Translate the URL to a filenameTranslate the URL to a filename• Determine whether the request is authorizedDetermine whether the request is authorized• Generate and transmit the responseGenerate and transmit the response
2020
Web Server: Generating a ResponseWeb Server: Generating a Response
Returning a fileReturning a file• URL corresponds to a file (e.g., /www/index.html)URL corresponds to a file (e.g., /www/index.html)• … … and the server returns the file as the responseand the server returns the file as the response• … … along with the HTTP response headeralong with the HTTP response header
Returning meta-data with no bodyReturning meta-data with no body• Example: client requests object “if-modified-since”Example: client requests object “if-modified-since”• Server checks if the object has been modifiedServer checks if the object has been modified• … … and simply returns a “HTTP/1.1 304 Not Modified”and simply returns a “HTTP/1.1 304 Not Modified”
Dynamically-generated responsesDynamically-generated responses• URL corresponds to a program the server needs to runURL corresponds to a program the server needs to run• Server runs the program and sends the output to clientServer runs the program and sends the output to client
2121
Hosting: Hosting: Multiple Sites Per Multiple Sites Per MachineMachine
Multiple Web sites on a single machineMultiple Web sites on a single machine• Hosting company runs the Web server on behalf of Hosting company runs the Web server on behalf of
multiple sites (e.g., www.foo.com and multiple sites (e.g., www.foo.com and www.bar.com)www.bar.com)
Problem: returning the correct contentProblem: returning the correct content• www.foo.com/index.html vs. www.bar.com/index.htmlwww.foo.com/index.html vs. www.bar.com/index.html• How to differentiate when both are on same machine?How to differentiate when both are on same machine?
Solution: multiple servers on the same Solution: multiple servers on the same machinemachine• Run multiple Web servers on the machineRun multiple Web servers on the machine• Have a separate IP address for each serverHave a separate IP address for each server
2222
Hosting: Multiple Machines Per Hosting: Multiple Machines Per Site...Site...performance improvementperformance improvement ReplicatingReplicating a popular Web site a popular Web site
• Running on multiple machines to handle the loadRunning on multiple machines to handle the load• … … and to place content closer to the clientsand to place content closer to the clients
Problem: directing client to a Problem: directing client to a particular replicaparticular replica• To balance load To balance load across the server replicasacross the server replicas• To pair clients with To pair clients with nearbynearby servers servers
SolutionSolution: : • Takes advantage of Domain Name System (DNS)Takes advantage of Domain Name System (DNS)
2323
Web Web Systems Systems ComponentsComponents
ClientsClients• Send requests and receive responsesSend requests and receive responses• Browsers, spiders, and agentsBrowsers, spiders, and agents
ServersServers• Receive requests and send responsesReceive requests and send responses• Store or generate the responsesStore or generate the responses
DNS (Domain Name System) and the DNS (Domain Name System) and the WebWeb• Distributed network infrastructureDistributed network infrastructure• Transforms site name -> IP address Transforms site name -> IP address • Direct clients to serversDirect clients to servers
2424
DNS Query stepsDNS Query steps User types or clicks on a URLUser types or clicks on a URL
• E.g., http://www.cnn.com/2006/leadstory.htmlE.g., http://www.cnn.com/2006/leadstory.html
Browser extracts the site nameBrowser extracts the site name• E.g., www.cnn.comE.g., www.cnn.com
Browser calls Browser calls gethostbyname() gethostbyname() to learn IP to learn IP addressaddress• Triggers resolver code to query the local DNS serverTriggers resolver code to query the local DNS server
Eventually, the resolver gets a replyEventually, the resolver gets a reply• Resolver returns the IP address to the browserResolver returns the IP address to the browser
Then, the browser contacts the Web serverThen, the browser contacts the Web server• Creates and connects socket, and sends HTTP requestCreates and connects socket, and sends HTTP request
2525
Multiple DNS QueriesMultiple DNS Queries
Often a Web page has embedded Often a Web page has embedded objectsobjects• E.g., HTML file with embedded imagesE.g., HTML file with embedded images
Each embedded object has its own URLEach embedded object has its own URL• … … and potentially lives on a different Web serverand potentially lives on a different Web server• E.g., http://www.myimages.com/image1.jpgE.g., http://www.myimages.com/image1.jpg
Browser downloads embedded objectsBrowser downloads embedded objects• Usually done automatically, unless configured otherwiseUsually done automatically, unless configured otherwise• Requires learning the address for www.myimages.comRequires learning the address for www.myimages.com
2626
When are DNS Queries When are DNS Queries UnnecessaryUnnecessary??
Browser is configured to use a proxyBrowser is configured to use a proxy• E.g., browser sends all HTTP requests through a proxyE.g., browser sends all HTTP requests through a proxy• Then, the proxy takes care of issuing the DNS requestThen, the proxy takes care of issuing the DNS request
Requested Web resource is locally Requested Web resource is locally cachedcached• E.g., cache has http://www.cnn.com/2006/leadstory.htmlE.g., cache has http://www.cnn.com/2006/leadstory.html• No need to fetch the resource, so no need to queryNo need to fetch the resource, so no need to query
Resulting IP address is locally cachedResulting IP address is locally cached• Browser recently visited http://www.cnn.comBrowser recently visited http://www.cnn.com• So, the browser already called So, the browser already called gethostbyname()gethostbyname()• … … and may be locally caching the resulting IP addressand may be locally caching the resulting IP address
2727
Directing Web Clients to ReplicasDirecting Web Clients to Replicas Simple approach: different namesSimple approach: different names
• www1.cnn.com, www2.cnn.com, www3.cnn.comwww1.cnn.com, www2.cnn.com, www3.cnn.com• But, this requires users to select specific replicasBut, this requires users to select specific replicas
More elegant approach: different IP More elegant approach: different IP addressesaddresses• Single name (e.g., www.cnn.com), multiple addressesSingle name (e.g., www.cnn.com), multiple addresses• E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, …E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, …
Authoritative DNS server returns many Authoritative DNS server returns many addressesaddresses• And the local DNS server selects one addressAnd the local DNS server selects one address• Authoritative server may vary the order of addressesAuthoritative server may vary the order of addresses
2828
Clever Clever Load Balancing Load Balancing SchemesSchemes
The problem - Selecting the “best” IP The problem - Selecting the “best” IP address to returnaddress to return• Based on server performanceBased on server performance• Based on geographic proximityBased on geographic proximity• Based on network loadBased on network load• ……
Example policiesExample policies• Round-robin scheduling to balance server loadRound-robin scheduling to balance server load• U.S. queries get one address, Europe anotherU.S. queries get one address, Europe another• Tracking the current load on each of the replicasTracking the current load on each of the replicas
2929
Web Content DistributionWeb Content Distribution Main ingredients of the Web (review)Main ingredients of the Web (review)
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems Components (review)Web Systems Components (review)• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• Web ProxyWeb Proxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
3030
TCP Interaction: Multiple TransfersTCP Interaction: Multiple Transfers Most Web pages have multiple objectsMost Web pages have multiple objects
• E.g., HTML file and multiple embedded imagesE.g., HTML file and multiple embedded images
Serializing the transfers is not efficientSerializing the transfers is not efficient• Sending the images one at a time introduces delaySending the images one at a time introduces delay• Cannot start retrieving second images until first arrivesCannot start retrieving second images until first arrives
A Solution A Solution - Parallel connections- Parallel connections• Browser opens multiple TCP connections (e.g., 4)Browser opens multiple TCP connections (e.g., 4)• … … and retrieves a single image on each connectionand retrieves a single image on each connection
Performance trade-offsPerformance trade-offs• Multiple downloads sharing the same network linksMultiple downloads sharing the same network links• Unfairness to other traffic traversing the linksUnfairness to other traffic traversing the links
3131
TCP Interaction: Short TransfersTCP Interaction: Short Transfers Most HTTP transfers Most HTTP transfers
are shortare short• Very small request message Very small request message
(e.g., a few hundred bytes)(e.g., a few hundred bytes)• Small response message Small response message
(e.g., a few kilobytes)(e.g., a few kilobytes)
TCP overhead may be TCP overhead may be bigbig• Three-way handshake to Three-way handshake to
establish connectionestablish connection• Four-way handshake to tear Four-way handshake to tear
down the connectiondown the connection
time to transmit file
initiate TCPconnection
RTT
requestfile
RTT
filereceived
time time
3232
A solution A solution - TCP Interaction: Persistent - TCP Interaction: Persistent ConnectionsConnections
Handle multiple transfers per Handle multiple transfers per connectionconnection• Maintain the TCP connection across multiple requestsMaintain the TCP connection across multiple requests• Either the client or server can tear down the connectionEither the client or server can tear down the connection• Added to HTTP after the Web became very popularAdded to HTTP after the Web became very popular
Performance advantagesPerformance advantages• Avoid overhead of connection set-up and tear-downAvoid overhead of connection set-up and tear-down• Allow TCP to learn a more accurate RTT estimateAllow TCP to learn a more accurate RTT estimate• Allow the TCP congestion window to increaseAllow the TCP congestion window to increase
3333
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
3434
Web Content DeliveryWeb Content Delivery
3535
ScalabilityScalability Limitation Limitation
3636
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
3737
Server FarmsServer Farms (motivated for scalability) (motivated for scalability)
3838
Server FarmsServer Farms DefinitionDefinition
• a collection of computer a collection of computer servers to accomplish server servers to accomplish server needs far beyond the capacity needs far beyond the capacity of one machine. of one machine.
• Often have both a primary and Often have both a primary and backup server allocated to a backup server allocated to a single task (for fault tolerance)single task (for fault tolerance)
Web FarmsWeb Farms• Common use of server farms is Common use of server farms is
for web hostingfor web hosting
3939
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
4040
Web ProxiesWeb Proxies
4141
Web Proxies are Web Proxies are IntermediariesIntermediaries Proxies play both rolesProxies play both roles
• A server to the clientA server to the client• A client to the serverA client to the server
www.cnn.com
www.google.com
Proxy
4242
How can an intermediary help – Proxy How can an intermediary help – Proxy CachingCaching Client #1 requests http://www.foo.com/fun.jpgClient #1 requests http://www.foo.com/fun.jpg
• Client sends “GET fun.jpg” to the proxyClient sends “GET fun.jpg” to the proxy• Proxy sends “GET fun.jpg” to the serverProxy sends “GET fun.jpg” to the server• Server sends response to the proxyServer sends response to the proxy• Proxy stores the response, and forwards to clientProxy stores the response, and forwards to client
Client #2 requests Client #2 requests (cached case ) (cached case ) http://www.foo.com/fun.jpghttp://www.foo.com/fun.jpg• Client sends “GET fun.jpg” to the proxyClient sends “GET fun.jpg” to the proxy• Proxy sends response to the client from the cacheProxy sends response to the client from the cache
BenefitsBenefits• Faster response time to the clientsFaster response time to the clients• Lower load on the Web serverLower load on the Web server• Reduced bandwidth consumption inside the networkReduced bandwidth consumption inside the network
4343
Getting Requests to the ProxyGetting Requests to the Proxy
Explicit configurationExplicit configuration• Browser configured to use a proxyBrowser configured to use a proxy• Directs all requests through the proxyDirects all requests through the proxy• Problem: requires user actionProblem: requires user action
Transparent proxy (or “interception Transparent proxy (or “interception proxy”)proxy”)• Proxy lies in path from the client to the serversProxy lies in path from the client to the servers• Proxy intercepts packets en route to the serverProxy intercepts packets en route to the server• … … and interposes itself in the data transferand interposes itself in the data transfer• Benefit: does not require user actionBenefit: does not require user action
4444
Other Functions of Web Proxies Other Functions of Web Proxies
Anonymization Anonymization • Server sees requests coming from the proxy addressServer sees requests coming from the proxy address• … … rather than the individual user IP addressesrather than the individual user IP addresses
TranscodingTranscoding• Converting data from one form to anotherConverting data from one form to another• E.g., reducing the size of images for cell-phone browsersE.g., reducing the size of images for cell-phone browsers
PrefetchingPrefetching• Requesting content before the user asks for itRequesting content before the user asks for it
FilteringFiltering• Blocking access to sites, based on URL or contentBlocking access to sites, based on URL or content
4545
Web Content DistributionWeb Content Distribution Main ingredients of the WebMain ingredients of the Web
• URL, HTML, and HTTPURL, HTML, and HTTP• HTTP: the protocol and its stateless propertyHTTP: the protocol and its stateless property
Web Systems ComponentsWeb Systems Components• ClientsClients• ServersServers• DNS (Domain Name System)DNS (Domain Name System)
Interaction with underlying network Interaction with underlying network protocol: TCPprotocol: TCP
Scalability and performance enhancementScalability and performance enhancement• Server farmsServer farms• ProxyProxy• Content Distribution Network (CDN)Content Distribution Network (CDN)
4646
Why CDN?Why CDN? PProvidersroviders want to want to offer content to consumersoffer content to consumers
• EfficientlyEfficiently• ReliablyReliably• SecurelySecurely• InexpensivelyInexpensively
The server and its link can be overloadedThe server and its link can be overloaded Peering points between ISPs can be congestedPeering points between ISPs can be congested Alternative solution: Content Distribution Alternative solution: Content Distribution
Networks Networks • Geographically diverse servers serving content from many Geographically diverse servers serving content from many
sourcessources
4747
Content Delivery NetworksContent Delivery Networks
4848
CDN ArchitectureCDN Architecture
Proactively replicate data by caching Proactively replicate data by caching static pagesstatic pages
ArchitectureArchitecture• Backend serversBackend servers• Geographically distributed surrogate serversGeographically distributed surrogate servers• Redirectors (according to network proximity, balancing)Redirectors (according to network proximity, balancing)• ClientsClients
Redirector MechanismsRedirector Mechanisms• Augment DNS to return different server addressesAugment DNS to return different server addresses• Server-based redirection: based on HTTP redirect feature Server-based redirection: based on HTTP redirect feature
4949
CDN ArchitectureCDN Architecture