how to live with low/intermittent bandwidth/connectivity

Post on 23-Feb-2016

51 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

How to live with low/intermittent bandwidth/connectivity. Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in. Web sites have traditionally served static content But, dynamic content generation has come into vogue - PowerPoint PPT Presentation

TRANSCRIPT

How to live with low/intermittent bandwidth/connectivity

Krithi RamamrithamIIT Bombay

krithi@cse.iitb.ernet.in

2

Web Content• Web sites have traditionally served static

content

• But, dynamic content generation has come into vogue– generated on the fly by running dynamic scripts, e.g., Active

Server Pages (ASP), Java Server Pages (JSP), Servlets– allows generation of different content for the same request

3

Web PageAd Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

A News content site

Dynamic Web Pages…

4

Generic Architecture

Data sourcesEnd-hosts

servers

sensors

wired hosts

mobile hosts

Net

wor

k

Net

wor

k

5

Coherency of Dynamic Data

• Strong coherency– The client and source always in sync with each other– Strong coherency is expensive!

• Relax strong coherency: - coherency– Time domain: t - coherency

• The client is never out of sync with the source by more than t time units

• eg: Traffic data not stale by more than a minute– Value domain: v - coherency

• The difference in the data values at the client and the source bounded by v at all times

• eg: Only interested in temperature changes larger than 1 degree

6

Generic Architecture

Data sources

Proxies/caches

End-hosts

servers

sensors

wired host

mobile host

Net

wor

k

Net

wor

k

7

The Push Approach

• Proxy registers the data item of interest and the coherency requirement with the server

• Server pushes interesting changes

+ Achieves Strong Consistency + Keeps network overhead minimum-- Poor Scalability (has to maintain state

and has to keep connections open)-- Low Resiliency

Server Proxy UserPush Push

8

The Pull Approach

Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR)

+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with respect to state

space and computation– Weak cache consistency – Heavy polling for stringent coherence requirement or highly

dynamic data– Network overheads higher than for Push

Server Proxy UserPull Push

9

Typical End-to-end Web Site Architecture

Users

ApplicationServerCluster

Data

WebServerCluster

.

.

.

.

10

WS vs. AS

• Web servers– Do well defined and quantifiable local work

• e.g., processing HTTP headers, serving static content • Application servers

– Run multi-layer programs• e.g., scripts involving calls to backends

… …

WebSwitch

WebServerCluster

ApplicationServerCluster

… …

WebSwitch

WebServerCluster

ApplicationServerCluster

11

Inside the Application Layer3-tier model

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

HTML

Objects

Row Set

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

12

Inside the Application Layer…

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

1. JSP invokes a Servlet2. Servlet contacts CMS

3. CMS requests data

4. DBMS calls storage system

13

Performance and Scalability Issues• Computationally-intensive logic executed at

multiple tiers

• Cross-tier communication

• Object instantiation and cleanup processing

• External I/O calls

• Database connection pool latencies

• Content conversion and formatting

14

Optimizing the Application LayerTraditional Means

• Optimize each tier independently:– Presentation-level caches built inside application server

processes– Main memory database employed over persistent DBMS– Persistent object storage techniques employed inside

content management systems … and so on

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

ADDT’LSERVICES

Local cacheand optimization

code

15

Query result caching

• Many application server products offer this feature

-- mitigates only local database access latency-- only a subset of query results may be reused

in page generation-- page fragments may not all be from

databases

16

Middle tier database caching

• Caching database tables in main memoryOracle 9i CacheMain-memory databases, e.g., TimesTen

-- mitigates only database access latency-- caching at table granularity results in poor

cache utilization-- main-memory databases are difficult to

integrate and maintain and can be expensive

17

Page Level Caching

• Dynamically generated HTML pages are cached

+ Can completely offload work from web/app server– Low reusability for highly personalized web pages– URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages– Often introduces excessive invalidations -- e.g., even if a single element on the page changes

18

Optimizing the Application LayerIssues

• Traditional techniques impact specific components within the application, but not the entire application

– No mitigation of component-to-component interaction latencies

– Different synchronization and invalidation policies risk data integrity

– Each optimization scheme consumes programmer timefor development and maintenance

19

Key ideas

• Re-use program results to eliminate redundant work • Facilitate single-point, architecture-wide optimization

Apply to both programmatic objects and result fragments

20

Optimizing the Application Layer

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

cache

Enables the resultsof programs to bere-used.

21

Usually….

LegacySystems

1. JSP invokes a Servlet

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

2. Servlet contacts CMS

3. CMS requests data

4. DBMS calls storage system

Plus, at each step there are communication delays and logic processing delays

22

Novel Solution…

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

Function Parameter(s) Result

Real-time storage engine

Tags trigger calls to the storage engine.

Can store any program output, but is most commonly an HTML fragment or a Programmatic Object.Chutney

tags

When the Result of a Function with a specific Parameter set is already known (and up-to-date), the work normally necessary to produce that Result is bypassed.

Appl. Programming Interface

23

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Applicationlogic

Databasecalls

HTMLformatting...

Code Blocks Perform Work

24

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Web Page

Ad Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

(Example: News content site)Certain components can be cached

Code Blocks <-> Components

25

DCA: Our Solution

Codeblock

Applicationlogic

Databasecalls

HTMLformatting

Page generation scriptCodeblock

...

Request

Code Block Output

End tag

Start tag

Wor

kby

pass

ed

DynamicContent

Accelerator

26

DCA in a Typical End-to-end Web Site Architecture

• A single instance of the DCA serves a rack of application servers

• Application servers communicate with DCA through a lightweight API

Users DynamicContent

Accelerator

ApplicationServerCluster

DataWeb

ServerCluster

27

Cache Management

• A critical aspect of any caching solution

• DCA supports novel cache management strategies:

– Prediction-based cache replacement– Observation-based cache invalidation

28

Cache Replacement• Prediction-based

replacement⁻ fragments having lowest

probability of access replaced⁻ Least-Likely-to-be-Used (LLU)

– Access probabilities based on:• Current user navigational

patterns over site graph (in the form of clickstreams)• Historical user navigational

patterns over site graph (in the form of association rules)

News

Sports

Hockey

Schedules ScoresPlayers Teams

Site Graph

(News, Sports, Hockey) Schedules = 20%(News, Sports, Hockey) Players = 15%(News, Sports, Hockey) Teams = 10%(News, Sports, Hockey) Scores = 55%

LLU

29

Cache Invalidation

• DCA supports common cache invalidation techniques:

– Time-based: Each cache element assigned a TTL– Event-based: Updates to the database send an invalidation

message to the cache– On demand: Manual invalidation of selected elements

• DCA supports additional invalidation techniques….

30

Cache Invalidation…• Other invalidation techniques supported:

– Observation-based• User-initiated updates are observed in scripts; each

such update sends an invalidation message to the cache

• Most appropriate for auction sites, online trading sites• Invalidation does not require communication with the

databases– Keyword-based:

• Elements can be associated with keywords; e.g., a retailer may wish to invalidate all “seasonal” items

– Regular expression-based: • Elements can be invalidated based on regular

expression matching

31

Performance Study…

Test Site

– Fictitious online retail site, allows browsing of product catalog

– Pages generated using JSP scripts– Site content stored in Oracle database– Database schema based on Dublin Core Metadata Open

Standard– Contains 200,000 products and 44,000 categories– Each page consists of 3 components, each involving a

database call

32

Performance Study…

Test Setup

– Content Database Server: Oracle 8.1.6

– Web/Application Server: WebLogic 6.0 running on cluster of 2 machines

– Server machines:have 1 GB RAM, dual P III-933 Mhz processorsrun Windows 2K Advanced Server

33

Testing Methodology...

• Baseline Parameters:– Cache Size, i.e., percentage of fragments that fit into cache: 75%– Cache replacement policy: LLU

• User load is varied by sending requests from client machines running Radview’s WebLoad

• Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links)

34

Performance Impact80% faster response times through existing application infrastructure

Source: Fortune 100 client results

0

10

20

30

40

50

60

0 100 200 300 400 500

Number of Users

Aver

age

Resp

onse

Tim

e (s

econ

ds)

non-Chutney

Chutney

35

Chutney Throughput Impact250% increase in transaction rates

Source: Fortune 100 client results

0

100

200

300

400

500

600

700

0 100 200 300 400 500

Number of Users

Tran

sact

ions

Per

Sec

ond

non-Chutney

Chutney

36

Alternative: CDNs

Sources

Repositories

Clients

ContentDistributionNetworks

e.g., Akamai

Push BasedPush BasedCore InfrastructureCore Infrastructure

37

Conclusion• Increased use of dynamic page generation technologies => increases load on application servers => serious performance and scalability problems for e-business sites • DCA (Dynamic Content Acceleration) => significantly reduces the load on the server side

infrastructure, allows e-business sites to scale => significantly outperforms existing middle tier caching

solutions

IIT Bombay’s aAQUA Community Forum

Farmers get information and

get their questions answered

-- In the local context

-- In their local language

www.aAQUA.org

Capitalizes on existing human and infrastructural resources:

Agri-extension center – KVK, Baramati

NGO – Vigyan Ashram, Pabal

Government – MCIT

39

Access over low bandwidth:Resource Optimization

Resource constraintsLow/unpredictable bandwidth => disconnected operation/access

Exploitcaching prefetching (through prediction of future needs)Profiling by user type, location =>offline aAQUA

Data characteristicsStatic data – text, images – land records, photos

can be cached/hoardedDynamic data – weather/price information

cached info need to be refreshed carefullyContinuous media – VoIP, video data

QoS considerations

top related