caching dynamic documents vipul goyal department of computer science & engg institute of...

22
Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology & Computer Science Tata Institute of Fundamental Research Dharma P. Agrawal Center for Distributed and Mobile Computing, ECECS University of Cincinnati

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Caching Dynamic Documents

Vipul GoyalDepartment of Computer Science & Engg

Institute of Technology, Banaras Hindu University

Sugata SanyalSchool of Technology & Computer Science

Tata Institute of Fundamental Research

Dharma P. AgrawalCenter for Distributed and Mobile Computing, ECECS

University of Cincinnati

Page 2: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Caching the Web Documents

• Browser locally stores the document which can be quickly supplied later in case of a re-request

• Advantages are three fold-

-Improved response time

-Reduced network traffic

-Reduced server load

Page 3: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Caching Dynamically Generated Pages

Overview

• One central aspect of the development of WWW has been the increasing use of dynamically generated documents (i.e. the pages generated using some server script)

• Traditional Caching fails because every instance of the dynamically generated page is different

Page 4: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Question:Why do we want to cache a dynamic

page if its every instance is different ?

Answer:•The instance differ only “slightly”

•They usually contain a number of sections of duplicate HTML code

Page 5: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Examples

Page 6: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology
Page 7: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology
Page 8: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

The Solution

Requirements

• Changes to the web server to accommodate a new software called Fragmentor

• Changes to the client to accommodate a Plug-in

• No changes to the HTTP protocol

Page 9: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Central Ideas of the Scheme

• Fragmentor will parse the scripts at the server to give rise to a hierarchy of cacheable templates (representing static HTML code) and non-cacheable bindings (representing dynamic HTML code)

• On any request, only bindings will be supplied to the client

• Templates, if not cached, will be downloaded separately by the client

Page 10: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Overview of the steps

1. The fragmentor parses all the scripts at the server to produce a number of templates and binding generators

2. The client sends the request for the required dynamic page

3. The server generates and sends only the binding in response

4. The client downloads the required templates if not already cached

5. Client plugs the templates into bindings to get the full HTML page

Page 11: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Parsing the Scripts

The fragmentor parses the script source code line by line and puts the static hard coded output into the template

Source Code Template Generated

Print “Welcome to web mail”; Welcome to web mail

… You have <gap> new mails

Print “You have $n new mails”;

Page 12: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Parsing the Scripts

• The fragmentor deletes hard coded output from the script source code to give rise to a binding generator

• When a request is received, this binding generator runs to produce a binding which is then supplied to the client

Page 13: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Parsing the Scripts

Question

Can we simply collect the static text of the script in a file?

Answer

The script may contain branches (if-else), so we don’t know which part of

the static text will appear in the output

Page 14: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Definitions

Definition 1 (Template)

A cacheable regular HTML file having gaps or discontinuities in it.

It may also contain the following new tags in addition-

1) <gap>2) <loop> and </loop>

Page 15: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Definitions

Definition 2 (Binding)

A binding is a non-cacheable section of code enclosed between <temp ref="<absolute template url>"> and </temp> tags. The <temp ref="..."> tag specifies the template to which the binding belongs.

The enclosed code may contain the following apart from the regular HTML code-

1) <gap> and </gap> tags2) <loop> and </loop> tags3) <n> and </n> tags where n is a positive integer not

equal to zero.4) Another binding (reason: presence of branches)

Page 16: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Parsing the Scripts

• The fragmentor will generate separate template for every branch in the script source code

• However, only one (??) binding will be sent to the client

Page 17: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Example of template generation

Page 18: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Example of binding generation

Page 19: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Client Side Generation of Full Document

• The client only receives a binding in response to the request

• The binding contains full URLs of all the required templates

• The client downloads the templates which are not already cached

• Templates are plugged and gaps are ‘filled’ using the binding to obtain the full HTML instance of the required web page

Page 20: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

This approach needs to be optimized

• Number of Templates generated = Number of branches in the program

• For moderate and large sized programs, large number of templates and bindings will be generated

• Unnecessary burden on the system as some templates will rarely be used (e.g. corresponding to branches dealing with errors/exceptions)

Page 21: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Improving the approach (Still in progress)

• Fragmentor will first gather the branch flow statistics of the program

• Templates for only the frequently taken branches will be generated

• Templates for popular branch sequences will be merged into a single template

Page 22: Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology

Thank you you