jszap: compressing javascript code martin burtscher, ut austin ben livshits & ben zorn,...

27
JSZap: Compressing JavaScript Code Martin Burtscher, UT Austin Ben Livshits & Ben Zorn, Microsoft Research Gaurav Sinha, IIT Kanpur

Upload: joseph-randall

Post on 23-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

JSZap: Compressing JavaScript Code

Martin Burtscher, UT AustinBen Livshits & Ben Zorn, Microsoft Research

Gaurav Sinha, IIT Kanpur

2

A Web 2.0 Application Dissected

70,000+ lines of JavaScript code

downloaded2,855 Functions

1+ MB code

Talks to 14 backend services

(traffic, images,directions, ads, …)

3

Lots of JavaScript being Transmitted

www.live.com

spreadsheets.google

maps.live

chi.lexigame

hotmail

gmail

dropthings

maps.google

pageflakes

bunny hunt

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Fraction of download that is JavaScript

Up to 85% of a Web 2.0

app is JavaScript code!

AJAX: Tension Headaches

4

Execution can’t start without

the code

Move code to client for

responsiveness

5

JavaScript on the Wire

JavaScript crunch

gzip -d parser AST

JSZap

gzip

6

JSZap Approach

• Represent JavaScript as AST instead of source

• Serialize the compressed AST

• Decompress directly into AST on client

• Use gzip as 2nd-level (de-)compressor

7

Benefits of AST-based Compression

• Compression: less to transmit• ASTs are blasted directly into the browser

Reduced Latency

• Reduces mobile charges• Reduces operator network costs: better for servers

Reduced Network Bandwidth

• Ensures well-formedness of code• Can use to check language subsets easily (AdSafe)• Caching incremental updates• Unblocking HTML parser

Correctness, Security, and other Benefits

8

JSZap Compression

JavaScript JSZap gzip

9

JSZap Compression

JavaScript identifiers gzip

literals

productions1

2

3

10

GZIP is a formidable

opponent

11

JSZap vs. GZIP

JSZapgzip0

5

10

15

20

25

30

35

40

5.45.4

18.419.0

8.411.5

Literals Identifiers Productions

Size

in K

B

12

Talk Outline

identifiers

literals

productions1

2

3

evaluation on real code

13

Background: ASTs

a * b + c 1) E E + T

2) E T3) T T * F

4) T F5) F id

+

*

a b

c5

5

1

3

5

Expression Grammar Tree

14

A Simple Javascript Examplevar y = 2;function foo () {

var x = "jscrunch";var z = 3;z = y + y;

}x = "jszap";

Identifier Stream

y foo x z z y y x

Literal Stream

"jscrunch" 2 3 "jszap"

Production Stream

1 3 4 ... 1 3 4 ...

15

Benchmarking JSZap

Benchmark name Source lines

Source bytes

gmonkey 922 17,382

getDOMHash 1,136 25,467

bing1 3,758 77,891

bingmap1 3,473 80,066

livemsg1 5,307 93,982

bingmap2 9,726 113,393

facebook1 5,886 141,469

livemsg2 7,139 156,282

officelive1 22,016 668,051

• JavaScript files up to 22K LOC

• Variety of app types

• Both hand-generated, and machine-generated

• gzipped everything

16

Components of JavaScript Sourcegm

onke

y

getD

OM

Has

h

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

0%10%20%30%40%50%60%70%80%90%

100%

productions identifiers literals

• None of the categories can be ignored

• Identifiers become more prominent with code growth

17

Compressing the Production Stream

• Frequency-based production renaming

• Differential encoding: 26 and 57 => 2 and 3

• Chain rule: eliminate predictable productions

• Tree-based prediction-by-partial-match

18

PPMC

• Consider compressing – if (P) then X else X

• Should be very compressible• if (P) then ...abc... else ...abc...

P

XX

• Tree context used to build a predictor

• Provides the next likely child node given context C and child position p

• Arithmetic coding: more likely=shorter IDs

• See paper for details

19

Production Compression with PPMC

gmon

key

getD

OM

Has

h

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

50%55%60%65%70%75%80%85%90%95%

100%

0.6772

Prod

uctio

n Co

mpr

essi

on (g

zip

= 1)

20

Compressing the Identifier Stream

• Symbol tables instead of identifier stream:– Compress redundancy: offset into table– Global or local symbol tables– Use variable-length encoding

• Other techniques:– Sort symbols by frequency– Rename local variables

21

Variable-length Encoding for Identifiers

is global?

is renamed local

00…

01…

fits in 1 byte?

11…

10…

22

Variable-Length Identifier Encodinggm

onke

y

getD

OM

Has

h

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

0%10%20%30%40%50%60%70%80%90%

100%

parent local 2byte local 1byte local builtin global 2byte global 1byte

23

Symbol Tables: Effectiveness

gmon

key

getD

OM

Has

h

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

80%

85%

90%

95%

100%

0.943

89%

Global ST VarEnc

Iden

tifier

s (N

oST

= 1)

24

Compressing Literals

• Symbol tables• Grouping literals by type• Pre-fixes and post-fixes• These techniques result in 5-10% savings

compared to gzip

25

Average JSZap Compression: 10%

gmon

key

getD

OM

Has

h

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

80%82%84%86%88%90%92%94%96%98%

100%

0.8792

JSZa

p Co

mpr

essi

on (g

zip

= 1)

Productions; 26%

Identifiers; 57%

Literals; 17%

13% savings

26

Summary and Conclusions• JSZap: AST-based compression for JavaScript

• Propose a range of techniques for compressing– Productions– Identifiers– Literals

• Preliminary results are encouraging: 10% savings over gzip

• Future focus– Latency measurements – Browser integration

27

Well-formedness

Security (AdSafe)

AST representation

Unblocking HTML parser

Caching and incremental

updates

Compression with JSZap

?

Questions?