efficient context-sensitive output escaping for javascript template engines
TRANSCRIPT
Efficient Context-aware Output Escaping for JavaScript Template EnginesPRESENTED BY Nera Liu, Adonis Fung, and Albert YuParanoids Labs, Yahoo! SEPT 24, 2015
How to defend against XSS in Javascript Template Engines using contextual analysis?
Background, Related Work & Implementation > Design > Evaluation > Conclusion
Problem Statement
2
Background, Related Work & Implementation
What is Cross Site Scripting (XSS)?
Given no proper output filtering:<h1>Hello <?php echo $_GET['name']; ?></h1>
A typical attack vector coming through XXX of query string at victim.com/?name=XXX: "'><script>alert(1)</script>
HTML of victim.com ends up being:<h1>Hello "'><script>alert(1)</script></h1>
4
Cross-Site Scripting (XSS) & OWASP Top 10■ Ranked No. 3 / OWASP Top 10 WebApp Security Risks
■ Root Cause● Untrusted inputs executed as scripts under a victim’s origin/domain.
■ Consequences● Cookie stealing, user privacy leaking.● Fully control the web content / defacing.
Screen-captured from https://www.owasp.org/index.php/Top_10_2013-A3-Cross-Site_Scripting_(XSS) 5
How to defend against XSS?- Filtering at the Front Gate
6Image from Rob, On guard, 2007, flickr.com, License: creative common
7Image from 呉 松本, Pipes! Pipes! Pipes!, 2009, flickr.com, License: creative common
It is the internal data flow of your web application…● with databases● with APIs● with browsers● …all interconnecting with each other, how would you design filtering rules for both APIs and databases?
How to defend against XSS?- Systems are getting more complicated
8
Fundamental Limitations- NO universal filtering rule that is flexible yet secure
e.g., filtering for <a href="..."> ≠ <div>...</div>- Impossible to settle at the front gate on
- how data should be further mangled, - and predict how it would be output in the resultant HTML
- As a result, subject to XSS attacks and over-filtering issues
Input Filtering- Limitations
■ Template Engines● Handlebars, DustJS- Escape & < > " ' ` into & < > " '
`- {{untrustedData}} is escaped by default.
9
How to defend against XSS?- Output Filtering in Template Engines
The industry is shifting from input filtering to output filtering
Image from Tom Page, CRW_1978, 2008, flickr.com, License: creative common
10Image from john, Secure, 2009, flickr.com, License: creative common
Not Yet!!!
Are your web applications safe now?
Most Template Engines are still vulnerable!- Blindly escaping
Blindly-escaping (&<>"'`) would not stop XSS- {{url}} is an untrusted user input (assumed thereafter)
- {{url}}is javascript:alert(1), or
- {{url}}is # onclick=alert(1)
→ Solution: Context-Aware Output Escaping (aka. contextual escaping)
A template is typically written like so:<a href={{url}}>{{data}}</a>
11
PartialAutomatic
Contextual Escaping
Ember.js1, Facebook React2,
Google Angular.js3
Automatic Contextual Escaping
Google Closure, Google Go Template4
No Contextual Escaping
Handlebars,LinkedIn Dust.js
(making use of the blindly-escaping filter)
Notes:1Ember.js does not apply contextual filtering rules in <style>, <script> and style attributes.2Facebook React does not apply contextual filtering rules in <style>, <script>, style attributes and URI contexts.3AngularJS does not apply contextual filtering rules in style attributes.4Google Go Template is not a JavaScript Template Engine.
12
Related Work- Template Engines vs Contextual Escaping
Handlebars
Context Parser
Contextual Analyzer
HandlebarsTemplate
Parser
HandlebarsTemplate AST
HTML5 Parser (w/auto HTML canonicalization)
AST Walker
HandlebarsTemplate
w/filter markups
CSS Parser
Pre-compiler
Contextual XSS Filters(registered as helpers/callbacks)
HTML
Data(possibly untrusted)
Runtime Compiler
Template
Spec.
Our solution (comprised of the blue boxes) rewrites templates before Handlebars
(2)online
13
(1)offline
Secure Handlebars- Software Architecture
14
■ Handlebars with Default Escaping.■ Secure-Handlebars with Contextual Escaping.
Demo videos!! original handlebars, secure handlebars
Demonstration- Handlebars vs. Secure Handlebars
Express Secure Handlebars
15
var express = require('express');
// simply replace the original express-handlebars with express-secure-handlebars, our implementation will preprocess the template(s) before passing to the original handlebars compiler.
// exphbs = require('express-handlebars');exphbs = require('express-secure-handlebars');
DesignOur Approach
17Image from Andrea Goh, baking ingredients, 2012, flickr.com, License: creative common
What are the ingredients?
● Template Parser & Walker○ for extracting template markups
● Standard Compliant Context Parsers ○ for analyzing output contexts○ for auto-correcting browser quirks
● Context-sensitive XSS Filters○ for applying contextual filtering rules
to defend against XSS!
DesignTemplate Parser and Walker
<div style="{{cssContext}}">{{htmlContext}}</div>{{#if data}}<a href="{{uriContext}}">link</a>{{else}}<div>Data not found</div>{{/if}}
19
■ Extract template markups and build an AST for further contextual analysis
H T/C H T/H H T/B
H T/U H H
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
R
Template Parser & Walker
20
■ Template walker traverses the AST, and triggers different parsers
We trigger an HTML5 context parser for analysis! (green & blue)
We trigger a CSS context parser for analysis! (orange)
We trigger a URI parser for analysis! (red)
H H H T/B
H H H
R
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
T/C T/H
T/U
Context Parsers (HTML, CSS etc.)
21
■ Based on the contextual analysis, precise filtering rules can be applied!
We apply the filtering rules (i.e. the most basic HTML escaping) for an HTML context
We apply the filtering rules for an HTML double-quoted attribute value context and CSS context
We apply the filtering rules for an HTML double-quoted attribute value context and URI context
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
H H H T/B
H H H
R
T/C T/H
T/U
Contextual-Sensitive XSS Filters
22
The parsing sequence of the AST● R → H → T/C → H → T/H → H → H → T/U → H● R → H → T/C → H → T/H → H → H
The end context of this HTML chunk will copy to each branch as a start context for further contextual analysis
H H H T/B
H H H
R
T/C T/H
T/U
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
Template Parser & Walker- Handling of branching logic
Branch A Branch B
23
... T/B
<a href=" <a style="
T/?
Ambiguous Context! CSS or URI?
...{{#if data}}<a href="{{else}}<a style="{{/if}}{{ambiguousContext}}
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
Template Parser & Walker- Ambiguous context after branching
-
24
<input sub-tmpl
style=" T/H
<input sub-tmplstyle=" T/CParent template AST
Sub-template AST
AST with sub-template expansions
WITHOUT sub-template expansions, templates are analyzed separately.
WITH sub-template expansions, templates are analyzed together.
parent template content<input {{>sub-tmpl}}
sub-template contentstyle="{{output}}"
HTML context?
CSS context!!
Template Parser & Walker- Sub-template Expansion
Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
DesignContextual Parsing
26
Given a piece of HTML, find out which portions of it are in executable context. E.g.
■ <html> <script> ... </script> </html>■ <a href=javascript:... >■ <img src=x onerror=... >
Contextual Parsing- Problem Definition
Day 1: "Parsing Html The Cthulhu Way" Quoted from Coding Horror,
http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
27
# pull out data between <script> tags($script_data) = $html =~ /<script>(.*?)</script>/gis;
Day 2: Search npmjs, and pick the first one.
28
Less horrible, until you see it...
HTML 5 seems like fun<a<b<c>
<! comment !> <? comment >
</d id=e/>
<f g = h > , <f g=<h>, or <f g=h> ?
<script> what
<!-- this
<script> actually
</script> means
--> ?
</script>
29
view-source:https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/sample1.html
30
Dilemma - Reuse or Build?
31
■ Programming Language■ Speed■ Compliance with HTML5 Specification■ Ease for maintenance
Day 20: I need a high speed racing car
32
■ Libraries that are compliant to HTML5 specification:● Google's CTemplate and Closure Template
Day 20: Use a library that's compliant
33
■ Server side binding C with nodejs■ Client side?■ Can't extend easily for our use case (templating)
■ Get a coffee machine, tons of coffee bean■ Read section "The HTML syntax" (sect 8 and sect 12, resp.)
Day 21: https://html.spec.whatwg.org/ and http://www.w3.org/TR/html5/
34
HTML := TOKEN | TEXT | TAG
TAG := TAGNAME + TAGATTR*
Day 50: HTML Grammar?
35
if tag name == "Script" { alert }
• Erroneous HTML will always be accepted.
Day 50: HTML := ANY*
36
Day 99: Flows can be visualized
37
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/everything.svg
Key Observation #1
Section 12.2.4 - HTML Tokenization1. HTML are tokenized as DATA (TEXT), TAG, ATTR using
the flow.2. Special tokens are defined as RAWDATA, RCDATA,
SCRIPT.
Section 12.2.5 - HTML Tree Construction1. describes how DATA -> RAWDATA / RCDATA / SCRIPT
38
Key Observation #2
Describing flows can be cumbersome. But there are patterns.• token state changes only when seeing
• WHITESPACE• < , /, > (for tag)• & (for html entity)• ', " , = (for attribute)• !, - , ? (for comment)• A-z (valid start character for tag name)
39
40
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/tag.svg
Finite State Machine
41
42
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/visual.html
Key Observation #3One state transition table can cover the normal cases.Special cases: 1. a<b< , algorithm ask for reconsumption of b when b is
followed by non-tag related element.2. tag matching is required for in RAWTEXT (noframes, xmp,
style, iframe, noembed, noscript), RCDATA (textarea, title), SCRIPT, PLAINTEXT.
- Thus no tag nesting allowed in RAWTEXT / RCDATA / SCRIPT- <textarea><script><textarea>
Quick and dirty solution: Use 3 state transition tables altogether.Formal solution: expand state space to N^3.
43
44
https://github.com/yahoo/context-parser/blob/master/src/html5-state-machine.js
Exercise: Parse the following.
45
<script> what
<!-- this
<script> actually
</script> means
--> ?
</script>
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/demo.html
46
Hint: use state diagram of <script> tag flow
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/script.svg
HTML5 Context Parser- design principle
- Standard Compliance- Cross browser compatibility.- Speed and Efficiency.
47QR code points to the repo of the standalone CSS Parser
Command line version of our context parser
report state (aka context) of each character
- Our implementation is based on WHATWG- HTML5 compliance
- Language- Context as output
HTML5 Context Parser- standard compliance
QR code points to the repo of the standalone Context ParserFigure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 48
Gumbo - DOM tree as an outputhtml5lib - Python implementation
HTML5 Context Parser- speed & efficiency
Lightweight & Efficient- State transitions reduction
- (e.g., omit 16 doctype transitions, i.e., 23% of all states)
- No tree/DOM construction
QR code points to standalone Context Parser github repoFigure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 49
Standard-compliant Parser is NOT Enough
50
■ Purpose: Add filters based on the determined context
■ Problem:Context inferred by browsers
≠ Context inferred by our parsers
■ Worst case: filters voided; XSS
Some Browser-specific Quirks <a href="..." <script>{{inScriptInSafari5/Data}}</a>
51
<!--[if IE]><script>{{inScriptInIE/Comment}}</script><![endif]-->
<textarea><!-- </textarea>{{inRawText/DataHTML5}}--></textarea>
Compatibility issues by HTML 5 (e.g., in IE7)
<div><!-- Comment1 --!> {{inComment/DataHTML5}} --></div>
<div id=`{{inGraceAccentQuotedinIE/UnquotedAttrVal}}`></div>
etc...
etc...
Auto HTML Canonicalization
52
■ Comparisons● Prior work: manual corrections, or no warning at all● Our work: auto. rewrite HTML to clear parse errors
■ Goals● Ensure parsing experience aligned across browsers/parsers● Decisions: honor HTML 5 standard; secure-by-default● Hence, contextual filtering can work accordingly
DesignXSS Filters
Security Model
54
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
Templates: Trusted; Data from output expressions: Untrusted
Context-aware Filterings are specific to the determined context
Data Self-contained Untrusted data cannot break out from its context
Non-executable Data Untrusted data cannot be executed as script
Preserve Trusted Code Trusted code and logics should be preserved
Security Goals for filters
Assumption
p.s. Go Temp. and Closure have similar security models
55
Image from: https://www.flickr.com/photos/ravenshoegroup/5692831233/ (CC BY 2.0)
XSS in More Details
Differentiations stem from the DESIGN…
Prior Work Assumption
56
Untrusted variable assumed non-empty; In reality, ever thought an empty variable could break security?
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
#1: Data self-contained; Trusted Code Preserved- Sample: Output Markup in Unquoted Attribute Value
+ <input value=� name=email> by our work
- <input value={{email}} name=email>
57
+ <input value= name=email> by Closure/Go
When data is empty, the resulted HTML after filtering and data binding:
- Browser/HTML interprets “ name=email” as the attribute value- trusted structure broken. reference to email’s value lost; surprise to devel. - legit use of document.querySelector('[name=email]').value throws error
- To mitigate, our filter inserts U+FFFD (meaning NULL) when empty- good faith: developers still have a chance to validate the value (e.g., email)- preserved developers’ logics (if not help quoting it)
58
State transition in DATA state (e.g., <div>↑</div>)
Are existing filters really designed for the era of contextual escaping?
#2: More Context-sensitive and Efficient
59
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
Output Markup Prior Work Our Work
in Data ContextApply the same HTML Filter
(i.e., encode &<>"'`)
Apply yd Filter(i.e., encode < only)
in Double-quoted Attr Value Context
Apply yavd Filter(i.e., encode " only)
<input type="hidden" value="{{email}}" name="email">{{email}}
■ Insight: Context Parser accurately determined the contexts● Why over-encode? Let’s embrace just-sufficient encoding ● Runtime performance > 5x faster, as a result
Double-encoding Issue● Input filtering likely applied in
existing website: <3 becomes <3
● Output filtering encodes it again: finally becomes &lt;3 rendered as in the diagram
#3: Life is worthless without love <3
60
Can we omit encoding the & character?
Graceful Output and Input Filtering- A brave attempt not to encode &
■ & cannot lead to script executions, as defined by HTML 5 “JS includes” <br size="&{alert(1)}"> became history (IE5)
61
■ Require HTML decoding in case of blacklisting, e.g., to filter javascript:alert(1),● prior work tests value, : becomes &colon;● we html decode first, then tests the resulted value● Our decoder: correct (>a → >a), fastest (using FSM, trie)
Evaluations & Comparisons
63
Evaluations- Contextual Analysis on a Yahoo! website
● 90.9% of output markups automatically secured with the contextual filtering
Location of output expressions (aka. Context) No of findings Notes
Simple HTML Contexts (e.g., data, dqAttr ...) 52 (~10.5%) Secured by default handlebars.
ATTRIBUTE_VALUE_UNQUOTED state 5 (~1.0%) Secured also by secure-handlebars.
ATTRIBUTE_VALUE states + URI 280 (~56.8%)
ATTRIBUTE_VALUE states + CSS 111 (~22.5%)
ATTRIBUTE_VALUE states + JS (e.g., onclick) 1 (~0.2%) Manual review is required.
Dangerous contexts.SCRIPT state (i.e., <script>) 37 (~7.5%)
ATTRIBUTE_NAME state 4 (~0.8%)
RAW_TEXT state (e.g., <style>) 3 (~0.6%)
■ Offline/one-off overhead (for contextual analysis)● It takes 63s to pre-process and canonicalize 512 templates
(i.e. 0.35MB/sec).■ Negligible runtime overhead (for filtering only)
● for contexts defendable by default filter, ours is >5x faster● to secure other contexts, by design, least amount of chars
Ref: https://github.com/yahoo/xss-filters/blob/benchmarks/tests/benchmarks/compare-default.js#L9-L22
64
Evaluations- Performance
65
Yahoo Secure Handlebars
Google Angular.js
Google Closure
Facebook React
Ember.js
Contextual Escaping Supported
HTML Contexts ✓ ✓ ✓ ✓ ✓
URI Contexts ✓ ✓ ✓ no ✓
CSS Contexts ✓ ✓ / no1 ✓ no no
JS Contexts no no ✓ no no
Important Features
Auto HTML Canonicalization ✓ no no no no
Auto Sub-template Analysis ✓ ✓2 ✓3 ✓2 ✓2
Secure Filters for > 90% of Browser Market Share (incl., IE 7+, Safari 5+, FF & Chrome)
✓ no ✓ no no
Framework Comparisons
1AngularJS does not apply contextual filtering rules on style attribute.2AngularJS, React and EmberJS restrict the sub-template in HTML Data context only.3Google Closure requires manual annotation for sub-template analysis.
■ When developers don’t know how to sanitize...● use of SafeString/dangerouslySetInnerHTML
66
■ No need to sanitize individual fields■ Usable on client side or server side■ Whitelist based approach
QR code points to html-purify github repo
Future Work / Rich HTML Sanitization
safeHtml = Purifier.purify(untrustedRichHtml);
Image from https://www.flickr.com/photos/tnarik/3416160916 (CC BY 2.0)
■ Efficient HTML5 compliant parser w/auto corrections■ Auto apply contextual, just-sufficient, and faster escaping■ Effortless adoption w/express-secure-handlebars■ Open-sourced at github.com/yahoo and npmjs.com
Portal: https://yahoo.github.io/secure-handlebars
Conclusion: Building A Safer Internet for All
Automatic contextual escaping made easy
67
Thank you!
Nera, Adon, Albert{neraliu, adon, albertyu}@yahoo-inc.com
Twitter: @neraliu, @adonatwork, @yukinying
68
We’d like to acknowledge the support and help from:
- Stuart Larsen - Alaa Mubaied - Aditya Mahendrakar - Eric Ferraiuolo - Christopher Harrell - Christopher Rohlf - Jeremy Ruppel
Bug Bounty Program Contributors● https://github.com/yahoo/secure-handlebars/blob/master/CONTRIBUTORS.md ● https://github.com/yahoo/xss-filters/blob/master/CONTRIBUTORS.md
Appendix
Besides, client-side use with secure-handlebars- Contextual analyzer can preprocess templates during the build process (at
server side)- Handlebars pre-compiles the rewritten templates
Filters registered at client-side allow handlebars to filter data at data binding stage.
Hassle-free server-side adoption- To switch from express-handlebars to express-secure-handlebars npm:
- 2 LOCs changes: (1) dependency in package.json, (2) require(...)
-
70
Deployability of Secure-Handlebars- secure-handlebars & express-secure-handlebars
■ Work as a Preprocessor● Parse template and build an Abstract Syntax Tree (AST)● Walk thru every branch, trigger different parser for contextual analysis● Insert filter markups to {{outputExpression}} based on its context● Produce a rewritten template, compatible w/handlebars (unlike ember.js)
■ Facilitate Seamless Upgrade● Existing template logics must all be preserved
QR code points to the secure-handlebars github repo71
Secure Handlebars- Design Principles
+ <a href="{{{yavd (yubl (yufull url))}}}">{{{yd url}}}</a>
● Handlebars applies the filters (aka helpers) during compilation.○ {{{ }}} - disable the default blindly-escaping.○ yufull - encodeURI() with IPvFuture support○ yubl - disable dangerous protocols such as javascript:○ yavd - html-escape double-quote character (" → ")○ yd - html-escape less-than character (< → <)
● Contextual Analyzer adds filter markups specific to output contexts
- <a href="{{url}}">{{url}}</a>
72
Rewrite template before Handlebars
■ Same considerations as HTML5 Context Parser ● Standard compliance.
● Cross-browser compatibility.
■ Design Goal- All browsers MUST parse the CSS with the same contextual
result.
73QR code points to the standalone CSS Parser github repo
CSS Context Parser- design principles
■ Approach:● Rewrite the CSS grammar into a stricter grammar.
● The original grammar allows escape char (i.e. \{6digits}), the stricter grammar only allows known set of chars (i.e. [a-zA-Z0-9]) and special chars (i.e. :, ;).
● It is unusual to use escape char in CSS template.
74QR code points to the standalone CSS Parser github repo
CSS Context Parser- strict mode
// this is a valid syntax, but our parser would reject it!<div style="\color:{{output}}">...</div>
Why are we reluctant to support auto JS Context filtering? Static vs. Dynamic
75
■ What XSS filters should we apply? single-quoted JS string? double-quoted URI attr?● Static (incl related) approach can only apply the former one● Warn & manual check; Avoid false sense of security
<script>var html = '<a href="{{untrustedUrl}}"><b>link</b></a>...';document.write(html);</script>