rclassify: classifying race conditions in web applications via deterministic replay

5
RClassify: Classifying Race Conditions in Web Applications via Deterministic Replay 著者: Lu Zhang and Chao Wang (Virginia Tech and University of Southern California) 紹介者: 酒井 政裕 (Preferred Networks, Inc.) @ ICSE2017勉強会 2017-08-24 要約:JavaScriptのデータ競合検査の偽陽性を、 スケジューリングしたリプレイでフィルタリング 9-2

Upload: masahiro-sakai

Post on 29-Jan-2018

220 views

Category:

Technology


0 download

TRANSCRIPT

RClassify: Classifying Race Conditions in Web

Applications via Deterministic Replay

著者: Lu Zhang and Chao Wang(Virginia Tech and University of Southern California)

紹介者: 酒井 政裕 (Preferred Networks, Inc.)@ ICSE2017勉強会 2017-08-24

要約:JavaScriptのデータ競合検査の偽陽性を、スケジューリングしたリプレイでフィルタリング

9-2

課題

z JavaScript においては処理はアトミックに実行されるので普通の意味でのデータ競合(data race)は存在しない

z が、イベントの発生順によって、意図しない結果になる不具合はありえる

– 右図で <script> 要素のパース前に画像がロードされたら?

z 既存のデータ競合の検出ツール(例えばEventRacer)は誤検出が非常に多い

7

9-2

bilities to the application itself (see Section V). This is betterthan existing approaches because technologies are changingrapidly and tools implemented using a particular version ofthe browser will quickly become obsolete. In contrast, ourplatform-agnostic approach will be more robust against thesechanges and updates.

Since we concretely execute the application using deter-ministic replay, as opposed to heuristically filtering the warn-ings [21], [22] or applying conservative static analysis [18],[32], we can robustly decide if a race condition is real (i.e., ifboth execution orders are feasible). The reason why existingtools report many bogus race conditions in the first place isbecause some hidden happens-before relations between eventsare not accounted for, and precisely capturing all happens-before relations would have been prohibitively expensive.

The second challenge is to decide, during state recordingand comparison, which fields of the program state are impor-tant and thus should be compared. For a typical client-side webapplication, the number of fields can be extremely large, whichmeans including all of them would result in large overhead atrun time. Furthermore, many fields are actually designed to besensitive to other sources of nondeterminism that are irrelevantto the race condition. For example, there are fields that needto have different values depending on the date or time of theday. In such cases, we should exclude them in order to avoidthe false positives.

Therefore, our main contribution in this context is develop-ing a flexible configuration interface to allow users to specifywhich fields should be excluded. We also propose a testing-based method (see Section VI) to automatically identify andexclude these irrelevant fields.

We implemented and evaluated RCLASSIFY on standardbenchmarks and real websites from Fortune-500 companies.Our experiments show that RCLASSIFY outperforms all otherexisting tools capable of handling the same benchmarks,including EVENTRACER [22], Mutlu et al. [18], and R4 [9].For example, RCLASSIFY identified all 33 known-to-be-harmful races out of the 50 warnings in standard benchmarks,whereas R4 identified only 8 of them and Mutlu et al. [18]did not identify any. Furthermore, on the seventy randomlychosen websites from the portals of Fortunate-500 companies,EVENTRACER [22] returned 1,903 warnings, among whichRCLASSIFY identified 73 as bogus, 132 as harmful, 1644as harmless, and 54 as undecided. We manually reviewedthe 132 harmful races and confirmed the correctness of ourclassification; in contrast, R4 [9] identified only 33 of theharmful races, indicating that it is significantly less effective.

To sum up, this paper makes the following contributions:• We propose an evidence-based method for classifying

race-condition warnings in web applications, by con-cretely executing the application to assess the actualimpact of racing events.

• We develop a platform-agnostic deterministic replayframework for JavaScript-based web applications, whichdoes not rely on modifying browsers or JavaScript en-gines, and thus is more widely applicable.

• We evaluate the new method on standard benchmarks aswell as a large number of real websites to demonstrate itseffectiveness in identifying the harmful race conditions.

1 <html>2 <head> ... </head>3 <body>4 <img src="image1.jpg" onload="image1Loaded()"

id="image1">5 <!-- omitted elements... -->6 <script id="script1">7 function image1Loaded() {8 document.getElementById("button1")

.addEventListener("click", func);9 }

10 function func() {11 document.getElementById("outputField").innerHTML

= "Well done!";12 }13 </script>14 <!-- omitted elements... -->15 <button id="button1"> button1 </button>16 <!-- omitted elements... -->17 <div id="outputField"> </div>18 </body>19 </html>

Fig. 2. Example: A client-side web application with race conditions.

II. MOTIVATION

In this section, we use examples to illustrate the ideas behindour new method while highlighting the technical challenges.

A. Race Conditions

Consider the web page in Fig. 2, which contains an image,a button, and a JavaScript code block. The image.onloadevent, fired after the browser downloads image1, invokes theimage1Loaded() function, which in turn attaches a listenerfunction to the onclick event of button1. The button maybe clicked by the user immediately after it is parsed, andits listener function func() changes text in outputField to’Well done!’ Thus, the expected event sequence is ev0:parsing(image1) → ev1: parsing(script1) → ev2: pars-ing(button1) → ev3: firing(image1.onload) → ev4 : parsing(outputField) → ev5: firing(button1.onclick).

However, depending on the network speed, load of thecomputer, and timing of the user click, there may be otherexecution sequences, some of which do not lead to theexpected display of ’Well done!’. As shown in the partialorder of events in Fig. 3, there are four race conditions:

1) RC1 is (ev1 , ev3) over image1Loaded

a) event ev1 : parsing(script1)b) event ev3 : firing(image1.onload)

2) RC2 is (ev2 , ev3) over button1

a) event ev2 : parsing(button1)b) event ev3 : firing(image1.onload)

3) RC3 is (ev3 , ev5) over button1

a) event ev3 : firing(image1.onload)b) event ev5 : firing(button1.onclick)

4) RC4 is (ev4 , ev5) over outputField

a) event ev4 : parsing(outputField)b) event ev5 : firing(button1.onclick)

The first race condition (RC1) is between the parsing ofHTML element script1 and the firing of image1.onload. Typ-ically, the parsing finishes first, but if image1 is downloadedbefore the parsing finishes, e.g., due to caching of the imageor slow parsing of other HTML elements preceding script1,

278279

※ Zhang et al. RClassify: Classifying Race Conditions in Web Applicationsvia Deterministic Replay. In Proceedings of ICSE 2017 Fig. 2より引用

提案手法: RCLASSIFY

1. サイトと既存検査ツールのwarningが入力2. instrumentaionを施してイベント列を記録3. 競合イベント(e1,e2)の実行順序が e1→e2 と e2→e1 とな

るようスケジュールしてリプレイ(それ以外のイベント順序は出来るだけ保つ)– 一方が実現不能であればデータ競合ではない (bogus)

4. 実行後に両者で状態を比較– 両者に本質的な違いが

あれば harmful なデータ競合

– 本質的に同じであればharmless なデータ競合

8

RClassify: Classifying Race Conditions in WebApplications via Deterministic Replay

Lu ZhangVirginia Tech

Blacksburg, VA, USA

Chao WangUniversity of Southern California

Los Angeles, CA, USA

Abstract—Race conditions are common in web applicationsbut are difficult to diagnose and repair. Although there existtools for detecting races in web applications, they all report alarge number of false positives. That is, the races they report areeither bogus, meaning they can never occur in practice, or benign,meaning they do not lead to erroneous behaviors. Since manuallydiagnosing them is tedious and error prone, reporting theserace warnings to developers would be counter-productive. Wepropose a platform-agnostic, deterministic replay-based methodfor identifying not only the real but also the truly harmful raceconditions. It relies on executing each pair of racing events in twodifferent orders and assessing their impact on the program state:we say a race is harmful only if (1) both of the two executions arefeasible and (2) they lead to different program states. We haveevaluated our evidence-based classification method on a large setof real websites from Fortune-500 companies and demonstratedthat it significantly outperforms all state-of-the-art techniques.

I. INTRODUCTION

Modern web applications are complex due to their needto implement many features on the client side through asyn-chronous programming and the use of JavaScript code whilemaintaining quick response to users. Although web browserstypically guarantee that each JavaScript code block is executedatomically, meaning there is no data-race in the traditionalsense, high-level race conditions can still occur due to de-ferred HTML parsing, interleaved execution of event handlers,timers, Ajax requests, and their callbacks.

Existing race detection tools for web applications [33],[21], [22], [7], [18], [9] often report many false positives.That is, warnings reported by these tools may be bogus, orreal but harmless. For example, EVENTRACER [22] reportedhundreds of warnings from the official websites of Fortune-500 companies; although some of these race conditions areindeed harmful, the vast majority are not, which means di-rectly reporting them to developers would have been counter-productive. None of the existing tools, including R4 [9], canaccurately assess the impact of racing events and robustlyidentify the real and truly harmful race conditions.

We propose RCLASSIFY, the first evidence-based methodfor classifying race-condition warnings in web applications.Toward this end, we develop a platform-agnostic deterministicreplay framework for client-side JavaScript programs, andleverage it to assess the actual impact of race conditions.Given a race-condition warning denoted (eva , evb), where eva

and evb are the racing events, we first execute the applicationwhile forcing eva to occur before evb, and then execute theapplication while forcing evb to occur before eva . We saythat (eva , evb) is a harmful race only if (1) both executionsare feasible and (2) the resulting program states, ps1 and

Instrumented Web Application

Compare the Program States

Execution 1

URL of Web Application

Race-conditionWarnings

Static Analysis of HTML files

Replay the RacingEvent Pair

Execution 2

Harmful orHarmless

Fig. 1. RCLASSIFY: Our evidence-based race-condition classification method.

ps2 , differ in some important fields of the HTML DOM,JavaScript variables, and environment variables of the browser.The intuition is that, when the order of eva and evb is not fullycontrolled by the program logic, and yet affects the programbehavior, it deserves a closer look by developers.

The overall flow of RCLASSIFY is shown in Fig. 1, whoseinput is the URL of the web application and a set of race-condition warnings, and whose output is the set of harmfulraces. First, it statically analyzes the HTML files and thenuses source-code transformation to add self-monitoring andcontrol capabilities. Then, it analyzes the race-condition warn-ings (reported by the race detection tool) and generates theconfiguration files needed for deterministic replay. Next, foreach pair (eva , evb) of racing events, it executes the applicationtwice—once with eva preceding evb and the other time withevb preceding eva . Finally, it compares the two resultingprogram states ps1 and ps2 .

There are two technical challenges. The first challengeis developing a robust method to deterministically replay aJavaScript-based client-side web application. This is difficultdue to the myriad possible sources of nondeterminism. Forexample, the race condition may occur during the deferredparsing of HTML elements, the interleaved execution ofJavaScript, the dispatch of multiple event handlers, the firing oftimer events, the execution of asynchronous HTTP requests,as well as their callback routines. In this context, our maincontribution is developing a unified framework for controllingthe execution order of the various types of racing events.

Our replay framework differs from the mechanisms usedby existing race detection tools such as EVENTRACER, WAVE,and R4, because it is implemented within the target web appli-cation itself and therefore is platform-agnostic. That is, we donot modify the web browsers or their underlying JavaScript ex-ecution engines (e.g., WEBKIT). Instead, we leverage source-code transformation to add self-monitoring and control capa-

2017 IEEE/ACM 39th International Conference on Software Engineering

DOI 10.1109/ICSE.2017.33

277

2017 IEEE/ACM 39th International Conference on Software Engineering

1558-1225/17 $31.00 © 2017 IEEE

DOI 10.1109/ICSE.2017.33

278

※ Zhang et al. RClassify: Classifying Race Conditions in Web Applicationsvia Deterministic Replay. In Proceedings of ICSE 2017 Fig. 1より引用

9-2

Instrumentation の詳細

z Platform agnostic– JavaScriptレベルのinstrumentationを頑張ることで、

プラットフォーム非依存で実現 (手をいれた特別なブラウザやJavaScriptエンジンを使わなくて良い)

– 結構泥臭くやっている

z 面白かった例として、イベント順を強制するために– イベント毎に「その前に起こっているべきイベントのリスト」を保持

– イベント発火時にまだ発火していないイベントがあったら、その時点では処理を実行せず、遅延して再実行

9

9-2

実験結果と所感

z 実験結果– 標準的なベンチマーク50個で正確な分類結果

(R4等の既存ツールでは一部の競合しか検出できず)– フォーチュン500企業からランダムに選んだ70サイトに適

用し、EventRacerの1903警告中、132を正しくharmfulと分類

z 所感– 静的解析と動的解析の組み合わせの王道– うまい問題を選んだ上で、実装と評価をきちんとやった– フォーチュン500から評価対象を選んでいるのが面白い

10

9-2