finding bugs in dynamic web applications

39
Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Michael D. Earnst Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )

Upload: tacey

Post on 14-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Finding Bugs in Dynamic Web Applications. Shay Artzi , Adam Kiezun , Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar , Michael D. Earnst. Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis ). CSE 6329 Special Topics in Advanced Software Engineering. - PowerPoint PPT Presentation

TRANSCRIPT

Finding Bugs in Dynamic Web Applications

Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit

Paradkar, Michael D. Earnst

Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )

– Presented By

» Md. Monjurul Hasan

CSE 6329 Special Topics in

Advanced Software Engineering

Dynamic Web Application

• Generates pages (HTML contents) on-the-fly• Content varies on user and user-specified

criteria• Obtained by server-side programming

• We can say that all big, known web applications are Dynamic Web Application

Source: Dynamic Web Application Development using PHP and MySQL – By Simon Stobart and David Parsons

Web Threats

• Web script crashes and malformed dynamically-generated Web pages impact usability of Web applications

• Current tools for Web-page validation cannot handle the dynamically-generated pages

Web Script Crash

• Missing included file• Call to undefined method• Wrong Database query• Uncaught exceptions

Malformed HTML

• HTML that does not conform to the WDG (Web Design Group) or W3C’s (World Wide Web Consortium) standard – Not using defined tags by W3C (e.g.

<html><table><div>..etc.)– Not maintaining the structure(e.g.

<html><header></header><body> .. </body></html>)– Not using proper opening and matching closing tag– etc.

• Web Scripting language can generate HTML

The Problem

• Bad scripts creating syntactically-malformed HTML– Partially displayable or Non-displayable HTML– Browser’s attempt to correct crashes– Slower HTML rendering– Discard important information– Trouble indexing correct pages for search engines

• Example

More Problems

• Dynamic web page testing challenges– HTML validation tools only perform testing of

static page– Can not fully capture behavior since not all of

functionality of code is found in the HTML result– No automatic validator for scripting languages

that dynamically generate HTML pages– HTML Kit validates every generated page but requires manual

generation of inputs that lead to displaying pages

What this paper presents…

• Presents automated technique for finding faults manifested as Web script crashes or malformed-HTML – extends dynamic test generation to scripting languages.

• Identifies minimal part of input responsible for triggering failures

• Uses an oracle to determine well-formed HTML• Creates a tool, Apollo that implements all these

in the context of PHP

Why ?

• Widely used in Web development– Network interactions– Database– HTTP processing

• Object oriented• Scripting • 21 millions domains1 (75%) are powered

including large websites like Wikipedia, WordPress, Facebook, Dig etc.

1Source Netcraft, April 2007

Example: program

• SchoolMate.php– Allows school administrators to manage classes

and users, teachers to manage assignments and grades and students to access their information

• Typical URL:schoolmate.php?

page=1&page2=100&login=1&username=user&password=password

‘printReportCards.php’ missing

‘printReportCards.php’ missing

make_footer() not executed in certain situations unclosed HTML tag

make_footer() not executed in certain situations unclosed HTML tag

Generates illegal <j2> tagGenerates illegal <j2> tag

Failures in PHP programs

• Targets two types of failures– Execution failures

• Web Script Crashes

– HTML failures• Malformed HTML

Failure-Finding in PHP Applications

• Concolic Testing – Dynamic Test Generation TechniqueExecute application on 1. Initially on empty input2. Then on additional inputs, obtained by solving constraints

that are derived from control flow paths• Extensions

– Validate to correctness of program output by using oracle– Use isset, isempty, require, etc. to require generation of

constraints absent in other OOPL’s– Use pre-specified set of values for database authentication– Simulate each user input by transforming source code

Transformation of Code

• Interactive HTML pages with buttons and menus

• For each page (h) that contains N buttons– Add additional input parameter p to PHP program

• Values range from 1 to N

– Switch statement inserted including appropriate PHP source file, depending on p

An example

<?/* Simulated User Input*/Switch ($_GET[“_btn”] {Case 1:

require_once(“mainmenu.php”);break;

Case 2:require_once (“newuser.php”);break;

}?>

<?phpecho “<h2>Webchess “.$Version.” login”</h2>;?><form method = “post” action = “mainmenu.php”><p>Nick: <input name=“txtNick” type=“text” size=“15” /><br />Password: <input name=“pwdPassword” type=“password” size =“15” /></p><p><input name=“login” value=“login” type=“submit” /><input name=“newAccount” value=“New Account” type=“button” onClick =“window.open(‘newuser.php’, ‘_self’)” /></p></form>

The Failure Detection Algorithm• parameters: Program P, oracle O• result : Bug reports B;• B : setOf (<failure, setOf (pathConstraint), setOf (input)>)1. P ′ ≔ simulateUserInput(P);2. B empty;≔3. pcQueue emptyQueue();≔4. enqueue(pcQueue, emptyPathConstraint());5. while not empty(pcQueue) and not timeExpired() do6. pathConstraint dequeue(pcQueue);≔7. input solve(pathConstraint);≔8. if input not equals to then⊥9. output executeConcrete(P≔ ′, input);10. failures getFailures(O, output);≔11. foreach f in failures do12. merge <f , pathConstraint, input>into B;13. c1 . . . cn executeSymbolic(P∧ ∧ ≔ ′, input);14. foreach i = 1,. . . ,n do15. newPC c1 . . . ci−1 ≔ ∧ ∧ ∧ ¬ ci;16. queue(pcQueue, newPC);17. return B;

Example: Execution 1 (Expose Third Fault)

true – sets page = 0

false

GoTo(20)

Execution

HTML validation tool determines output is legal• NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1HTML validation tool determines output is legal• NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1

parameters: Program P, oracle Oresult : Bug reports B;B : setOf (<failure, setOf (pathConstraint), setOf (input)>)1.P ′ ≔ simulateUserInput(P);2.B empty;≔3.pcQueue emptyQueue();≔4.enqueue(pcQueue, emptyPathConstraint());5.while not empty(pcQueue) and not timeExpired() do6. pathConstraint dequeue(pcQueue);≔7. input solve(pathConstraint);≔8. if input not equals to then⊥9. output executeConcrete(P≔ ′, input);10. failures getFailures(O, output);≔11. foreach f in failures do12. merge <f , pathConstraint, input>into B;13. c1 . . . cn executeSymbolic(P∧ ∧ ≔ ′, input);14. foreach i = 1,. . . ,n do15. newPC c1 . . . ci−1 ≔ ∧ ∧ ∧ ¬ ci;16. queue(pcQueue, newPC);17.return B;

NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 = 1337Set(page)

NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 = 1337Set(page)

Example: Execution 2 (The Opposite Path)

• NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1– Constraint solver may get page2 0; login 1

true

true

HTML validation tool discovers failure and generates bug report added to output set

of bug reports

Minimization on Path Constraints

• Find shorter path constraint for a given bug report

• Eliminates irrelevant constraints – better assist programmer to detect location of the fault

• Solution for a shorter path constraint is often a smaller input

• Does not guarantee returned path constraint is shortest that exposes failure

Minimization Example

• HTML malformation from previous example could have been reached from different execution paths

NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1

Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1

page2 ≠ 1337 ∧ login = 1page2 ≠ 1337 ∧ login = 1

page2 ≠ 1337page2 ≠ 1337 login = 1 (login 1)login = 1 (login 1)

• parameters: Program P, oracle O, bug report b• result : Short path constraint that exposes b.failure1. c1 . . . cn intersect(b.pathConstraints);∧ ∧ ≔2. pc true;≔3. foreach i = 1, . . . , n do4. pci c1 . . . ci−1 ci+1 . . . cn;≔ ∧ ∧ ∧5. input solve(pci);≔6. if input not equals then⊥7. output executeConcrete(P, input);≔8. failures getFailures(O, output);≔9. if b.failure not belongs to failures then10. pc pc ci;≔ ∧11. input pc solve(pc);≔

12. if input pc not equals to then⊥

13. outputpc executeConcrete(P, input≔ pc );

14. failurespc getFailures(O, output≔ pc );

15. if b.failure failures∈ pc then

16. return pc;17. return shortest(b.pathConstraints);

Path Constraint Minimization Algorithm

Apollo

• User Input Simulator• Executor• Bug Finder

– Oracle– Bug Report Repository– Input minimizer

• Input Generator– Symbolic Finder– Constraint Solver– Value Generator

Apollo

Executor: Shadow Interpreter

• Shadow Interpreter – Modified Zend PHP interpreter 5.2.2 to record

path constraints and information associated with output

– Performs symbolic execution along with concrete execution

– Records conditions for PHP-specific comparison operations such as isset and empty

Executor: Database Manager

• Database Manager– (Re) initializes DB used by a PHP application.

Restores DB before each execution– Supply additional information about

username/password pairs

Bug Finder

• Bug Report = Failure + Path constraint + Input inducing failure

• Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML

• Oracle – HTML validation tool (WDG and WC3)• Input Minimizer – uses the path constraints

minimization algorithm

Input Generator

• Symbolic Driver – generates new path constraints and select next path constraint

• Constraint Solver – computes an assignment of values to input parameters that satisfies a given path constraint.– Choco constraint solver

• Value Generator – generates value for parameters– Combines random value generation and constant

values mined from source code

Experimentation

Program #files LOC PHP LOC # DL’s

faqforge 19 1712 734 14164

webchess 24 4718 2226 32352

schoolmate 63 8181 4263 4466

phpsysinfo 73 16634 7745 492217

total 179 31245 14968 543199

faqforge = Tool for creating and managing documentswebchess = Online chess gameschoolmate = PHP/MySQL solution for administering schoolsphpsysinfo = Displays system info

Generation Strategies

• Compared to two other approaches– Halfond and Orso (Randomized)

• Random values to the parameters• Proposed for JavaScript

– Minamide’s static analysis• Approximates the string output of program with a

context-free grammar• Discovers malformed HTML faults

• Apollo’s test input generation previously discussed

Methodology

• 10-minute runs on each program– Generation of hundreds of inputs

• Ran on both Apollo and Random test input generation strategies

• WDG offline HTML validation tool

Results Classification

• Execution crash: PHP interpreter terminates with exception

• Execution error: PHP interpreter emits warning visible in generated HTML

• Execution warning: PHP interpreter emits warning invisible to HTML output

• HTML error: program generates HTML for which validation tool produces error report

• HTML warning: program generates HTML for which validation produces a warning report

Randomized

Results Analysis

Apollo

Average line coverage – 58.0%Faults Found on Subject Apps – 214

Average line coverage – 15.0%Faults Found on Subject Apps – 59

Tries to load two missing files

Database related

Unset Time-zone

Resulted in Malformed HTML

Line Coverage = Number of executed lines / Total lines with executable PHP code in application

Results Analysis

• Apollo Vs Randomized– 58% line coverage Vs 15.2% line coverage– 214 faults Vs 59 faults

• Apollo Vs Minamide’s tool– 2.7 more HTML validation faults (120 Vs 45)– 83 additional execution faults– 104 faults (10 minutes) Vs 14 faults (126 minutes)

• Apollo is more effective and efficient than both

Results Analysis: Path Constraint Minimization

Program Success rate % Path Constraints Inputs

Orig. Size Reduction Orig. Size Reduction

faqforge 64 22.3 0.22 9.3 0.31

webchess 91 23.4 0.19 10.9 0.40

schoolmate 51 22.9 0.38 11.5 0.58

phpsysinfo 82 24.3 0.18 17.5 0.26

Reduces size of inputs by up to factor of 0.18 for more than 50% of faults

Reduces size of inputs by up to factor of 0.18 for more than 50% of faults

Success rate – Percentage of faults whose exposing input was minimizedOrig. size – Average size of original path constraints (# of conjuncts) and inputs (# of key-value pairs)Reduction columns – Ratio of minimized to un-minimized size. The lower the ratio, the more successful the minimization

Limitations• Simulating user inputs statically• JavaScript code in the generated HTML not

tracked• Limited line coverage for native C methods• Limited sources of input parameters

– Only inputs from global arrays (_POST, _GET and _REQUEST)

Thank you