robust and fast pattern matching for intrusion detection€¦ · thompson's on-the-fly...

16
Robust and Fast Pattern Matching For Intrusion Detection Kedar Namjoshi and Girija Narlikar Bell Labs INFOCOM 2010

Upload: others

Post on 26-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

Robust and Fast Pattern MatchingFor Intrusion Detection

Kedar Namjoshi and Girija Narlikar

Bell Labs

INFOCOM 2010

Page 2: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

2 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Intrusion Detection in a picture

PACKET NO MATCH (OK)

MATCH (BAD)

BAD PACKET SPECIFICATION

Page 3: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

3 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Intrusion Detection in more detail

PACKET

BAD PACKET SPECIFICATION

Pre-process Stage 0 Stage 1 Stage 2

Stage 0: IP address/flow-based filter Stage 1: Keyword match (Aho-Corasick algorithm) Stage 2: Regex match

NO MATCH NO MATCH NO MATCH

OK OK

BAD

OK

Page 4: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

4 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

An unexpected performance glitch

Snort slows down to ONE packet/second!

The slowdown is in the regex filter.

Why?

Page 5: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

5 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Regular Expressions and Non-Deterministic Finite Automata (NFAs)

Regular expression syntax

R := ² | a ( a 2 Σ) | R;R | R+R | R*

Thompson (1968) gives a linear-time translation to NFAs by induction on syntax. E.g., NFA for R* is

Examples

(a*)* ... a sequence of a's

(a+b)*

a100 (a+b)* ... 100 a's somewhere in an {a,b} string

(a?)nan ... optionally n a's followed by n a's

NFA for R ² ²

²

Page 6: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

6 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Pattern Matching Algorithms

On-the-fly determinization [Thompson 1968](grep)

Regular Expression

NFA

DFA

Backtracking algorithm(pcre)

Convert to

Table lookup (lex)

Page 7: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

7 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Properties

Regular expression → NFA: linear

NFA → DFA: exponential space in worst case

Backtracking: exponential time in worst case

Thompson's algorithm (on-the-fly): |NFA| * |packet| worst-case,

|NFA|+|packet| in practice if memoized (Dragon Book)

Why use backtracking at all?

1. When packet matches expression, may do less work than Thompson's algorithm

2. Easy to adapt to back-references

Page 8: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

8 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Back-References

Express non-regular properties: (.*)\1 matches words of the form ww

Matches “abab” and “aa” but not “aaa” or “abba”

\1 is a back-reference whose match must be identical to the substring captured by the bracketed expression (.*)

Increasingly important for IDS. Data for Snort:

Page 9: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

9 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

The Problem in a Nutshell

Thompson's on-the-fly algorithm has linear complexity for regular expressions

Back-reference matching uses backtracking

Problem: a specification with back-references mixed up with bad regular expressions

> potentially exponential matching complexity

> IDS open to a low-frequency DOS attack

Q: Can one detect the potential for exponential behavior off-line?

> NP-hard

Our solution: Extend Thompson's on-the-fly determinization algorithm to back-references

Page 10: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

10 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

On-The-Fly matching for Regex's with Back-references

Thompson's algorithm records the set of active NFA states which may be reached after reading a prefix of the input packet

Our algorithm is similar, except that it records a set of configurations. A configuration is a triple (q,map,state) where

“q” is an NFA state

“map” is a (possibly partial) mapping of brackets to a substring of the packet

“state” indicates whether it is necessary to match a back-reference at q

The set of configurations is updated when

A new bracket is “opened” or “closed” – e.g., at “(“ and “)” in (.*)\1

A new input byte is read from the packet

Worst-case space complexity = #of possible configurations = |NFA| * |packet|2K , where K = #back-references

Worst-case time complexity = |space| * |packet|

Page 11: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

11 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Refining worst-case bounds

The worst-case bounds are rather loose. For instance

(ab)\1 + (cd)\2 has two backrefs, but only one is used on any run

In (ab)(cd)\1(ef)\2 , the first backref is not used anywhere after the first occurrence

In (ab)(R)\1, the backref can only be bound to “ab”

We detect such cases with “liveness analysis” of the backref-NFA (a technique originally developed in compiler optimization)

On Snort rules

120 distinct backref expressions; backref-NFA's have between 70 and 30,000 states

111 expressions have a single backreference

8 expressions have 6 backreferences but at most 2 live at any state

1 expression has 4 live backreferences but each has a fixed match, so the true space complexity is linear and not |NFA| * |packet|2*4

So the worst-case bounds are usually not reached in practice

Liveness information can be further used to optimize the matching algorithm by restricting configurations to live references

Page 12: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

12 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Experiments with Snort (I)

Page 13: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

13 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Experiments with Snort (II)

Backtracking does less work because the traces are “bad” traces (i.e., they match the specification)

For “good” traces, the work done by the deterministic algorithm is at most that required for backtracking

Page 14: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

14 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

Closely Related Work

K. Thompson, “Regular Expression Search Algorithm” [C.ACM 1968]

S. Crosby, “Denial of service through regular expressions” [USENIX 2003]

S. A. Crosby and D. S. Wallach, “Denial of Service via Algorithmic Complexity Attacks” [USENIX Security Symp. 2003]

R. Smith, C. Estan, S. Jha, “Backtracking algorithmic complexity attacks against a NIDS” [ACSAC, 2006]

M. Becchi and P. Crowley, “Extending finite automata to efficiently match Perl-compatible regular expressions” [CONEXT 2008]

Page 15: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

15 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

DPI and Model Checking

MATCH (BAD) / Program Error

PACKET /Program

BAD PACKET SPECIFICATION /Negated Correctness Property

NO MATCH (OK) / Program Verified

DPI is verification on a finite path! Expressiveness/compactness vs. model checking complexity trade-off for specification languages Temporal logic can be checked in linear time on a finite path (folklore) DPI could exploit such tradeoffs, both for flexibility and for robustness

Page 16: Robust and Fast Pattern Matching For Intrusion Detection€¦ · Thompson's on-the-fly algorithm has linear complexity for regular expressions Back-reference matching uses backtracking

16 | Presentation Title | Month 2009 Copyright © 2009 Alcatel-Lucent. All rights reserved. XXXXX

To conclude...

Unrestricted use of regular expressions (with or without back-references) can result in exponential matching complexity

This is a serious and real robustness problem

Deterministic algorithms provide robust guarantees

Automaton analysis can help detect bad cases at compile time

There is much more that could be done to analyze NFA structure

There is much more that could be done with specification languages, to exploit the trade-off between expressiveness/compactness vs. matching complexity