shock: a worst-case ensured sub-linear time pattern matching algorithm for inline anti-virus...

15
Ensured Sub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning Author: Nen-Fu Huang, Wen-Yen Tsai Publisher: IEEE ICC ,2010 Presenter: Kai-Yang, Liu Date: 2012/1/4

Upload: claud-skinner

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

SHOCK: A Worst-Case EnsuredSub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning

Author:

Nen-Fu Huang, Wen-Yen Tsai

Publisher: IEEE ICC ,2010

Presenter: Kai-Yang, Liu

Date: 2012/1/4

INTRODUCTION•Challenges of an inline multi-pattern

matching algorithm:

Must be fast enough to scan millions of packets in the gigabit environment.

It is desirable for small memory footprint of the algorithm to scale well for the ever-growing virus patterns.

must perform well under a high volume of virus-infected traffic to avoid becoming the bottleneck.

2

ClamAV•ClamAV provides an anti-virus engine and

a regularly updated virus database.•ClamAV virus signatures can be classified

as one of the four categories: basic, regular expression (regex), MD5, and others.

3

Basic Patterns

•long minimum (>= 10 bytes )•average pattern length >= 25 bytes

4

The Proposed SHOCK Algorithm

•SHOCK(Shift/Hash with Overlap Check) algorithm consists of an offline preprocessing phase and an online pattern matching phase.

•The shift table is constructed using the same approach as in the WM algorithm with block size two and we calculate the hash value of the 2-byte prefix of each pattern.

5

Example

m = 4 B = 2

totorose

6

ot 0

to 0

se 0

oo 1

os 1

ro 2

•When a matched pattern is found, there may be another consecutive pattern in the text with prefix overlapping suffix of the currently matched one.

7

Example

totorose

8

ot 0

to 0

se 0

oo 1

os 1

ro 2

•For a pattern to be stored in the nextPat list of the current pattern, the number of its prefix characters which overlap suffix of the current pattern must be greater than or equal to PSP_TH.

9

•Although only a quite small number of patterns has long nextPat list when PSP_TH = 8 ,they must be specially handled to avoid the worst-case scenario.

10

Bitmap-offset-indexing structure

•only for those patterns with nextPat list length greater than the parameter, PBMAP_TH

11

Example

PSP_TH = 3

back_sh = PSP_TH -1 = 2

totorose

12

EXPERIMENTAL RESULTS

13

EXPERIMENTAL RESULTS

14

EXPERIMENTAL RESULTS

15