tfa: a tunable finite automaton for regular expression matching
DESCRIPTION
TFA: A Tunable Finite Automaton for Regular Expression Matching. Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H . Jonathan Chao Publisher: ACM/IEEE ANCS, 2007 Presenter : Ching-Hsuan Shih Date: 2014/05/28. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/1.jpg)
TFA: A Tunable Finite Automaton for Regular Expression Matching
Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan ChaoPublisher: ACM/IEEE ANCS, 2007 Presenter: Ching-Hsuan ShihDate: 2014/05/28
Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
![Page 2: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/2.jpg)
Outline Introduction Motivation Tunable Finite Automaton(TFA) Splitting NFA Active State Combinations State Encoding Performance Evaluation
2National Cheng Kung University CSIE Computer & Internet Architecture Lab
![Page 3: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/3.jpg)
Introduction (1/3)
Network Intrusion Detection System (NIDS)• Is a device or software to monitor the network whether
there are malicious activities.• Most IDS is to observe the network packet ,system log
or network flow. Regular Expression
• Current rule-sets like Snort, Bro, and many others are replacing strings with the more powerful and expressive regular expressions.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
3
![Page 4: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/4.jpg)
Introduction (2/3)
Deterministic Finite Automatons (DFAs) and Non-deterministic Finite Automatons (NFAs) are two typical representations of regular expressions.
The main problem with DFAs is prohibitive memory usage:• The number of states in a DFA scale poorly with the size and number
of wildcards in the regular expressions they represent. An NFA represents regular expressions with much less
memory storage. However, this memory reduction comes with the price of a high and unpredictable memory bandwith requirement.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
4
![Page 5: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/5.jpg)
Introduction (3/3)
In this paper, we propose Tunable Finite Automaton (TFA) with a small (larger than one) but bounded number of active states.
The main idea of TFA is to use a few TFA states to remember the matching status traditionally tracked by a single DFA state.
National Cheng Kung University CSIE Computer & Internet Architecture Lab
5
![Page 6: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/6.jpg)
Motivation (1/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
6
Regex :1. .*a.*b[ˆa]*c2. .*d.*e[ˆd]*f3. .*g.*h[ˆg]*i
Alphaset Σ ={a, b, ..., i}Number of states in DFA :54Number of states in NFA :10
Although the NFA requires much less memory, its memory bandwidth requirement is four times that of the DFA
![Page 7: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/7.jpg)
Motivation (2/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
7
![Page 8: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/8.jpg)
Motivation (3/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
8
![Page 9: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/9.jpg)
Motivation (4/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
9
We have seen the main reason for the DFA having far more states than the corresponding NFA is that the DFA needs one state for each NFA active state combination
One possible solution is to allow multiple automaton states (bounded by a given bound factor b) to represent each combination of NFA active states. We name it Tunable Finite Automaton (TFA).
![Page 10: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/10.jpg)
Tunable Finite Automaton (1/5)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
10
A. Constructing A TFAThe implementation of a TFA logically consists of two components : A TFA structure. Set Split Table (SST) : Each entry of the SST table corresponds to one
combination of NFA active states (i.e., a DFA state) recording how to split the combination into multiple TFA states.
![Page 11: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/11.jpg)
Tunable Finite Automaton (2/5)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
11
1. Generate the DFA states using the subset construction scheme [13]. The obtained DFA states provide us with all valid NFA active state combinations.
2. Split each NFA active state combination into up to b subsets, with the objective of minimizing the number of distinct subsets, and generate one TFA state for each distinct subset. After this step, we obtain the TFA state set QT and the set split table SST.
3. Decide the transition function δT . Different from traditional automatons, outgoing transitions of TFA states do not point to other TFA states. Instead, they point to a data structure called state label, which contains a set of NFA state IDs. Given a TFA state s, its state label associated with character “c” includes all NFA states that can be reached via character “c” from the NFA states associated with TFA state s.
4. Decide the set of initial states (I) and the set of accept states (FT ).
![Page 12: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/12.jpg)
Tunable Finite Automaton (3/5)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
12
![Page 13: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/13.jpg)
Tunable Finite Automaton (4/5)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
13
![Page 14: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/14.jpg)
Tunable Finite Automaton (5/5)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
14
B. Operating A TFA Assume the input string is “adegf ”. Initial active state : O1. a: return label {A,O}, next active states: OA2. d: return label {A,D,O}, next active states: O , AD3. e: return label {A,E,O}, next active states: O , AE4. g: return label {A,E,G,O}, next active states: OG , AE5. f return label {A ,F,G,O}, next active states: OG , AF6. AF is an accept state => match!
![Page 15: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/15.jpg)
Splitting NFA Active State Combinations (1/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
15
A. Set Split Problem (SSP) To find a minimal number of subsets from the NFA state set, so that for
any valid NFA active state combination, we can always find up to b subsets to exactly cover it.
b-SSP problem is an NP-hard problem for any b > 1. We present here a heuristic algorithm to solve the b-SSP problem.
![Page 16: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/16.jpg)
Splitting NFA Active State Combinations (2/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
16
B. A Heuristic Algorithm for 2-SSP Problem Given an NFA active state combination with v states, we consider only
two kinds of special splits:1. No split at all (i.e., one subset is empty).2. Splits that divide the combination into two subsets whose sizes are 1 and
v-1, respectively. The reason to use the second special split is that, after analyzing the NFA
active state combinations of many rule sets, we find many combinations of NFA active states differ from each other in only one NFA state.
![Page 17: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/17.jpg)
Splitting NFA Active State Combinations (3/3)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
17
![Page 18: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/18.jpg)
State Encoding
National Cheng Kung University CSIE Computer & Internet Architecture Lab
18
A simple scheme is to implement each state label as an array, including all associated NFA state IDs.• High storage cose.• TFA operation overhead.
Bit vector:• Find a way to assign each NFA state a bit vector, so that the bit vector
associated with each valid combination of NFA active states (i.e., each DFA state) must be unique.
• And the number of bits used in the bit vector is minimized.
![Page 19: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/19.jpg)
Performance Evaluation (1/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
19
![Page 20: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/20.jpg)
Performance Evaluation (2/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
20
![Page 21: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/21.jpg)
Performance Evaluation (3/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
21
![Page 22: TFA: A Tunable Finite Automaton for Regular Expression Matching](https://reader036.vdocuments.site/reader036/viewer/2022062315/568163a7550346895dd4b57e/html5/thumbnails/22.jpg)
Performance Evaluation (4/4)
National Cheng Kung University CSIE Computer & Internet Architecture Lab
22