a p2p flow identification model based on bayesian network

34
102062626 黃黃黃 黃黃黃A P2P flow Identification Model Based On Bayesian Network Published in: Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on Date of Conference: 23-25 Sept. 2011 1/31

Upload: brenda-warren

Post on 31-Dec-2015

36 views

Category:

Documents


0 download

DESCRIPTION

A P2P flow Identification Model Based On Bayesian Network. Published in : Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on Date of Conference:  23-25 Sept. 2011. 102062626 黃柏勛 資工碩一. - PowerPoint PPT Presentation

TRANSCRIPT

102062626 黃柏勛 資工碩一

A P2P flow Identification Model Based On Bayesian

NetworkPublished in:Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on

Date of Conference: 23-25 Sept. 2011

1/31

Abstract❖ 1.Constitute A uniform P2P flow identification model. –

UFIM(Uniform Flow Identification Model)

❖ 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 

❖ 3. We make 6 measurements to denote identification performance. 

❖ 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly.

❖ 5. . All these works establish the base of giving new identification method further.

2/31

Introduction❖ P2P flows could be sorted to 4 classes:

1.Port identification 2. application layer characteristic word identification 3.transport layer heuristic identification 4.machine learning identification

❖ Erman et al. utilized two datasets, and contrasts 3 unsupervised clustering algo.:K-means, DBSCAN, and AutoClass .

He contrasted the accuracy, time-consuming, but without processing rate, real-time, CPU and memory consuming.

❖ Though many P2P flow identification method exist, but we

are lack of detailed contrasting and analyzing of different

identification method. 3/31

Introduction❖ This essay gives a UFIM (Uniform Flow Identification

Model) to describe different P2P flow identification method and give a theory of abstractly describing UFIM using Bayesian Network.

❖ And group the current flow identification characteristic to two categories: “basic characteristic” and “statistical characteristic” to decreasing the implementation complexity. => A Bayesian network model method to construct specific identification access.

❖ And give 6 measurements to analyze identification method.

4/31

II. P2P FLOW IDENTIFICATION MODEL

❖ Current p2p flow identification methods are different in

implementation access but have same essential characteristic—set mapping.

❖ Supposing that flows denotes the identified and classified flow sets, Y denotes the known application protocol set, then arbitrary identification method

could be denoted as F : flows →Y , namely the mapping

form flow set flows to application protocol set Y.

5/31

II. P2P FLOW IDENTIFICATION MODEL

6/31

II. P2P FLOW IDENTIFICATION MODEL

UFIM consists of 3 part mainly:

(1) Characteristic set X = {A1 A2.. Am } Ai is random variable and denotes the flow identification characteristic

(2) Application protocol set Y = {y1 y2 ...yn } yi is an arbitrary vector, and m denotes m random variables

corresponding to X and identify different application protocols;

(3) Mapping function F. for a given flow i flow , F could

judge the belonged application protocol of yk ,

7/31

II. P2P FLOW IDENTIFICATION MODEL

8/31

II. P2P FLOW IDENTIFICATION MODEL

We could take out a flow record (1)flow i from flows set and (2)construct a value vector a = {a1^0 , a2^0 , ..., am^0 } , which is corresponding to m characteristics in X. Then we (3)contrast a with n vectors in Y, and (4)output the application protocol yk , which has the highest similarity, as result.

9/31

II. P2P FLOW IDENTIFICATION MODEL

The accuracy UFIM A of UFIM is decided by 3 parts:

① set Y, this set is related with application protocol

classification and the accuracy of vector and denoted as (1)Ay ;

② (2)Aflow , it is related to accuracy of characteristic value a when construct unknown flow flowi ;

③(3)Af , it is related to the accuracy of mapping function F.

10/31

Abstract❖ 1.Constitute A uniform P2P flow identification model. –

UFIM(Uniform Flow Identification Model)

❖ 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 

❖ 3. We make 6 measurements to denote identification performance. 

❖ 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly.

❖ 5. . All these works establish the base of giving new identification method further. 11/31

III. Bayesian network description

12/31

III. Bayesian network description

Back ground

1.Baysian Network

2.Basic and Statistical characteristic

(Characteristic selection is important for identification.)

13/31

Baysian Network

-Directed acyclic graphical model is a probabilistic graphical model.

14/31

IIII.Bayesian network description

15/31

IIII.Bayesian network description

--Basic characteristics represents the characteristics that can be extracted directly from a single block, denoted by Ai^0 , the basic characteristic set is denoted as

--Statistical characteristics represents the characteristics that can be extracted from basic characteristics of multiple messages, denoted as Ai^j , where i represents the basic characteristics of Ai .

16/31

IIII.Bayesian network description

Through studying different existing identification methods and the 248 kinds of characteristics mentioned in literature [10], we selected 7 basic characteristics, as TableⅠ shows.

17/31

IIII.Bayesian network description

18/31

IIII.Bayesian network description

19/31

Abstract❖ 1.Constitute A uniform P2P flow identification model. –

UFIM(Uniform Flow Identification Model)

❖ 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 

❖ 3. We make 6 measurements to denote identification performance. 

❖ 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly.

❖ 5. . All these works establish the base of giving new identification method further. 11/31

IV. Performance measurements

Def .1: flow identification rate T: it denotes the maximum

packets needed for flow identification, that f constructs all the captured packets for identification characteristics

Because ni denotes the packet quantity needed for constructing Xi .

20/31

IV. Performance measurements

Def. 2: protocol distinguishing rate I: suggest that f

distinguish the belonged application protocol of flow with

conditional probability , then I denotes the

probability that packets were mis-distinguished, that is the proportion of misidentified flow in total flows.

21/31

IV. Performance measurements

Def 3: characteristic offset W: packets belongs to χ is

regarded as unknown flow, then W denotes the proportion of unknown flow in total flows.

Def 4: identification robustness H: it denotes whether the

correctness of f is correlated with the packet arriving order.

22/31

IV. Performance measurements

Def 5: flow identification consuming L:it denotes the time

needed for flow identification and equals to the time

complexity of f.

Def 6: flow identification space S: it denotes the memory

space of f needed for identifying flow and equals to the time complexity of f.

23/31

IV. Performance measurements

T reflects the real-time of f,

I and W reflect the correctness of f,

H reflects robustness of function f,

L and S reflect the complexity of f.

24/31

Abstract❖ 1.Constitute A uniform P2P flow identification model. –

UFIM(Uniform Flow Identification Model)

❖ 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 

❖ 3. We make 6 measurements to denote identification performance. 

❖ 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly.

❖ 5. . All these works establish the base of giving new identification method further. 11/31

V.Experiment Analysis-F denotes the number of error identification P2P flows, include false negative and false positive.

-U denotes as unknown flow

[A,B] denotes a set of testing data

25/31

V.Experiment Analysis-I and W denote protocol erroneous judgment and characteristic offset.

[A,B] denotes a set of testing data

26/31

V.Experiment AnalysisFrom the identification result we could conclude that the

proportion of F and U in P2P traffic is same to I and W. so I and W could be used to denote the identification accuracy of I and W.

27/31

Abstract❖ 1.Constitute A uniform P2P flow identification model. –

UFIM(Uniform Flow Identification Model)

❖ 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 

❖ 3. We make 6 measurements to denote identification performance. 

❖ 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly.

❖ 5. . All these works establish the base of giving new identification method further. 11/31

VI. Summary and My report

- we could analyze and compare performance of different identification method in uniform model.

28/31

VI. Summary and My report

- we could analyze and compare performance of different identification method in uniform model.

- Math is important.

29/31

Question

?30/31

By Bee31/31