classification of applications in http tunnels by gajen piraisoody, changcheng huang,biswajit nandy,...

26
Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering Carleton University. Ottawa, ON. Canada. 12 November 2013

Upload: ernesto-spike

Post on 30-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Classification of Applications in HTTP Tunnels

By

Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil Seddigh

Electrical and Computer EngineeringCarleton University.Ottawa, ON. Canada.

12 November 2013

Page 2: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 2

Outline• Overview• Motivation• Problem Statement• Contribution• Approach to classification• Evaluation• Conclusion

Page 3: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 3

Overview – HTTP Tunnel

What is HTTP Tunnelled Traffic?

• HTTP port used to carry web traffic

• Non-HTTP applications are wrapped in HTTP protocols

• HTTP port now tunnels email, chat, video, image, audio, file-transfer and

peer to peer traffic

Why HTTP Tunnel non-HTTP applications?

• HTTP clients (browser) are readily available and deployable

• Tunneling permits applications to by-pass restricted network connectivity

that exists in the form of firewalls, proxy and NAT

Page 4: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 4

Motivation

HTTP Traffic Classification

• HTTP traffic in an entire network is about 80%

• HTTP tunneled traffic is not identifiable by ports alone

• Tunneled traffic like YouTube and Netflix is increasing in cloud network

• Info on tunneled traffic helps cloud-centre management with planning,

provisioning and ensuring quality of service

Why flow-based against DPI classification process?

• Provides a scalable software solution(less CPU consumption)

• Can classify encrypted data

Page 5: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 5

Problem Statement

Given network traffic measured with NetFlow

Find a way to classify HTTP tunnelled traffic

• Audio (Radio & Music), Video and File-transfer

No training dataset needed for the proposed algorithm

Use information available from NetFlow only

Page 6: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 6

Contribution

Proposed scheme classifies HTTP tunneled traffic: audio(radio

& music), video and file-transfer

Proposed scheme helps audio classification by using

‘occupancy’ feature

Proposed scheme enhances classification performance by

including flow-group found using flows from Content

Servers(subnet masked IP of long-flow)

Page 7: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 7

Approach in detail

Identify long-flow HTTP traffic Parameter : BPF

Classify radio trafficParameter : BPF, BPP, BPS, Occupancy

Classify music trafficParameter : BPF, BPP, BPS, Occupancy

Classify video trafficParameter : BPF, BPP, BPS, Flow-group

Classify file-transfer trafficParameter : BPF, BPP, BPS, Flow-group

Bytes-per-second(BPS), Bytes-per-flow(BPF), Bytes-per-pkt(BPP)

Page 8: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 8

Approach to Classification

Identify Long-flow HTTP Traffic

Classify Audio Traffic

Classify Video & File-transfer Traffic

Page 9: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 9

Identify Long-flow HTTP Traffic

Identifying HTTP Traffic

Long-flow has byte size larger than a threshold Audio, video and file-transfer are generally long-flow

HTTP_PORTS 80, 443, 1935, 8008, 8080, 8088, 8090

Page 10: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 10

Identify Long-flow HTTP Traffic

Classify Audio Traffic

Classify Video & File-transfer Traffic

Approach

Page 11: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 11

Classify Audio Traffic

99.4 % of radio rates are between 20 and 320 Kbps (Statistics from 3683 online radio web sites)

98% of online music rates are between 64 and 320Kbps (Statistics from >20 online music sites)

95% Confidence Interval of radio bytes-per-packet are between 900 and 1470 (Samruay et.al [1])

95% Confidence Interval of music bytes-per-packet are between 1260 and 1500 (Samruay et.al [1])

  

Page 12: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 12

Classify Audio Traffic

Behavioral analysis: Online audio listener typically listens to

audio for more than 5 minutes

There are two distinct audio types : Radio & Music(songs)

New concept : Occupancy helps classify audio. Occupancy is a ratio of the

flow duration over the entire duration of a chunk of time.

  

0123456

Ave

rage

dow

nloa

d ra

te (M

bps)

music(Grooveshark)

radio (Hdradio)

video(CTV)

Page 13: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 13

Classify Audio Traffic

Difference between Radio & MusicContinuous - Radio contents appears to download every second of the flow

Dirac - Songs in a playlist are downloaded & played one at a time

The max/min size of a radio flow is dependent on maximum flow-period configuration and the offered radio rates

The max/min size of a music flow is dependent on max/min song duration and offered online music rates

95% confidence interval of radio occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 82%,100%

95% confidence interval of music occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 0%,55%

Assumption : Minimum number of radio-flows are two (5 minutes at least)

Assumption : Minimum number of music-flows are two ( 5 minutes at least)

Assumption : Maximum radio-phase timeout is based on a flow-period(120 seconds)

Maximum music-phase timeout is based on maximum song duration (382 seconds)

Page 14: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 14

Approach

Identify Long-flow HTTP Traffic

Classify Audio Traffic

Classify Video & File-transfer Traffic

Page 15: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 15

CDN’s Authoritative DNS Server

Client Server

1) Client clicks on audio/video hyperlink

2) Metafile sent to client

3) M

etafi

le

Listening

HTTP Server

CDN_1

Web Browser

Media Player

8) Request multimedia content 1

5) Responds with CDN site

6) FromDNS lookup ,request sent tio CDN admin

7) Responds with address of all contents on all CDN’s

CDN_n

4) Request multimedia content

9) Request multimedia content 210) Content1

11) Content2

Background

• Multimedia Distribution (3 types)

Page 16: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 16

Classify Video & File-transfer Traffic

Video flow-attributes (bytes-per-packet, bytes-per-flow, download rates)

& flow-group technique (FG) are used to classify video & file-transfers

Flow-group (FG)

• Video flow is associated with meta-data, style sheet, advertisements

• Kei.et.al[3] defined FG as the number of flows that occur within a few

seconds of video-flow with same destination-IP address

• Our expanded flow-group also includes flows that occur within a

longer duration that have the same subnet masked source-IP

address and the same destination-IP address

  

Page 17: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

An Example

Slide 17

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536012345678

Flow Size

flow-index

Log

10(B

ytes

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360

102030405060708090

Flow Duration

flow-index

TIm

e (S

econ

ds)

Page 18: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Example cont`d

Slide 18

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435360

200

400

600

800

1000

1200

1400

1600Bytes-per-packet

Flow Index

0 1 2 3 4 5 6 7 8 9 1011 12131415161718192021222324252627282930313233343536

Type of Flow

flow-index

vide

o-flo

w

flow-g

roup

signa

l-flo

w

Page 19: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 19

Classify Video & File-transfer Traffic

-60 -4 0 1 10

Kei.et.al's flow-group - 98% within 4 seconds before video-flow and 97.8% of flow-group are

within 1 seconds after video-flow

Flow-group range (seconds)

Improved flow-group - 94.4% within 60 sec-onds before video-flow and 94.1% of flow-group are within 10 seconds after video-flow

video-flow

All flow-group statistics are estimated from dataset DS-4 and DS-5

-92.6% of flow-group-bytes-per-flow is above 1000 and below 500000 -Almost 100% of flow-group bytes-per-packet are above 200

Page 20: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 20

Classify Video & File-transfer Traffic

Start

Gather potential V/F flows

• flow > 0.5MB

• & > 1260 bytes-per-pkt

• & > 128Kbps

• & order by destination-IP

and flow start time

End

For every potential V/F flow, gather potential

flow-group(FG) flows when:

• FG flow > V/F start-time – 4

• &FG flow < V/F start-time + 1

• & FG flow and V/F has same dest-IP

• & FG flow between 1000B and 0.5 MB

• & FG flow between 200 and 1500 BPP

For V/F-phase gather potential FG flows:

• Same source IP address-subnet

• Same destination IP address

• & FG flow > V/F start-time – 60

• &FG flow < V/F start-time + 10

• & FG flow between 1000B and 0.5 MB

• & FG flow between 200 and 1500 BPP

If FG == true:

inc FG counter

If FG == true:

inc FG counter

If FG >0:Label videoelse:Label file-transfer

Green is original flow-group(FG), Yellow is improvised flow-group. Both FG are run

:

Page 21: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 21

Evaluation

Datasets used to test algorithms Accuracy measurement assessment

• Precision is the systems correct predictions against all predicted value. That is precision = TP / (TP+FP)

• Recall is the systems correct predictions against all actual correct value. That is recall = TP / (TP + FN)

• F-Measure is the harmonic mean of recall and precision. That is F-measure => 2 * Precision * Recall / (Precision + Recall)

• accuracy = TP + TN / (TP + FP + FN + TN) – true results Compare against other algorithms

NaïveBayes SVM (Support Vector Algorithm)

Page 22: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 22

Evaluation – Datasets

SME-6 SME-7 SME-8Date 1/7/2013 1/22/2013 1/23/2013Duration(s) 24723 28207 13628Start-time (GMT-5) 10:18:04 10:29:04 10:56:20Flows 249822 287616 198409Packets 13376109 15351639 10170693

Bytes 11158181285 13589511746 8728052938

HTTP Flows 75485 87181 63951

HTTP Packets 7346663 8814438 5628558

HTTP Bytes 10456335955 12545720613 7982629610

Page 23: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 23

Evaluation – Results

SME6-Audio SME6-File SME6-Video SME7-Audio SME7-File SME7-Video SME8-Audio SME8-File SME8-Video

27.5%

59.5%

39.4%

56.1%

79.7%

70.8%66.5%

64.0%

86.6%

16.8%

23.2%

42.6%

21.6%

12.5%

40.4%

60.4%

49.1%

43.1%

84.9%

60.8%

72.9%

93.0% 93.6%

82.5%85.1%

89.7%94.2%

F-Measure

NaivesBayes SVM Proposed Algorithm

Page 24: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 24

Evaluation – Results

SME-6 SME-7 SME-8

NaivesBayes 39.1% 73.5% 71.4%

SVM 17.8% 16.3% 42.0%

Proposed Algorithm 70.5% 89.9% 90.9%

39.1%

73.5% 71.4%

17.8% 16.3%

42.0%

70.5%

89.9% 90.9%

Accuracy

Page 25: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 25

Conclusion

• Proposed algorithm uses flow-based approach and classifies high percentage of tunneled traffic : audio, video and file-transfer

• Proposed audio algorithm:• Used a concept called occupancy to classify radio & music traffic

• Proposed video & file-transfer algorithm• Used improvised flow-group method to help increase

classification accuracy of video and file-transfer traffic• Proposed scheme’s F-measure is at least 10% more than

NaiveBayes and SVM

Page 26: Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy, Nabil Seddigh Electrical and Computer Engineering

Slide 26

Reference[1] Samruay Kaoprakhon , Vasaka Visoottiviseth, "Classification of Audio and Video Traffic over HTTP Protocol," in Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, Sept 2009

[2] M. Twardos, "The Information Diet," 2011. [Online]. Available: http://theinformationdiet.blogspot.ca/2011/11/probability-distribution-of-song-length.html. [Accessed 2013]

[3] K Takeshita, T Kurosawa, M Tsujino and M Iwashita, "Evaluation of HTTP Video Classification Method Using Flow Group Information," in Telecommunications Network Strategy and Planning Symposium (NETWORKS), 2010 14th International, Sept 2010.

[4] H.Kim, K.Claffy, M.Fomenkov, D.Barman, M.Falutsos, K.Lee, " Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices Classification of Audio and Video Traffic over HTTP Protocol," in ACM, 2008

[5] POWERS, D.M.W. “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION ," in Journal of Machine Learning Technologies, Volume 2, Issue 1, 2011, pp-37-63