tor packet analysis - locating identifying markers

TOR Packet Analysis 2010

1

TOR PACKET ANALYSIS:

LOCATING IDENTIFYING MARKERS

BRENT MUIR

TOR Packet Analysis MUIR 2010

2

ABSTRACT

This paper examines the traffic analysis of “The Onion Router” (TOR) network in

order to identify any markers of TOR usage on the network packets. A historical

overview of anonymity systems on the internet is provided. A detailed examination

of the TOR system is also conducted discussing its development, its features, its

limitations and its weaknesses. The methodology utilised to locate any TOR

identifying markers is via a packet comparison of TOR and non-TOR identical network

packets. A high-level and a low-level traffic analysis are conducted resulting in some

TOR markers being identified. These results are put into a law enforcement context

in order for a forensic analysis of TOR network packets to take place.

Recommendations are given regarding the usage of TOR to mitigate the behavioural

actions of users that have inadvertently violated their anonymity.


3

TABLE OF CONTENTS

ABSTRACT ............................................................................................ 2

INTRODUCTION .................................................................................... 4

ANONYMITY ON THE INTERNET ........................................................... 5

TOR (THE ONION ROUTER) ................................................................. 10

METHODOLOGY ................................................................................. 19

TOR CLIENT ................................................................................................................ 20

PACKET COMPARISON ................................................................................................ 22 PACKET CAPTURE & ANALYSIS .................................................................................... 24

RESULTS ............................................................................................. 25

CONCLUSION/RECOMMENDATIONS .................................................. 33

REFERENCE LIST ................................................................................. 36


4

INTRODUCTION

Various technologies exist that assist internet users in maintaining their anonymity

while online. One of the most common technologies that allows for anonymity is

“TOR” (The Onion Router). This paper will examine the extent of anonymity that TOR

provides when the network traffic is subject to traffic analysis techniques.

Specifically, through the analysis of network packets, it is the hypothesis that TOR

traffic will be distinguishable from regular internet traffic. Through traffic analysis

and social engineering it is theorised that the originating IP address can still be learnt

from the remnants of the TOR network traffic.

Before discussing the analysis of TOR traffic, firstly anonymity on the internet will be

explained providing a brief background into the different techniques that have been

used since the internet was invented. Secondly, TOR itself will be discussed

explaining the technology behind the onion router and how it provides anonymity.

Next, the methodology utilised to test TOR’s ability to provide anonymity will be

explained, including traffic capture and analysis techniques. The results of the traffic

analysis will be detailed next providing an insight into TOR’s use as an anonymity tool

on the internet. Lastly, some recommendations will be given regarding the use of

TOR as an anonymity tool, and the analysis of TOR traffic in a law enforcement

context.


5

ANONYMITY ON THE INTERNET

The development of inter-networking that led to the “internet” was not designed for

the mass-usage that it currently facilitates1. Many features that users take for

granted were not specified, for example encryption, and were subsequently “tacked-

on” to a system that couldn’t initially support it, rather than building it from the

ground-up with these features included2. One feature that was never envisaged was

anonymity for its users3. With the exclusion of anonymity by default on the internet

many systems have been designed to fill this gap, for example proxy servers and MIX

networks, yet the network traffic these systems attempt to anonymise still relies on

the same network infrastructure to send and receive packets4, thus leading to a

misconception that these systems provide full-anonymity on the internet.

Before discussing these anonymity systems any further it is important to define the

term “anonymity”. Danezis and Diaz define anonymity as “the state of being not

identifiable within a set of subjects, the anonymity set”5. This definition implies that

a user on the internet should not be identifiable through their network traffic any

more than any other user of the internet, that is, the network traffic should have no

identifying characteristics. This is not the case with network traffic as it is necessary

for network traffic to contain identification characteristics such as IP headers and

port numbers so that the computers receiving this traffic are able to correctly

interpret the data contained within. A more suitable term for these systems might

be “unlinkability”, which is defined in the ISO15408 standard as follows:

[Unlinkability] ensures that a user may make multiple uses of resources or services

without others being able to link these uses together. [...] Unlinkability requires that

users and/or subjects are unable to determine whether the same user caused certain

specific operations in the system6.

1 (Hafner & Lyon, 2000)



4 (Danezis & Diaz, 2008)


6 In (Danezis & Diaz, 2008)


6

The term “unlinkability” describes a scenario where it is impossible to pin certain

network traffic to a particular user (or computer) which is more accurate for

describing the level of service that the anonymity systems provide on the internet.

There are numerous types of anonymity systems (anonymisers) designed to function

over the internet. These can be broken-down into three main categories: Proxy

servers; Anonymous email clients, and MIX or Crowd systems.

The idea of proxy servers is to route the network traffic through a proxy to hide the

original IP address of the internet connection. An example of these simple proxy

servers are “Anonymizer” and “SafeWeb”:

The Anonymizer product acts as a web proxy through which all web requests and

replies are relayed. The web servers accessed, should therefore not be able to extract

any information about the address of the requesting user7.

The purpose of these systems is to provide basic “unlinkability” and allows users the

ability to access IP location-specific content, such as videos hosted on

http://www.hulu.com/. These types of systems are known as “one-hop” proxy

servers as the network traffic is only routed through one proxy server at a time. This

makes them distinct from MIX networks which route their traffic through multiple

nodes (or “hops”) before reaching their destination. Due to the nature of the “one-

hop” system, back-tracing can be conducted on the internet traffic to determine the

originating IP address of the packets.

Anonymous email clients were originally devised in the early 1980’s by Chaum who

designed an email communication system that used Public Key Cryptography (PKG)

to not only hide the contents of the email message, but also the sender and receiver

of the email message8. The purpose of anonymous email clients is to provide a

communication channel where the author of an email message can communicate

with another without revealing their identity. An example of an anonymous email


8 (Chaum, 1981)

http://www.hulu.com/


7

client is the “Anon.penet.fi” relay system. This system utilised pseudonyms, or

fictitious names, to facilitate an anonymous communication system:

The technical principle behind the service was a table of correspondences between

real email addresses and pseudonymous addresses, kept by the server. Email to a

pseudonym would be forwarded to the real user. Email from a pseudonym was

stripped of all identifying information and forwarded to the recipient9.

These types of systems were designed with email communication in mind; however,

email is only one form of communication over the internet, and so these systems are

not suitable for providing anonymity (or unlinkability for that matter) for the

majority of internet traffic.

MIX or Crowd systems are similar in design to proxy servers; however they relay the

network traffic through multiple MIXes (or nodes) rather than through only one

proxy server.

Each user contacts a central server and receives the list of participants, the crowd. A

user then relays her web requests by passing it to another randomly selected node in

the crowd. Upon receiving a request each node tosses a biased coin and decides if it

should relay it further through the crowd or send it to the final recipient 10.

The principal idea is that messages to be anonymized are relayed through a node,

called a mix. The mix has a well-known RSA public key, and messages are divided into

blocks and encrypted using this key. The first few blocks are conceptually the

“header” of the message, and contain the address of the next mix. Upon receiving a

message, a mix decrypts all the blocks, strips out the first block that contains the

address of the recipient, and appends a block of random bits (the junk) at the end of

the message. 11





8

Another purpose of MIX systems is to “actually mix together many messages, to

make it difficult for an adversary to follow messages through it, on a first-in, first-out

basis”12.

Like the previous systems, Crowds and MIXes are also susceptible to various attacks

that undermine the level of anonymity that they provide. One such attack is outlined

by Reiter and Rubin who explain that these systems “can be undermined by

executable web content that, if downloaded into the user's browser, can open

network connections directly from the browser to web servers, thus bypassing

Crowds altogether and exposing the user to the end server”13. By utilising end-to-end

traffic analysis techniques other inadequacies of these systems can also be

highlighted:

Another attack tries to correlate events at the endpoints of the system: if a user

makes an HTTP request, it is reasonable to assume that this request leaves the last

MIX towards a web server shortly later. Similarly, the response sent from the web

server to the last MIX will appear on the link between first MIX and user within some

seconds14

.

Onion routers, such as TOR, work similarly to Crowds in providing anonymity; in fact

their goal has been described as “to protect communication so that the recipients

and the sender cannot be linked by an adversary analyzing the network traffic”15.

Onion Routing is similar to Crowds in that an initial message forms a path of proxies

through which the initiator sends its future messages. The protocol gets its name

from its method of encrypting the initial packet and the address of the proxies at

each hop on the path with the public key of the previous step. This scheme results in

layers of encryption that are peeled off at each step in order to determine the next

12

(Danezis & Diaz, 2008) 13

(Reiter & Rubin, 1998) 14

(Rennhard & Plattner, 2002) 15

(Gomu kiewicz, Klonowski, & Kutylowski, 2004)


9

address to send to on the path. This requires the initiator to predetermine the entire

path16.

One issue with onion routing is highlighted by Danezis and Diaz who state that

“onion routing aims at providing anonymous web browsing, and therefore would

become too slow if proper mixing was to be implemented”17. This means that

network traffic utilising onion routing does not mix various messages together unlike

MIX networks. Another weakness with onion routing is described by Wright et al.:

Onion Routing has generally been implemented with the onion routers being placed

in the network outside of the control of the individual users. While it can be argued

that this reduces the possibility of corruption of any particular onion router, it

requires that the users trust the operators of the onion router to maintain their

anonymity 18.

The level of trust required in the people or organisations which host these nodes is

possibly the biggest weakness in onion routing. Any corrupt node along the route

can compromise the entire anonymity of the network packet.

…strong anonymity against traffic analysis requires cooperation by and implicit trust

in many different parties. Any single entity, no matter how trustworthy it appears,

can be subverted, whether by technical means, corrupt personnel, or so-called

“subpoena attacks”19.

16

(Wright, Adler, Levine, & Shields, 2002) 17


(Wright et al., 2002) 19

(Androulaki, Raykova, Srivatsan, Stavrou, & Bellovin, 2008)


10

TOR (THE ONION ROUTER)

One of the most widely deployed onion-routing anonymising systems is TOR (The

Onion Router). TOR is known as a "second-generation onion router"20 and was

originally funded by the United States Naval Research Centre and the United States

Defense Advanced Research Projects Agency (DARPA)21. The history of the TOR

project will be briefly outlined before a detailed examination of the TOR design is

discussed. After this the design limitations and weaknesses of TOR will be scrutinised

providing an overview of the attacks that have been proposed to target the TOR

system.

HISTORY

The beginnings of the TOR project date back to the mid-1990s when the US Office of

Naval Research (ONR) began developing onion routing techniques22. The

development of the second generation of onion routers did not begin until 2002

when funding was provided by DARPA and ONR23. In 2003 the TOR network was

publicly deployed with nodes spanning across 2 continents and the following year

“hidden services” went online24. The funding from DARPA and ONR ceased in 2004

and the Electronic Frontier Foundation stepped up to continue funding the TOR

project25. One of the main purposes of TOR has been stated as being to “defend

against a form of network surveillance that threatens personal freedom and

privacy”26. It is widely acknowledged that TOR is often used by journalists and people

who wish to remain anonymous while browsing online, as well as people who have

20


(The Tor Project Inc, 2009b) 22

(Naval Research Laboratory) 23




(The Tor Project Inc, 2009a)


11

restricted internet access, for example people living in China27. The US military also

utilise TOR to host hidden services for intelligence gathering purposes28.

DESIGN

The TOR system follows on from traditional onion routing services, that is it utilises

proxy servers in order to spoof an IP address so that the originating IP address

remains unknown. TOR can be seen as a mix between onion routing and crowd

systems. The TOR system “tunnels everything over TCP Port 80” 29 “over a network of

relays, and is particularly well tuned to work for web traffic, with the help of the

‘Privoxy’ content sanitizer”30. Privoxy is a web proxy service which modifies “web

page data and HTTP headers”31 and is commonly used for “removing ads and other

obnoxious Internet junk”32. In the case of TOR this web cache assists in removing

web traffic that could reveal the true IP address of the user, such as Javascript or

Flash content. Unlike traditional onion routing services TOR does not send the traffic

through in its original packet format, instead TOR uses fixed-length “Cells” to

transfer data. Each Cell consists of a header and a payload (see Diagram 1). As stated

by Fraser et. al, “TOR operates using fixed 512 byte cells (or packets) for stronger

anonymity and the Transport Layer Security (TLS) protocol for authentication and

privacy”33. Coupled with this Cell-based design, TOR utilises “Circuits” to choose the

path that the data will take as well as which protocol layer to anonymise: “they may

intercept IP packets directly, and relay them whole (stripping the source address)

along the circuit”34.

27

(The Tor Project Inc, 2009c) 28

(The Tor Project Inc, 2009c) 29



(Privoxy Developers 2010) 32


(Fraser, Raines, & Baldwin, 2005) 34

(Dingledine, Mathewson, & Syverson, 2005)


12

Diagram 1 – TOR Cells (Packets)35

TOR uses a traditional network architecture: a list of volunteer servers is downloaded

from a directory service. Then, clients can create paths by choosing three random

nodes, over which their communication is relayed. Instead of an `onion' being sent to

distribute the cryptographic material, Tor uses an iterative mechanism. The client

connects to the first node, then it request this node to connect to the next one. The

bi-directional channel is used at each stage to perform an authenticated Diffie-

Hellman key exchange. This guarantees forward secrecy and compulsion resistance:

only short term encryption keys are ever needed36.

Diagram 2 – The TOR Network37

35

(Dingledine, Mathewson, & Syverson, 2004) 36


(Bauer, McCoy, Grunwald, Kohno, & Sicker, 2007)


13

HIDDEN SERVICES

Another benefit of the TOR system over traditional onion routers is that it allows

users to host content on the internet that can only be accessed via the use of the

TOR system, these are known as “Hidden Services”. These hidden services are

denoted by the use of the virtual Top Level Domain (TLD) “.onion” which is the

address entered by the user to connect to this type of service. When connecting to a

hidden service a user creates a new circuit to the hidden service’s rendezvous point

which adds an extra layer of protection38 (see Diagram 3). As claimed by Dingledine

et al. “this type of anonymity protects against Distributed-Denial-of-Service attacks:

attackers are forced to attack the onion routing network because they do not know

the host’s IP address”39.

Diagram 3 - Normal use of hidden services and rendezvous servers 40

38

(Dingledine et al., 2004) 39

(Dingledine et al., 2004) 40

(Øverlier & Syverson, 2006)


14

LIMITATIONS and WEAKNESSES

The TOR system is not without its share of limitations; Danezis and Diaz raise the

point that “one notable difference between TOR and previous attempts at

anonymizing streams of traffic, is that it does not claim to offer security against even

passive global observers”41. In fact Lemos states that “the problem is known to both

the Tor Project, which advises everyone to use end-to-end encryption, and to

security researchers”42. This limitation accumulates to the following point, “an

adversary, who can observe a stream at two different points, can trivially realize it is

the same traffic”43.

This limitation leads to weaknesses that can be exploited to undermine the

anonymity of the TOR system. As outlined in Table 1 there are two types of attacks

against the TOR network: active attacks and passive attacks.

PASSIVE ATTACKS ACTIVE ATTACKS

– Packet and connection timing

correlation

– Lying about bandwidth to get more

traffic

– Fingerprinting of traffic/usage patterns – Failing circuits to bias node selection

– “Intersection Attacks” of multiple

attributes of users

– Modifying application layer traffic at

exit

Table 1 – Attacks Against TOR44

Passive attacks involve collecting of the network packets for later analysis and are

often hard to detect45. Fu et. al state that “passive traffic analysis attacks may, at first

sight, appear innocuous since those attacks do not actively alter the traffic (e.g.,

41


(Lemos, 2007) 43


(Perry, 2007) 45

(Fu, Graham, Bettati, & Zhao, 2003)


15

drop, insert, and modify packets during a communication session)”46. Whereas active

attacks use probing methods to collect packet information which may alter the

traffic on the network. The various types of attacks against TOR, and there position

in the TOR network, are detailed in Diagram 4. As stated by Sun et. al:

Even when multiple proxies are used, however, the first link between the user and

the first proxy is the most vulnerable to attack, since the attacker (whether the first

proxy itself, the user's ISP, or perhaps an eavesdropper (say, on a wireless link) can

immediately determine the user's network address47.

Diagram 4 - TOR Attack Points48

One common attack against the TOR system is known as a “Timing Correlation

Attack”. This type of attack uses timing analysis methods to determine the network

latency of the TOR system. As observed by Murdoch:

…the load on the Tor node affects the latency of all connection streams that are

routed through this node. A similar increase in latency is introduced at all layers. As

expected, the higher the load on the node, the higher the latency49.

46

(Fu et al., 2003) 47

(Sun et al., 2002) 48

(Perry, 2007)


16

An attacker relays traffic over all routers, and measures their latency: this latency is

affected by the other streams transported over the router. Long term correlations

between known signals injected by a malicious server and the measurements are

possible. This allows an adversary to trace a connection up to the first router used to

anonymize it50.

Diagram 5 - How Much Anonymity Does Network Latency Leak?51

(Measuring TOR circuit time without application-layer ACKs: the estimate for TAX

is t3 - t1. We abuse notation and write TXY for the one-way delay from X to Y 52)

Using website fingerprinting is another passive attack against the TOR system. In this

type of attack an adversary “fingerprints” commonly visited websites to determine

their file size, this file size is then compared to the network packets to determine if

there are any matches. As stated by Hintz:

49

(Murdoch & Danezis, 2005) 50


(Hopper, Vasserman, & Chan-Tin, 2007) 52

(Hopper et al., 2007)


17

When a user visits a typical webpage, they download several files. A user downloads

the HTML file for the webpage, images included in the page, and the referenced

stylesheets. Each of these... files has a specific file size which is for the most part

constant.53

Attacks against TOR Hidden Services have also been devised. Øverlier and Syverson

discuss an attack which is used to locate the address of the Hidden Service. To carry

out this attack a compromised TOR node and a malicious client machine are used to

make repeated connections to the Hidden Service (see Diagram 6).

The main idea is to make many connections to the hidden server, so that it eventually

builds a circuit to the rendezvous point using the malicious Tor node as an entry

point. The malicious Tor node uses a simple timing analysis (packet counting) to

discover when this has happened54.

Diagram 6 - Vulnerable location of Attacker in communication channel to the

Hidden Server 55

Although the design specifications of the TOR system negates traditional DDoS

attacks, Fraser et. al have proposed a mutated DDoS attack on TOR based on TOR’s

use of TLS.

53

(Hintz, 2003) 54

In (Hopper et al., 2007) 55

(Øverlier & Syverson, 2006)


18

DDoS attacks targeting an Onion Router’s CPU are possible due to TOR’s dependence

on TLS. Such attacks force an Onion Router to execute so many public key

decryptions that it can no longer route messages56.

Another weakness in the TOR system is due to the fact that any user may host a TOR

server (node) which means that any person wishing to host a compromised node is

able to do so without any major hurdles. TOR designers have developed a formula

for determining the probability of using a compromised node:

…the probability of choosing a compromised entrance node is m/N and the

probability of choosing a compromised exit node is the same, thus, the combinatorial

model is expressed as (m/N )2, where m > 1 is the number of malicious nodes and N is

the network size…57

Another passive attack can be achieved by hosting a compromised TOR node and

collecting the unencrypted packets exiting this node. In this type of attack high-level

information about the network traffic can be learnt. Egerstad conducted an attack

against TOR using this method described and was able to intercept email messages

“discussing military and national-security issues between embassies and sensitive

corporate e-mail messages”58. This highlights another limitation of the TOR system,

or any anonymity system, if users enter their real logins and email addresses into

TOR then their perceived anonymity is compromised. TOR is not designed to be used

by “real” users due to the lack of end-to-end encryption, instead it is recommended

that people utilise anonymous email clients and logins59.

56

(Fraser et al., 2005) 57

In (Bauer et al., 2007) 58

In (Lemos, 2007) 59

(The Tor Project Inc, 2009a)


19

METHODOLOGY

A gap exists in the research regarding how the weaknesses in TOR can be utilised

from a law enforcement perspective. In order to establish what information can be

gathered from the analysis of TOR packets, a packet comparison is necessary. This

comparison will examine the TOR packets as well as identical non-TOR (or standard)

internet packets. There are three stages to the methodology: Setting up the TOR

system; Packet selection, and Analysis. The setup of the TOR system will be discussed

to detail how the packets will be intercepted. This will be followed by an explanation

of the types of internet traffic examined. Finally the analysis stage will be outlined

discussing the various tools utilised to examine the TOR packets.


20

TOR CLIENT

To ensure that the network traffic was generated from identical machines, virtual

machines (VMs) were utilised: one with TOR installed and the other without TOR.

Originally it had been planned to run a TOR exit node on a local server in order to

capture the unencrypted network traffic as it left the exit node, however it was

determined that to propagate realistic network traffic locally would produce

undesired results. Instead a standard TOR client was installed on the TOR-VM.

The full specifications for the two VMs was as follows:

TOR VM Non-TOR VM

CPU Intel Dual core E6550 @ 2.33GHz

Intel Dual core E6550 @ 2.33GHz

RAM 1 GB Ram 1 GB Ram

Operating System Microsoft Windows XP SP3

Microsoft Windows XP SP3

Web Browser Mozilla Firefox version 3.5.6

Mozilla Firefox version 3.5.6

TOR/Vidalia TOR version 0.2.1.21 Vidalia version 0.2.6

N/A

WireShark

(traffic capture)

WireShark version 1.2.5 (SVN Rev 31296)

WireShark version 1.2.5 (SVN Rev 31296)

Eeye IRIS

(traffic analysis)

Eeye IRIS version 5 Eeye IRIS version 5

Table 2 – VM Comparison

By running a fully functioning TOR client for end-users allows for the packets to be

generated on-the-fly over the internet rather than propagating traffic to simulate

the internet. Rather than capture the TOR-packets on the local machine, which

would be unencrypted, the TOR traffic was captured by observing the traffic

entering the LAN (as depicted in Diagram 7). When installing TOR it provides a

Mozilla Firefox plug-in that can be switched on and off. It is for this reason that

Mozilla Firefox was utilised for the web browsing aspect of this research. Windows


21

XP was chosen as the operating system for the VMs, this is due to the full

compatibility of the TOR system with Windows XP.

Diagram 7 – TOR and Non-TOR Network Setup


22

PACKET COMPARISON

The types of internet traffic chosen to utilise for this analysis was based on the

highest internet hits of December 2009, as compiled by Nielsen60 (see Table 3). By

examining these statistics the following information about web usage can be

gathered: the internet is used as a source of information (for example Google or

News Corp); the internet is used as a communication medium (for example Facebook

or Yahoo); the internet is used as a source for shopping (for example eBay or

Amazon).

Using this knowledge the following web browsing usage was established:

1. Yahoo was selected as the user’s homepage. The user would log on to their

Yahoo webmail account.

2. The user would read their emails as well as write an email.

3. Following-on from reading their email, the user would click a link inside a

www.news.com.au email and read a few news articles, including one

regarding the 2010 Winter Olympics.

4. The user would then visit www.google.com and search for “winter Olympics”.

5. This search would result in a www.wikipedia.com link which the user would

click on.

6. From the original “winter Olympics” Wikipedia entry the user would click on

the 2010 winter Olympics link.

7. The user would then enter www.ebay.com into the web browser and search

for “winter Olympics tickets”.

8. Following this search the user would then browse a few of the resulting links.

9. The user would then enter www.amazon.com into the web browser and

search for “winter Olympics tickets”.

10. The user would then search for “ice hockey” under the “movies” category and

click on the first link.

11. The user would then enter www.facebook.com into the web browser and

login to their account.

60

(Nielsen, 2010)

http://www.news.com.au/

http://www.google.com/

http://www.wikipedia.com/

http://www.ebay.com/

http://www.amazon.com/

http://www.facebook.com/


23

12. On the Facebook site the user would search for “winter Olympics” under the

“groups” category.

13. The user would then join a “winter Olympics” group and add a message to the

group’s Facebook “wall”.

14. Then the user would enter www.bing.com into the web browser and search

for “what is my ip”.

15. Following the above search the user would click on the link

www.whatismyip.com and recover their IP address (it is to be noted that

when TOR is installed the original homepage is always an IP address providing

link).

RANK PARENT

UNIQUE

AUDIENCE

(000)

ACTIVE

REACH

%

TIME PER

PERSON

(HH:MM:SS)

1 GOOGLE 353,851 83.91 2:38:50

2 MICROSOFT 315,490 74.81 3:01:38

3 YAHOO! 228,711 54.23 2:12:36

4 FACEBOOK 206,878 49.06 5:57:17

5 EBAY 163,844 38.85 1:41:31

6 WIKIMEDIA

FOUNDATION 141,239 33.49 0:16:01

7 AMAZON 137,364 32.57 0:32:11

8 AOL LLC 129,360 30.67 2:21:03

9 NEWS CORP.

ONLINE 120,316 28.53 0:59:17

10 INTERACTIVECORP 115,131 27.30 0:11:36

Table 3- Top 10 Global Web Parent Companies, Home & Work December 200961

61

(Nielsen, 2010)

http://www.bing.com/

http://www.whatismyip.com/


24

PACKET CAPTURE & ANALYSIS

The analysis phase of the methodology has two stages: the packet capture stage;

and the packet analysis stage. To capture the network packets WireShark was

selected as it is a robust network capture tool based on the “pcap” library. The first

stage of the analysis involves capturing the identical network traffic from the two

VMs via WireShark. WireShark can then be utilised to conduct the first form of traffic

analysis to examine the low-level protocol information of the network traffic. This

initial analysis will focus on IP header information as well as connection types and

port information.

For the next stage of analysis Eeye’s IRIS will be utilised to rebuild the html traffic.

IRIS is a commercial network traffic monitoring and analysis tool that works on all

IPv4 internet traffic. It is able to rebuild html traffic as well as provide statistical

information about the network traffic.

By utilising WireShark and IRIS it will be possible to drill-down into the network

packets in order to exploit social engineering strategies to locate personally

identifiable information from the network packets. The social engineering analysis

will attempt to discover personally identifiable information from various sources,

including email and social networking sites. As stated by Cohen, “the forensic

examiner is more interested in high level information obtained from the traffic

rather than low level protocol information”62.

As the TOR traffic will be captured on the LAN, the most important question to

answer will be if there is anyway to tell if network packets are utilising TOR from

packet information. This is, can traffic analysis be used to “fingerprint” the network

packets in order to identify the usage of TOR.

62

(Cohen, 2008)


25

RESULTS

As the network traffic is already known, the purpose of the analysis was not to

distinguish the websites visited by the user, instead the analysis is to determine

what, if any, TOR-specific traffic fragments can be identified in order to violate the

anonymity-properties of the TOR system.

LOW-LEVEL ANALYSIS

By observing the network packets through WireShark the low-level packet properties

were examined and compared. It is evident through the analysis of these packets

that TOR packets do not contain any property that can be utilised to “fingerprint” the

header of the packet, that is, there is no recurring hex header of the network traffic

that can be associated with TOR traffic. This is due to the first part of the TOR “cell”

being the CircID (Circuit ID) of 2 bytes, which is unlikely to be the same as numerous

circuits can be multiplexed over the single TLS connection.

This is not to say that TOR traffic cannot be recognised on-the-fly, just that a hex

header for packet fingerprinting is not possible. One way that TOR traffic can be

identified compared to standard internet traffic is through the default port number

that TOR utilises, port number 9001. By applying a TCP port filter in WireShark the

TOR traffic can be easily monitored (see Screenshot 1). Officially port 9001 is

reserved for traffic related to the “Microsoft Sharepoint Authoring Environment”;

however, TOR is setup by default to take advantage of this port number for both a

source port and a destination port. This is not to say that TOR can’t be re-configured

to use other TCP port numbers, only that a default installation TOR will utilise port

9001.

Screenshot 1 – WireShark TOR Port Filter


26

Screenshot 2 – TOR Traffic on Port 9001

By filtering for port 9001 on the LAN the TOR traffic was able to be observed. Once

the TOR traffic has been identified it was important to note that the IP source and

destination address information could be learnt through analysis in WireShark. As is

seen in Screenshot 3 the destination address for this packet is 192.168.1.5. Knowing

this IP address will allow for future analysis and capturing of the unencrypted

packets from the local machine hosting the TOR client. In this way the identification

of the TOR packets could be used to determine which user has TOR installed on their

machine.


27

Screenshot 3 - IP Address Identification

HIGH-LEVEL ANALYSIS

The TOR packets, when encrypted, do not allow for the HTML data to be rebuilt.

Although this may obstruct a high-level traffic analysis from taking place on these

TOR packets, there are workarounds which allow for the SSL-encrypted TOR packets

to be rebuilt. There is a WireShark plug-in called “TOR Dissector” which, when run on

a local machine running a TOR client , captures the user’s TOR SSL keys and decrypts

the TOR packets on the fly (see screenshot 4). This leads to an issue about whether

someone would be able to access a user’s local machine and run WireShark in

conjunction with TOR Dissector without the user’s knowledge. This does, however,

lead to an alternative method to decrypt the TOR traffic without the user suspecting

anything, by conducting a Man-In-The-Middle (MITM) attack. A MITM attack

positioned between the user’s computer and the TOR server will allow an attacker to

decrypt the user’s TOR packets in real-time, and either rebuild the HTML or filter for

plain text.


28

Screenshot 4 – TOR Dissector

Private LAN/Corporations

By using a network capture tool such as WireShark and filtering the internet

connection of the LAN it is easy to recognise when TOR is used. By observing the TOR

packets on the LAN a corporation would be able to pinpoint the local host computer

utilising the TOR network. The use of TOR in many organisations is in itself likely to

breach their internet usage policies, and once the local host is determined any future

TOR packets could be captured and rebuilt using the WireShark plug-in “TOR

Dissector”, or by conducting a TOR MITM attack.

Government/ISP

If a corporation, or a Government, does not have access to the local machine running

TOR then the TOR MITM attack can still be performed to decrypt the TOR traffic.

Similarly it is possible to establish a compromised TOR exit node to capture

unencrypted TOR traffic. It must be stated that in order to conduct a targeted TOR


29

MITM attack the adversary must have prior knowledge that the user has utilised TOR

and be aware of their IP address.

In 2007 Egerstad hosted a TOR exit node in an effort to capture unencrypted TOR

packets to investigate the types of internet traffic people were accessing through the

TOR service63. Among the captured packets were highly confidential emails regarding

foreign military issues sent by embassy staff members64. This highlights one of the

biggest fallacies with TOR, or any anonymity service, in that many users assume that

these services will provide complete anonymity even when sending emails from their

own accounts. Similarly in 2009 Vea conducted research into the anonymity-

breaching properties of hosting a TOR service and stated:

...no matter how many anonymizing tools a user employs, or how well they are put

into play, that same user lets the cat out of the bag when their web posts, emails or

chats leave traces back to themselves...65

This research broke-down the TOR traffic into categories of usage as depicted in the

following graph:

Graph 1 – TOR Packet Distribution66

63

In (Lemos, 2007) 64

In (Lemos, 2007) 65

(Vea, 2009) 66

(Vea, 2009)


30

The fact that anyone may host a TOR server is another concern and major security

risk, which may be mitigated via educating TOR users about what aspect of their

internet usage is really anonymous. This leads to an important question: how many

compromised TOR nodes are there? It only takes one compromised node along the

TOR relay to violate the entire relay’s traffic. This issue has propelled some into

investigating whether certain TOR nodes are in fact compromised and acting

maliciously67.

Since the TOR exit-nodes can decide what traffic (or rather, what ports) it wants to

relay it’s easy to set up a rogue exit-node that relays only cleartext traffic (and of

course sniffs it on the fly)...68

This research resulted in the identification of numerous TOR exit nodes restricting

traffic based upon the port numbers. For example, a node was identified as

accepting only unencrypted IMAP, AOL Instant Messenger, MSN Messenger and

Yahoo Messenger traffic and rejecting all other forms of internet traffic69. It is

possible that the person hosting this server is doing so to assist people communicate

over TOR, yet it is equally possible that the node is compromised and capturing

unencrypted packets. Even if this node is not compromised it could become

compromised as easily as turning on WireShark. As well as being selective with

internet traffic, TOR nodes can be compromised using the MITM attack

methodology. By running a SSL enabled server the same researcher connected to

their website through TOR to check if any exit nodes were modifying his website’s

SSL certificate70. One TOR exit node was found to have modified his website’s SSL

certificate indicating that a MITM attack was being carried out through this particular

exit node71. It was unclear what the MITM attack was being used for, but it is

important to be aware of the potential dangers when using the TOR service.

67

(Team Furry, 2007a) 68



(Team Furry, 2007b) 71

(Team Furry, 2007b)


31

Man-In-The-Middle attacks against TOR are not new. In fact there is a tool designed

to facilitate these types of attacks against SSL traffic called “SSLStrip”72. This tool has

been designed to work on a proxy server, such as TOR, between the user and the

internet. Whenever a user attempts to access an SSL website, “a program on the

proxy server sends the request to the website, handles any redirect to an SSL-

encrypted page and returns an exact duplicate to the user, without the

encryption”73. To the end user the website looks legitimate, even the ubiquitous SSL

“padlock” symbol is able to be spoofed with the use of this tool74. When run on a

TOR node the tool’s creator was able to capture and decrypt packets relating to

account logins, including 114 Yahoo credentials, and 50 Gmail credentials, as well as

packets containing credit-card numbers75. This research indicates that TOR users are

sending traffic relating to login details and credit card numbers and assuming that

the TOR system will ensure that these packets are secure and anonymous.

By analysing TOR traffic captured from the TOR exit nodes it is evident that users are

misguided in their understanding of the abilities of TOR, specifically users who utilise

TOR to “anonymously” log into their own email clients or other websites using their

own personally identifiable information (for example social networking/blogging

sites). In fact the TOR developers clearly state that for security users should

incorporate end-to-end encryption76.

The TOR developers also state that TOR does not guarantee against global

adversaries, for example corrupt or compromised nodes77. Currently there is the

ability for TOR users to manually select which exit nodes they wish to utilise.

Although good in theory, this leads to another issue regarding the choosing of the

nodes. It is therefore possible that a system similar to a Certificate Authority (CA)

system could be put into place for users to ensure the integrity of the exit nodes

which they are using. This would, however, result in a violation of the anonymity of

the people or organisations hosting these exit nodes. This violation would most likely

72

(Marlinspike, 2009) 73

(Security Focus, 2009) 74

(Marlinspike, 2009) 75

(Security Focus, 2009) 76

(Lemos, 2007) 77

(Dingledine et al., 2004)


32

lead to a reduction in the number of privately operated exit nodes which in turn

would result in fewer onion layers and slower connections.


33

CONCLUSION/RECOMMENDATIONS

Through traffic analysis it has been disproven that the originating IP addresses can

be recovered from TOR packets, that is, except if they are TOR packets captured over

a local area network. On the other hand social engineering has had success in

identifying users of the TOR network through insecure and non-anonymous logins.

Although this method does not always result in recovering the originating IP address,

recovering the real identity of the user is much more important from a law

enforcement perspective.

This paper has shown that through traffic analysis techniques TOR traffic can be

distinguished from regular internet traffic. Specifically, the port numbers that TOR

utilises, along with the frequent usage of SSL traffic, assist in locating packets

belonging to the TOR network. Having this knowledge greatly assists network

observers, either law enforcement or corporations, in recognising TOR and then

subsequently implementing suitable measures to further conduct traffic analysis on

these types of packets.

Although at first glance it may appear that the TOR system provides adequate

protection of the users’ anonymity, and to a certain degree their security, the

weaknesses exhibited by the TOR system can be easily exploited. From a law

enforcement perspective these weaknesses can be exploited in order to capture

these packets and conduct a forensic analysis of their content.

There are a few processes that are recommended in order to minimise the loss of

anonymity while using TOR. Firstly it is fundamentally flawed to use TOR with a

user’s real email address or account logins. This undermines any anonymity provided

by the TOR service. Instead it is highly recommended that only anonymous, or

temporary, email addresses and logins are used within the TOR network. Secondly

TOR should not be utilised to make any purchases over the internet. Using a credit

card number or a user’s physical shipping address will also undermine any anonymity

provided by TOR. Any reference to a user’s physical location or any personally

identifiable information should not be mentioned whilst utilising the TOR service to

ensure the anonymity of TOR users.


34

If, for example, a user or computer had been identified as utilising TOR and a law

enforcement agency wanted to know what TOR was being used for then the law

enforcement agency could instigate a MITM attack using a tool such as “SSLStrip”.

The point of attack could either be running SSLStrip while acting as a compromised

TOR node, or running SSLStrip in between the user’s internet connection and the

TOR system itself. Using either of these attack points a law enforcement agency

would be able to “tap” the user’s network packets and view the content in clear text

(see Diagram 8).

Diagram 8 – TOR MITM Attack

By utilising open-source tools, such as WireShark and SSLStrip, a law enforcement

agency would be able to effectively capture and analyse a user’s TOR packets. In

order for this type of capture and analysis to be successful the law enforcement

agency would need to have prior knowledge of the person who is utilising the TOR


35

service. Without knowledge of the person’s IP address the MITM attack would not

be feasible due to the requirement of positioning the attack in between the user’s

computer and the TOR system. If a law enforcement agency were to run a MITM

attack on a compromised TOR node they would not be able to determine which TOR

users were connected to their compromised TOR node, therefore in a law

enforcement context knowing the target is a necessity.

This paper has demonstrated that the TOR system is not infallible to traffic analysis

techniques. Indeed traffic analysis plays an important part in locating TOR packets

and subsequently implementing attacks that compromise the anonymity of the TOR

network. The attacks presented in this paper allow law enforcement agencies to

implement systems that will decrypt TOR packets to gain high-level access to the

original HTML of the packets. When used in a network forensic context these attacks

change TOR from an anonymity system into nothing more than a slight

inconvenience.


36

REFERENCE LIST

Androulaki, E., Raykova, M., Srivatsan, S., Stavrou, A., & Bellovin, S. M. (2008). Par: Payment for anonymous routing. Lecture notes in computer science, 5134, 219-236.

Bauer, K., McCoy, D., Grunwald, D., Kohno, T., & Sicker, D. (2007). Low-resource routing attacks against anonymous systems. Paper presented at the Proceedings of the 2007 ACM workshop on Privacy in electronic society.

Chaum, D. L. (1981). Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM.

Cohen, M. I. (2008). PyFlag–An advanced network forensic framework. Digital Investigation, 5, 112-120.

Danezis, G., & Diaz, C. (2008). A survey of anonymous communication channels. Journal of Privacy Technology.

Dingledine, R., Mathewson, N., & Syverson, P. (2004). Tor: The second-generation onion router. Paper presented at the Proceedings of the 13 th Usenix Security Symposium.

Dingledine, R., Mathewson, N., & Syverson, P. (2005). Challenges in deploying low-latency anonymity. NRL CHACS Report, 5540-5265.

Fraser, N. A., Raines, R. A., & Baldwin, R. O. (2005). Tor: An Anonymous Routing Network for Covert On-line Operations. IOSphere: the Professional Journal of Joint Information Operations, 44–47.

Fu, X., Graham, B., Bettati, R., & Zhao, W. (2003). Active traffic analysis attacks and countermeasures.

Gomu kiewicz, M., Klonowski, M., & Kutylowski, M. (2004). Onions Based on Universal Re–Encryption-Anonymous Communication Immune Against Repetitive Attack.

Hafner, K., & Lyon, M. (2000). Where wizards stay up late: The origins of the Internet: Touchstone Books.

Hintz, A. (2003). Fingerprinting websites using traffic analysis. Lecture notes in computer science, 171-178.

Hopper, N., Vasserman, E. Y., & Chan-Tin, E. (2007). How much anonymity does network latency leak?

Lemos, R. (2007). Embassy leaks highlight pitfalls of Tor [Electronic Version]. SecurityFocus. Retrieved 09/10/2009, from http://www.securityfocus.com/news/11486?ref=rss

Marlinspike, M. (2009). SSLSTRIP [Electronic Version]. Retrieved 05/02/2010, from http://www.thoughtcrime.org/software/sslstrip/

Murdoch, S. J., & Danezis, G. (2005). Low-cost traffic analysis of tor. Paper presented at the IEEE Symposium on Security and Privacy.

Naval Research Laboratory. Onion Routing - Brief Selected History. Retrieved 09/10/2009, from http://www.onion-router.net/History.html

Nielsen. (2010). Top 10 Global Web Parent Companies. Retrieved 22/01/2010, from http://en-us.nielsen.com/rankings/insights/rankings/internet

Øverlier, L., & Syverson, P. (2006). Locating hidden servers. Paper presented at the IEEE Symposium on Security and Privacy.

Perry, M. (2007). Securing the Tor Network: Defcon. Privoxy Developers (2010). Privoxy 3.0.16 User Manual. Retrieved 04/01/2010, from

http://www.privoxy.org/user-manual/index.html Reiter, M. K., & Rubin, A. D. (1998). Crowds: Anonymity for web transactions. ACM

Transactions on Information and System Security (TISSEC), 1(1), 66-92. Rennhard, M., & Plattner, B. (2002). Introducing morphmix: Peer-to-peer based anonymous

internet usage with collusion detection. Security Focus. (2009). Man-in-the-middle attack sidesteps SSL [Electronic Version].

Retrieved 05/02/2010, from http://www.securityfocus.com/brief/910

http://www.securityfocus.com/news/11486?ref=rss

http://www.thoughtcrime.org/software/sslstrip/

http://www.onion-router.net/History.html

http://en-us.nielsen.com/rankings/insights/rankings/internet

http://www.privoxy.org/user-manual/index.html

http://www.securityfocus.com/brief/910


37

Sun, Q., Simon, D. R., Wang, Y. M., Russell, W., Padmanabhan, V. N., & Qiu, L. (2002). Statistical Identification of Encrypted Web Browsing Traffic. Paper presented at the Proceedings of IEEE Symposium on Security and Privacy,.

Team Furry. (2007a). On TOR. MW-Blog Retrieved 05/02/2010, from http://www.teamfurry.com/wordpress/2007/11/19/on-tor/#more-177

Team Furry. (2007b). TOR Exit Nodes Doing MITM Attacks. MW-Blog Retrieved 05/02/2010, from http://www.teamfurry.com/wordpress/2007/11/20/tor-exit-node-doing-mitm-attacks

The Tor Project Inc. (2009a). Tor: anonymity online. Retrieved 09/10/2009, from https://www.torproject.org/index.html.en

The Tor Project Inc. (2009b). Tor: Sponsors. Retrieved 09/10/2009, from https://www.torproject.org/sponsors

The Tor Project Inc. (2009c). Tor: Users. Retrieved 09/10/2009, from https://www.torproject.org/torusers.html.en

Vea, M. (2009). What Traffic is on a TOR Relay? Retrieved 04/01/2010, from http://www.omninerd.com/articles/What_Traffic_is_on_a_TOR_Relay

Wright, M., Adler, M., Levine, B. N., & Shields, C. (2002). An analysis of the degradation of anonymous protocols.

http://www.teamfurry.com/wordpress/2007/11/19/on-tor/#more-177

http://www.teamfurry.com/wordpress/2007/11/20/tor-exit-node-doing-mitm-attacks

http://www.teamfurry.com/wordpress/2007/11/20/tor-exit-node-doing-mitm-attacks

http://www.torproject.org/index.html.en

http://www.torproject.org/sponsors

http://www.torproject.org/torusers.html.en

http://www.omninerd.com/articles/What_Traffic_is_on_a_TOR_Relay