john geddes*, mike schliep, and nicholas hopper ... · john geddes*, mike schliep, and nicholas...

17
John Geddes*, Mike Schliep, and Nicholas Hopper Anarchy in Tor: Performance Cost of Decentralization Abstract: Like many routing protocols, the Tor anonymity network has decentralized path selection, in clients locally and independently choose paths. As a re- sult, network resources may be left idle, leaving the sys- tem in a suboptimal state. This is referred to as the price of anarchy, where agents acting in their own self interest can make poor decisions when viewed in a global con- text. In this paper we explore the cost of anarchy in Tor by examining the potential performance increases that can be gained by centrally optimizing circuit and relay selection using global knowledge. In experiments with both offline and online algorithms, we show that cen- trally coordinated clients can achieve up to 75% higher bandwidth compared to traditional Tor. Drawing on these findings, we design and evaluate a decentralized version of our online algorithm, in which relays locally distribute information enabling clients to make smarter decisions locally and perform downloads 10-60% faster. Finally, we perform a privacy analysis of the decentral- ized algorithm against a passive and active adversary trying to reduce anonymity of clients and increase their view of the Tor network. We conclude that this decen- tralized algorithm does not enable new attacks, while providing significantly higher performance. 1 Introduction Tor [7] is one of the most popular and widely used low- latency anonymity systems, with 2 million daily clients supported by a network of over 6000 volunteer relays. Tor clients tunnel traffic through a sequence of 3 relays called a circuit. Traffic sent over the circuit is encrypted so that no single relay can learn the identity of both the client and destination. With a limited amount of re- sources available from the volunteer relays, it is impor- tant to distribute traffic to utilize these resources as ef- ficiently as possible. Doing so ensures high performance *Corresponding Author: John Geddes: University of Minnesota, E-mail: [email protected] Mike Schliep: University of Minnesota, E-mail: [email protected] Nicholas Hopper: University of Minnesota, E-mail: hop- [email protected] to latency sensitive applications and attracts potential clients to use the network, increasing anonymity for all users. Similar to ordinary routing, clients make decisions locally based on their view of the network. In Tor, direc- tory authorities are centralized servers that keep track of all relays and their information. Clients download this information, stored in consensus files and server descriptors, every few hours from the directory author- ities. With this information clients create 10 circuits to have available, selecting relays for the circuits at ran- dom weighted by the relays bandwidth. When the client starts using Tor to send traffic, it then selects the circuit in best standing to use for the next 10 minutes. Since it can take the directory authority up to 24 hours to properly update relay information and distribute it to clients, much research has focused on how clients can make better decisions, such as considering latency [2] or relay congestion [37] when selecting relays and circuits. The main issue with leaving network decisions to clients is that local incentives can lead clients to make decisions that they believe will maximize their perfor- mance, when actually these decisions result in a global sub-optimal state with overall client performance suffer- ing. In these situations it is common to compare what happens when decisions are made with global knowl- edge. This can be achieved by centrally processing re- quests as they are made, or performing decisions offline when the entire set of requests is known a priori. The performance gap that results from using decentralized compared to centralized decision making is called the price of anarchy. While Tor clients are not necessarily selfish, they are making local decisions without context of the state of the global network. In this paper we analyze the price of anarchy in the Tor network. We develop both offline and online centralized algorithms that optimize relay and circuit selection decisions made by clients. Using the global view of all active circuits in the Tor network, the algorithms are able to more intelligently select which circuits clients should use for each download they per- form, resulting in significant performance improvements to clients. While these are not necessarily protocols that should be run in practice, it allows us to bound the ac- tual price of anarchy in Tor. This demonstrates the po- tential improvements that better resource allocation in Tor could achieve. arXiv:1606.02385v2 [cs.CR] 13 Jun 2016

Upload: others

Post on 22-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

John Geddes*, Mike Schliep, and Nicholas HopperAnarchy in Tor: Performance Cost of DecentralizationAbstract: Like many routing protocols, the Toranonymity network has decentralized path selection, inclients locally and independently choose paths. As a re-sult, network resources may be left idle, leaving the sys-tem in a suboptimal state. This is referred to as the priceof anarchy, where agents acting in their own self interestcan make poor decisions when viewed in a global con-text. In this paper we explore the cost of anarchy in Torby examining the potential performance increases thatcan be gained by centrally optimizing circuit and relayselection using global knowledge. In experiments withboth offline and online algorithms, we show that cen-trally coordinated clients can achieve up to 75% higherbandwidth compared to traditional Tor. Drawing onthese findings, we design and evaluate a decentralizedversion of our online algorithm, in which relays locallydistribute information enabling clients to make smarterdecisions locally and perform downloads 10-60% faster.Finally, we perform a privacy analysis of the decentral-ized algorithm against a passive and active adversarytrying to reduce anonymity of clients and increase theirview of the Tor network. We conclude that this decen-tralized algorithm does not enable new attacks, whileproviding significantly higher performance.

1 IntroductionTor [7] is one of the most popular and widely used low-latency anonymity systems, with 2 million daily clientssupported by a network of over 6000 volunteer relays.Tor clients tunnel traffic through a sequence of 3 relayscalled a circuit. Traffic sent over the circuit is encryptedso that no single relay can learn the identity of boththe client and destination. With a limited amount of re-sources available from the volunteer relays, it is impor-tant to distribute traffic to utilize these resources as ef-ficiently as possible. Doing so ensures high performance

*Corresponding Author: John Geddes: University ofMinnesota, E-mail: [email protected] Schliep: University of Minnesota, E-mail:[email protected] Hopper: University of Minnesota, E-mail: [email protected]

to latency sensitive applications and attracts potentialclients to use the network, increasing anonymity for allusers.

Similar to ordinary routing, clients make decisionslocally based on their view of the network. In Tor, direc-tory authorities are centralized servers that keep trackof all relays and their information. Clients downloadthis information, stored in consensus files and serverdescriptors, every few hours from the directory author-ities. With this information clients create 10 circuits tohave available, selecting relays for the circuits at ran-dom weighted by the relays bandwidth. When the clientstarts using Tor to send traffic, it then selects the circuitin best standing to use for the next 10 minutes. Sinceit can take the directory authority up to 24 hours toproperly update relay information and distribute it toclients, much research has focused on how clients canmake better decisions, such as considering latency [2] orrelay congestion [37] when selecting relays and circuits.

The main issue with leaving network decisions toclients is that local incentives can lead clients to makedecisions that they believe will maximize their perfor-mance, when actually these decisions result in a globalsub-optimal state with overall client performance suffer-ing. In these situations it is common to compare whathappens when decisions are made with global knowl-edge. This can be achieved by centrally processing re-quests as they are made, or performing decisions offlinewhen the entire set of requests is known a priori. Theperformance gap that results from using decentralizedcompared to centralized decision making is called theprice of anarchy.

While Tor clients are not necessarily selfish, theyare making local decisions without context of the stateof the global network. In this paper we analyze the priceof anarchy in the Tor network. We develop both offlineand online centralized algorithms that optimize relayand circuit selection decisions made by clients. Usingthe global view of all active circuits in the Tor network,the algorithms are able to more intelligently select whichcircuits clients should use for each download they per-form, resulting in significant performance improvementsto clients. While these are not necessarily protocols thatshould be run in practice, it allows us to bound the ac-tual price of anarchy in Tor. This demonstrates the po-tential improvements that better resource allocation inTor could achieve.

arX

iv:1

606.

0238

5v2

[cs

.CR

] 1

3 Ju

n 20

16

Page 2: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 2

In this paper we make the following major contri-butions:

– Using a modified client model, we create a geneticalgorithm to compute the optimal offline circuit se-lection for a given sequence of download requests.Since the algorithm is given access to the full se-quence of requests before processing, this allows usto establish the optimal performance for a given net-work and load. This serves as a baseline for the com-petitive analysis of other algorithms.

– In addition, we develop an online centralized al-gorithm that processes download requests serially.Based on a delay-weighted capacity routing algo-rithm the centralized circuit selection tries to avoidlow bandwidth bottlenecks when assigning circuitsto requests.

– Using techniques from the centralized algorithm, wedevelop a decentralized algorithm that results insimilar resource allocations, with relays releasing asmall amount of information to allow clients to makesmarter circuit selection choices.

– We perform privacy analysis on the decentralizedcircuit selection algorithm that considers both in-formation that can be learned from a passive adver-sary, and examines how an active adversary couldattempt to abuse the algorithm to increase theirview of the network.

– All algorithms are analyzed in a large scale simu-lated environment using the Shadow simulator [15].This allows us to precisely compare the effects andperformance gains achieved across the different cir-cuit selection algorithms.

2 BackgroundIn this section we discuss some of the Tor architec-ture, basics on how the clients operate and send datathrough the network, and related work on increasingperformance in Tor.

2.1 Tor Architecture

When a client first joins the Tor network, it downloadsa list of relays with their relevant information from adirectory authority. The client then creates 10 circuitsthrough a guard, middle, and exit relay. The relays usedfor each circuit are selected at random weighted by

bandwidth. For each TCP connection the client wishesto make, Tor creates a stream which is then assignedto a circuit; each circuit will typically handle multiplestreams concurrently. The Tor client will find a viablecircuit that can be used for the stream, which will bethen be used for the next 10 minutes or until the circuitbecomes unusable.

Internal to the overlay network, relay pairs establisha single TLS connection for all communication betweenthem. To send traffic through a circuit, data is packedinto 512-byte onion-crypted cells using secret keys ne-gotiated with each relay during the circuit building pro-cess. Once a relay has received a cell, it peels off its layerof encryption, finds the next relay on the circuit to sendto, and places it on a circuit queue where it waits tobe sent. After a cell has traversed the entire circuit, theexit recovers the initial data sent by the client and isforwarded to the end destination.

2.2 Related Work

One of the major keys to increasing anonymity for Torusers is ensure a large anonymity set, that is, a largeuser base. To do so Tor needs to offer low latency tothe clients; bad performance in the form of slow webbrowsing can lead to fewer users using the system over-all. To this end, there has been a plethora of researchlooking to address ways to increase performance in Tor.These roughly fall into 4 areas: scheduling, selection,transports, and incentives.Scheduling: Internally Tor is constantly makingscheduling decisions on what to send and how muchshould be processed. These decisions happen on all lev-els, between streams, circuits, and connections. On thecircuit level, much work has been done in an attemptto classify latency sensitive circuits, either prioritizingthem [3, 35] or outright throttling noisy ones [18]. Withregards to deciding how much should be sent on acircuit, there has been work comparing Tor’s end-to-end window algorithm with an ATM-style link-basedalgorithm [4]. Additional work [14] has been done onscheduling across circuits from all connections, limitinghow much is written to a connection at a time so Torhas more control over what gets sent, opposed to thekernel.Selection: When creating circuits, Tor selects from re-lays at random weighted by their bandwidth. Determin-ing the bandwidth is a non-trivial issue, and much work[33, 34] has been done looking at a range of methods,from using self-reported values, central nodes making

Page 3: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 3

direct measurements, and peer-to-peer methods relyingon existing relays. There also has been a lot of researchin improving the relay selection [2, 6, 12, 31, 32, 36, 37]for circuit creation. These range from incorporating la-tency and congestion measurements, using a virtual co-ordinate system, to adjusting the weighting strategy, allin an attempt improve overall client performance.Transport: One of the noted performance issues inTor is the fact that the single TLS connection betweenrelays can cause unnecessary blocking, where circuitscould keep sending data but TCP mechanisms preventit [8]. Reardon [26] attempted to address this by im-plementing TCP-over-DTLS allowing a connection tobe dedicated to each circuit. In a similar vein, there hasbeen numerous work [5, 9, 10] looking into increasing thenumber of TCP connections between relays and schedul-ing circuits between them in an attempt to avoid un-neccessary blocking. Nowlan et al.introduce uTCP anduTLS[23, 24] , which allows for out-of-order delivery inTor so it can process cells from circuits that are notblocking.Incentives: While the previous lines of research in-volved improving efficiency in how Tor handles traffic,another set looked at potential ways to incentive clientsto also contribute bandwidth as a relay, increasing theoverall network resources available to clients. This wasfirst explored by Ngan, Dingledine, and Wallach [22],where they would prioritize traffic from relays providinghigh quality service in the Tor network. Jansen, Hopper,and Kim [16] extend this idea, allowing relays to earncredits which can be redeemed for higher prioritizedtraffic. Building on this, Jansen, Johnson, and Syverson[17] introduce a more lightweight solution that allowsfor the same kind of prioritization of traffic without asmuch overhead.

3 Experimental SetupTo maximize our understanding of performance in theTor network, we want to run large scale experimentsthat accurately represent the actual operations of Torclients and relays. To this end we use Shadow [1, 15],a discrete event network simulator with the capabilityto run actual Tor code in a simulated network environ-ment. Shadow allows us to setup a large scale networkconfiguration of clients, relays, and servers, which canall be simulated on a single machine. This lets us run ex-periments privately without operating on the actual Tornetwork, avoiding potential privacy concerns of dealing

with real users. Additionally, it lets us have a globalview and precise control over every aspect of the net-work, giving us insights and capabilities into Tor thatwould be near impossible to achieve in a live network.Most importantly, Shadow performs deterministic runs,allowing for reproducible results and letting us isolateexactly what we want to test for performance effects.

Our experiments are configured to use Shadowv1.9.2 and Tor v0.2.5.10. We use the large network con-figuration deployed with Shadow which consists of 500relays, 1350 web clients, 150 bulk clients, 300 perfor-mance clients, and 500 servers. The default client modelin Shadow [13–15] has clients download a file of a spe-cific size, and after the download is finished it chooseshow long to pause until it starts the next download.Web clients download 320 KB files and will randomlypause between 1 and 60,0000 milliseconds until it startsthe next download. Bulk clients download 5 MB fileswith no break in between downloads. The performanceclients are split into three groups, downloading 50 KB,1 MB, and 5 MB files, then pausing 1 minute betweeneach download it performs. Along with the default clientmodel we also consider a new type of client model calledthe fixed download model. Instead of clients having asingle start time and inter-download pause times, theyhave a list of downloads with each download having aunique start time and end time. To create a fixed down-load experiment we extract the start and stop times forall client downloads from a default client run and usethose as the base for the fixed download experiment.

For measuring performance between experiments weexamine different metrics depending on which clientmodel is being used. For the default client model we lookat the time to first byte and download times of web andbulk clients along with the total read bandwidth acrossall clients in the network. When using the new fixeddownload model, download times lose meaning as theyhave fixed start and end times. The major metric thatwill change between experiments is the amount of databeing pushed through the Tor network, so total clientread bandwidth is the only metric we consider whenusing the fixed download model.

4 Price of AnarchyThe price of anarchy [29] refers to situations wheredecentralized decision making can lead a system intoa sub-optimal configuration due to the selfish natureof agents participating in the system. In networking

Page 4: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 4

Fig. 1. Example of circuit bandwidth estimate algorithm. Each step the relay with the lowest bandwidth is selected, its bandwidth evenlydistributed among any remaining circuit, with each relay on that circuit having their bandwidth decremented by the circuit bandwidth.

this problem is specifically referred to as selfish rout-ing [28, 30], where users select routes based on a localoptimization criteria (latency, bandwidth, etc.) result-ing in a sub-optimal global solution due to potentialconflicting interests. While clients in Tor are not neces-sarily selfish, as they select relays at random weightedby their bandwidth, they are oblivious with respect torelay and network conditions. In that sense, localized de-cision making with respect to relays and circuits has thepotential to lead to a sub-optimal result, where networkresources are left idle which could be used to increaseclient performance.

To fully explore the price of anarchy in Tor we wantto examine just how much of a performance increase canbe achieved if we have some centralized authority mak-ing decisions rather than the clients themselves. For thiswe consider both an offline algorithm, which is able tosee the entire set of download requests when operating,and an online algorithm which needs to process inputsserially and is unaware of future requests. In this sectionwe detail how each of these algorithms work and lookat any potential performance benefits.

4.1 Offline Algorithm

An offline algorithm needs access to the entire inputset while operating, in this case that means knowingthe times of when each client is active and perform-ing a download. So instead of using the default clientmodel in Shadow, we use the fixed download modelwhich gives us a list of all download start and stoptimes in the experiment. From the experimental setupwe extract all downloads di with their correspondingstart and end times (si, ei), along with the list of re-lays rj and their bandwidth bwj . These parameters arethen passed into the offline algorithm which returns amapping di → ci(ri1 , ri2 , ri3) of downloads and a circuitconsisting of three relays which should be used by theclient performing the download. Note that we still usethe constraints from Tor that all relays in the circuitmust be unique and that ri3 must be an exit relay, but

we do not force the algorithm to select ri1 to be a guard.This is done since it is impossible to use a non-exit asan exit relay, however there is no mechanism to enforcethe guard in a circuit actually has the guard flag set.

When using the fixed download client model the of-fline circuit selection problem starts to strongly resem-ble job scheduling [11]. For example, we have a set ofdownloads (jobs) that have to be processed on three re-lays (machines). Each download has a start time (releasedate), and can either have an end time (due date) or afile of specific size to download (number of operations).The relays can process downloads simultaneously (jobscan be preempted) and the goal is to download each fileas fast as possible (minimize complete time or lateness).There are some complications in how exactly to definethe machine environment for job scheduling, since therelays are heterogeneous and the fact that the amountof processing done on a relay is determined by the bot-tleneck in the circuit and not necessarily the bandwidthcapacity of the relay. We would definitely need an al-gorithm which handled job splitting across machines,as we need 3 relays processing a download, and theseproblems seem to be often NP-hard.

Due to these complications we instead develop a ge-netic algorithm in order to compute a lower-bound of anoptimal offline solution. Our genetic algorithm will havea population of solutions with a single solution consist-ing of a complete mapping of downloads to circuits. Wecan breed two solutions by iterating through each down-load and randomly choosing a circuit from either parent.However, the most important part of a genetic algorithmis a fitness function allowing us to score each solution.Since the metric we are most concerned with in the fixeddownload client model is the amount of bandwidth be-ing pushed through the network, we want a method forestimating the total amount of network bandwidth usedgiven a mapping of downloads to circuits.

In order to calculate the total bandwidth across anentire set of downloads, we first need a method for cal-culating the circuit bandwidth across a set of active cir-cuits at a single point of time. To do this we make two

Page 5: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 5

Algorithm 1 Estimate bandwidth of active circuits1: function CalcCircuitBW(activeCircuits)2: activeRelays← GetRelays(activeCircuits)3: while not activeCircuits.empty() do4: r ← GetBottleneckRelay(activeRelays)5: circuits← GetCircuits(activeCircuits, r)6: circuitBW ← r.bw/circuits.len()7: for c ∈ circuits do8: c.bw ← circuitBW

9: for circRelay ∈ c.relays do10: circRelay.bw −= c.bw

11: if circRelay.bw = 0 then12: activeRelays.remove(circRelay)13: end if14: end for15: activeCircuits.remove(c)16: end for17: end while18: end function

observations: (1) bottleneck relays will determine thebandwidth of the circuit, and (2) that a relay will splitits bandwidth equally among all circuits it is the bottle-neck on. The first observation comes from the end-to-end window based congestion control algorithm used inTor, ensuring clients and exit relays only send as muchas a circuit can handle. The second observation isn’ttrivially true, since with the priority circuit schedulersome circuits will be sending more than others on asmall enough time interval. In aggregate though all cir-cuits will be given roughly the same amount of band-width.

The pseudocode for calculating the circuit band-width across a set of active circuits is shown in Algo-rithm 1 and works as follows. First the algorithm iden-tifies the relay r with the lowest bandwidth per circuit(line 4). We know this relay is the bottleneck on all cir-cuits that r appears on, as by definition every other relayon the circuit will have a higher available circuit band-width. Next the algorithm iterates through each circuitci that r appears on and assigns the circuit bandwidthas r’s per circuit bandwidth (lines 5-8). In addition, foreach ci the algorithm also iterates over each relay rij

on circuit ci, decrementing the bandwidth of rij by thecircuit bandwidth assigned to circuit ci (line 10). Whileiterating over the relays, if any of them runs out of avail-able bandwidth, the relay is removed from the list ofactive relays (lines 11-13). After we have iterated overthe relays in circuit ci, it is removed from the list ofactive circuits (line 15). Note that during this process

Algorithm 2 Compute total bandwidth for downloads1: function CalcTotalBW(downloads)2: start, end← GetT imeInterval(downloads)3: bandwidth← 04: time← start

5: while time ≤ end do6: circuits← GetActiveCircs(downloads, time)7: CalcCircuitBW (circuits)8: for circuit ∈ circuits do9: bandwidth += circuit.bw

10: end for11: time← time+ tick

12: end while13: return bandwidth

14: end function

relay r will always end up with a bandwidth of 0, andno circuits remaining in the list of active circuits willcontain r. Furthermore, any other relay that reached abandwidth value of 0 will also not be contained on anycircuit remaining in the active circuit list 1. This meansthat we can never have a situation where a circuit hasa relay that is no longer active with a bandwidth of 0,resulting in a circuit bandwidth of 0. The algorithm re-peats this process, extracting the bottleneck relay andupdating relay and circuit bandwidths, until no activecircuits remain. An example of this this algorithm isshown in Figure 1, highlighting which relay is selectedand how the circuit bandwidth is assigned and relaybandwidth updated.

With an algorithm to compute the bandwidthacross all active circuits at a single point in time, wecan now compute the overall bandwidth consumptionacross an entire set of downloads with assigned circuits,which is outlined in Algorithm 2. First the algorithm re-trieves the earliest start time and latest end time acrossall downloads (line 2). With this we then iterate overevery “tick” between the start and end time (lines 5-12), calculating the bandwidth across all active circuitsduring that tick (lines 6-7). Then for each active circuitwe simply add the circuit bandwidth to a total band-width counter (lines 8-10). This gives us a total band-width value for a circuit to download mapping that canbe used as the fitness score for solutions in the geneticalgorithm. With this fitness score we can now run thegenetic algorithm across populations of circuit to down-load mappings. The main parameters for the genetic

1 The proof for this is shown in AppendixA

Page 6: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 6

algorithm are the breed percentile b, indicating the topb% of solutions to pick from when breeding, the elite tope% of solutions that get copied over into the next gener-ation, and the mutation probabilitym. For our purposesa mutation can occur when we are selecting a circuit fora single download, and we simply select a relay in thecircuit to be replaced when another randomly chosenrelay. 2.

4.2 Online Algorithm

The offline genetic algorithm has access to the entireset of downloads when making circuit selection deci-sions. We want to see how it compares to an online al-gorithm, one that has to make circuit selection decisionsserially. This way it has global knowledge of which ac-tive downloads are using which circuits, but has no ideahow long downloads will last and does not know whenfuture downloads start. While at first glance the onlineproblem might seem similar to traditional routing, thereare some problems using routing algorithm with respectto circuit selection in Tor. Typically routing algorithmsdepict the network as a graph in order to run efficient al-gorithms in order to extract which paths to select. Sincerelays, not links, are the resource constrained entities wecan invert the graph, with each relay consisting of andin and out vertex and the edge between them repre-senting the bandwidth of the relay. Additionally, sincein Tor there is a link between every relay, for each relaywe add an edge with infinite bandwidth connecting itsout vertex with every other relays in vertex. But nowif we want to run any traditional algorithms (e.g. maxflow), there is not way to impose the requirement thatall circuits must contain exactly three hops. In order todo this we would have to create a graph of all potentialcircuits, where each relay would have a separate vertexfor when it can be a guard, middle, and exit relay. Butnot only does this explode our state space, there is nowno way to make sure that when a relay is used as aguard, the available bandwidth on the middle and exitvertex decreases correspondingly.

So while we cannot use traditional routing algo-rithm for online circuit selection, we can borrow sometechniques from routing algorithms that share similargoals. Specifically, for our online algorithm we borrow

2 Our experiments used parameters b = 0.2 and e = 0.1. Mu-tations were only allowed during some of the experiments andwas set at m = 0.01.

techniques used in the delay-weighted capacity (DWC)routing algorithm[38]. The goal of the DWC algorithmis similar to what we wish to accomplish for circuit se-lection, that is to find the path with the highest poten-tial capacity. There are quite a few differences in theoriginal DWC algorithm and what we need to accom-plish, mainly that we are given a set of circuits to choosefrom and do not need to extract paths from a graph.The key insight we can use from the DWC algorithm isbottleneck identification and weight assignment. The al-gorithm first extracts a set of least delay paths, then foreach path it identifies the bottleneck link, incrementingthe links weight by the inverse bandwidth of the link.Then the DWC algorithm simply selects the path withthe lowest sum of weights across all links in the path.By doing this it avoids the most critical links, thosethat are the bottlenecks providing the lowest amount ofbandwidth.

To adopt the DWC routing algorithm for an on-line circuit selection algorithm, we need a method foridentifying which relays are bottlenecks on active cir-cuits. Algorithm 1, which estimates the bandwidth forall active circuits, can very easily be adapted for thispurpose. Note that when we get the relay with the low-est per circuit bandwidth (line 4), we know that thisrelay is the bottleneck on each active circuit it appearson. So when we iterate through those circuits (line 7)and assign them the per circuit bandwidth value (line 8)we can also update the weight for the relay. By simplyadding

r.weight← r.weight+ 1circuitBW

after the circuit bandwidth assignment (between lines 8and 9). Now when the function is done each circuit willhave a bandwidth and each relay will have a correspond-ing DWC weight. So to run the DWC circuit selectionalgorithm, we first iterate through the downloads in-order based on the download start times. For each newdownload we get the set of current active downloadsand their assigned circuits. We then run the updatedalgorithm that calculates active circuit bandwidth andrelay DWC weights. Then, for each circuit we are con-sidering for the new download, we calculate the circuitweight as the sum of relay weights in the circuit. Thenthe new download is assigned the circuit with the low-est calculated weight. If there are multiple circuits withthe lowest weight we select the circuit with the highestavailable bandwidth, defined as the minimum across allrelay available bandwidth values.

Page 7: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 7

Fig. 2. On the left, each circuit can achieve 10 MBps throughput,resulting in a total of 20 MBps being delivered to the clients. Onthe right, both original circuits must share bandwidth at the exitnodes with a third circuit, reducing the throughput for each circuitto 5 MBps and the total network throughput to 15 MBps, a drop innetwork throughput even though we added a circuit to the network.

4.3 Circuit Sets

In both the offline and online circuit selection algo-rithms, each download has a set of potential circuitsthat can be selected from. For our experiments we con-sider three potential circuit sets. The first is the original10 circuits that were available during the initial vanillarun. Along with extracting the fixed download start andend times from a completed vanilla experiment, we alsorecord which circuits were available on the client wheneach download started. This allows us to specifically fo-cus on circuit selection, keeping relay selection identicalacross all experiments. The second circuit set we con-sider is the full set of all potential valid circuits. Weconsider a circuit r1, r2, r3 valid if r3 is an actual exitrelay, and if r1 6= r2, r2 6= r3, and r1 6= r3. Note we donot require that r1 actually has the guard flag set. Thisis because while there are mechanisms internally in Torthat prevent any use of a non-exit relay being used as anexit relay, there is nothing stopping a client from usinga non-guard relay as their guard in a circuit. We alsoremove any “duplicate” circuits that contain the samerelays, just in a different order. This is done since nei-ther algorithm considers inter-relay latency, only relaybandwidth, so there is no need to consider circuits withidentical relays.

The final circuit set we consider is the pruned circuitset. The motivation behind the pruned set can be seen inFigure 2. It shows that we can add an active circuit andactually reduce the amount of bandwidth being pushedthrough the Tor network. In the example shown it willalways be better to use one of the already active circuitsshown on the left instead of the new circuit seen on theright side of the graphic. There are two main ideas be-hind building the pruned circuit set: (1) when buildingcircuits always use non-exit relays for the guard andmiddle if possible and (2) select relays with the highest

bandwidth. To build a circuit to add to the pruned set,the algorithm first finds the exit relay with the highestbandwidth. If none exist, the algorithm can stop as nonew circuits can be built. Once it has an exit, the al-gorithm then searches for the non-exit relay with thehighest bandwidth, and this relay is used for the middlerelay in the circuit. If one is not found it searches theremaining exit relays for the highest bandwidth relay touse for the middle in the circuit. If it still cannot find arelay, the algorithm stops as there are not enough relaysleft to build another circuit. The search process for themiddle is replicated for the guard, again stopping if itcannot find a suitable relay. Now that the circuit hasa guard, middle, and exit relay, the circuit is added tothe pruned circuit set and the algorithm calculates thecircuit bandwidth as the minimum bandwidth across allrelays in the circuit. Each relay then decrements its re-lay bandwidth by the circuit bandwidth. Any relay thatnow has a bandwidth of 0 is permanently removed fromthe relay list. This is repeated until the algorithm canno longer build valid circuits. 3

4.4 Results

In this section we explore the performance of our offlinegenetic and online DWC circuit selection algorithms. Tostart we first configured and ran a large scale experimentrunning vanilla Tor with no modifications, while usingthe default client model. The results of this run werethen used to generate an experiment using the fixeddownload model, where each download had the samestart and end times that was seen in the vanilla experi-ment. We also extracted the original circuit sets for eachdownload. These download times are then fed into bothalgorithms, along with which circuit sets to use, whichthen output a circuit selection for each download.

We ran the offline genetic and online DWC algo-rithms once for each circuit set, resulting in mappingsof fixed downloads to circuits. Large scale experimentswere run using each of the outputted circuit selections,and the results were compared to the original vanillaTor experimental run. Recall since we are using thefixed download client model, the only metric we areconcerned with is network usage, in this case the to-tal client download bandwidth during the experiment.Figures 3a and 3b shows the total client bandwidthachieved with the genetic and DWC algorithms respec-

3 The pseudocode for this algorithm is shown in Appendix B.

Page 8: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 8

60 80 100 120 140 160

Client Bandwidth (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanillaorig10prunedfull

(a) Offline Genetic Algorithm

40 60 80 100 120 140 160

Client Bandwidth (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanillaorig10prunedfull

(b) Online DWC Algorithm

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

Relay Bandwidth Consumption (%)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanilladwcgenetic

(c) Relay Capacity

Fig. 3. Total client bandwidth when using the offline and online circuit selection while changing the set of available circuits, along withthe relay bandwidth capacity of the best results from both algorithms.

tively. The first thing to note is that both algorithmswere able to produce relay and circuit selections thatimproved bandwidth by 72%, with median client band-width jumping from 86 MBps up to 147 MBps. To seewhy we can look at relay capacity seen in Figure 3c.This looks at the percent of bandwidth being used ona relay compared to their configured BandwidthRate.Note that this can go higher than 100% because Toralso has a BandwidthBurstRate that allows the relayto temporarily send more than the BandwidthRate overshort periods of time. This shows us that in the best per-forming DWC and genetic algorithms, half of the timerelays are using 90% or more of their configured band-width, which was only happening about 20% of the timein vanilla Tor. This shows that the algorithms were ableto take advantage of resources that were otherwise leftidle in vanilla Tor.

Interesting while both algorithms were able toachieve near identical maximum results, these camefrom different circuit sets. The genetic algorithmachieved its best results when using the pruned set whilethe DWC algorithm performed best with the full setof valid circuits. Furthermore, when the genetic algo-rithm used the full set and the DWC algorithm used thepruned set, they both actually performed worse whencompared to vanilla Tor. We suspect that when usingthe full circuit set, the genetic algorithm gets stuck in alocal maximum that it is unable to escape, and using thepruned circuit set prevents the algorithm from enteringone of these suboptimal local maximums. When the al-gorithms were restricted to the original circuit sets bothalgorithms were able to improve performance comparedto vanilla Tor. The offline genetic algorithm was able toincrease client bandwidth to 111 MBps, while the onlineDWC algorithm saw performance results close to what

was achieved with the full circuit set, with median clientbandwidth at 138 MBps compared to 147 MBps withthe full circuit set.

While the genetic algorithm serves as a lower boundon an optimal offline solution, given the near identicalresults seen when using the pruned circuit set in thegenetic algorithm compared with the full circuit set inthe DWC algorithm, we believe the online DWC algo-rithm is operating fairly close to an optimal solution,resulting in a much better allocation of resources. Mostlikely to see even more increased performance we wouldneed to either increase the bandwidth resources avail-able (e.g. add more relays or increase the bandwidth ofexisting relays), or add more clients to the network thatcan consume any remaining idle resources.

5 Decentralized Circuit SelectionIn the previous section we saw that when making circuitselection in a centralized fashion, with a global view ofthe network, we were able to see large performance in-creases with resources being used much more efficiently.In this section we look at a decentralized solution thatborrows techniques from the online DWC algorithm, al-lowing for clients to make smarter circuit selections lo-cally. The main insight from the online DWC algorithmoutlined in Section 4.2 is that we want to avoid bottle-neck relays, particularly if they are low-bandwidth bot-tlenecks. The obvious advantage from using a central-ized approach is we have a global view of active circuitsand can accurately estimate where the bottlenecks areand how much bandwidth they are providing per circuit.In a decentralized setting no one entity has this globalview that can determine who is and is not a bottleneck

Page 9: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 9

on each circuit. So instead we want a method for relaysthemselves to estimate which circuits they are a bot-tleneck on. With estimates of bottleneck circuits, theycan locally compute their DWC weight and leak it toclients, allowing clients to make more intelligent circuitselection decisions.

5.1 Local Weight Computation

The main challenge in having relays locally computetheir DWC weight is having the relays estimate whichcircuits they are the bottleneck on. To calculate bottle-neck circuits, we make three observations: (1) to be abottleneck on any circuit the relay’s bandwidth shouldbe fully consumed (2) all bottleneck circuits should besending more cells than non-bottleneck circuits, and (3)the number of cells being sent on bottleneck circuitsshould be fairly uniform. The last point is due to con-gestion control built into Tor, where the sending rateon a circuit should converge to the capacity of the bot-tleneck. So the two main things we need is a methodfor calculating the bandwidth of a circuit and a way toclassify circuits as bottleneck or non-bottleneck basedon their bandwidth value.

While calculating circuit bandwidth might seemeasy, we need to be careful to treat bulk and web circuitsequally. Bulk clients continuously send data through thecircuit, so those should always be at or close to capac-ity. Web clients are more bursty, sending traffic in shortperiods and resting for an unknown amount of time. Inorder to assign a circuit a bandwidth we consider twoparameters. First is the bandwidth window, in which wekeep track of all traffic sent on a circuit for the past wseconds. The other parameters is the bandwidth gran-ularity, a sliding window of g that we scan over the wsecond bandwidth window to find the maximum numberof cells transferred during any g period. That maximumvalue is then assigned as the circuit bandwidth. Withcircuits assigned a bandwidth value we need a way tocluster the circuits into bottleneck and non-bottleneckgroups. For this we look at three different clusteringmethods.Jenks Natural Breaks: The Jenks natural breaks[19] clustering algorithm attempts to cluster one-dimensional data into classes that minimizes the in-class variance. The function takes in a one-dimensionalarray of data and the n classes that the datashould be clustered into. It returns the breaks[b1, b2), [b2, b3), ..., [bn, bn+1] and a goodness of variancefit (GVF) var ∈ [0, 1], where higher variances indicate

Algorithm 3 Head/Tail clustering algorithm1: function HeadTail(data, threshold)2: m← sum(data)/len(data)3: head← {d ∈ data|d ≥ m}4: if len(head)/len(data) < threshold then5: head← HeadTail(head, threshold)6: end if7: return head

8: end function

better fits. For clustering circuits we can use the Jenksalgorithm in two different ways. First is to cluster thedata into 2 classes, with circuits in the [b1, b2) rangeclassified as non-bottlenecks and those in [b2, b3] classi-fied as bottlenecks. Second is we can keep incrementingthe number of classes we cluster the circuits into un-til the GVF value passes some threshold τ . Once thethreshold is passed we then classify all circuits in the[bn, bn+1] range as bottlenecks and everyone else as non-bottlenecks.Head/Tail: The head/tail clustering algorithm [20] isuseful when the underlying data has a long tail, whichcould be useful for bottleneck identification, as we ex-pect to have a tight clustering around the bottleneckswith other circuits randomly distributed amongst thelower bandwidth values. The algorithm first splits thedata into two sets, the tail set containing all values lessthan the arithmetic mean, and everything greater to orequal to the mean is put in the head set. If the percentof values that ended up in the head set is less than somethreshold, the process is repeated using the head set asthe new data set. Once the threshold is passed the func-tion returns the very last head set as the head clusterof the data. The pseudocode for this algorithm is shownin Algorithm 3. For bottleneck identification we simplypass in the circuit bandwidth data and the head clusterreturned contains all the bottleneck circuits.Kernel Density Estimator: The idea behind the ker-nel density estimator [25, 27] is we are going to try andfit a multimodal distribution based on a Gaussian kernelto the circuits. Instead of using the bandwidth estimatefor each circuit, the estimator takes as input the entirebandwidth history seen across all circuits, giving the es-timator more data points to build a more accurate den-sity estimate. For the kernel bandwidth we initially usethe square root of the mean of all values. Once we have adensity we compute the set of local minima {m1, ...,mn}and classify every circuit with bandwidth above mn asa bottleneck. If the resulting density estimate is uni-modal and we do not have any local minima, we repeat

Page 10: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 10

this process, halving the kernel bandwidth until we geta multimodal distribution.

With a method to calculate circuit bandwidth andidentify bottleneck circuits, the relay can calculate theirweight in the same way described in Section 4.2; iterateover each bottleneck circuit and add the inverse of thebandwidth to the local weight calculation. This infor-mation can be leaked to clients, allowing them to selectthe circuit with the lowest weight summed across allrelays in the circuit.

5.2 Implementation

Since the decentralized algorithm needs real time band-width information, we implement the algorithm directlyin Tor with clients making circuit selection decisions asdownloads start. So while we had to use the fixed down-load client model in Section 4 in order to precomputecircuit selection, for the decentralized algorithm we canuse the default client model where clients have a sin-gle start time and web clients pause randomly betweendownloads they perform. While this means they algo-rithms need to run in real time, we can now allows in-clude the time to first byte and file download time inour evaluation metrics.

To incorporate the decentralized algorithm in Torwe first implemented the local weight calculationmethod described in Section 5.1. As discussed pre-viously the method has three parameters, bandwidthgranularity g, bandwidth window w, and which cluster-ing method to use. For each circuit relays keep a countof how many cells were read and written in each 100millisecond interval over the past w seconds. Then todetermine the circuit bandwidth it iterates through thecircuit history to determine the maximum number ofcells sent over a g millisecond period. The circuit band-width values are then clustered into bottleneck circuits,with the weight calculated as the sum of inverse circuitbandwidth across all bottlenecks. Now that a relay cancompute their own DWCweight, they need a way to leakthis information to clients. For this we implement a newcell called a gossip cell. Each relay records its weight ina gossip cell and appends it on the downstream queueto the client for each circuit it has. Since the cell is en-crypted no other relay will be able to modify the weight.To prevent gossip cells from causing congestion, relayssend the gossips cells on circuits every 5 seconds basedon the circuit ID; specifically when now() ≡ circID

mod 5. For clients we modified the circuit_get_bestfunction to select circuits with the lowest weight, using

circuit bandwidth as the tie breaker. While compiling alist of “acceptable” circuits that Tor normally selectedfrom, we additionally keep track of the minimal circuitweight across the acceptable circuits, and additionallyfilter out circuits with larger weight values. If there aremultiple circuits remaining after this, we iterate throughthe remaining circuits selecting circuits with the highestcircuit bandwidth, defined as the minimum advertisedrelay bandwidth across all relays in the circuit. If afterthis step we still have more than one circuit available wedefault to Tor’s circuit selection among the remainingcircuits.

For the centralized DWC algorithm we can nolonger precompute circuit selection since we are usingthe default client model, so we need a way to run thecentralized algorithm during the experiment as clientsstart downloads. To accomplish this we introduce a newcentral authority node which clients will need to queryfor circuit selection decisions. When a client starts thecentral authority connects to the client using the Torcontrol protocol. It disables the client from making cir-cuit selections decisions locally, and then listens on theCIRC and STREAM events. With the CIRC events the cen-tral authority know what circuits are available on everyclient in the network. The STREAM events then notifythe central authority when a client begins a download,at which time the central authority can select whichcircuit the client has that they should use for the newdownload. Through the STREAM events the central au-thority is also notified when the stream has completedso the central authority can remove the circuit assignedto the stream from the active circuit list. With this thecentral authority obtains a global view of the networkwith knowledge of every current active circuit. So it canuse the Algorithm 1 discussed in Section 4.2 to com-pute which relays are bottlenecks on each circuit, updatethe weight of relays accordingly, and select the lowestweighted circuit that clients have available.

5.3 Results

With the various parameters and clustering methodsavailable to the decentralized algorithm, the first thingwe are interested in is which parameters results in themost accurate bottleneck estimation. Since we are com-paring the decentralized algorithm against the central-ized DWC algorithm, we will use the central authori-ties bottleneck estimation as ground truth. To make thecomparison we configured an experiment to run withthe central authority making circuit selections. Every

Page 11: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 11

Bandwidth Granularity 100 ms 1 secondBandwidth Window 1s 2s 5s 10s 1s 2s 5s 10s

Jenks Two 1.67 1.79 3.56 8.69 1.80 2.39 5.28 13.25Jenks Best 1.11 1.14 1.91 3.87 1.00 1.18 2.25 4.70Head/Tail 0.74 0.92 1.66 4.36 0.90 1.13 2.45 6.14

KernelDensity 0.81 0.82 0.85 0.93 0.79 0.79 0.95 1.62

Table 1. The bottleneck clustering methods mean square error across varying bandwidth granularity and window parameters, with redvalues indicating scores less than weighted random estimator.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocalcentral

(a) Time to First Byte

0 2 4 6 8 10

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocalcentral

(b) Web Download Times

0 20 40 60 80 100

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocalcentral

(c) Bulk Download Times

80 85 90 95 100 105 110 115

Client Bandwidth (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanillalocalcentral

(d) Total Client Bandwidth

Fig. 4. Download times and client bandwidth compared across circuit selection in vanilla Tor, decentralized, and centralized algorithms.

time the central authority selects a circuit for a client itoutputs the bottleneck estimation for every relay. Ad-ditionally, during the run all relays periodically outputthe entire 10 second bandwidth history for every circuitit is on. With this information we can compute whichcircuits every relay would have classified as a bottleneckdepending on the parameters used. For comparing es-timates, every time the central authority outputs theirbottleneck figures, every relay that outputs their circuithistory within 10 simulated milliseconds will have theirlocal bottleneck estimate compared to the estimate pro-duced by the central authority.

We are interested in the combination of parame-ters and clustering methods that minimizes the mean-squared error between the central authority and local re-lay bottleneck estimations. Table 1 shows the results ofthe different clustering methods with bandwidth granu-larity set at either 100 millisecond or 1 second, and withthe bandwidth window varying between 1,2,5 and 10seconds. For comparison we created a weighted randomestimator that simply randomly selects from the entireset of estimates produced by the central authority. Theweighted random estimator produced a mean-squarederror of 1.33, which was better than almost half of allconfigurations. For example the Jenks two class cluster-ing method consistently produced results worse than theweighted random estimator no matter what parameterswere used. Across all clustering methods larger band-width windows resulted in higher mean-squared errors.This indicates that circuits that were inactive (and thus

excluded from the central authorities calculation) werestill being assigned a high bandwidth value and clus-tered into the bottleneck group. Interestingly, while thekernel density estimator produced the best results onaverage, the overall best parameter configuration wasusing the head/tail estimator with bandwidth granular-ity g = 100ms and bandwidth window w = 1s.

Using these parameters we ran an experiment withthe decentralized DWC algorithm, along with vanillaTor and centralized DWC. Note that every experimentwas configured to have relays send gossip cells every5 seconds in addition with including a central author-ity connected to every client. In non-centralized exper-iments, when a download starts the central authoritytells the client to select a circuit themselves insteadof making the selection for them, and we had vanillaTor ignore relay weights when making circuit selection.This is done to ensure that any latency and band-width overhead is identical across experiments and theonly difference is the circuit selection algorithm beingused. The results of all three experiments are shown inFigure 4. The centralized experiments show across theboard improvements in every metric compared to vanillaTor, with clients experiencing almost 20% higher band-width. While the decentralized experiment produced re-sults slightly under the centralized, we still see improve-ments compared to vanilla Tor. Figure 4d shows clientbandwidth still increasing 8% with the decentralized al-gorithm. Bulk clients in both the centralized and de-centralized experiments achieved between 5% and 30%

Page 12: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 12

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocal-ncirc10local-ncirc25local-ncirc50

(a) Time to First Byte

0 2 4 6 8 10

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocal-ncirc10local-ncirc25local-ncirc50

(b) Web Download Times

0 20 40 60 80 100

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillalocal-ncirc10local-ncirc25local-ncirc50

(c) Bulk Download Times

70 75 80 85 90 95 100 105 110

Client Bandwidth (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanillalocal-ncirc10local-ncirc25local-ncirc50

(d) Total Client Bandwidth

Fig. 5. Download times and client bandwidth when using local decentralized circuit selection, varying the number of available circuits toselect from.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillacentral-ncirc10central-ncirc25central-ncirc50

(a) Time to First Byte

0 2 4 6 8 10

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillacentral-ncirc10central-ncirc25central-ncirc50

(b) Web Download Times

0 20 40 60 80 100

Download Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive F

ract

ion

vanillacentral-ncirc10central-ncirc25central-ncirc50

(c) Bulk Download Times

80 90 100 110 120 130 140

Client Bandwidth (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

vanillacentral-ncirc10central-ncirc25central-ncirc50

(d) Total Client Bandwidth

Fig. 6. Download times and client bandwidth when using centralized circuit selection, varying the number of available circuits to selectfrom.

faster downloads. While all web clients saw faster down-load times in the centralized experiment, the resultsfrom the decentralized experiment are slightly moremixed. 70% of web client downloads performed just asfast as they did in the centralized experiment, with 15%falling in between centralized and vanilla Tor. The final15% of web client downloads experienced just slightlylonger download times, with 1-2% increases comparedto vanilla Tor. Time to first byte shown in Figure 4ashows the worst performance in the decentralized exper-iments, with roughly 25% of downloads experiencing anextra 1-2 seconds to receive their first byte. This couldbe caused by the fact that there is a delay in a relayupdating its weight and all clients receiving it, causingclients to select relays that are temporarily congestedresulting in poor times to first byte.

Along with the discussed parameters for the localweight computation, both the centralized and decentral-ized have an implicit parameter, the number of activecircuits client should attempt to maintain. By defaultTor clients keep around 10 active circuits it can selectfrom. This is to make sure that there is always somecircuit that can be used for downloads, and the clientdoesn’t have to incur the 5-10 second overhead of circuitcreation when a download starts. For both the central-ized and decentralized algorithms, having more circuitsavailable to select from could increase performance. As

was seen in Section 4.2 when we had the full set of cir-cuits to select from, the centralized DWC algorithm pro-duced the best results. To test this we configured theexperiments to have clients maintain a set of 10, 25,and 50 available circuits. Results for these experimentsare shown in Figures 5 and 6. Across almost every met-ric the decentralized algorithm saw worse results whenusing more circuits. This is particularly evident lookingat client bandwidth in Figure 5d. When selecting from50 circuits only about 40% of the time clients achieveda higher bandwidth, and using 25 circuits always re-sulted in worse performing clients. This is most likelydue to the fact that increasing the number of circuitsthat clients create is interfering with the bottleneck clas-sification algorithm, resulting in worse performance. Forthe centralized algorithm all download times remainedfairly identical no matter how many circuits were avail-able. The only metric that saw any difference was clientbandwidth shown in Figure 6d. Here we see increasingthe number of available circuits to 25 and 50 achieves anincrease of 8% and 19% in client bandwidth compared tousing only 10 circuits. Considering that no other met-rics improved when increasing this is most likely dueto overhead from maintaining circuits that is not beingcontrolled for.

Page 13: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 13

1 2 3 4 5 6 7 8 9 10

Client Difference

0.20

0.15

0.10

0.05

0.00

0.05

0.10

0.15

0.20

Weig

ht

Diffe

rence

(a)

1.0 0.5 0.0 0.5 1.0 1.5 2.0∆ Percent Circuits Seen

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

Vanilla to Local DWCVanilla to Lying

(b)

5 10 15 20

Relay Index

0

5

10

15

20

25

Cir

cuit

s Seen (

%)

Change from Local DWCChange from Lying

(c)

Fig. 7. (a) weight difference on relays compared to the difference in number of bottleneck circuits on the relay (b) CDF of the differencein percent of circuits a relay sees when a relay lies about their weight (c) percent of circuits a relay is on for the top 20 relays going fromvanilla Tor to decentralized DWC (blue), in addition with the gains achieved when relays lie about their weight (green)

5.4 Privacy Analysis

With the addition of gossip cells and the new decen-tralized circuit selection protocol, there are some av-enues that could be abused in an attempt to reduceclient anonymity. The first major concern is that sincethe gossip cells purposefully leak information about theinternal state of the relays, that this could open up anew side channel where an adversary can correlate clientactivity to change in information leaked to determinewhich relays are used by a client. The second issue isthe increase in percent of circuits that relays end upbeing selected on, both from the change in selection al-gorithm and also due to the fact that an adversarialrelay could abuse the protocol. This can be done by ly-ing about their own weight or artificially inflating otherrelays weight, thereby increasing the chance they willbe selected allowing them to observe a higher portion ofthe Tor network. In this section we examine the impactof some of these privacy issues.

5.4.1 Information Leakage

The first issue is that relays advertising their localweight calculation could leak some information to anadversary. Mittal et al.[21] showed how an adversarycould use throughput measurements to identify bottle-neck relays in circuits. Relay weight values could be usedsimilarly, where an adversary could attempt to correlatestart and stop times of connections with the weight val-ues of potential bottleneck relays. To examine how muchinformation is leaked by the weight values, every timea relay sent a GOSSIP cell we recorded the weight of therelay and the number of active circuits using the relay.

Then for every two consecutive GOSSIP cells sent by arelay we then recorded the difference in weight and num-ber of active circuits, which should give us a good idea ofhow well the change in these two values are correlatedover a short period of time. Figure 7a shows the dis-tribution of weight differences across various changes inclients actively using the relay. Since we are interested intimes when the relay is a bottleneck on new circuits, weexcluded times when the weight difference was 0 as thisis indicative that the relay was not a bottleneck on anyof the new circuits. This shows an almost nonexistentcorrelation between the two metrics, where we are justas likely to see a rise or drop in weight after more bottle-neck circuits start using a relay. In the situation similarto the one outlined in [21] where an adversary is at-tempting to identify bottleneck relays used in a circuit,we are particularly interested in the situation where thenumber of active circuits using the relay as a bottleneckincreases by 1. If there were large (maybe temporary)changes noticeable to an adversary they could identifythe bottleneck relay. But as we can see in Figure 7a thedistribution of weight changes when client difference is1 is very large, meaning it would be almost impossibleto identify the bottleneck relay by just monitoring relayweights.

5.4.2 Relay Usage Changes

Since relays are still selected at random weighted bytheir bandwidth, over the long term relay usage shouldremain the same in vanilla Tor and when using the de-centralized DWC circuit selection algorithm. Over theshort term this might change though, since circuits areselected based on network information that could cause

Page 14: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 14

0 5 10 15 20 25 30

Time (m)

0

10

20

30

40

50

Bandw

idth

(M

B/s

)

attacker bwtarget weight

0

1

2

3

4

5

Weig

ht

(a) (b)

0 5 10 15 20 25 30

Time (m)

0

2

4

6

8

10

12

14

16

Dow

nlo

ads

Finis

hed

vanilladecentralized

(c)

Fig. 8. (a) attackers bandwidth and targets weight when an adversary is running the denial of service attack (b) number of clients usingthe target for a circuit (c) number of downloads completed by clients using the target relay before and after attack is started, shownboth for vanilla Tor and using decentralized circuit selection

peeks in relay usage. In addition to short term fluctu-ations, an active adversarial relay could lie about theirDWC weight, consistently telling clients they have aweight of 0, increasing the chances clients select circuitsthat the adversaries relay lies on. These changes couldallow a relay to increase their view of the network with-out having to actually provide more bandwidth, poten-tially reducing anonymity of clients using Tor.

To determine the effect that the decentralized al-gorithm has on relay usage, we extracted all circuitsselected by clients during each experiment. In addition,for the experiment running the decentralized circuit se-lect algorithm we had clients output every circuit itconsidered, along with each relays weight. This allowsus to perform a static analysis of which circuits wouldhave been selected if a single relay was lying about theirweight. In each situation we are interested in the changein the percent of circuits selected containing a specificrelay, showing how much more of the network an ad-versary might be able to see. Figure 7b shows the CDFof the change in percent of circuits experienced by eachrelay. It shows the difference when going from vanillaTor to regular decentralized DWC, in addition to go-ing from vanilla Tor to decentralized DWC when relayslie about their weight. While the increases are gener-ally small, the largest changes come from simply usingthe new decentralized algorithm, with 80% of relays see-ing a small drop in the percent of circuits selected theyend up on. Even the relays that see an increase almostnever end up seeing more than 1% additional circuits.Furthermore there is very little gained when relays startlying about their weight, with most increases being anextra 0.1-0.2%. Figure 7c shows the 20 relays with thelargest gains. These are the highest bandwidth relays inthe network, providing 39% of the total available band-

width. Here we see in the most extreme case, the veryhighest bandwidth relay could view 9% more circuitsthan they would have seen in vanilla Tor over the shortterm. Over the long term however, the only real gaincoming from the relay lying about their weight wouldbe capped at about 3-4%.

5.4.3 Denial of Service

While adversarial relays are limited in the number of ex-tra circuits they can be placed on by lying about theirweight, they still could have the ability to reduce thechances that other relays are selected. To achieve thisthey would need to artificially inflate the weight of otherrelays in the network, preventing clients from selectingcircuits that the target relays appear on. Recall thatthe local weight calculation is based on how many bot-tleneck circuits the relay estimates they are on. Thismeans that an adversary cannot simply just create in-active circuits through the relay to inflate their weight,those circuits would never be labeled as bottleneck. Soto actually cause the weight to increase the adversaryneeds to actually send data through the circuits. To testthe effectiveness, we configured an experiment to create250 one-hop circuits through the target relay. After 15minutes the one-hop circuits were activated, download-ing as much data as they could. Note that we want asmany circuits through the relay as possible to make theirweight as large as possible. The relay weight is summedacross all bottleneck circuits,

∑bw(ci)−1. If we have

n one-hop circuits through a relay of bandwidth bw,each circuit will have a bandwidth of roughly bw

n , so theweight on the relay will be

∑n1(

bwn

)−1 = n2

bw .

Page 15: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 15

Figure 8a looks at the weight of the target relayalong with how much bandwidth the attacker is using,with the shaded region noting when the attack is ac-tive. We see that the attacker is able to push throughclose to the maximum bandwidth that the relay canhandle, around 35 MB/s. When the circuits are activethe weight spikes to almost 100 times what it was previ-ously. Figure 8b shows the number of circuits the targetis actively on. After the attack is started the numberplummets to almost 0, down from the 30-40 it was pre-viously on. But while the attack does succeed in inflatingthe weight of the target, note that the adversary has tofully saturate the bandwidth of the target. Doing thisin vanilla Tor will have has almost the same effect, es-sentially running a denial of service by consuming allthe available bandwidth. Figure 8c looks at the numberof completed downloads using the target relay in bothvanilla Tor and when using the decentralized circuit se-lection. Both experiments see the number of successfuldownloads drop to 0 while the attack is running. So eventhough the addition of the relay weight adds anothermechanism that can be used to run a denial of serviceattack, the avenue (saturating bandwidth) is the same.

6 ConclusionIn this paper we explore the price of anarchy in Torby examining how using offline and online algorithmsfor circuit selection is able to significantly increase per-formance to end clients. These algorithms are able tomore efficiently utilize network resources, resulting inalmost twice as much bandwidth being consumed. Fur-thermore, since the offline and online produced almostidentical results, it suggests that that network utiliza-tion was near or at capacity. In addition we adapt theonline DWC algorithm to create a decentralized version,where relays themselves are able to estimate calcula-tions made by a central authority. While the results arenot quite as dramatic, the decentralized algorithm wasstill able to achieve 20-25% higher bandwidth consump-tion when compared to vanilla Tor. Finally we analyzethe potential loss in anonymity when using the decen-tralized algorithm, both against a passive and activeadversary.

AcknowledgmentsThis research was supported by NSF grant 1314637.

References[1] Shadow Homepage and Code Repositories. https://shadow.

github.io/, https://github.com/shadow/.[2] M. Akhoondi, C. Yu, and H. V. Madhyastha. LASTor: A

Low-Latency AS-Aware Tor Client. In Proceedings of the2012 IEEE Symposium on Security and Privacy, May 2012.

[3] M. AlSabah, K. Bauer, and I. Goldberg. Enhancing tor’sperformance using real-time traffic classification. In Pro-ceedings of the 19th ACM conference on Computer andCommunications Security (CCS 2012), October 2012.

[4] M. AlSabah, K. Bauer, I. Goldberg, D. Grunwald, D. McCoy,S. Savage, and G. Voelker. Defenestrator: Throwing outwindows in tor. In Proceedings of the 11th Privacy Enhanc-ing Technologies Symposium (PETS 2011), July 2011.

[5] M. AlSabah and I. Goldberg. PCTCP: Per-Circuit TCP-over-IPsec Transport for Anonymous Communication OverlayNetworks. In Proceedings of the 20th ACM conferenceon Computer and Communications Security (CCS 2013),November 2013.

[6] M. Backes, A. Kate, S. Meiser, and E. Mohammadi. (noth-ing else) MATor(s): Monitoring the anonymity of tor’s pathselection. In Proceedings of the 21th ACM conferenceon Computer and Communications Security (CCS 2014),November 2014.

[7] R. Dingledine, N. Mathewson, and P. Syverson. Tor: Thesecond-generation onion router. In Proceedings of the 13thUSENIX Security Symposium, August 2004.

[8] R. Dingledine and S. J. Murdoch. Performance im-provements on tor or, why tor is slow and what we’regoing to do about it. Online: http://www. torproject.org/press/presskit/2009-03-11-performance. pdf, 2009.

[9] J. Geddes, R. Jansen, and N. Hopper. IMUX: Managing torconnections from two to infinity, and beyond. In Proceedingsof the 12th Workshop on Privacy in the Electronic Society(WPES), November 2014.

[10] D. Gopal and N. Heninger. Torchestra: Reducing interactivetraffic delays over tor. In Proceedings of the Workshopon Privacy in the Electronic Society (WPES 2012). ACM,October 2012.

[11] R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. R. Kan.Optimization and approximation in deterministic sequencingand scheduling: a survey. Annals of discrete mathematics,5:287–326, 1979.

[12] S. Herbert, S. J. Murdoch, and E. Punskaya. Optimis-ing node selection probabilities in multi-hop m/d/1 queu-ing networks to reduce latency of tor. Electronics Letters,50(17):1205–1207, 2014.

[13] R. Jansen, K. S. Bauer, N. Hopper, and R. Dingledine. Me-thodically modeling the tor network. In CSET, 2012.

[14] R. Jansen, J. Geddes, C. Wacek, M. Sherr, and P. Syverson.Never been kist: Tor’s congestion management blossomswith kernel-informed socket transport. In Proceedings of23rd USENIX Security Symposium (USENIX Security 14),San Diego, CA, August 2014. USENIX Association.

[15] R. Jansen and N. Hopper. Shadow: Running Tor in a Boxfor Accurate and Efficient Experimentation. In Proc. of the19th Network and Distributed System Security Symposium,2012.

Page 16: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 16

[16] R. Jansen, N. Hopper, and Y. Kim. Recruiting new Tor re-lays with BRAIDS. In A. D. Keromytis and V. Shmatikov,editors, Proceedings of the 2010 ACM Conference on Com-puter and Communications Security (CCS 2010). ACM,October 2010.

[17] R. Jansen, A. Johnson, and P. Syverson. LIRA: LightweightIncentivized Routing for Anonymity. In Proceedings ofthe Network and Distributed System Security Symposium- NDSS’13. Internet Society, February 2013.

[18] R. Jansen, P. Syverson, and N. Hopper. Throttling TorBandwidth Parasites. In Proceedings of the 21st USENIXSecurity Symposium, August 2012.

[19] G. F. Jenks. The data model concept in statistical mapping.International yearbook of cartography, 7(1):186–190, 1967.

[20] B. Jiang. Head/tail breaks: A new classification schemefor data with a heavy-tailed distribution. The ProfessionalGeographer, 65(3):482–494, 2013.

[21] P. Mittal, A. Khurshid, J. Juen, M. Caesar, and N. Borisov.Stealthy traffic analysis of low-latency anonymous commu-nication using throughput fingerprinting. In Proceedings ofthe 18th ACM conference on Computer and communicationssecurity, pages 215–226. ACM, 2011.

[22] T.-W. J. Ngan, R. Dingledine, and D. S. Wallach. Build-ing Incentives into Tor. In R. Sion, editor, Proceedings ofFinancial Cryptography (FC ’10), January 2010.

[23] M. F. Nowlan, N. Tiwari, J. Iyengar, S. O. Aminy, andB. Ford. Fitting square pegs through round pipes: Unordereddelivery wire-compatible with tcp and tls. In Proceedings ofthe 9th USENIX conference on Networked Systems Designand Implementation, pages 28–28. USENIX Association,2012.

[24] M. F. Nowlan, D. Wolinsky, and B. Ford. Reducing latencyin tor circuits with unordered delivery. In USENIX Workshopon Free and Open Communications on the Internet (FOCI),2013.

[25] E. Parzen. On estimation of a probability density func-tion and mode. The annals of mathematical statistics,33(3):1065–1076, 1962.

[26] J. Reardon. Improving Tor using a TCP-over-DTLS tunnel.Master’s thesis, University of Waterloo, September 2008.

[27] M. Rosenblatt et al. Remarks on some nonparametric esti-mates of a density function. The Annals of MathematicalStatistics, 27(3):832–837, 1956.

[28] T. Roughgarden. Selfish routing. PhD thesis, Cornell Univer-sity, 2002.

[29] T. Roughgarden. Selfish routing and the price of anarchy,volume 174. MIT press Cambridge, 2005.

[30] T. Roughgarden and É. Tardos. How bad is selfish routing?Journal of the ACM (JACM), 49(2):236–259, 2002.

[31] M. Sherr, M. Blaze, and B. T. Loo. Scalable link-basedrelay selection for anonymous routing. In Privacy EnhancingTechnologies, pages 73–93. Springer, 2009.

[32] M. Sherr, A. Mao, W. R. Marczak, W. Zhou, B. T. Loo,and M. A. Blaze. A3: An extensible platform for application-aware anonymity. 2010.

[33] R. Snader and N. Borisov. Eigenspeed: secure peer-to-peerbandwidth evaluation. In IPTPS, page 9, 2009.

[34] R. Snader and N. Borisov. Improving security and perfor-mance in the tor network through tunable path selection.Dependable and Secure Computing, IEEE Transactions on,

8(5):728–741, 2011.[35] C. Tang and I. Goldberg. An improved algorithm for Tor

circuit scheduling. In A. D. Keromytis and V. Shmatikov,editors, Proceedings of the 2010 ACM Conference on Com-puter and Communications Security (CCS 2010). ACM,October 2010.

[36] C. Wacek, H. Tan, K. Bauer, and M. Sherr. An EmpiricalEvaluation of Relay Selection in Tor. In Proceedings ofthe Network and Distributed System Security Symposium -NDSS’13. Internet Society, February 2013.

[37] T. Wang, K. Bauer, C. Forero, and I. Goldberg. Congestion-aware Path Selection for Tor. In Proceedings of FinancialCryptography and Data Security (FC’12), February 2012.

[38] Y. Yang, J. K. Muppala, and S. T. Chanson. Quality ofservice routing algorithms for bandwidth-delay constrainedapplications. In Network Protocols, 2001. Ninth Interna-tional Conference on, pages 62–70. IEEE, 2001.

A Bandwidth Algorithm ProofLet R be the relay selected with B bandwidth and C

circuits. Let R′ be a different relay with B′ bandwidthand C ′ circuits. By definition R is selected such thatBC ≤

B′

C′ . When iterating through the C circuits let n bethe number that R′ is on. Note that this means that n ≤C ′. After the circuits have been iterated through andoperations performed, R′ will have B′− B

C ·n bandwidthleft with C ′ − n circuits.

Assume that R′ has 0 bandwidth afterwards, so B′−BC ·n = 0. We want to show this means that R′ is on nomore circuits so that C ′ − n = 0. We have

B′ − B

C· n = 0⇒ B′ = B

C· n⇒ n = B′ · C

B

So that means that the number of circuits R′ is left onis

C ′ − n = C ′ − C − B′ · CB

= B · C ′

B− B′ · C

B

= B · C ′ −B′ · CB

However, R was picked such that

B

C≤ B′

C ′⇒ B · C ′ ≤ B′ · C ⇒ B · C ′ −B′ · C ≤ 0

This gives us

C ′ − n = B · C ′ −B′ · CB

≤ 0

since we know B > 0 and

n ≤ C ′ ⇒ 0 ≤ C ′ − n

which implies that 0 ≤ C ′ − n ≤ 0⇒ C ′ − n = 0.

Page 17: John Geddes*, Mike Schliep, and Nicholas Hopper ... · John Geddes*, Mike Schliep, and Nicholas Hopper AnarchyinTor:PerformanceCostofDecentralization Abstract: Like many routing protocols,

Anarchy in Tor: Performance Cost of Decentralization 17

B Circuit Pruning Algorithm

Algorithm 4 Generate pruned circuit set1: function BuildPrunedSet(relays)2: circuits← List()3: while TRUE do4: if relays.len() > 3 then5: break6: end if7: if relays.numExits() > 0 then8: break9: end if

10: relays.sortByBW ()11: exit← relays.getF irstExit()12: middle← relays.getF irstNonExit()13: guard← relays.getF irstNonExit()14: if middle == Null then15: middle← relays.getF irstExit()16: end if17: if guard == Null then18: guard← relays.getF irstExit()19: end if20: circuits.append(guard,middle, exit)21: bw ← min(guard.bw,middle.bw, exitbw)22: guard.bw ← guard.bw − bw23: middle.bw ← middle.bw − bw24: exit.bw ← exit.bw − bw25: if guard.bw == 0 then26: relays.remove(guard)27: end if28: if middle.bw == 0 then29: relays.remove(middle)30: end if31: if exit.bw == 0 then32: relays.remove(exit)33: end if34: end while35: return circuits

36: end function