flow capacity of the london underground

Flow Capacity of the London

Underground: Mind the Gap

By

John Joseph Dougherty X

Abstract:

The London Underground, often referred to as ‘the Tube’, is one of the primary transportation

systems for the Greater London Area. As one might expect, the demand for ridership jumps

during the morning and evening rush hours which, in turn, congests the system. The goal of this

paper is to construct an appropriate model for the London Underground to better analyze the

system’s transportation capacity. Specifically, we will construct a graph of the rail system where

we will then assign a unique carrying capacity to the individual edges of the network. Once our

directed graph is set up, we will then use a maximum flow algorithm to calculate the maximal

number of people the system can move from the residential to the business districts in a given

hour. With this calculation we will then compare it to actual values and determine if the system

is operating efficiently.

Introduction:

The London Underground is a primary transportation artery of the Greater London area.

Locally it is known as the Tube and is an integral part of daily life, particularly in the city proper.

The system itself is the oldest underground rail system in the world, having opened its first line,

the Metropolitan Railway, in 1863 [3]. It is also the 12th largest transit system in the world, with

the rail network covering over 402km and serving 270 stations. This extensive rail system,

which lies both above and below ground, is responsible for about 4 million passenger journeys

a day and accounts for about 17% of the total public transportation ridership [1].

As far as transportation demand goes, the hourly demand for ridership peaks at 8 am,

with about 400 thousand passenger journeys starting in this hour, and a total of about 1 million

total journeys starting between 7 and 10 am as of 2007. There is a similar rush hour period

from 4 to 7 pm in the evening, but this demand is more spread out [2].

The system itself is composed of 11 individual lines, all of which have their own unique

trains and carrying capacities. The lines are: Bakerloo, Central, Circle, District, Hammersmith &

City, Jubilee, Metropolitan, Northern, Piccadilly, Victoria, and Waterloo & City. Now, though

each line tends to follows a distinct route, at times some lines ride in parallel with others. For

example, from the Baker Street Station to the Liverpool Street Station, the Bakerloo, Circle, and

Hammersmith & City lines travel along the same rail. That is, these three lines make the same

stops and even share the station platform at times. This is something we will have to keep in

mind when assigning the carrying capacities to the edges within our graph model.

So as we can see, this is a fairly extensive network which conjures up many interesting

questions. In particular, we want to know what the maximum capacity of people the London

Underground can move from the Residential to Business Districts in an hours’ time. With a

solution to this problem, we can then ask ourselves if the Tube is running efficiently. That is;

does the London Underground meet the transportation demands of the city? Is it supplying less

or more capacity than necessary? In the case of an emergency, how long will it take to

empty/fill the city proper? Does London need to spend more or less money on meeting the

demand? These are the kind of questions we can begin to answer upon understand the

maximum flow through the network. Furthermore, enhancing the efficiency of such systems

leads to more profit for the City as well as an increase to quality of life for the consumer, in this

case the traveler. Thus this question is not only interesting to millions of commuters, it also

poses interest for the city.

Background:

There has been ample work done to study traffic systems in general. As mentioned

earlier, it is a problem that both the city and consumer cares greatly about. In fact, systems like

the London Underground are a common topic of study for companies interested in

transportation networks. Such systems of interest include: rail, road, foot, and even flight

networks. Moreover, while these systems may seem distinct, there are fundamental similarities

between them. Essentially, understanding and modeling these transportation networks falls

into the category of logistics and hence extensive work has been done in studying them.

It should be noted that there are two ‘views’ commonly used to analyze traffic systems.

On the one hand we have the view of the consumer, and on the other we have the perspective

of the supplier. In the first, the view of the consumer, average wait times and trip durations are

of paramount importance to ensure rider satisfaction; however, in the case of the supplier

which is a more macro view of the system, overall utility and net performance take the upper

hand. Now although these perspectives are intimately connected and in an ideal model of the

system all such factors would be taken into account, the difficulty in doing so reduces the

practicality of such an inclusion.

For our model we will take the latter view, which is that of the macro performance. In

this case, general variables of interest include: approach, dwell, depart, deadhead, transit, and

wait. Of these variables, some are rather self-explanatory like approach, dwell, depart and

transit which correspond respectively to the vehicle’s breaking, loading, accelerating, and

traveling times. Yet other variables like wait and deadhead are less obvious. While it may seem

clear, wait is more subtle than initially expected. This subtlety is a result of the range of aspects

that are considered in determining the wait variable. For instance, if a vehicle arrives early to a

destination but is on an explicit schedule, it then has to wait before departing to its next

location of interest. Another example of wait time is when a vehicle cannot progress due to

some ‘traffic jam’ further down the line. So while it is a clear variable in what it measures, the

subtlety of it makes it difficult to quantify. Finally, the other non-initiative variable is deadhead.

This factor corresponds to the ‘dead’ travel that may occur for logistic reasons. That is,

deadhead occurs when vehicles have to be moved to a new location due to necessity, yet the

vehicle is not transporting any cargo, or in the case of the Tube, passengers.

Now while all six variables are major contenders used to model the overall performance

of transportation networks, our model will not include all of these factors. Due to the size of the

system we are considering and the lack of available data, we simply cannot use these variables

to the appropriate level of detail. Hence for our model, while we would love to include all

possible factors and generate a model of relevant sophistication, we will instead have to make

some simplifying assumptions that will allow us to generate a rough model of the network.

Data:

Due to its importance, there is a lot of valuable data on the Tube transportation

network; however, while there is ample data collected, we do not necessarily have access to all

the information of interest. That said we do have access to some of the more fundamental

data.

Firstly, we have the official map of the London Underground provided by Transportation

for London [1]. This seemingly trivial piece of data is actually one of the more useful sources of

information we have. Due to the way it is set up, it allows us to determine not only what

stations are connected to each other, but it also provides information about what lines connect

these stations. This information is immensely important in generating a flow graph of the Tube

that will later be used to run various calculations.

Next we have, again due to Transportation for London [1], information about the train

stock of individual lines. So for the 11 lines we know what the individual carrying capacities of

the trains are. They are, in people per train, as follows:

Bakerloo: 730

Central: 892

Circle: 865

District: 827

Hammersmith & City: 865

Jubilee: 817

Metropolitan: 865

Northern: 665

Piccadilly: 684

Victoria: 864

Waterloo & City: 892

This data we will later use to calculate the carrying capacity of the network.

Along with the above information, we will also need information about the speed and

distance between stations. Unfortunately, information about the transit, approach, dwell and

depart variables were difficult to come by, and as a result they are all compacted into the

average speed of a train over the entire network, which we know to be 33 km per hour [1]. This

is a detrimental assumption as we will later find out, but a necessary one.

Now information about edge length, that is the distance between any two adjacent

stations, will actually be a derived value based on the latitude and longitude of the stations.

This data, along with a list of stations that are contained in zones 1 and 2, was provided by

Doogal Co.UK [4]. It should be noted that for the distances, we will simply assume that the

edges tend to follow geodesics and so we will use the latitude/longitude of the stations to make

this calculation.

The next piece of pertinent data comes from Samuel Hickey [2] and Transportation for London

(TFL) [1], which is the acceptable wait time for a train in the London Underground. Now, as

mentioned earlier, with a macro view of the system this data may seem unnecessary. However,

while the actual wait time is not used in our system, it is used in determining the minimum train

frequency for each line, which then determines the actual number of trains in the system on a

line to line basis. So, according to Samuel Hickey [2], the maximum ‘acceptable’ wait time for a

train is less than 10 minutes. This information along with TFL claiming that the average wait

time for a train is between 2 and 7 minutes allows us to figure realistic train frequencies.

Finally, we require an understanding of the two districts in question. That is, what

stations are in the Residential District, and what stations lay in the Business District? To gather

this data, we make some educated assumptions about zoning, which we then will use to find

the appropriate stations contained in the respective districts. Information about the two

districts comes from Business 2 Community [5]. Here it is stated that the 5 primary business

districts are: Square Mile, Canary Warf, Southwark and London Bridge, West End, and

Shoreditch. These 5 areas make up for over half a million jobs for commuters, and as a result

they are considered as London’s Business District. For the residential district, we note that

many commuters live outside of Central London. Because of this, we will assume that the

commuters for the London Underground come from zone 3 and out. Thus, with this notion of

residential and business districts, we can now use our map to find the stations that lay in the

respective districts. Via the map we find that the stations in the business district include: Bank,

Barbican, Blackfriairs, Cannon Street, Covent Garden, Elephant and Castle, Lambeth North,

Leicester Square, London Bridge, Mansion House, Moorgate, Oxford Circus, Piccadilly Circus,

Southwark, St Paul’s, Waterloo, and Canary Wharf; while the stations included in the residential

district are: Archway, Brixton, Bromley-by-Bow, Clapham South, East Putney, Hampstead,

Kensel Green, Manor House, Mile End, North Acton, North Greenwich, Turnham Green, and

Willesden Green. Thus, we now have all the pertinent data to construct our model.

Model:

Using the above information, we now construct a model of London’s subway network in

the form of a directed graph. Using graph theory, we build the directed graph with edge

capacity, 𝐺 = {𝑉, 𝐸, 𝐶} where: 𝑉 = {𝑣𝑛: 1 ≤ 𝑛 ≤ 123} which is the set of vertices which in our

case is the 123 distinct stations contained in zones 1 and 2, 𝐸 = {𝑒𝑖: 1 ≤ 𝑖 ≤ 163} which is the

set of directed edges connecting the stations of which there are 163, and 𝐶 = {𝑐𝑘: 1 ≤ 𝑘 ≤

163}, which is the set of respective carrying capacities of the edges. Now, as mentioned above,

𝑉 and 𝐸 are found from our map of the underground. The edge capacities 𝐶 on the other hand,

will take further motivation to calculate and will be discussed in the next section. It should also

be noted that much of the above data will go into the calculations of the 𝑐𝑘’s.

Now, while the graph 𝐺 is a realistic model, it is an incomplete model based on the

calculations we wish to run. As it is, 𝐺 does represent zones 1 and 2 of the London

Underground, but it currently contains no additional information about the residential and

business districts. To remedy this we consider our source 𝑆 ⊂ 𝑉 and target 𝑇 ⊂ 𝑉 where 𝑆 =

{𝑅𝑒𝑠𝑒𝑑𝑒𝑛𝑡𝑖𝑎𝑙 𝑛𝑜𝑑𝑒𝑠}, and 𝑇 = {𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠 𝑛𝑜𝑑𝑒𝑠}. We want to embed our source and target

into the graph with unique properties, after all the idea is to calculate the maximum flow from

𝑆 to 𝑇. Furthermore, the maximum flow algorithm is used to find the maximum flow through

the network from a single source node to a single target node, and we currently have a multi-

source/target system. In order to remedy this, we create two ‘dummy’ vertices 𝑠 and 𝑡,

thought as source and target respectively. Next, we connect each of the 13 stations in 𝑆 to our

source node 𝑠, and each of the 17 stations in 𝑇 to our target node 𝑡. Let these edges be

denoted as 𝐷. Now we can use 𝐷, 𝑠 & 𝑡 to generate the necessary directed graph 𝐺∗ =

{𝑉 ∪ {𝑠, 𝑡}, 𝐸 ∪ 𝐷, 𝐶∗}. In doing so we add a total of 30 new edges to 𝐺, and it should be noted

that

𝐶∗ = 𝐶 ∪ {𝑠𝑚, 𝑡𝑝: 1 ≤ 𝑚 ≤ 13, 1 ≤ 𝑝 ≤ 17}

Where the 𝑠𝑚’s and 𝑡𝑝’s the respective new edges connecting the residential and business

nodes to the source and target nodes. Next e claim that while every edge in 𝐸 is bidirectional,

the constructed edges are not. That is, the residential edges strictly go from 𝑠 to 𝑠𝑚 and the

business edges strictly go from 𝑡𝑝 to 𝑡. Furthermore, these edges are assumed to have infinite

capacity. All of this is done so that our synthetic nodes do not restrict the overall flow of the

system.

So now that we have our directed graph 𝐺∗, we can use the maximum flow algorithm of

a directed graph to find the maximum number of people our system can move from 𝑆 to 𝑇 in an

hour, subject to the various constraints considered in the next section. It should be noted that

we use Mathematica to calculate the max flow our network due to its size, and that their

algorithm differs from the one that we know and is given by in the proof of the Min-Cut Max-

Flow Theorem as presented by Jacques Verstraete [6]. And with this, we are now ready to run

some calculations.

Calculations and Derivations:

First and foremost, we have to calculate the edge capacities of our graph 𝐺∗. Thus we

must calculate the set 𝐶 which first requires a sub calculation. Our first order of business then is

to find the line capacities of our system. The 11 distinct line capacities of our network are given

by the equation

𝑙𝑗 = (∑ 𝑥𝑖)(𝐹𝑗)(𝑇𝑗)

𝑣𝑗 , 1 ≤ 𝑗 ≤ 11

Here the 𝑥𝑖’s correspond to the length of the edges 𝑒𝑖, which are determined by the

geographical distance of the stations that make up the edges, and we sum over all such edges

that line 𝑗 lies on in order to calculate to overall length of line 𝑗 within our system. Next, we

multiply by 𝐹𝑗, which represents the desired train frequency of line 𝑗. We then multiply by the

carrying capacity of the type of train for line 𝑗 denoted 𝑇𝑗. And finally, we divide out by 𝑣𝑗 ,

which is the average velocity of line 𝑗, to obtain the line capacities as above. Now it should be

made clear that here, the units are in people. Thus, this value derives the maximum number of

people each line can hold at any instant of time.

Two additional things should be noted here. The first is that while we know that average

velocity of the network is 33 km per hour, the average velocity of each line is not necessarily 33

km per hour. To represent some of this variation, we allow edges of length 5 km or greater to

travel at an average speed of 66 km per hour, while edges of length less than 300 meters are

restricted to only 22 km per hour. In doing so, we are able to retain our 33 km per hour average

speed while yielding a more accurate model. The second note is that we will take the ceiling of

the 𝑙𝑗’s in our final calculations in order to have an integer number of trains for each line.

Now that we have the line capacities, 𝑙𝑗 for 1 ≤ 𝑗 ≤ 11, we can calculate the edge

capacities of 𝐺∗. The equation for the edge capacity of edge 𝑒𝑘 is given by

𝑐𝑘 =(𝑟𝑘)(∑ 𝑙𝑗)

𝑥𝑘

We sum over all lines that connect the pair of stations that define our edge 𝑒𝑘, and then

multiply by average speed of that edge denoted as 𝑟𝑘. We then divide out by the length of that

edge to obtain the carrying capacity of each edge given in people per hour.

So now we have the derivation of the carrying capacities for the individual edges of our

graph 𝐺∗ which we will use to run a few different max-flow calculations.

Calculation 1:

𝑇𝑗 = 80% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑗 = 100% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦

𝐹𝑗 = 1 ℎ𝑜𝑢𝑟 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 1 ℎ𝑜𝑢𝑟 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛

𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 281,834 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 353,707

Calculation 2:


𝐹𝑗 = 10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛

𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,221,300 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,532,980

Calculation 3:



𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,747,770 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,193,500

Calculation 4:



𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,285,740 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,868,850

Conclusions:

We can immediately see that, given our assumptions, the London Underground can flat

move an extraordinary number of people from outside zone 2 to the business district in an

hour. In fact, our calculation shows that in a matter of two hours, the Tube can exceed the daily

demand for ridership. So what went wrong?

Well, that is a difficult question to answer. We did make some averaging assumptions

that may have shifted our results. Furthermore, we did also use rounding in our calculations to

yield integer valued functions. Yet, these factors combined should not account for the gap we

see between our model and reality. In fact, it is likely that our biggest assumption responsible

for this discrepancy is derived from the Maximum Flow algorithm. The Maximum Flow

algorithm assumes that all aspects of the system are focused on moving the maximal number of

units from A to B, and completely ignores interactions within the system outside of

transportation from source to sink. That is, if we were dealing with freight and wanted to find

the Maximal Flow of freight we can move from source to target, the Max-Flow Min-Cut

algorithm would be much more accurate. Yet because we are looking at a system where the

‘freight’ interacts within the system and the macro demand isn’t strictly getting from A to B, the

algorithm is less powerful than initially anticipated. Finally, the maximum flow assumes rational

actors who have a foreknowledge of the system and will take considerable longer routes

despite the inconveniences. This of course is not true but was a necessary assumption for our

model. Yet, while these assumptions have been made, our results are not rendered utterly

useless.

While it is true that the question of efficiency becomes difficult to answer given our

result, we can say a few interesting things about the network. For example, we have found that

if required, the London Underground can evacuate the inhabitants of the City Proper, via the

business district stations, to the Greater London Area in as quickly as 4 hours. Furthermore, if

London restricts transportation on the Underground during rush hour to only transport

commuters from the residential district to the business district, and if all commuters were to

leave in the same hour, the Tube could transport all of its Underground commuters in an hour’s

time. So while we are unable to answer our initial question, we still yielded an interesting result

that does tell us valuable information about the system.

Now, to further improve the model a few things should be done. First and foremost is

finding access to more data. Instead of assuming an average speed for the system, it would be

ideal to know the travel, approach, dwell and depart variables for each edge of the network.

And even without that information, the model would be more viable with more accurate

knowledge of the average speed of each line. Furthermore, more data on the number of trains

and the spacing between them for each individual line would yield a more precise model.

Secondly, this model would improve by looking at different sources and targets. That is,

find the starting/ending destinations with the most demand and find the networks maximum

flow between those stations, given some restraints. This would allow us to make localized

improvements on the system which in the long run would improve the system as a whole.

Finally, this model could be further improved by taking a different approach to the

question of efficiency. As stated earlier, a single max-flow algorithm is not ideal for such

systems. Thus in addition to finding more sources/targets, it may be more efficient to abandon

the idea of maximum flow entirely for a more commuter specific model.

Thus, while we were unable to answer our initial question and our model could stand

improvements, we were able to build a network of the London Underground that yielded

interesting and pertinent information about the system.

References

[1] “Transportation for London”, http://www.tfl.gov.uk/corporate/about-tfl/what-we-

do/london-underground

[2] Hickey, Samuel Warren (2011). “Improving the Estimation of Platform Wait Times of

the London Underground”, Massachusetts Institute of Technology Library.

[3] Wolmar, Christian (2004). The Subterranean Railway: how the London Underground

was built and how it changed the city forever. Atlantic.

[4] Doogal Co.UK, http://www.doogal.co.uk/london_stations.php

[5] Business 2 Community, http://www.business2community.com/travel-leisure/5-

business-districts-to-work-in-london-0261885

[6] Verstraete, Jacques. http://www.math.ucsd.edu/~jverstra/154-part5-2014.pdf

http://www.tfl.gov.uk/corporate/about-tfl/what-we-do/london-underground

http://www.tfl.gov.uk/corporate/about-tfl/what-we-do/london-underground

http://www.doogal.co.uk/london_stations.php

http://www.business2community.com/travel-leisure/5-business-districts-to-work-in-london-0261885

http://www.business2community.com/travel-leisure/5-business-districts-to-work-in-london-0261885

http://www.math.ucsd.edu/~jverstra/154-part5-2014.pdf

flow capacity of the london underground

Documents