precise estimation of connections of metro passengers from …polytope.snu.ac.kr/papers/precise...
TRANSCRIPT
Precise estimation of connections of metro passengersfrom Smart Card data
Sung-Pil Hong1 • Yun-Hong Min1,3 • Myoung-Ju Park1,4 •
Kyung Min Kim1,2• Suk Mun Oh2
Published online: 1 April 2015� Springer Science+Business Media New York 2015
Abstract The aim of this study is to estimate both the physical and schedule-based
connections of metro passengers from their entry and exit times at the gates and the
stations, a data set available from Smart Card transactions in a majority of train networks.
By examining the Smart Card data, we will observe a set of transit behaviors of metro
passengers, which is manifested by the time intervals that identifies the boarding, trans-
ferring, or alighting train at a station. The authenticity of the time intervals is ensured by
separating a set of passengers whose trip has a unique connection that is predominantly
better by all respects than any alternative connection. Since the connections of such pas-
sengers, known as reference passengers, can be readily determined and hence their gate
times and stations can be used to derive reliable time intervals. To detect an unknown path
of a passenger, the proposed method checks, for each alternative connection, if it admits a
sequence of boarding, middle train(s), and alighting trains, whose time intervals are all
consistent with the gate times and stations of the passenger, a necessary condition of a true
connection. Tested on weekly 32 million trips, the proposed method detected unique
connections satisfying the necessary condition, which are, therefore, most likely true
physical and schedule-based connections in 92.6 and 83.4 %, respectively, of the cases.
Keywords Physical and schedule-based connection estimation � Smart Card data � Metro
network � Passenger’s behaviors
& Kyung Min [email protected]
1 Department of Industrial Engineering, Seoul National University, San 56-1 Shilim-dong, Kwanahk-gu, Seoul 151-742, South Korea
2 Present Address: Policy-Technology Convergence Research Division, Korea Railroad ResearchInstitute, 360-1 Woulam-dong, Uiwang-city, Geonggi-do 437-757, South Korea
3 Present Address: Intelligence Computing Laboratory, Samsung Electronics Co. Ltd., Suwon City,South Korea
4 Present Address: Department of Industrial Engineering and Management Systems Engineering,Kyung Hee University, Yongin City, South Korea
123
Transportation (2016) 43:749–769DOI 10.1007/s11116-015-9617-y
Introduction
Since the advent of the Smart Card Automated Fare Collection System, or Smart Card in
short, a major issue for transit planners has been how to fully realize the potential of the
massive transaction data (Pelletier et al. 2011). One possibility is that a Smart Card data
analysis culminates in providing an exact estimation of the physical and schedule-based
connections of each and every card holder’s trip. Archived information on the complete
train choices of passengers has numerous applications in planning and operation of a public
transit network (Seaborn et al. 2009; Trepanier et al. 2007). Typical examples include:
1. Empirical evaluation of a transit assignment model (Lam and Lo 2004; Kato et al.
2010; Raveau et al. 2011). Most transit assignment models in the literature have
verified their validity based on the train choice data from the surveyed passengers. It
normally requires a costly procedure to secure a large enough sample to guarantee a
level of accuracy.
2. Clarification of revenue collected by different train operating companies (Rinks 1986;
Tsamboulas and Antoniou 2006). A public transit network may evolve into an
integration of sub-networks of different operating companies. The lines are then
intertwined so that a card holder may carry out a trip with a minimal card transactions
with the system. This leaves the difficulty in allocating the fare revenue collected from
passengers among the operating companies. There appears an agreement that the real
data on the ridership (e.g. in person � kilometers) of each sub-network is essential in
any clarification method.
3. Estimation of connection cost (Nour et al. 2010; Guo and Wilson 2009). In principle, a
transit assignment contemplates an equilibrium attained by a user-optimal behavior
pursuing a minimum cost connection for a trip. An intensive real path choice from
Smart Card data is expected to provide us a firm empirical ground of the user-optimal
behavior studies.
Among the data fields of a Smart Card, the quadruple, (Departure station O, Entry time at
gate, Arrival station D and Exit time at gate) is available in a majority of train networks. It
is evident that this is also a minimal set of data required for a precise estimation of
passengers’ connections. Conversely, the quadruple appears a maximal data set that may
well be expected from the Smart Card data. For a modern day metro there is a demand to
accommodate a passenger’s trip with a minimal transaction with the fare collection system.
The privacy issue also makes difficult implementation a system that monitors more traces
of individual trips. The quadruple, therefore, seems most reasonable data available from a
Smart Card system.
To estimate a schedule-based connection of a passenger, we simply need to identify the
sequence of trains chosen for his/her O–D trip. Yet, it appears to be a nontrivial task to
trace the sequence even if the quadruples are known. A natural way to use the quadruples
might be to compute the inter-gate time of a passenger, the interval between entry and exit
times at the origin O and destination D, respectively, to assign him/her to a connection of
the closest mean inter-gate time.
The inter-gate time of a passenger, however, has a large variance because it has, as its
components, dismount movement-times, namely gate-to -platform, transfer, and platform-
to-gate times. In the metro network in Seoul, for instance, not rare is an O–D having
alternative connections with similar inter-gate times. The method, thus, may err seriously.
For instance, the Shillim–Garak Market Station pair has two alternative physical con-
nections, one via the station, SeoulNat’lUnivEducation, and the other via Jamsil. The mean
750 Transportation (2016) 43:749–769
123
and the standard deviation of inter-gate time on former connection are, respectively,
2936.7 and 189.1 s, while on the latter, 2923.5 and 239.2 s. If we assume, for simplicity,
the inter-gate times have a Gaussian distribution, 43.7 % of passenger of the first physical
connection has an inter-gate time closer to the mean gate-time of the second physical
connection. However, as will be shown in this paper, the quadruples, as a whole, provide
crucial information on the connection choice of passengers.
This paper is organized as follows. The section continues with a summary of the
previous estimation methods relying on the Smart Card data. In ‘‘Principles of proposed
method’’ section, we identify a set of transit behaviors and a special class of metro
passengers that enable us to develop a consistency condition that a true connection nec-
essarily satisfies. In ‘‘The algorithm’’ section, we develop an algorithm to detect a con-
nection of a passenger based on the consistency condition and illustrate some possible
cases. ‘‘Connection estimation results’’ section reports on an empirical evaluation of the
method applied to 32 million weekly O–D trips in the Seoul metropolitan area, from
Sunday, November 20 to Saturday, November 26, 2011. Finally, some concluding remarks
are provided in ‘‘Conclusion’’ section.
Literature review
The potential of a substantial and detailed collection of Smart Card data in the public
transit system has attracted attention of researchers since the turn of the millennium. See,
for example, (Lehtonen et al. 2002; Bagchi and White 2005). Consequently various be-
havioral analyses of Smart Card data mostly in countries that adopted the system in the
early period, have emerged (Asakura et al. 2008; Guo and Wilson 2009; Jang 2010;
Kusakabe et al. 2010; Morency et al. 2007; Park et al. 2008; Seaborn 2008; Seaborn et al.
2009; Trepanier et al. 2007; Utsunomiya et al. 2006). [For a more comprehensive literature
review on Smart Card data used in public transit systems, readers are referred to Pelletier
et al. (2011)].
Most of these studies, with a few exceptions, are based on statistical analyses of Smart
Card transaction data. However, there has been much emphasis on the importance of
deterministic and detailed information of travel behavior, such as the length of of indi-
vidual trips Bagchi and White (2005).
In reality, exact and complete estimates of the connections of individual passengers
would have led to a more beneficial analysis in many studies (Utsunomiya et al. 2006;
Morency et al. 2007; Park et al. 2008; Seaborn 2008; Asakura et al. 2008; Guo and Wilson
2009; Jang 2010; Raveau et al. 2011). However, such estimates could not be achieved
solely from a statistical behavioral study. In particular, recent studies (Trepanier et al.
2007; Seaborn et al. 2009; Kusakabe et al. 2010) for the purpose of marketing or transit
planning have aimed at more specific estimates of passengers’ connections.
Trepanier et al. (2007) proposed a method of estimating, from Smart Card data and bus
routes, the alighting stops of individual passengers in Gatineau, Quebec, Canada. For
buses, the locations were recorded in Smart Card data at boarding, but not necessarily on
alighting. Suppose a passenger travelled on two buses A and B consecutively in a day. They
reasoned that the alighting stop from bus A was the closest to the boarding location of bus
B. When a passenger rides only a single bus A in a day, they identify the bus, say, A0, thatthe passenger will ride in the near future and apply the same reasoning to A and A0 forestimating the passenger’s alighting stop from A.
In a multi-modal public transit network, it is nontrivial to decide if two consecutive legs
of a single passenger journey are actually the connections of a single trip. In Seaborn et al.
Transportation (2016) 43:749–769 751
123
(2009), they considered the transfers between buses and between bus and train (but not
between trains, because a record on connecting trains was not available from Smart Card
data). They proposed to consider them as the connections of a single trip if the time interval
between the alighting from the first vehicle and the boarding of the second vehicle was less
than a specific threshold value. To determine the threshold value, they applied the sta-
tistical method proposed in Seaborn (2008) to the time intervals whose alighting and
boarding locations were adequately close. The same method was also used in Jang (2010)
to determine a threshold value for the transit network in Seoul, Korea.
The work Kusakabe et al. (2010) is closely related to ours in that it was aimed at
estimating the choices for passengers from express, rapid and local trains for the same O–D
trip. The proposed method uses Smart Card data, railway topology, and the train timetable
on the premise that trains operate exactly to the timetable. The time expanded network was
then derived from the timetable. A key assumption was that the passengers always choose
the shortest connection in transit time and number of transfers. Given the entry and exit
times at gates for a passenger from Smart Card data, their algorithm determines the most
probable boarding and alighting trains. They then return a shortest connection consistent
with the two trains as the passenger’s connection. A tie is broken, if possible, by choosing
the connection for a minimum number of transfers.
Zhou and Xu (2012) performed a case study to estimate the schedule-based connections of
passengers based on their entry and exit times and the train log, namely the real data on arrival
and departure times of trains. It is based on the assumption that the passengers minimize a
surplus time, the time interval between the earliest arrival of a passenger at the platform and the
departure of the first train since then. Table 1 summarizes these studies and ours.
Smart Card data
Since its introduction in 2000, the Smart Card has quickly become the predominant method
of payment for the metro network in the Seoul metropolitan area. By 2005, 72 % of metro
Table 1 Literature summary
Mode Object ofestimation
Location Required informationother than Smart Carddata
Assumptions
Trepanieret al.(2007)
Bus Alightinglocations
Gatineau,Quebec,Canada
Bus route People ride buses at stopsclosest to preceding alightinglocations
Kusakabeet al.(2010)
Railway
Trainchoices
Osaka,Japan
Train log, Railwaynetwork topology
People choose shortestconnections
Zhou andXu(2012)
Metro Trainchoices
Beijing,China
Train log, Railwaynetwork topology
People choose train tominimize a surplus time
Fu et al.(2014)
Metro Trainchoices
London,UK
Railway networktopology
Transit time distribution ofeach route follows Gaussianor lognormal
Seabornet al.(2009)
Bus ?Train
Linkingtrips
London,UK
Bus route, GPS forstations and stops
Threshold value on transfertime
Ours Metro Trainchoices
Seoul,Korea
Train log, Railwaynetwork topology
People do not route longer by30 min. or more
752 Transportation (2016) 43:749–769
123
passengers used a Smart Card, with approximately 20 million transactions per day (Park
et al. 2008). In 2011, Smart Card became the only payment method.
Our connection estimating method in ‘‘The algorithm’’ section assumes the quadruple
(Departure station O, Entry time at gate, Arrival station D, Exit time at gate) for each O–D
trip of a metro passenger. For trains, this appears to be the case in most Smart Card systems
as shown in Table 2, although in some systems, for instance in Chicago, the pair (Arrival
station D, Exit time at gate) was not retained in the data.
Principles of proposed method
The schedule-based connection estimating problem can be posed formally as follows.
Definition 2.1 Section, physical and schedule-based connections By a section, we mean
the physical metro line between two adjacent stations inclusively. By a physical connec-
tion, in turn, we mean the concatenation of sections that a passenger passes during her/his
O–D trip. Then a schedule-based connection is defined to be a sequence of trains that a
passenger can take on a physical connection.
For example, the arc between Isu and Dongjak stations of the metro network illustrated
in Fig. 1, is a section. We can see that the physical connection from Bongcheon to Dongjak
consists of five sections. And a train from Bongcheon to Sadang and its connecting train at
Sadang to Dongjak constitutes a schedule-based connection of Bongcheon-Dongjak pair.
Note that once the schedule-based connection has been estimated, the physical connection
Table 2 Comparison of data fields for the Smart Card in cities.
Fields Seoul,Korea
London,UKSeabornet al.(2009)
Quebec*,CanadaTrepanieret al.(2007)
Osaka**,Japan,Kusakabeet al.(2010)
Chicago, USUtsunomiyaet al. (2006)
SanFrancisco,USUtsunomiyaet al. (2006)
WashingtonD.C, USUtsunomiyaet al. (2006)
Route ID Busesonly
Busesonly
Buses – Buses andtrains
Buses only Buses only
VehicleID
Busesonly
Busesonly
Buses – – Buses only Buses only
Boardingtime
Busesandtrains
Busesandtrains
Buses Trains Trains only Buses andtrains
Buses andtrains
Boardinglocation
Busesandtrains
Busesandtrains
Buses Trains Trains only Buses andtrains
Buses andtrains
Alightingtime
Busesandtrains
Trainsonly
– Trains – Buses andtrains
Buses andtrains
Alightinglocation
Busesandtrains
Trainsonly
– Trains – Buses andtrains
Buses andtrains
* No metro in Quebec, ** Smart Card in Osaka is for trains only
Transportation (2016) 43:749–769 753
123
is immediate. Also, by a train log, we mean the set of records indicating the arrival and
departure time of each train at each station.
Problem 2.2 Schedule-Based Connection Estimation Problem.
Input: The topology of the physical metro network, the set of passengers over a prede-
termined time horizon along with their quadruples, q ¼ (Departure station O, Entry time at
gate, Arrival station D, Exit time at gate), and the real arrival and departure times of trains
from a train log.
Output: The schedule-based connection of each passenger for his/her O–D trip, namely,
the complete sequence of boarding, transfer, and alighting trains, that was chosen for his/
her O–D trip in the metro network.
Generating tentative physical connections for O–D trips
The proposed method constructs a prior set of physical connections for each O–D trip by
excluding irrational connections. A passenger travelling from Station O to Station D
chooses a connection of the least cost. If other conditions are equivalent, the cost is an
increasing function of each of the number of transfers, travel times and the level of
congestion [e.g. Bureau of Public Roads (1964), De Cea and Fernandez (1993), Nielsen
(2000)]. According to a Shin et al. (2007), trips involving three or more transfers between
lines accounts for 1.5 % of total trips in the Seoul metropolitan area. We first exclude such
trips from the consideration.
For each O–D pair, we enumerate every possible physical connections and group them
to the numbers of transfers, n ¼ 0; 1, and 2. From each group, we discard the connections
that were longer than a shortest one by k sections or more. Thus we assume that people do
not route longer by some threshold value. We set k ¼ 10 which is equivalent to 31 min in
in-transit time. This is motivated by that the mean in-transit time of a passenger is 30 min
Fig. 1 O–D pairs with a unique tentative physical connection
754 Transportation (2016) 43:749–769
123
in Seoul metropolitan area. From a case analysis, indeed, the number of trips with an inter-
gate time longer than the minimum by more than 30 min is insignificantly small.
Then we perform an inter-group comparison; every physical connection is removed if it
has 10 more sections than an alternative physical connection with fewer transfers. The
connections left are defined to be the tentative physical connections of the O–D pair. In the
case of the Seoul metropolitan area, the process generated 3.9 tentative connections on
average for an O–D pair.
Reference passengers
In the above process, we found that the passengers that are prevalent have a unique
tentative physical connection. Consider, for instance, a Bongcheon-Isu trip on the metro
network in Fig. 1. It requires at least one transfer since the stations are on different lines.
Two physical connections are possible: one via Shindorim and Seoul Station, counter-
clockwise, the other via Sadang clockwise. The former requires two transfers and 17
sections while the latter one transfer and 3 sections. The process return, therefore, as a
unique tentative physical connection for a Bongcheon-Isu trip. A similar argument is
possible for the pairs, BongCheon-Dongjak, SeoulNat’lUniv-Isu, SeoulNat’lUniv-Dongjak
and so on.
About 47 % of the daily passengers on the metro-network in the Seoul metropolitan
area turned out to have a unique tentative physical connection.
Definition 2.3 Reference passengers By a reference passenger, we mean a passenger
whose O–D trip has a unique tentative physical connection.
Reference passengers, as their connections are guaranteed, play a crucial role in the
proposed connection estimation method.
Alighting and boarding time intervals
The idea is best illustrated by an example. There were 203 trips of passengers from Shillim
to Gangnam station, initiated between 7 and 9 A.M. on November 21, 2011. The first
plotting in Fig. 2 shows the entry times of the passengers at the origin, Shillim. They
appear uniformly distributed as expected. However, the exit times at the destination of the
same passengers, Gangnam, show a spiky pattern distributed over a brief period of time.
It is a typical behavior of an alighting passengers to rush to a gate and accomplish exit
as soon as possible. The platform-to-gate time of each passenger is, thus, typically the
maximal speed of a passenger and hence has the characteristic of an extreme value
Fig. 2 Entry and exit times for the same set of Shillim–Gangnam passengers
Transportation (2016) 43:749–769 755
123
(Einmahl and Smeets 2011). In fact, according to Ko et al., the platform-to-gate time of an
alighting passenger is best fitted by the Frechet distribution, mostly used for fitting extreme
values. Figure 3 shows the relative frequency of the platform-to-gate time of the alighting
passengers at Gangnam station from 5:30 to 11:00 A.M., November 21, 2011, which has
been fitted with Gamma, Inverse Gaussian and Frechet distributions. The Frechet distri-
bution is the best fit.
Definition 2.4 Alighting groups and time intervals By an alighting group, AGðX;NÞ, wemean the set of passengers that alight from the train X at their common destination N
(regardless of their origins). An alighting time interval is the time interval between the first
and last exit times in an alighting group.
The extreme value characteristic of the platform-to-gate times renders an alighting time
interval substantially smaller than an interarrival times of trains at a station and hence
disjointed. In the Seoul metropolitan area, the smallest headway in peak hours was 3.5 min
while the platform-to-gate times are 1.9 and 1.0, the mean and standard deviation.
Definition 2.5 Boarding groups By a boarding group, BGðX;NÞ, we mean the set of
passengers that board the same train X at their common origin N (regardless of their
destinations).
Unlike the alighting case, the boarding behaviors of metro passengers do not present
disjoined time intervals. However, the first-come-first-served queue discipline is well-
observed in boarding and the order on entry is maintained. It transpires in the metro
network of Seoul metropolitan area that at most two consecutive time intervals of boarding
groups may overlap. To see this, consider Fig. 4, a 2-dimensional plot of the entry and exit
times at a gate, of Shillim–Gangnam passenger groups, called an entry-exit map originated
from Kusakabe et al. (2010), where the x-axis represents the entry time and y-axis the exit
time of a passenger. From the figure, the passenger group of each Shillim–Gangnam train
is identified by the rectangle of boarding and alighting time intervals in the entry exit map.
Fig. 3 Platform-to-gate-time distribution at Gangnam station
756 Transportation (2016) 43:749–769
123
Furthermore, the disjointed alighting time intervals make the rectangles also disjointed, a
source of preciseness of the proposed method.
Suppose the alighting and boarding time intervals of AGðX;NÞ and BGðX;NÞ, re-spectively, are known to us for each train X and station N. Then, we can determine if X can
be the alighting or boarding train of a passenger at station N by checking if the exit or entry
time at N falls in the time intervals of AGðX;NÞ or BGðX;NÞ. To develop the consistency
check into a connection estimation method, we need first to estimate the time intervals of
trains at each station as in Fig. 4.
It is not however a trivial task to derive the time intervals solely by a plotting of the
quadruples of passengers. The passengers from in- and out-bound trains, for instance, may
happen to exit at the same gate with a proximity in time. Or, in a transfer station, the
alighting passengers from different lines may merge at a gate. The second key idea is to use
the reference passengers to derive the time intervals.
Estimation of alighting and boarding time intervals
Choose only the reference passengers whose physical connections involve no transfer.
Suppose his/her quadruple from Smart Card is q ¼ ðO; Entry time ¼t1;D; exit time ¼ t2Þ. We then consider the set P of trains that departed from O after t1and the set Q of trains that arrived at D prior to t2. If the two sets have only one common
train, say X, it should be the the alighting train of the reference passenger at D. In other
words, the passenger belongs to the alighting group AGðX;DÞ.Also notice that if either the gate-to-platform or platform-to-gate time is less than the
inter-arrival time of the trains, as in most of the real cases, P \ Q should be a singleton
(whose element is, of course, the train choice of q). Thus, we can identify the alighting
train of the reference passengers (if their trips are not delayed abnormally from gate to
platform or from platform to gate).
Thus by repeating the procedure to each of chosen reference passengers we can capture
a large subset fAGðX;NÞ of AGðX;NÞ for each train X and station N. Hence the time
interval fAGðX;NÞ offers a good estimate of that of the alighting time interval of AGðX;NÞ.
Fig. 4 Entry-exit map of the Shillim–Gangnam passengers
Transportation (2016) 43:749–769 757
123
Once fAGðX;NÞ has been constructed for each X and N, we derive an estimate fBGðX;NÞof the boarding group BGðX;NÞ in the following manner. Check every reference passenger
who departed at N via X. Put the passenger into fBGðX;NÞ if his/her exit times at D, the
destination, fall in the alighting time interval of fAGðX;DÞ.Similarly, the boarding time interval of BGðX;NÞ is then estimated using the time
interval of fBGðX;NÞ. As discussed earlier, unlike the alighting case, the time intervals may
overlap.
Another simple but important observation is that a reference passenger who made a
transfer in his/her trip is a verifier that a transfer has actually been made between the two
connecting trains he/she rode. For instance, in Fig. 1 a reference passenger from Bong-
cheon to Isu station certifies that there has been a transfer between the connecting trains he/
she used at the transfer station Sadang. This is very useful when estimating connection of a
passenger whose trip involves a transfer. From a possible list of connecting trains at a
transfer station, we can remove ones that have no verifier of an actual transfer.
Definition 2.6 Transfer reference passengers By the transfer reference passengers
RPðX; Y ;AÞ, we mean the set of reference passengers who transferred from Train X to Y at
Station A.
We now discuss how to find the transfer reference passengers RPðX; Y ;AÞ. Suppose X
and Y are, respectively, from Lines 1 and 2. We look up the list of the reference passengers
who transferred from Line 1 to Line 2 at A. Suppose the quadruple of an O–D passenger is
consistent with X and Y . Namely, his/her entry time at O on Line 1 falls into the time
interval of fBGðX;OÞ and exit time at D on Line 2 falls into the time interval of fAGðY ;DÞ.He/she is a proof that transfer has been made from X to Y at A and, thus, is added to
fRPðX; Y ;AÞ.
Estimation of time intervals from insufficient passengers
Obviously, the accuracy of time intervals depends on the the size of fAG or fBG. At the
stations in suburban areas in non-peak hours, the reference passengers may not be suffi-
cient to provide reliable time intervals. The issue can be resolved by aggregation of
alighting passengers of trains at each station. Under the assumption that the alighting
behavior of passengers is independent of the time of a day, it provides a sufficient col-
lection of platform-to-gate times for a reliable alighting time interval at each station.
The Garak Market station, a transfer station located at the intersection of Lines 3 and 8,
is scant in passenger traffic. The number of reference passengers per train at the station
varies from 1 to 30. We aggregate the reference passengers of the 142 inbound trains at the
station and fit their platform-to-gate times to a Frechet distribution. We then discard the
lowest 2.5 % and the highest 2.5 % as outliers with an excessive length of boarding or
alighting time. In our case this accounts for on average 1.59 passengers per train.
The range ½s; sþ L� of platform-to-gate times of the remaining passengers is then
defined as the standard alighting time interval at each station. The alighting time interval
of each train can be obtained simply by translating ½s; sþ L� to begin at the arrival time of
the train.
Figure 5 shows the resulted standard alighting time interval at Garak Market station.
Initially from the 142 inbound trains, there were 672 initial reference passengers from
which exclude are 19 passengers, 0.23 trips per train. The range is ½s; sþ L� ¼ ½28; 90 s�
758 Transportation (2016) 43:749–769
123
with the length L ¼ 62 s. The figure also shows the translation of the standard alighting
time interval to the arrival time, 08:19:04, of Train X. The resulting standard alighting time
interval ½08:19:32 ; 08:20:34 � of X is significantly larger than the time interval
½08:19:35 ; 08:20:01 � estimated from the reference passengers of X alone.
The standard boarding time intervals can be constructed analogously.
The algorithm
Given the quadruple q ¼ ðO; Entry time ¼ t1;D; Exit time ¼ t2Þ of a passenger, we
carry out the following steps for every tentative physical connection for O–D trip.
Suppose the physical connection, say P, requires no transfer. Then, we look up a train X
on P whose boarding and alighting time intervals contain t1 and t2, respectively. If none,
we reject P. Otherwise, we put P in the list of consistent physical connections of q along
with the train X, a single-train schedule-based connection on P.
Suppose P entails two transfers at stations, say, M and N. (We discuss this case only
since, then, the single-transfer case becomes obvious). We first construct the list of ten-
tative schedule-based connections for q on P, the list of sequences of trains S ¼X1 � X2 � X3 on P such that
1. The boarding interval of X1 at O contains t1, the alighting interval of X3 at D contains
t2, and
2. The arrival times of X1 and X2 are no later than the departure times of the following
trains, X2 and X3, respectively, at the transfer stations M and N.
Fig. 5 Standard alighting time interval at the Garak Market station on Line 8 inbound
Transportation (2016) 43:749–769 759
123
Note that this is a necessary condition that X1 � X2 � X3 can be a schedule-based con-
nection of the trip q on P. Then we loop up the transfer reference passengers RPðX1;X2;MÞand RPðX2;X3;NÞ. If both sets are nonempty, we return S as a consistent schedule-based
connection on P. We reject S, otherwise.
The algorithm returns P as the physical connection of q, only if P is the only physical
connection that admits a consistent schedule-based connection. Otherwise, namely, if there
is none or more than one such physical connections, the algorithm declares a failure to the
input quadruple q.
Initially, we apply the algorithm based on the standard time intervals in ‘‘Estimation of
time intervals from insufficient passengers’’ section. The passengers successfully returned
with a unique physical connection are added to the reference passenger set. Once we have
acquired sufficient reference passengers, we replace the standard time intervals with the
time intervals derived from the reference passengers of individual trains and repeat. The
algorithm can be summarized as in Fig. 6.
Note that there may be multiple consistent schedule-based connections even when a
unique physical connection is returned.
In our case, around 9 % of the trips were returned with more than one schedule-based
connections due to e.g. overlapping boarding time intervals and/or multiple connecting
trains. However, we can estimate the probability that each of the schedule-based con-
nections is the choice of passenger. The details are given in Appendix.
Illustration of actual estimation
The performance of the method is probably best understood by some actual cases of
estimation.
Unique physical and schedule-based connections
Figure 7 shows the trips of two passengers, say a and b who departed from Shillim station,
at 07:33:47 and 07:34:55 s, and arrived at Garak Market station, at 08:16:53 and 08:19:51
Fig. 6 The flow of algorithm
760 Transportation (2016) 43:749–769
123
s, respectively, on November 21, 2011: a = (Shillim, 07:33:47, Garak Market, 08:16:53)
and b = (Shillim, 07:34:55, Garak Market, 08:19:51).
There are two alternative physical connections: beginning at the origin, Shillim station,
both follow Line 2 outer-circle. However, one transfers at SeoulNat’lUnivEducation sta-
tion to line 3, the other at Jamsil station to line 8. The algorithm checks, for each passenger,
which physical connection has a logical connection, a sequence of trains all consistent with
his/her quadruple.
Consider a. On the physical connection, Shillim-SeoulNat’lUnivEducation-GarakMar-
ket, there is a unique train X1 whose boarding time interval contains the entry time of a. Of
the two trains, Y1 and Y2 that have been verified by transfer reference passengers to connect
X1 to Line 8 at SeoulNat’lUnivEducation, Y1 has an alighting time interval containing a’s
exit time at Garak Market station. Thus, Shillim-SeoulNat’lUnivEducation-GarakMarket is
added to the list of consistent physical connection of a along with the consistent schedule-
based connection X1 � Y1.
On the alternative physical connection, Shillim-Jamsil-GarakMarket, a should be as-
signed to the same tentative boarding train X1. However, neither of the two trains Z1 and Z2that connect X1 at Jamshil to Line 8 has an alighting time interval consistent with a’s exit
time at Garak Market station. Thus, Shillim-Jamshil-GarakMarket is rejected. Therefore,
the algorithm returns Shillim-SeoulNat’lUnivEducation-GarakMarket, as a unique physical
connection of a along with the unique schedule-based connection X1 � Y1.
Consider b. X2 is the only train whose boarding time is consistent with his/her entry time,
on the physical connection, Shillim-SeoulNat’lUnivEducation-GarakMarket. However, the
only verified train Y2 of X2 to Line 8 has alighting time interval inconsistent with b’s exit
time at the destination Garak Market. Thus, the physical connection is rejected for b.
Fig. 7 Schedule-based connection estimation of 2 Shillim–Garak Market trips
Transportation (2016) 43:749–769 761
123
On the physical connection, Shillim-Jamsil-GarakMarket, on the other hand, of the two
connecting trains Z2 and Z3 at Jamsil station, Z2 is has alighting time interval consistent
with b’s exit time as indicated in the figure. Thus, Shillim-Jamsil-GarakMarket is returned
as the physical connection for b, and X2 � Z2 is confirmed as the schedule-based
connection.
Analysis of failed cases
The algorithm fails when there are none or more than one physical connections consistent
with the quadruple of an input trip. Figure 8 illustrates the latter case. Consider a trip a =
(Janghanpyeong, 08:24:23, Sangsu, 09:01:28). There are two alternative physical con-
nections, I and II, that are comprised of the same line combination, Line 5 and 6, but
different transfer stations, Cheonggu and Gongdeok, respectively. The entry and exit times
times match with a unique schedule-based connection X1 � Y .
But, the transfer from X1 to Y are verified by a transfer reference passenger at both
transfer stations, Cheonggu and Gongdeok. Both the physical connections I and II are
consistent with the quadruple a and the method is failed. A failure due to multiple con-
sistent physical connections occurred more often when two or more physical connections
are distinct only in transfer station.
Connection estimation results
The metro network
The Seoul metropolitan area has 15 metro lines, 412 stations and 33,548 trains as operating
as of November 20, 2011. On these days of November 20 to 26, 2011, there were
Fig. 8 A case of failure: Indeterminate physical connection
762 Transportation (2016) 43:749–769
123
47,618,710 metro O–D trips. Of the possible O–D pairs, 904,897 pairs have nonzero traffic
and each carried 50 trips on average. In our study, we first excluded the trips involving 3
private lines, Metro 9, AREX(airport line), and DXLine, and one public line, the Incheon
City Line, because the train logs were entirely unavailable.
When the time interval between the entry and exit at a gate was the twice or more the
standard deviation off the mean, the trip was most likely voluntarily delayed. The number
of trips with such excessive inter-gate times was 1,571,417 which is 3.3 % of the total
trips. In addition, we have found that the actual record can be delayed after card tagging at
a gate because of a disruption in the communication network. Those abnormal trips,
delayed voluntarily or in tagging, were excluded from our data set. Finally, simply for an
accuracy, we ruled out the senior and handicapped citizens that have inter-gate times 6:7%longer than others Overall, the estimation algorithm was applied to 32,419,106 O–D trips
as summarized in Table 3.
The success rates
Table 4 summarizes the rate at which the method returns a unique physical connection to
the possible numbers of transfers required by the tentative physical connections of an O–D
trip. As indicated in the first column of Table 4, 51.3 % of the trips have only the tentative
physical connections with no transfer, and 26:7% the tentative physical connections re-
quiring two transfers, etc.
From the table, the success rate gets lower when there is alternative physical connection
requiring two transfers. Overall, the success rates were 92.6 and 83.4 %, respectively, for
the physical and schedule-based connections.
Table 3 Summary of trips se-lected for our experiment
Num. of trips Ratio (%)
Estimated trips 32,419,106 68.1
Excluded trips from 4 metro lines 5,176,440 10.9
Abnormal trips 5,625,075 11.8
Senior and handicapped citizens 4,398,089 9.2
Total 47,618,710 –
Table 4 Success rate for each combination of the numbers of transfers in physical connections
Num.transfers
Unique physicalconnection (%)
Unique schedule-basedconnection (%)
0 Transfer(%)
1 Transfer(%)
2 Transfers(%)
0 (51.3 %) 99.9 95.0 100 – –
1 (26.7 %) 94.5 80.6 – 100 –
2 (1.4 %) 72.6 51.9 – – 100
1, 2 (14.4 %) 67.6 54.0 – 56.6 43.4
0, 2 (4.2 %) 82.7 75.4 87.9 – 12.1
0, 1 (1.6 %) 84.1 78.0 70.2 29.8 –
0, 1, 2 (0.5 %) 75.3 68.0 69.4 20.1 10.5
92.6 83.4 60.8 32.6 6.6
Transportation (2016) 43:749–769 763
123
Consistency of train choice of metro passengers
We first probe the central assumption of transit behavior studies: do the metro passengers
make a rational train choice? To do so, we rely on the analysis of Cronbach (1951) to check
the consistency of the metro choice of passengers of an O–D.
We performed connection estimation for an additional day, Monday, March 19, 2012 to
be compared with Monday, November 21, 2011. We selected the 1513 O–D’s whose daily
traffics are no less than 100 trips in both days and which has more than one alternative
physical connections. The horizontal axis in Fig. 9 indicate the 3897 physical connections
while the vertical axis the proportion of its O–D passenger having chosen it on November
21, 2011. The same plot is done for March 19, 2012, maintaining the order of physical
connections, but exchanging the axes about the diagonal.
Obviously, if the train choice of passengers is consistent, the plotting should exhibit a
concentration of dots around the 45� diagonal, which is the case in the figure. In fact, the
Pearson’s correlation coefficient was very high, namely, 0.94. A paired-comparison T-test
accepted the null hypothesis that the train choices of passengers for their O–D trips are not
different on the 2 days. The Cronbach’s a was also 0.974. In any statistical sense, pas-
sengers indeed make an identical choice over the two Mondays. We extended the test over
the 5 days of the week, November 20 to 26, 2011 and we obtained a similar result.
Passenger flow on the time-expanded network
As algorithm returns the schedule-based connection for each and every trip, we can derive
the complete passenger flow on the time-expanded network. Figure 10 shows the passenger
flow, e.g., on the logical network time-expanded around Daerim station, a transfer station
Fig. 9 Consistency of physical connection estimation
764 Transportation (2016) 43:749–769
123
of Lines 2 and 7, from 07:45 to 07:55 A.M., November 20, 2011. In the time interval, there
were 4 trains arriving from Line 2 inner-circle, denoted by X1;X2;X3, and X4, and 3 trains
from Line 7 inbound, Y1; Y2, and Y3. The train logs and the passenger traffics are sum-
marized in Table 5.
In the figure, indicated are the passenger flows associated with each train. Of the 421
passengers of Train X4 at Daerim station on Line 2 arriving from Shindorim station, 3
exited and 2 transferred to Line 7 outbound. To the remaining 416 passengers, 75 entering
passengers joined. Also 37 transfer passenger from Line 7 outbound, and 3, 38, and 6
transfer passengers from Trains Y1, Y2 and Y3 on Line 7 inbound in the order, are added.
The resulting 575 passengers departed to the GuroDigitalComplex station along Line 2
inner circle.
We can also derive the transfer times between connecting trains. For instance, the
transfer time for 3 passengers from Y1 to X4 was 446 seconds, the difference between the
departure time of X4 and the arrival time of Y1. Crowdedness in public transport is an
Fig. 10 Passenger flow on the time-expanded network at the Daerim intersection of Line 2 and 7 from07:45 to 07:55 A.M. on November 21, 2011
Table 5 The trains and their associated passenger flows at the Daerim intersection from 07:45 to 07:55A.M. on November 21, 2011
Line Train Arrivaltime
Departuretime
Arriv.passen.
Alightingpassengers
Boardingpassengers
Depart.passen.
Exitingpassen.
Transfer-to
Entrypassen.
Transfer-from
Line 2(Innercircle)
X1 07:44:12 07:44:50 1399 54 91 63 55 1372
X2 07:45:59 07:47:13 470 1 0 88 61 618
X3 07:49:35 07:50:27 1277 55 55 91 91 1349
X4 07:51:33 07:52:42 421 3 2 75 84 575
Line 7(In-bound)
Y1 07:45:16 07:45:48 529 13 115 51 21 473
Y2 07:46:48 07:48:44 765 20 132 52 30 695
Y3 07:50:39 07:51:17 620 17 125 45 24 547
Transportation (2016) 43:749–769 765
123
important factor for the level of service (Weidmann et al. 2012; Cox et al. 2006). The
passenger flows on the time-expanded network provide us with the exact load on each train
which is, we believe, the most important data in a study on how crowdedness affects the
train choice of passengers.
Conclusion
First, we studied a set of behaviors of metro passengers by examining the gate times from
the Smart Card data, which produce time intervals precise enough to identify the pas-
sengers boarding, transferring, and alighting of trains based on the entry and exit times and
stations of a passenger.
1. The platform-to-gate time of an alighting passenger has the spiky characteristic of an
extremal value; the exit times at a gate of the passengers from the same train are
distributed over a very brief period of time. The time intervals of trains are disjointed.
2. The boarding behavior of metro passengers, however, is devoid of such disjointed time
intervals. However, the first-come-first-served queue discipline is observed well
enough to allow us to derive useful time intervals of boarding groups.
Second, we recognized and separated the class of passengers who have a unique pre-
dominant connection for a trip. Such passengers, more prevalent than expected, not only
provide us reliable estimates of the time intervals but also bear witness to an actual transfer
between trains from lines intersecting at a transfer station.
Third, we propose a connection estimation algorithm checking consistency of the time
intervals of trains in a tentative connection with the gate times and stations of a passenger,
which necessarily holds when the connection is an actual choice of a trip.
The proposed algorithm is applied to 32 million trips from Smart Card data collected in
the Seoul metropolitan area on the week, from Sunday, November 20 to Saturday,
November 26, 2011. As a result, our method could determine a unique physical connec-
tions in 92 % of the trips. The result shows a consistent physical connection choice over
the 5 weekdays.
Acknowledgments This research was supported in part by Basic Science Research Program(2014R1A2A1A11049663) through the National Research Foundation of Korea (NRF), and by the BK21Plus Program(Center for Sustainable and Innovative Industrial Systems) funded by the Ministry ofEducation, Korea.
Appendix
Probability estimation of schedule-based connections
Suppose the current physical connection requires a single transfer, say, at Station A. The
schedule-based connections on a physical connection can be represented by a time-ex-
panded network as in Fig. 11.
The consistency check is initiated by finding consistent trains at both O and D. By this
assumption, there can be at most two trains, say X1 and X2, at O, whose time intervals
contain the entry time, while at most one train, say Y , can be consistent with the exit time at
766 Transportation (2016) 43:749–769
123
D. If there are no such trains at either O or D, the passenger did not use the physical
connection.
If neither X1 and X2 can be connected to Y , in the sense that there is no relevant transfer
reference passenger, we conclude that the passenger did not use the physical connection.
If there is only one such train, say X1, whose connection to Y can be verified by transfer
reference passengers, then the schedule-based connection, X1 � Y is confirmed as the
unique connection of the passenger.
Finally, if there are two trains, say X1 and X2, from both of which we can find transfer
reference passengers to Y as in Fig. 11, we need to return both X1 � Y and X2 � Y . It is a
worst case in that the maximum number of schedule-based connections are confirmed as
consistent connections.
The estimation, however, can be refined by a probability distribution over the two
connections. In Fig. 11, we introduce some notations as follows:
• p: The fraction of the boarding reference passengers from the overlap of the two time
intervals that boarded train X1
• 1� p: The fraction of the boarding reference passengers from the overlap of the two
time intervals that boarded train not X1 but X2
• 1� q1: The fraction of the transfer reference passenger from X1 to Y
• q2: The fraction of the transfer reference passenger from X2 to Y
It is not then difficult to show that
Pr Passenger chose X1 � Yf g ¼ pð1� q1Þpð1� q1Þ þ ð1� pÞq2
;
Pr Passenger chose X2 � Yf g ¼ ð1� pÞq2pð1� q1Þ þ ð1� pÞq2
:
ð1Þ
Table 6 summarizes the numbers and list of consistent schedule-based connection(s),
the corresponding conditions, and the probability distributions. If none of the conditions
Fig. 11 Two schedule-based connections can be consistent
Transportation (2016) 43:749–769 767
123
from Table 6 is satisfied, no schedule-based connection can be consistent with the
quadruple of our passenger and hence the physical connection is rejected.
For a physical connection that requires two transfers, there may be up to 3 schedule-
based connections consistent with a quadruple if the trip is not abnormally delayed. The
previous arguments can be easily extended to such a case.
References
Asakura, Y., Iryo, T., Nakajima, Y., Kusakabe, T., Takagi, Y., Kashiwadani, M.: Behavioural analysis ofrailway passengers using smart card data. In: Proceedings of the Urban Transport, pp. 599–608. Malta(2008)
Bagchi, M., White, P.R.: The potential of public transport smart card data. Transp. Policy 12(5), 464–474(2005)
Bureau of Public Roads: Traffic Assignment Manual. U.S, Department of Commerce (1964)Cox, T., Houdmont, J., Griffiths, A.: Rail passenger crowding, stress, health and safety in Britain. Transp.
Res. Part A 40, 244–258 (2006)Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)De Cea, J., Fernandez, J.E.: Transit assignment for congested public tranport system: an equilibrium model.
Transp. Sci. 27(2), 133–147 (1993)Einmahl, J.H.J., Smeets, S.G.W.R.: Ultimate 100 m world records through through extreme-value theory.
Stat. Neerl. 65(1), 32–42 (2011)Fu, Q., Liu, R., Hess, S.: A bayesian modelling framework for individual passenger’s probabilistic route
choices: a case study on the London underground. In: 93rd Transportation Research Board (TRB)Annual Meeting (2014)
Guo, Z., Wilson, N.: Transfer behavior and transfer planning in public transport systems: a case of theLondon underground. In: Proceedings of the 11th International Conference on Advanced Systems forPublic Transport, Hong Kong (2009)
Jang, W.: Travel time and transfer analysis using transit smart card data. Transp. Res. Rec. 2144, 142–149(2010)
Kato, H., Kaneko, Y., Inoue, M.: Comparative analysis of transit assignment: evidence from urban railwaysystem in the Tokyo metropolitan area. Transportation 37, 775–799 (2010)
Ko, S.-J., Kim, K.M., Hong, S.-P.: Estimation of transfer times and alighting times of the metro passengersin Seoul metropolitan area. Working paper
Kusakabe, T., Iryo, T., Asakura, Y.: Estimation method for railway passengers’ train choice behaviour withsmart card transaction data. Transportation 37, 731–749 (2010)
Lam, W.H.K., Lo, H.K.: Traffic assignment methods. In: Hensher, D.A., Button, K.J., Haynes, K.E.,Stopher, P.R. (eds.) Handbook of Transport Geography and Spatial Systems, pp. 609–625 (2004)
Lehtonen, M., Rosenberg, M., Rasanen, J., Sirkia, A.: Utilization of the smart card payment system (scps)data in public tranport planning and statistics. In: Proceedings of the 9th World Congress on IntelligentTransport Systems, Chicago, Illinois, 14–17 October 2002
Morency, C., Trepanier, M., Agard, B.: Measuring transit use variability with smart-card data. Transp.Policy 14(3), 193–203 (2007)
Table 6 Numbers and lists of consistent schedule-based connection(s), the corresponding conditions, andthe probability distributions for a single-transfer physical connection
No. Consistent connection(s) Conditions Probability
1 X1 � Y 0\p\1 q1\1 q2 ¼ 0 1
p ¼ 1 q1\1 –
X2 � Y p ¼ 0 – q2 [ 0 1
0\p\1 q1 ¼ 1 q2 [ 0
2 X1 � Y and X2 � Y 0\p\1 0\q1\1 q2 [ 0 pð1�q1Þpð1�q1Þþð1�pÞq2 and
ð1�pÞq2pð1�q1Þþð1�pÞq2
768 Transportation (2016) 43:749–769
123
Nielsen, O.A.: A stochastic transit assignment model considering differences in passengers utility functions.Transp. Res. Part B 34(5), 377–402 (2000)
Nour, A., Casello, J.M., Hellinga, B.: Anxiety-based formulation to estimate generalized cost of transittravel time. Transp. Res. Rec. 2143, 108–116 (2010)
Park, J.Y., Kim, D.-J., Lim, Y.: Use of smart card data to define public transit use in Seoul, South Korea.Transp. Res. Rec. 2063, 3–9 (2008)
Pelletier, M.-P., Trepanier, M., Morency, C.: Smart card data use in public transit: a literature review.Transp. Res. Part C 19, 557–568 (2011)
Raveau, S., Munoz, J.C., de Grange, L.: A topological route choice model for metro. Transp. Res. Part A 45,138–147 (2011)
Rinks, D.B.: Revenue allocation methods for integrated transit systems. Transp. Res. Part A 20(1), 39–50(1986)
Seaborn, C.: Application of smart card fare payment data to bus network planning in London. UK. MSthesis, Massachusetts Institute of Technology, Cambridge (2008)
Seaborn, C., Attanucci, J., Wilson, N.: Analyzing multimodal public transport journeys in London withsmart card fare payment data. Transp. Res. Rec. 2121, 55–62 (2009)
Shin, S.G., Cho, Y., Lee, C.: Integrated transit service evaluation methodologies using transportation carddata (In Korean). Technical Report 2007-R-09, Seoul Development Institute (2007)
Trepanier, M., Tranchant, N., Chapleau, R.: Individual trip destination estimation in a transit smart cardautomated fare collection system. J. Intell. Transp. Syst. 11(1), 1–14 (2007)
Tsamboulas, D.A., Antoniou, C.: Allocating revenues to public transit operators under an integrated faresystem. Transp. Res. Rec. 1986, 29–37 (2006)
Utsunomiya, M., Attanuchi, J., Wilson, N.H.: Potential uses of transit smart card registration and transactiondata to improve transit planning. Transp. Res. Rec. 1971, 119–126 (2006)
Weidmann, U., Orth, H., Dorbritz, R.: Development of measurement system for public transport perfor-mance. Transp. Res. Rec. 2274, 135–143 (2012)
Zhou, F., Xu, R.-H.: Model of passenger flow assignment for urban rail transit based on entry and exit timeconstraints. J. Transp. Res. Board 2284, 57–61 (2012)
Sung-Pil Hong is a professor at the Department of Industral Engineering, Seoul National University. One ofhis research interests is computing discrete choice equilibria via optimization. Since 2010 he has performedvarious studies of modeling and analyzing metro transits.
Yun-Hong Min is a research staff member at Samsung Advanced Institute for Technology (SAIT). Hereceived his Ph.D. degree on Industrial Engineering from Seoul National University in 2012, and he hasbeen working at SAIT since 2012. His main research concerns are equilibrium analysis, convexoptimization, and machine learning.
Myoung-Ju Park is an assistant professor at the Department of Industrial and Management SystemsEngineering, Kyung Hee University. He received his Ph.D. in Industrial Engineering from Seoul NationalUniversity in 2012. His current research is about combinatorial optimization, approximation algorithms, andscheduling.
Kyung Min Kim is a Ph.D. candidate at the Department of Industral Engineering, Seoul NationalUniversity and a senior researcher at the Korea Railroad Research Institute. His main research concerns arerailway planning, transit assignment, and travel behavioral analysis.
Suk Mun Oh is a principle researcher at the Korea Railroad Research Institute. He received Ph.D. inIndustrial Engineering from Korea University in 2010. He was involved in a number of studies on railwayoperation and policy since 1995.
Transportation (2016) 43:749–769 769
123