animatics

77
Model-based clustering for BSS usage mining, a case study with the velib’ system of Paris Etienne Côme 15/10/2012

Upload: ticien

Post on 13-Jan-2015

8.905 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: animatics

Model-based clustering for BSSusage mining,a case study with the velib’ system of Paris

Etienne Côme

15/10/2012

Page 2: animatics

Outline

Bike Sharing Systems (BSS)

What is fun with BSS ?

Relatively new systemsRapidly diffusing (EU and US nowadays, Hangzhou, ...)Important sucessesAbundant usage dataIn interesting and original forms :

I Origins / Destinations + timestampI Real-time stations balances

Interesting and new problematics

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 2 / 75

Page 3: animatics

Outline

Outline

1 IntroductionProblematicsUsage data : trips recordsVelib’ in few numbers and picturesTools and approach

2 Stations clustering using temporal usage profilesData representation : count time seriesGenerative model : naive Poisson mixtureAnalysis of the results on the Velib’ dataset

3 Latent Dirichlet Allocation (LDA) for trips activity recognitionData representation : dynamical O/D matricesGenerative model under LDAAnalysis of the results on the Velib’ dataset

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 3 / 75

Page 4: animatics

Introduction Problematics

Problematics

Operational objectives

Planning new systems : position, size of the stationsQuality of service : bikes re-dispatch,......

Mining objectives

Building predictive model of usageFinding spatio-temporal patternsBetter understanding of the usages...

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 4 / 75

Page 5: animatics

Introduction Usage data : trips records

Raw data

Trips data

departure time stampdeparture stationarrival time stamparrival stationtype of subscription

! Will be converted in contingency tables (i.e. tensors of counts)

Data sources

! Velib’, 2 monthOpen data : Barclays (Londre), Boston, ...

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 5 / 75

Page 6: animatics

Introduction Velib’ in few numbers and pictures

in few numbers

BSS size :

1200 stations≈ 40000 slots≈ 16000 bikes≈ 100 000 trips/day27% trips = day subscription73% trips = year subscription

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 6 / 75

Page 7: animatics

Introduction Velib’ in few numbers and pictures

Global behavior

Distances (Km)

Trip

s

20 000

0

40 000

60 000

80 000

100 000

0 5 10Duration (min)

20 000

0

40 000

60 000

80 000

100 000

120 000

140 000

0 20 40 60 80 100

Day subscriptionfree use limit

Year subscriptionfree use limit

FIG. 1: Histograms of trips lengths and durations

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 7 / 75

Page 8: animatics

Introduction Velib’ in few numbers and pictures

Temporal effects

Time

Trip

s

5 000

10 000

15 000

20 000

25 000

30 000

35 000

Monday Tuesday Wednesday Saturday SundayThursday Friday

Subscription :

Short

Long

FIG. 2: Number of Trips / hour (short / long subscriptions)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 8 / 75

Page 9: animatics

Introduction Velib’ in few numbers and pictures

Temporal effects

0

2 500

5 000

7 500

0 2 4 6 8 10 12 14 16 18 20 22

Hours

Aver

age

num

ber

of t

rips

FIG. 3: Number of trips in week day / en week-end

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 9 / 75

Page 10: animatics

Introduction Velib’ in few numbers and pictures

Spatial effects

FIG. 4: Incoming trips map [6h,7h] for week days

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 10 / 75

Page 11: animatics

Introduction Velib’ in few numbers and pictures

Spatial effects

Distance from the center ("Les Halles") in Km

Mea

n ac

tivi

ty /

hou

r

4

8

12

16

20

24

2 4 6 8 10

FIG. 5: Stations activities / distance to "Les Halles"

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 11 / 75

Page 12: animatics

Introduction Tools and approach

Approach, exploratory data analysis

General methodologie

Use clustering algorithms to find interesting patterns in the dataConfront the found clusters to the city geography and sociology⇒ Extract important factors that influence BSS system behavior.

2 developments :

1 Find clusters of stations with similar temporal usage pattern2 Find latent activities that govern the BSS system dynamics

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 12 / 75

Page 13: animatics

Introduction Tools and approach

Tools, model based clustering

General methodologie

Imagine a data generation process⇒ which include non-observed or latent variablesLatent variables can be discrete or continuous

Examples of latent variables

Species for flowersTopics for textsCommunities for graph vertices...

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 13 / 75

Page 14: animatics

Introduction Tools and approach

Generative approach

Clustering

Model-based clustering :

1 Draw the cluster of sample (i)2 Depending on the cluster draw the observed values of (i)

0 20 40-20-40-60-800

0.01

0.02

0.03

0.04

0.05

x

f(x)

FIG. 6: Example of 1D Gaussian mixture model

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 14 / 75

Page 15: animatics

Introduction Tools and approach

Data generation process

Graphical model representation

1. Draw the cluster of sample (i)

Zi ∼M(1,π)

⇒ π prior proportions of the clusters.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 15 / 75

Page 16: animatics

Introduction Tools and approach

Data generation process

Graphical model representation

2. Depending on the cluster draw the observed values of (i)

p(x|Zik = 1) = f (x; θk ), ∀k ∈ {1, . . . ,K}.

⇒ f can be tuned to exploit specificities of the problem.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 16 / 75

Page 17: animatics

Introduction Tools and approach

Model based clustering framework

Task and tools

Inferring the parameters :⇒ EM algorithm or Variational EM for complex models

Finding the clustering⇒ Byproducts of EMFixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75

Page 18: animatics

Introduction Tools and approach

Model based clustering framework

Task and tools

Inferring the parameters :⇒ EM algorithm or Variational EM for complex modelsFinding the clustering⇒ Byproducts of EM

Fixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75

Page 19: animatics

Introduction Tools and approach

Model based clustering framework

Task and tools

Inferring the parameters :⇒ EM algorithm or Variational EM for complex modelsFinding the clustering⇒ Byproducts of EMFixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75

Page 20: animatics

Stations clustering using temporal usage profiles

Stations clustering usingtemporal usage profiles

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 18 / 75

Page 21: animatics

Stations clustering using temporal usage profiles

Stations clustering using temporal usage profiles

Objectives :

Find groups of stations with similar temporal usage profilesTemporal usage profiles = incoming, outgoing activity / hourTaking into account the week-days /week-end discrepancyWith a model for counts dataCross the results with possible explanatory variables :population, employments, amenities, ...

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 19 / 75

Page 22: animatics

Stations clustering using temporal usage profiles Data representation : count time series

Data representation : count time series

Observed data :

X outsdt : # of bikes taken at station s during day d at hour t

X insdt : # of bikes returned at station s during day d at hour t

Xsd = (X insd1, . . . ,X

insd24,X

outsd1, . . . ,X

outsd24)

⇒ X tensor of size N × D × T .⇒ temporal behavior / stations.

Variables

Xsd (observed) : # of bike leaving/comingZs (latent) : cluster of station sWd (observed) : cluster of days (week / week-end)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 20 / 75

Page 23: animatics

Stations clustering using temporal usage profiles Generative model : naive Poisson mixture

Generative model : naive Poisson mixture

FIG. 7: Graphical model representation

Parameters, Θ

αs = stations attractivity effectsπ = (π1, . . . , πK ) cluster proportionsλ = (λklt ) temporal profiles of the clusters

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 21 / 75

Page 24: animatics

Stations clustering using temporal usage profiles Generative model : naive Poisson mixture

Generative model

Naive Poisson mixture

Zs ∼ M(1, π)

Xsd1 ⊥⊥ . . . ⊥⊥ XsdT | {Zsk = 1,Wdl = 1}Xsdt |{Zsk = 1,Wdl = 1} ∼ P(αsλklt )

Constraints

∑l,t

Dlλklt = DT ,∀k ∈ {1, . . . ,K},

with Dl number of day in cluster l .

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 22 / 75

Page 25: animatics

Stations clustering using temporal usage profiles Generative model : naive Poisson mixture

Parameters estimation, likelihood

Marginal likelihood

L(Θ; X) =∑

s

log

∑k

πk∏d ,t ,l

p(Xsdt ;αsλklt )Wdl

(1)

Completed likelihood

Lc(Θ; X,Z) =∑s,k

Zsk log

πk∏d ,t ,l

p(Xsdt ;αsλklt )Wdl

(2)

where Z is unknown.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 23 / 75

Page 26: animatics

Stations clustering using temporal usage profiles Generative model : naive Poisson mixture

EM algorithm⇒ Straightforward solution for parameters estimation EM :

E step

Conditional expectation of Lc given the current parameters

E[Lc(Θ,x,Z)|x,Θ(q)] =∑s,k

tsk log

πk∏d ,t ,l

p(xsdt ;αsλklt )Wdl

(3)

with tsk the posteriori probabilities :

tsk =π

(q)k∏

d ,t ,l p(xsdt ;α(q)s λ

(q)klt )Wdl∑

k π(q)k∏

d ,t ,l p(xsdt ;α(q)s λ

(q)klt )Wdl

(4)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 24 / 75

Page 27: animatics

Stations clustering using temporal usage profiles Generative model : naive Poisson mixture

EM algorithm⇒ Straightforward solution for parameters estimation EM :

M step

Maximization of the lower bound with respect to the parametersαs : mean station activity α̂s = 1

DT∑

d ,t Xsdt ,

πk : proportion of cluster k , π̂k = 1N∑

s tsk

λklt : activity of time frame t for cluster k , for week day or duringthe week-end (day cluster l)

λ̂klt =1∑

s tskαs∑

d Wdl

∑s,d

tskWdlXsdt (5)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 25 / 75

Page 28: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Results

Setting

One month of data (September)Number of clusters (K=8) set manually⇒ good trade off between interpretability and fit of the clustering

Outputs

Zs : station s clustersλk : temporal profile of cluster kαs : stations s attractivity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 26 / 75

Page 29: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Railway stations

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 27 / 75

Page 30: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Railway stations

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 28 / 75

Page 31: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Parks

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 29 / 75

Page 32: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Parks

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 30 / 75

Page 33: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Spare time, night

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 31 / 75

Page 34: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Spare time, night

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 32 / 75

Page 35: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Spare time, night and week-end

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 33 / 75

Page 36: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Spare time, night and week-end

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 34 / 75

Page 37: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Housing

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 35 / 75

Page 38: animatics

Inhabitants / ha

0200400600800

1 0001 200

Housing

Page 39: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Employment (1)

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 37 / 75

Page 40: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Employment (2)

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 38 / 75

Page 41: animatics

Jobs / ha0

5001 0001 5002 000

Employment (1 and 2)

Page 42: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Mixed usage

Hours

Activity

0

1

2

3

4

5

0

1

2

3

4

5

Week

0 5 10 15 20

Week-end

0 5 10 15 20

Departures

Arrivals

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 40 / 75

Page 43: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Mixed usage

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 41 / 75

Page 44: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Crossing with population/employments/services rates

hab/ha emp/ha serv/ha com/ha162 237 4.2 3.7

Spare time (1) 367 189 6.3 4.4Spare time (2) 261 322 7.7 6.9Parks 172 90 2 1.7Railway stations 209 206 2.4 1.8Housing 375 108 3.8 2.7Employment (1) 138 409 4.5 2.8Employment (2) 157 456 5.7 5.6Mixed usage 301 163 3.8 2.8

TAB. 1: Mean of each cluster with respect to population, employment,services and shops densities . Sources "Recensement 2008", "Basepermanente des équipements", Insee.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 42 / 75

Page 45: animatics

Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset

Conclusion on stations clustering

Discussion on the model

Model adapted to countsScaling factors for stations importantStations described by incoming and outgoing flow dynamicsTaking into account week-day week-end differences

Discussion on the results

Clusters are interpretablePopulation, employment and amenities densities are highlyexplanatory for the clustersTemporal profiles are also interpretable and informative

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 43 / 75

Page 46: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition

Latent Dirichlet Allocation(LDA),

for trips activity recognition

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 44 / 75

Page 47: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition

Objectives

Decompose, the trips into interpretable clusters⇒ look for stationarities and change points in the OD dynamicsLDA with documents = small bags of successive trips

Analyse the found clusters with respect to their :

Temporal positions, cyclesSpatial distribution of flowsSpatial distribution of incoming / outgoing flows per stations

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 45 / 75

Page 48: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices

Data representation : dynamical O/D matrices

Observed data :

Xijt : # of bikes that were1 taken at station i2 returned at station j3 at time t

t ∈ {1, . . . ,Nt} :

i , j ∈ {1, . . . ,Ns} : set of stations

⇒ Xijt tensor of dimension Ns × Ns × Nt .⇒ taking into account spatial and temporal BSS behavior

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 46 / 75

Page 49: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA

LDA, background

LDA = Latent Dirichlet Allocation

Bayesian mixture for discrete data⇒ originally to find topics in text corpusEach document (bag of words) is a mixture of topicsEach topic has its own words probabilities vector

FIG. 8: Graphical model representation of LDA.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 47 / 75

Page 50: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA

LDA for dynamical O/D matrices analysis

Hypothesis :

Local stationarity of BSS behaviour / ODCyclostationarity : week, day

Small bags of successive trips ≈ stationarity of OD

⇒ Documents (bags of words) = bags of successive trips (5000)

, with :

Words = Origin/Destination couplesTopics = Latent activities

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 48 / 75

Page 51: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA

LDA, for dynamical O/D matrices analysis

For each activity a, draw an O/D matrices generator :

Λa ∼ D(β)

For each "bag of trips" t ∈ {1, . . . ,Nt} :

1 Draw the activities proportions : πt ∼ D(α)

2 For each trips of the bag t :I Draw its activity A : A ∼M(1, πt )I Draw an O/D couple D using activity A generator :

D ∼M(1,ΛA)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 49 / 75

Page 52: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Fixing the number of activities

perplexity analysis

Perplexity = f( likelihood of test data )Clear drop off at K=5

●●

●●

155000

160000

165000

4 8 12K

perp

lexi

ty

FIG. 9: Perplexity on the September dataset with respect to the number oflatent activities.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 50 / 75

Page 53: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Temporal results : πt

0

3000

avril 11

trip

s/

hour

avril 18

6000

9000

avril 25

FIG. 10: Temporal evolution of πt

Remarks :

Cyclostationarity clearly visible (even holidays)Low mixture between the latent activitiesInterpretable temporal clusters : Home↔Work, Lunch,...

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 51 / 75

Page 54: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : Λa as flows

FIG. 11: Latent activity "House→Work commute", flows (blue for f=10/10 000)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 52 / 75

Page 55: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : Λa as flows

FIG. 12: Latent activity "Lunch", flows (blue for f=10/10 000)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 53 / 75

Page 56: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : Λa as flows

FIG. 13: Latent activity "Work→House commute", flows (blue for f=10/10 000)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 54 / 75

Page 57: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : Λa as flows

FIG. 14: Latent activity "Evening", flows (blue for f=10/10 000)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 55 / 75

Page 58: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : Λa as flows

FIG. 15: Latent activity "Spare time", flows (blue for f=10/10 000)

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 56 / 75

Page 59: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Incoming / Outgoing specificities, question :

Which stations have an increased in/out-degree for a latent activity a ?

Introduce stations incoming specificities ISas and outgoing

specificities OSas :

ISas = log(pina

s/pings ), OSa

s = log(poutas /poutg

s ), (6)

with pinas ,pouta

s the probabilities that a trips end/start in station sfor activity a :

pinas =

∑j

Λajs, pouta

s =∑

j

Λasj ,

and pings ,poutg

s the global probabilities that a trips end/start instation s :

pings =

∑j,t Xjst∑i,j,t Xijt

, poutgs =

∑j,t Xsjt∑i,j,t Xijt

.

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 57 / 75

Page 60: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : incoming specificities

FIG. 16: Latent activity "House→Work commute", stations incomingspecificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 58 / 75

Page 61: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : outgoing specificities

FIG. 17: Latent activity "House→Work commute", stations outgoing specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 59 / 75

Page 62: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Expected bike balance, question :

Positive/negative bike balance of stations for a latent activity a ?

The O/D matrix D follow a multinomial law of parameter Ndep(number of trips) and Λa :

D ∼M(Ndep,Λa),

The bike balance Bs for a station s is thus given by :

Bs =

Incoming bikes︷ ︸︸ ︷∑j

Djs −

Outgoing bikes︷ ︸︸ ︷∑j

Dsj

And the expectation of the balance vector B is thus equal to :

E[B] = Ndep((Λa)t − Λa)v, (7)

with v = (1, . . . ,1)t .

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 60 / 75

Page 63: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Spatial results : expected bike balance

-30-20-100102030

Balance

FIG. 18: Latent activity "House→Work commute", stations expected balanceswith Ndep = 10 000

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 61 / 75

Page 64: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Lunch", incoming specificity

FIG. 19: Stations incoming specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 62 / 75

Page 65: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Lunch", outgoing specificity

FIG. 20: Stations outgoing specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 63 / 75

Page 66: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Lunch", balance

-30-20-100102030

Balance

FIG. 21: Stations expected balances with Ndep = 10 000

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 64 / 75

Page 67: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Work→House commute", incoming specificity

FIG. 22: Stations incoming specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 65 / 75

Page 68: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Work→House commute", outgoing specificity

FIG. 23: Stations outgoing specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 66 / 75

Page 69: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Work→House commute", balance

-30-20-100102030

Balance

FIG. 24: Stations expected balances with Ndep = 10 000

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 67 / 75

Page 70: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Evening" incoming specificity

FIG. 25: Stations incoming specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 68 / 75

Page 71: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Evening", outgoing specificity

FIG. 26: Stations outgoing specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 69 / 75

Page 72: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Evening", balance

-30-20-100102030

Balance

FIG. 27: Stations expected balances with Ndep = 10 000

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 70 / 75

Page 73: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Spare time", incoming specificity

FIG. 28: Stations incoming specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 71 / 75

Page 74: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Spare time", outgoing specificity

FIG. 29: Stations outgoing specificity

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 72 / 75

Page 75: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

"Spare time", balance

-30-20-100102030

Balance

FIG. 30: Stations expected balances with Ndep = 10 000

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 73 / 75

Page 76: animatics

Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset

Conclusion on LDA for activities recognition

Interpretable latent activitiesGive good picture of city "pulse" and geographyBetter understanding of the system behaviourStrong evidence of cyclostationarityWeek-day / Week-end pattern

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 74 / 75

Page 77: animatics

Thanks for your attention

@comeetie, [email protected]

IfsttarCentre de Marne-la-ValléeBatiment le “Descartes 2”2, rue de la Butte Verte F-93166 Noisy le Grand cedex

Mél. [email protected]él. +33 (0)1 45 92 56 57

Site : www.ifsttar.fr

Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 75 / 75