mta analysis · 2013. 3. 7. · analysis for union square line4 figure:plot of the time intervals...

38
MTA ANALYSIS

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

MTA ANALYSIS

Page 2: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW MANY SUBWAY STATIONS ARE IN NEWYORK CITY?

493

Page 3: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW MANY SUBWAY STATIONS ARE IN NEWYORK CITY?

493

Page 4: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW ARE THESE DISTRIBUTED ACROSS THE FIVEBOROUGHS?

Page 5: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW ARE THESE DISTRIBUTED ACROSS THE FIVEBOROUGHS?

Page 6: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW DOES IT APPEAR AFTER NORMALIZING WITHRESPECT TO SURFACE AREA?

Page 7: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

HOW DOES IT APPEAR AFTER NORMALIZING WITHRESPECT TO SURFACE AREA?

Page 8: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

ANY BETTER WITH POPULATION FACTORED IN INSTEAD?

Page 9: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

ANY BETTER WITH POPULATION FACTORED IN INSTEAD?

Page 10: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

MTA DATA

Subway Main Tunrstile

StationEntrances calendar tunrstile

calendar-dates RBSroutesstops

stop-timestransfers

trips

Page 11: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Extracting details from data files

IMAGINE YOU WANTED TO FIND OUT THE THE ARRIVALTIMES OF LINE B AT COLUMBUS CIRCLE.

YOU FOLLOW THESE SIMPLE STEPS:

1: GO TO THE FILE stops AND LOOK FOR THE stop-id OFCOLUMBUS CIRLCE.2: GO TO THE FILE trips AND LOOK UNDER COLUMNroute-id FOR B3. EACH ROW WITH B CONTAINS A UNIQUE trip-id.4. GO TO FILE stop-times, LOCATE THE trip-id FROM 3.THAT ROW CONTAINS A STOP ID AND TIME.5. LOCATE THE stop-id FROM 1 AND THE CORRESPONDINGtime.6.REPEAT 3-5 FOR EVERY ROW WITH B

Page 12: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

ANALYSIS FOR UNION SQUARE LINE 4

Figure: Plot of the time intervals for arrivals of line 4 at the UnionSquare on weekdays and weekends.

I Observe the oscillations in the time interval at daytime on weekends.

I Observe also that the longest interval over the cycle is about 20 minutes and does not depend on whetherit is a weekday or weekend.

Page 13: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

ANALYSIS: Turnstile Data At Station Stops.

WE WANT TO KNOW HOW THE USAGE OF THESUBWAY VARIES OVER THE DAY

Figure: 28th Street -Line 1(weekdays). Observe the remarableconsistency in the pattern across all weekdays.

Page 14: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

ANALYSIS: Turnstile Data At Station Stops.

WE WANT TO KNOW HOW THE USAGE OF THESUBWAY VARIES OVER THE DAY

Figure: 28th Street -Line 1(weekdays). Observe the remarableconsistency in the pattern across all weekdays.

Page 15: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: 28th Street -Line 1 (weekends)

Page 16: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: 34th Street Penn Station(weekdays).Again, notice how nearlyidentical the variation over 24 hours is for all weekdays

Page 17: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: 34th Street Penn Station(weekend).

Page 18: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Neighborhoods Geographic DataI I WANTED TO UNDERSTAND THE DISTRIBUTION OF

STOPS ACROSS NEIGHBORHOODS.I I OBTAINED GIS DATA FOR THE NEIGHORHOODS OF

NYC FROM nycopendataI EACH NEIGHBORHOOD HAS BY A SET OF POINTS

THAT DESCRIBES THE BOUNDARY POLYGON

Figure: DUMBO Polygon

Page 19: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Point in Polygon

I TO DETERMINE IF A STATION IS WITHIN ANEIGHORHOOD, I WROTE A SMALL CODE TO FINDOUT IF A POINT LIES WITHIN THE BOUNDARY OF APOLYGON.

I THE MAIN IDEA IS TO COUNT THE CHANGES IN THEQUADRANTS OF THE VECTOR FROM THE POINT INQUESTION TO A VERTEX OF THE POLYGON AS WE GOAROUND IT.

I IF THE TOTAL CHANGE IS +4 OR -4 (SIGN DETERMINESDIRECTION) THEN THE POINT IS INSIDE THE POLYGON

Page 20: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

def PiP ( pol , x , y ) :q count , quad p=0,−1quad=0f o r ps i n po l :

dx=ps [0]−xdy=ps [1]−yi f dx==0 or dy==0: r e t u r n Truequad =quad det ( dx , dy ) # A s imp l e f u n c t i o n tha t d e t e rm in e s

# which qua rd r an t the v e c t o r l i e s i n .i f ( quad p >=0):

i f ( quad−quad p)%4 ==1:# Quadrant has s h i f t e d by +1 ( c o u n t e r c l o c kw i s e )

q count+= 1e l i f ( quad−quad p)%4 ==3:# Quadrant has s h i f t e d by −1 ( c l o c kw i s e s h i f t o f 1)

q count−=1e l i f ( quad − quad p)%4 ==2: # Here we have to de t e rm ine i f the quadrant#changed i n a c l o c kw i s e or ant−c l o c kw i s e d i r e c t i o n .

det=Fa l s ewh i l e ( det==Fa l s e ) : # Choose a po i n t on the#l i n e j o i n i n g the two v e c t o r s

r= rand ( )p x =dx p + r ∗(dx−dx p )p y = dy p + r ∗(dy−dy p )quad m = quad det ( p x , p y )i f ( quad m!=quad ) and ( quad m!=quad p ) :

i f ( quad m −quad p)%4==1:q count+=2

e l s e :q count−=2

det=Truedx p=dxdy p=dyquad p=quad

i f abs ( q count )==4:r e t u r n True

e l s e :r e t u r n Fa l s e

Page 21: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: Heat Map for distribution of Stops across neighborhoods

Page 22: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

TOP 10 NEIGHBORHOODS RANKED BY STOPS:

22 Midtown - Midtown South17 SoHo-Tribeca-CivcCentr-LittleItaly13 BatteryParkCity-LowerManhattan

12 HudsnYds-Chelsea-Flatirn-UnionSq11 HuntersPt-Sunnyside-WstMaspeth

10 East New York (part A)10 West Village

9 Bensonhurst West9 Park Slope - Gowanus

Page 23: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

TOP 10 NEIGHBORHOODS RANKED BY STOPS:

22 Midtown - Midtown South17 SoHo-Tribeca-CivcCentr-LittleItaly13 BatteryParkCity-LowerManhattan

12 HudsnYds-Chelsea-Flatirn-UnionSq11 HuntersPt-Sunnyside-WstMaspeth

10 East New York (part A)10 West Village

9 Bensonhurst West9 Park Slope - Gowanus

Page 24: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: Heat Map for distribution of stops across neighborhoods adjustedto area

Page 25: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

TOP 10 NEIGHBORHOODS RANKED BY STOPS(ADJUSTED TO AREA)

13 BatteryParkCity-LowerManhattan22 Midtown - Midtown South

17 SoHo-Tribeca-CivcCentr-LittleItaly10 West Village6 Fort Greene

12 HudsnYds-Chelsea-Flatirn-UnionSq7 UpperEastSide - CarnegieHill

5 Central Harlem South5 Chinatown

Page 26: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

TOP 10 NEIGHBORHOODS RANKED BY STOPS(ADJUSTED TO AREA)

13 BatteryParkCity-LowerManhattan22 Midtown - Midtown South

17 SoHo-Tribeca-CivcCentr-LittleItaly10 West Village6 Fort Greene

12 HudsnYds-Chelsea-Flatirn-UnionSq7 UpperEastSide - CarnegieHill

5 Central Harlem South5 Chinatown

Page 27: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

The Bigger Picture For Stops Distribution

Page 28: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

The Bigger Picture For Stops Distribution

Page 29: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Neighborhood stops and income inequality:

Figure: Compare this with the distribution for income inequality in NewYork City

Page 30: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Neighborhood stops and income inequality:

Figure: Compare this with the distribution for income inequality in NewYork City

Page 31: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Stop Clusters

:

k-MEANS ALGORITHM :

STEP 1: CHOOSE k CENTERS RANDOMLY ON THE 2DSURFACESTEP 2: ASSIGN EVERY STOP TO ONE OF THE CENTERSBASED ON CRITERION OF MINIMUM DISTANCESTEP 3: RELOCATE EVERY CENTER TO THE CENTROID OFTHE STOPS ASSIGNED TO IT.STEP 4: REPEAT 2-3 UNTIL CONVERGENCE

Page 32: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

A short code for k-means algorithm

wh i l e ( conv==Fa l s e ) : # Loop u n t i l c ove rgencec t+=1e r r=0conv=True # Flag f o r conve rgencef o r j , pos i n enumerate ( z i p ( l o n s x , l a t s y ) ) : # Loop ove r each data po i n t

d c l u s =[ d i s t ( pos [0]−x1 , pos [1]− y1 ) ,2 f o r x1 , y1 i n z i p ( x , y ) ] # A l i s t w i thd i s t a n c e s to the c u r r e n t c l u s t e r c e n t r o i d s .

T dst2=T dst2+ min ( d c l u s )r e c l u s=d c l u s . i nd e x (min ( d c l u s ) ) # Index o f the minimum−d i s t a n c e c l u s t e ri f ct<2:

s t o p c l u s [ j ]= r e c l u sc l u s s t o p [ s t o p c l u s [ j ] ] . append ( j )

e l i f ( r e c l u s != s t o p c l u s [ j ] ) : # Re−as s ingment o f n e a r e s t c l u s t e r .c l u s s t o p [ s t o p c l u s [ j ] ] . remove ( j )c l u s s t o p [ r e c l u s ] . append ( j )s t o p c l u s [ j ]= r e c l u sconv=Fa l s e # The min imal a s s i gnment has not been reached ye t .

f o r j , pos i n enumerate ( z i p ( x , y ) ) : # Loop ove r the c l u s t e r c e n t r o i d sd x=[ l o n s x [ k ] f o r k i n c l u s s t o p [ j ] ]d y=[ l a t s y [ k ] f o r k i n c l u s s t o p [ j ] ]i f l e n ( d x )==0: breakx n=sum( d x )/ l e n ( d x ) # Se t t i n g the new l o c a t i o n s .y n=sum( d y )/ l e n ( d y )x [ j ]= x ny [ j ]= y n

Page 33: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Representation of k-means clustering

Figure: Clusters =5. The centroids are marked by a black star and thefigure next to it is the number of stops attached to that.

Page 34: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: Clusters =6. The centroids are marked by a black star and thefigure next to it is the number of stops attached to that. Notice that oneof the cluster centroids in at the edge of Bay Ridge(Brooklyn)

Page 35: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Figure: Clusters =7. The centroids are marked by a black star and thefigure next to it is the number of stops attached to that. Notice thatStaten Island and Far Rockaway are now indepedent clusters.

Page 36: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

Significance of Clusters

I HOW DO WE DECIDE THE NUMBER OF CLUSTERS ISOPTIMUM?

I IT IS A MATTER OF INDIVIDUAL DISCRETION. MYAPPROACH HAS BEEN TO CONSIDER THE VARIABLETHIS ALGORITHM MINIMIZES J :SUM OF SQUARES OFDISTANCES TO THE ASSIGNED CENTERS OF ALLPOINTS.

I IF J DECREASES CONSIDERABLY WITH AN ADDITONOF AN EXTRA CLUSTER, THEN I TAKE IT THAT MORECLUSTERS ARE NEEDED TO REPRESENT THE DATACORRECTLY.

I WHEN J DOES NOT VARY MUCH ON ADDITION OFCLUSTERS, I STOP.(Of course, it will decreasemonotonically until we reach as many clusters as there arepoints when it is zero!)

Page 37: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

PROBLEMS WITH THE DATA:(A) INCONSISTENCIES:(Eg: 34TH ST PENN STATION IS LISTED 3 TIMES IN RBSFILE ; PATH TRAIN STOPS ARE INCLUDED IN ONE SET ANDNOT ANOTHER)

(B) UNRELIABLE DATA :Eg: IN FILE tunstile-data, THERE ARE SEVERAL RECORDSUNDER THE CATEGORY OF ’DOOR OPEN’ WHICH IS WHENCOMMUTERS CAN SKIP THE TURNSTILE ALTOGETHER.

ALSO, SOMETIMES THE TURNSTILES ARE SUDDENLYRESET (AT RANDOM TIMES!).

(C) SPECIAL AWARD FOR WORST DATA EVER:NEW YORK CITY SHAPEFILES: THIS IS GROTESQUE!

Page 38: MTA ANALYSIS · 2013. 3. 7. · ANALYSIS FOR UNION SQUARE LINE4 Figure:Plot of the time intervals for arrivals of line 4 at the Union Square on weekdays and weekends. I Observe the

THANK YOU