find my ride (wide)
TRANSCRIPT
▸Bike shares are awesome
▸Flexible and convenient ▸ But only if a bike is available when you need it
FIND MY RIDE
▸Bike shares are awesome
▸Flexible and convenient
▸But only if a bike is available when you want it
▸ But only if a bike is available when you need it
FIND MY RIDE
▸ Can get current status from Hubway website or 3rd party apps.
▸ Often bikes will be gone by the time you get to a station.
▸ Can we help people find the best station to go to?
FIND MY RIDE
Bike Data -2 years of station data -96 bike share stations -number of bikes available every minute at each station
Weather Data -hourly historical weather data from NOAA
12am 8am 5pm0
10N
umbe
r of B
ikes
WeekdayWeekend
Residential Station
12am 8am 5pm0
10
Num
ber o
f Bik
es
WeekdayWeekend
Business Station
FIND MY RIDE
INPUT start location
and time
Googlemaps api - calculate walk time to nearby station
Forecastio - get weather forecast
MODEL random forest
classifier at each station to predict # bikes
FEATURES month
day of week time of day
(minute) precipitation temperature
holidays
FORECAST 0, 1-2, 2+ bike
availability
OUTPUT rank stations
based on walking distance and # of
bikes
FIND MY RIDE
Training
54.8%
Testing
22.6%Validation
22.6%
FIND MY RIDE▸ Randomly selected days to train, test, and
validate on
▸ Tuned random forest classifier on 55% of the data (17/31 days per month) with different hyper parameters
Training
54.8%
Testing
22.6% Validation
22.6%
FIND MY RIDE▸ Randomly selected days to train, test, and
validate on
▸ Tuned random forest classifier on 55% of the data (17/31 days per month) with different hyper parameters
▸ Selected best model based on score of predicting test data (7/31 days)
Training
54.8%
Testing
22.6%
Validation
22.6%
FIND MY RIDE▸ Randomly selected days to train, test, and
validate on
▸ Tuned random forest classifier on 55% of the data (17/31 days per month) with different hyper parameters
▸ Selected best model based on score of predicting test data (7/31 days)
▸ Final results based on validation data (7/31 days)
▸ Randomly selected days to train, test, and validate on
▸ Tuned random forest classifier on 55% of the data (17/31 days per month) with different hyper parameters
▸ Selected best model based on score of predicting test data (7/31 days)
▸ Final results based on validation data (7/31 days)
▸ Scored the model on recall score of 0 bikes
0 Bikes 1-2 Bikes 2+ BikesActual
0 Bikes
1-2 Bikes
2+ Bikes
Pred
icte
d
0.83 0.05 0.12
0.11 0.30 0.59
0.05 0.11 0.84
0.0
0.5
1.0
FIND MY RIDE
FIND MY RIDE
0 1 2 3 4 5 6 7 8 9 101112131415Actual
0123456789
101112131415
Pred
icte
d
0.0
0.5
1.0
0 Bikes 1-2 Bikes 2+ BikesActual
0 Bikes
1-2 Bikes
2+ Bikes
Pred
icte
d
0.83 0.05 0.12
0.11 0.30 0.59
0.05 0.11 0.84
0.0
0.5
1.0