predicting the availability of parking spaces in ljubljana
DESCRIPTION
Presentation for my Josef Stefan International Postgraduate School data mining course assignment.Predicts the availability of parking spaces in Ljubljana car parks: 30min, 1h, 2h and 3h intervals.TRANSCRIPT
Knowledge Discovery And Data Mining
Predicting The Availability of Parking Spaces in Ljubljana
Luis Rei [email protected]
http://luisrei.com
Report, slides and code available online.
Parking Spaces• City of Ljubljana (http://www.lpt.si)
• Available via http://opendata.si/
• 11 Car Parks!• Park Name (and id) • Number of Free Spaces!• Total Spaces Available • Price • Coordinates • Timestamp!• Updated Every 5min • From 2011-09-12 to 2013-11-18!
• Test starts: 2013-08-19
The ParksPark Total Spaces*PH Kozolec 248Tivoli I 360Mirje 110Trg MDB 40Gospodarsko raz.
550Bežigrad 62Trg preko. brigad
98Kranjčeva 118Žale II 80Petkovskovo II 85PH Kongresni trg
720
Buyer Beware: Cleanup• Missing data!
• Collection failed: entire months, weeks, days missing • All parks
• Sensor/communication failed: missing entries • Some parks
• Invalid data!• Negative free spaces • (A lot) more free spaces than the total • Null values
• Strategies!• Interpolating • Replacing with the mean (window variables) • Removing (target variable)
Time Series Resampling
2011-01-01 00:00:00 1
2011-01-01 00:45:00 2
2011-01-02 01:30:00 2
2011-01-02 02:15:00 4
2011-01-03 03:00:00 3
2011-01-03 06:00:00 12011-01-03 2.0
How: Mean
2011-01-01 1.5
Interval: Daily
2011-01-02 3.0
How: Min
2011-01-03 1
2011-01-01 1
2011-01-02 2
How: Last
2011-01-03 1
2011-01-01 2
2011-01-02 4
Question (Goal)!At the end of the next time period, how many free spaces will be available in this park?
How: Last
Intervals: {30, 60, 120, 180} min
Sliding Windowsw-2!
past statew-1!
past statew!
current state
Target!future state
Interval!t-2 170 180 190 200
Interval!t-1 180 190 200 210
Interval!t 190 200 210 220
window size = 4
Baselines & Models• Baselines
• Mean • Previous Value
• Models • Linear Regression • Regression Tree • Random Forest
• Bonus Models • Global Random Forest • Incremental Linear Regression
Results Average Root Mean Squared Error
Method 30Min 60Min 120Min 180Min
Mean 41,2 41,4 41,6 41,3
Previous Value 10,1 16,3 26,6 33,9
Linear Regression 3,5 4,2 4,8 4,7
Regression Tree 0,5 0,8 0,4 0,5
Random Forest 0,4 0,5 0,6 0,5
!PH Kongressni trg,resampled 120 min intervals
One Week At the Car Park &
The trouble with missing values
The Effect of Missing Values The Sliding Window Revisited
w-2!past state
w-1!past state
w!current state
Target!future state
Interval!t-2 170 160 140 100
Interval!t-1 160 140 100
Missing!not
predicted
Interval!t 140 100
Missing!replaced
with mean ?? = 10 E.g Mean = 150 very different from the missing value (e.g. 60)
Missing Values
Percentage of test set 0.7%
Percentage of error (RMSE) 71%
Note For window_size = 1, RMSE = 21 - not represented for the sake of clarity
RF Average RMSE vs Window Size
Future Work• Better handling of missing values
• Time based interpolation of some of the missing data within a limited max time interval
• Use model to predict the missing data!
• Crawl more data
• Test with a full year
• Evaluate “classical” autoregressive models
• with smoothing
• Predict further into the future
• Additional data: weather, holidays, soccer, social…
• Get the average error down to zero, keep maximum error small