spread and page rank for interactive maps
DESCRIPTION
Talk presented at GIS in Action conference, Portland, OR, April 2013TRANSCRIPT
Spread and Page Rank for Interactive Maps
Wm Leler - Flightstats, [email protected]
openstreetmap.org
zoom level 6
openstreetmap.org
Missing: San Francisco, San Jose, Los Angeles, Las Vegas, Phoenix, Seattle, Vancouver, Detroit, Dallas, NYC, Miami
Stamen Terrain
Beaverton? Hillsboro? Forest Grove? Tigard?
Google Maps
Seattle? Denver? Salt Lake City? Las Vegas?
MapQuest
56
where did they go?
The Big Problem
• A map is a spatial display of a bunch of objects: cities, highways, parks, airports, etc.
• At most zoom levels, there are far too many objects to display.
• What is the best way to pick which objects to display (per zoom level) on a map?
Our Immediate Problem
• At FlightStats we need to decide which airports to draw on a map
Every airportthat suppliesus with flightdata (4180)
FAA Categories
Based on % of passenger-enplanements
1. Primary large hub (>1% of p-e)
2. Primary medium hub (.25 - 1%)
3. Primary small hub (.05 - .25%)
4. Primary non-hub (<0.05%)
5. Secondary (< 10,000 p-e / year)
Bad Solution• Not so good for maps
• Airports bunch together and leave big empty spaces
Bunching
Empty Spaces
Bunching
Delay Map
• South America has only 3 large primary airports
• Two serve the same city (Rio)
• Africa has one (JNB)
Even Worse Internationally
• Should be dependent on zoom level
• as you zoom in, you want to see more
• Number of passengers is a bad measure
• short commuter flights to little airports should count less
• flights to other major airports should count more
More Problems
Goals
• Show “important” airports
• major airports
• plus less major airports if they are the primary airport for an area
• Avoid airports “bunching up”
Solution
Importance of an airport is based on:
1. How connected it is to other airports
weighted by the importance of each other airport (recursive)
2. Area for which it is the primary airport
3. Based on current map view
Connectivity• Calculate connectivity using the PageRank
algorithm (by Larry Page at Google, adapted by Steve Wilson at FlightStats)
SEA
PDX
EUG
PDT
RDM
PR(PDX) = foreach airport x: flights(PDX, x) * PR(x)
Connectivity• Calculate connectivity using the PageRank
algorithm (by Larry Page at Google, adapted by Steve Wilson at FlightStats)
SEA
PDX
EUG
PDT
RDM
PR(PDX) = foreach airport x: flights(PDX, x) * PR(x)
.0025
.0036
.00014
.0002.0003
Spread
• (Page) Rank still suffers from bunching and empty spaces
• Add Spread – the distance to the closest airport of higher rank
• a reasonable proxy for the area
• The Spread for PDX is 111.98 nautical miles (the distance to SEA)
CombiningRank and Spread
Rank Spread X
SFO 0.0040 293.5 1.19
OAK 0.0015 10.26 0.0158
SJC 0.0012 24.8 0.0287
SMF 0.0011 65.6 0.0702
The four biggest airports in Northern CA
Algorithm
1. Pre-calculate Rank and Spread for all airports (they don’t change very often)
2. For each map view, display the N airports with the largest product of Rank and Spread
3. (optionally) set minimum Rank and Spread
Flexibility
• Applications can pick their weighting of Rank and Spread, and value of N
• N can depend on zoom level
• Can also use them as limits
• minimum Spread to debunch airports
• minimum Rank to hide small airports
Beyond Airports
• Can use Spread to space out almost anything on a map
• cities, neighborhoods, roads, parks, mountains, rivers, lakes, ...
• your data!
• Rank emphasizes connectivity over raw size or category
Connectivity
gaps
noise
DEMO
http://demo.flightstats-ops.com/spread
http://www.slideshare.net/wmleler/spread-20277955