Visualising large spatial databases and Building bespoke geodemographics
Muhammad Adnan
University College London
About Me
• 2007 – 2009 • Worldnames (http://worldnames.publicprofiler.org)• Onomap (http://www.onomap.org)
• Nov. 2009 – Oct. 2011 (A KTP between UCL and Local Futures Group)
• LFG is a research and strategy consultancy
• Aim of the KTP was to device a better visualisation of the data
Data
• A database of 1600 indicators around 130 data sources
• Data sources cover social, economic, and environmental change in the UK
• The data is held at 8 spatial levels• Region, Sub region, District 2009, Nuts 3,
District (pre 2009), Ward, LSOA, OA
Visualisation of the data
• A ‘total place maps’ solution using different technologies (Video)
Base Layer Data
On the fly rendering of tiles
Programming in C# and ASP.NET
Data retrieval from database
Building Bespoke Geodemographics
Geodemographics
• “Analysis of people by where they live” or “locality marketing”
(Sleight, 1993:3)
HomeAddressPerson
Area
How a classification is created ?
Data – Census + Other
Experian: Mosaic
• Census data: 54%• Non-Census data: 46%
CACI: Accorn
• Census data: 30%• Non-Census data: 70%
ONS Output Area Classification
• Census data: 100%
How a classification is created ?
Segmentations are created by cluster analysis
Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 ...
Area1
Area2
Area3
Area4
Area5
Area6
Area7
Area8
...
Inputs…
How a classification is created ?
Variable 1
Variable 2
Cluster 1Cluster 2
Cluster 3
Cluster Analysis
K-means is used for clustering
How a classification is created ?
Output of Cluster Analysis
Area Cluster
Area1 1
Area2 1
Area3 2
Area4 1
Area5 3
Area6 3
Area7 3
Area8 2
...
Research Issues
• Optimisation of clustering algorithms• K-means• PAM (Partitioning Around Mediods)
• Open Tools ? • OACoder• GeodemCreator
• Bespoke local area classifications• UK’s open data initiative• ONS Neighbour Statistics API• UK’s police API• Barclays cycle hire API
Optimisation of Clustering Algorithms (K-Means)
K-means optimisation
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145
Run
RS
Q
n
x
n
yyxV z
1 1
2
)(
K-means (100 runs of k-means on OAC data set for k=4)
K-means (100 runs of k-means on OAC data set for k=4)
Run k-means multiple times (10,000 times) (Singleton & Longley, 2009)
CUDA & GPUs (Graphical Processing Units)
• Nvidia graphics cards have GPUs (Graphical Processing Units)• Can be used for parallel processing• Nvidia GeForce GT 420M (96 GPUs)• Latest Telsa graphics cards have 1000 GPUs
• CUDA (Computer United Device Architecture)• Parallel computing architecture• C and C++ can be used for programming
• A parallel implementation of k-means (Adnan & Longley, 2011)
K-means vs Parallel K-means
Could be useful for building geodemographics quickly in online environments
Open Tools for Geodemographics
Open Tools - OACoder
• Developed with Alex Singleton
• Assigns UK’s postcodes their corresponding OAC groups
• Download from
http://areaclassification.org.uk/
Open Tools – ‘GeodemCreator’
• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary
data sources)
Open Tools – ‘GeodemCreator’
• Allows users to create their local area Geodemographic Classifications• Provides data available in the public domain (but users can use ancillary
data sources)
Will be available to download from http://publicprofiler.org
Spatially Weighted Geodemographics
Spatially Weighted Geodemographcis
• Geodemographic classifications do not account for spatial weights in the results
• A spatially weighted Geodemographic classification introduces spatial weights in addition to the socio-economic characteristics
• Tobler’s first law of geography• “Everything is related to everything else, but near things are more
related than distant things”
Spatially weighted Geodemographics
Step - 1: Construct a Neighbours Graph
Spatially weighted Geodemographics
Step - 1: Construct a Neighbours Graph
Spatially weighted Geodemographics
Step - 2: Apply Moran’s I to the data set
• It is a measure of spatial auto correlation
• Values of spatial auto-correlation range from -1 to 1
• A negative value represents a negative spatial auto-correlation
Spatially weighted Geodemographics
Step - 2: Apply Moran’s I to the data set
Spatially weighted Geodemographics
Area V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 VM
Area1
Area2
Area3
Area4
Area5
Area6
Area7
Area8
...
Moran’s I Result
Step - 3: Apply K-means
Spatially weighted Geodemographics
Result
• Open methods and tools for building geodemographics are important
• A testing of Spatial Weighted Geodemographics technique• On lower spatial levels
• I will be working on the new research grant of Paul Longley on “Uncertainty of Identity”• How behaviours of people in the real-world could be mapped with their
behaviours in the virtual world ?
• Could marketing strategies be devised for targeting online social networks and communities ?
Conclusion and future work
A quick illustration
http://worldnames.publicprofiler.org
• We have a record of 100,000 ‘IP Address’ entries for the last 6 months
A quick illustration
http://quova.com
An API to convert “IP addresses” to their corresponding latitude / longitude values
A quick illustration
A quick illustration
Any Questions ?
Adnan, M., Longley, P.A., Singleton, A.D., Brunsdon, C. (2010) Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases. Transactions in GIS, 14(3), 283 – 297. Hall, J.D., Hart, J.C. (2004). GPU acceleration of iterative clustering. In: ACM Workshop on General-Purpose Computing on Graphics Processors, p C-6Harris, R., Sleight, P., Webber, R. (2005). Geodemographics, GIS and Neighbourhood Targeting. Wiley, London. Reynolds, A.P., Richards, G., Rayward-Smith, V.J. (2004) The Application of K-Medoids and PAM to the Clustering of Rules. Lecture Notes in Computer Science. 3177/2004, 173-178. Singleton, A.D., Longley, P.A (2008). Creating open source geodemographic classifications for Higher Education applications. Papers in Regional Science, 88(3), 643-666. Takizawa, H., Kobayashi, H. (2006). Hierarchical parallel processing of large scale data clustering on a pc cluster with GPU co-processing. J. Supercomput.,36(3):219–234. Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A. 170(2), 379-403.
References