towards a higher education profiler
DESCRIPTION
CASA, London - 11/2/09TRANSCRIPT
Towards a Higher Education Profiler
Alex Singleton, Paul Longley, Alan Wilson
Introduction
• Views on the transition from PhD to Post Doctorate– ESRC First Grant Scheme
• Some new data sources• Some new insights• Beta Educational Profiler• From description to prediction
My transition from an unconventional PhD an unconventional Post Doc
• A Spatio-Temporal Analysis of Access to Higher Education (aprox. 1 year ago)– Three Themes
• Momentum• Unfinished Business• Future Directions
– Unconventional• PhD – Spent 2 years in Cheltenham at UCAS – KTP• Post-Doc – It isn’t really one
Momentum
• Keep it up – you are used to writing lots– Use this to write papers, grant / book proposals
• Disillusion with PhD topic– Good to put it down for a while and do something else
• 2 Papers on E-Society– Online validation– Digital deprivation V material deprivation
• 2 Papers on Neo-Geography
Unfinished Business
• There will be things in your thesis which you wanted to cover but didn’t have time / room– Methodological
• Alternate algorithms to k-means in creation geodemographics• Geographic representations of cluster instability – related to initial
seed locations• Alternate optimisation procedures –measures of spatial rather than
social similarity
– Domain Specific• Course Clusters
– Overarching Themes• Future of area classification
– Real Time Geodemographics
Future Directions• Like it or not, your PhD is what you are known for!
– Chart hits matter:• Brunsdon = GWR• Dorling = Catograms
– Unless you start again, you will alwaysreturn to you PhD themes
• It is what you know most about!
– Research is driven by funding• Funding for PhD students• Funding for research grants• Building a research team enables
– You to do more– Efforts shift from “doing” to “guiding” / “organising”
» Don’t drown in this – you still need to keep “doing”
Doctoral
Post - Doctoral
AcademicsRA
ESRC First Grants
• Scheme: Enables early career researchers to apply for a small grant – essentially 1 researcher @ FEC funding + expenses
• Very competitive– Over 200 applications last year – ~13% success
• Spatial interaction modelling, geodemographics and widening participation in the Higher Education sector?
Start Oct. 2008
End Sept. 2009
Feb2009
Time PlanApproximately ¼ Way Through
Stage 1: Data Acquisition and Insight
Stage 2: Higher Education Profiler
Stage 3: Higher Education Modeller
Pet Project....(Facebook)
The HE Problem: Acceptances 1962 - 2003
Occupational Group: 1968-1978
Social Class: 1980 - 2001
Socio-Economic Group: 2002 - 2007
End point of my thesis
Investigated a variety of different aspects of HE participation, from a geodemographics / geographers perspective
Distance travelled to HE by Mosaic
Mosaic Profile– UCAS Acceptances – 2004 (Base All Adults)
HE Acceptances
2006 PLASC Data - DCSF
School Profiles – Selective Schools
2006 PLASC Data - DCSF
GCSE Grades
School Catchment Areas
Stage 1: Data Acquisition and Insight
• Data Sources– University and Colleges Admissions Service (UCAS)– Higher Education Statistics Agency (HESA)– Department for Children Schools and Families (DCSF)
• A-Level & Equiv (Key Stage 5)• GCSE & Equiv (Key Stage 4)
– DCSF & HESA now link at individual level• Map a student through time!
– Previously – had to consider each key stage separately
Caveat – These data only arrived last week!
Insight 1: Entry Rates (DCFS & HESA)
DCSFKey Stage 5
HESA (0)
HESA (+1)
HESA (+2)
2004 ~50%
~20%
~5%
Direct Entry
Gap Year
Gap Years
National Targets = 18-30 Age Range
Insight 2: Course Choice Behaviour (UCAS)
Applicant
C1 C2 C3 C4 C5 C6
Percentages – Row ∑100%
Thus, for applicants with at least one choice in “A1 - Pre-Clinical Medicine”, 76.9% of applications from other applicants are within the same JACS Line.
A1 is quite homogeneous!
Extract of the full table
Insight 3: Model of Private Characteristics
State / KS5 FE Colleges Private
Demographics – inc.spatial reference x
Higher Education (HESA)
Insight 4: Participation Flows (based on HESA data)
Stage 2: Higher Education Profiler
• Integrate insights from my thesis• UK HE Atlas• Platform for decision support for a range of
stakeholders in HE
Map Generation – Dependent Solution
Issues – Dependent on Google API, Limited to Google Cartography, potential issues with data ownership
Original Site (Google)
Map Generation – Independent Solution
Shuttle Radar Topography Mission (SRTM)~100m resolution
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
PublicProfilerSchools Atlas
DCSF.gov.uk EduBase and
IDACI
OpenLayers (.js)
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
AJAX Requests
OSM Tiles
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
Google Chart API
Tiles
Chart Cache
Architecture Diagram
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
Mapnik
OSM Tiling Script (.py)
Stylesheets(.xml)
OpenStreetMap (via Cloudmade)
Shapefiles
Tiles
PublicProfilerSchools Atlas
NetworkLayer
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL Tools
Mapnik
OSM Tiling Script (.py)
Stylesheets(.xml)
PerryGeo Hillshading
Shapefiles
HillshadingLayer
PublicProfilerSchools Atlas
Tiles
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
PublicProfilerSchools Atlas
ChoroplethLayers
Tiles
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
OAC
SchoolStatistics
PublicProfilerSchools Atlas
Tiles
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
School Catchment
Areas
PublicProfilerSchools Atlas
Tiles
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
ServingTiles
PublicProfilerSchools Atlas
Tiles
OpenLayers (.js)
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
Other TileSources
PublicProfilerSchools Atlas
Tiles
OSM Tiles
OpenLayers (.js)
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
AJAX Requests
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
Google Chart API
Chart Cache
PublicProfilerSchools Atlas
Tiles
OSM Tiles
OpenLayers (.js)
Showing the Statistics
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
AJAX Requests
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
Google Chart API
Chart Cache
OSM
Tiles
Tiles
OpenLayers (.js)
PublicProfilerSchools Atlas
Showing Catchments
OpenStreetMap (via Cloudmade)
great_britain.osm (.xml) PostgreSQL
DB with PostGIS
osm2pgsql
NASA’s
SRTM DEMs
GDAL ToolsUKBorders
English MSOAs and Postcodes
DCSF. gov.uk
National Pupil Database
mySQL DB
DCSF.gov.uk EduBase and
IDACI
HEFCE.ac.uk POLAR
OAC
Mapnik
Shapefiles
OSM Tiling Script (.py)
Stylesheets(.xml)
ArcGIS
ColorBrewer
PerryGeo Hillshading
AJAX Requests
PVCs (.kml)
OAC
Hawths Tools
ArcGIS
Google Chart API
Chart Cache
PublicProfilerSchools Atlas
OSM Tiles
OpenLayers (.js)
Tiles
Production Systems
OpenStreetMap (via Cloudmade)
Stage 3: Higher Education Modeller
• m: geodemographic group• i: area of residence• j: university (or university location)• a: attainment level• n: school type• x: subject group• h: university type
Potential student groupings
The flow array
• We write the array of interest as
Sij(m, a, n, x, h)
• on the basis that we are always going to want to model Sij with some subset of (m, a, n, x, h).
• The challenge arises from the number of cells in this 7-dimensional array.
Numbers in each category
• m: 7 (Output Area Classification)• i: 30 (NUTS2 areas)• j: 171 HEIs; may be reduced to 30 locations• a: 3 attainment levels; or a continuum of UCAS points• n: ideally 5 school/college types: independent, state
selective, state non-selective, Sixth Form College, FE College; reduce to 3?
• x: 8, or 4?• h: 5 – Oxbridge/UCL/Imperial, major civic, other research,
large other, small other; or reduce to 3?
Flow array cells
• If we take the largest suggested numbers, the number of cells in the array would be:
7x30x171x3x5x8x5 = 21,546,000
• which is a ludicrously large number given that we are handling roughly 500,000 students in a year. Most of the cells would have zero entries.
Revised flow array cells
• If we take the lower category numbers, we get
7x30x30x3x3x4x3 = 680,400
• which is still too large. It is useful to look at this as 30x30 = 900 geographic dimensions and 7x3x3x4x3 = 756 other dimensions. (900x756 = 680,400)
Visualising the real data
• The next step is to visualise the data to guide us towards further aggregation.
School type
Output Area Classification
Attainment
University NUTS2 Area
Flow of student(s)Height = flow size
Comparing London Universities
UCL(Flow size > 2 students)
London Metropolitan University (Flow size > 2 students)
Comparing Leeds Universities
University of Leeds(Flow size > 4 students)
Leeds Metropolitan University(Flow size > 4 students)
OAC: Constrained By Circumstances (Flow size > 5 students )
OAC: City Living (Flow size > 5 students)
London to University of Oxford
All flows out of Cornwall (Flow size > 5 students)
Model Equation• The model would take the form:
Sij(m, a, n, x, h)
= Ai(m, a, n, x, h)Bj(a, x, h)ei(m, a, n, x, h)Oi(m)Dj(a, n, x, h)exp[-β(m)cij(m)]
• This conceptual model would suffer from having too many cells, but we will use the experience of examining the data to find ways of aggregating.
• Initially, the model would be run on a doubly constrained basis for calibration purposes.
• It would then be possible to replace Dj(a, n, x, h) by a set of attractiveness factors, Wj(a, n, x, h). This would provide a ‘What if?’ capability. The model could then be used to test various future policy options.