4 months left in a 4 year program ask me how it’s going!
TRANSCRIPT
GLOBEPower Analysis for Geographic Representativeness
Matthew D Schmill and Tim Oates
University of Maryland, Baltimore County
GLOBE: Enhancing Scientific Workflows The goal: accelerate and improve scientific workflows for land change
science
Joint work with Wayne Lutters, Erle Ellis, Tim Oates, Penny Rheingans at University of Maryland, Baltimore County IS, CSEE, GES
Supported by NSF’s Cyber-Enabled Discovery & Innovation program
Centerpiece is the GLOBE application Enabling better science by
Linking local studies to global data
Real-time analytics, interactive geovisualization tools
Fostering scientific collaboration
The GLOBE Program 4 months left in a 4 year program
Ask me how it’s going!
Land Change Science Study of interaction between human systems, ecosystems, the atmosphere,
and other Earth Systems as mediated through human use of land. Cross cuts many disciplines of social and natural science
Typified by this challenge: how to integrate and synthesize local studies to “globalized” results
Though GLOBE is targeted at Land Change Scientists The concept of representativeness is a very general concern
The GLOBE system is appropriate to any discipline engaged in the synthesizing local studies into global results
GLOBE: Technical Overview
Postgres/PostGIS
GlassfishGLOBE API
Java/ServletApache Commons
GeoToolsGson
GeoServer(Foreign Geom Layer)
WDPAGADM
UI: JS/HTML5/CSS3AngularJS
D3jsLeaflet
Google Maps
jdbc ajax
GLOBE: How We Can Help
Cases
Social
Analytics
Connecting with Data: GLOBE’s DGG Discrete Global Grid (Kevin
Sahr)
Globe Land Unit (GLU) ISEA3H, resolution 12
~ 96km2 hexagonal units
~1.5M terrestrial GLUs
Multi-resolution grid Faster rendering @ global
scales
Connecting with Data: Global Variables Static global variable
layers 100 variables and
growing
Remote sensing, biological, environmental, human …
Reprocessed to our grid
Data stored in-memory Quad tree / lookup table
Fast tile rendering and histogramming
Connecting with Data: GLOBE Cases An instance of a case study
A geography describing the location studied
Attached to primary source data (i.e. a journal article)
Annotated with metadata
GLOBE’s case database 3,166 visible case studies
~1,000 case studies georeferenced by the GLOBE Cases Team Taken from iconic LCS meta studies
Global data is aggregated over the case’s geographic entity
Collections and Metastudies Global inferences can be
made by synthesizing the results from collections of cases
44 GLOBE and community submitted collections
GLOBE Analytic: Representativeness The degree to which a sample represents a global pattern
Typical criticism anywhere that samples are used to make inferences Biased case selection undermines your result
Schmill et al. (2014) COM.Geo GLOBE: Analytics for Assessing Global RepresentativenessAvailable from IEEE Xplore
GLOBE’s Bias Analytics
Representativeness A converse to bias
A well-represented sample is not biased, a biased sample is not representative
Goodness-of-fit Collection versus population
Representedness The degree to which a location or
member of the population is represented by the collection
Useful for visualization and analysis Heat maps that show geographically
where gaps lie
Can be used as a basis for case study search to fill study gaps
GLOBE’s Bias Analytics
Representativeness A converse to bias
A well-represented sample is not biased, a biased sample is not representative
Goodness-of-fit Collection versus population
Multinomial
Representedness The degree to which a location or
member of the population is represented by the collection
Useful for visualization and analysis Heat maps that show geographically
where gaps lie
Can be used as a basis for case study search to fill study gaps
Series of binomials
Representativeness Analysis – x2
Bias vs. Representativeness A low p-value lets the researcher conclude bias
The next step is to address the bias
But the goal is to conclude representativeness And one cannot accept H0 based on a high p-value
Cohen, P.R. Getting what you deserve from data. 1996. You have not ruled out possible explanations for failing to reject H0
Small sample size
High sample variance
Statistical Power The probability that a test will reject H0 if it is false
α - the probability of falsely rejecting H0 (type I error)
β - the probability of failing to reject H0 when it is false (type II error)
(1-B) is power
A collection is representative if H0 is not rejected and the underlying test is powerful
Obligatory Math Slide
Binomial X2 (representedness) - Power
Can solve for n to compute sample size for a fixed β Multinomial case – much less pleasant
Monte Carlo methods – how to generate pi?
Treat the problem as a collection of binomials, use min power
The binomial decomposition also lets us draw maps
Components of Binomial Power Functions
standard normal quantile function
standard normal cumulative probability function
Parameters the biased proportion (p- is the “amount of effect”)
α is the significance level
Values n is sample size
p is the “true proportion” (the proportion of the category being assessed in the global range)
Mapping Statistical Power1. Tile request (x,y,z,A)
1. Range query, returning all intersecting GLUs (QT Lookup)
Mapping Statistical Power1. Tile request (x,y,z,A)
1. Range query, returning all intersecting GLUs (QT Lookup)
2. Iterate over intersecting GLUs
1. Dimensionality reduction (optional)
2. Map to condition
3. Map to binomial
Mapping Statistical Power1. Tile request (x,y,z,A)
1. Range query, returning all intersecting GLUs (QT Lookup)
2. Iterate over intersecting GLUs
1. Dimensionality reduction
2. Map to condition
3. Map to binomial
Mapping Statistical Power1. Tile request (x,y,z,A)
1. Range query, returning all intersecting GLUs (QT Lookup)
2. Iterate over intersecting GLUs
1. Dimensionality reduction
2. Map to condition
3. Map to binomial
4. Compute power
1. Map to color key
2. Render GLU (scanline renderer)
Representativeness, the Next Generation
Multivariate analysis
Modular analytical workflows & geoviz Save & Share workflows
Includes two Power Analysis tools Multinomial/Monte Carlo power
Binomial Power/Sample Size
In Summary GLOBE Application: new scientific workflows for Land Change Science
Broadly applicable where local studies are used to make global inferences
Collaboration, Contextualizing, Analytics
Representativeness is a goal of synthesis studies that is facilitated by GLOBE
But representativeness is not a directly testable hypothesis Chi Square, other statistics are useful for identifying and diagnosing bias
Power analysis helps to complete the case for representativeness
In the Pipeline Cases & Community Outreach
LTER network
World Bank / AidData integration (15,000+ cases)
GLOBE Temporal Consider spatiotemporal patterns
Aimed at supporting the Archaeology community
GLOBE BigData GLOBE Water
Working at higher resolutions ISEA3H resolution 17 = 400M .4km2 land-covered cells
Distributed / MapReduce architecture
Thanks! Visit us at http://globe.umbc.edu
Get an account at http://globe.umbc.edu/app