4 months left in a 4 year program ask me how it’s going!

29
GLOBE Power Analysis for Geographic Representativeness Matthew D Schmill and Tim Oates University of Maryland, Baltimore County

Upload: catherine-allyson-tyler

Post on 24-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4 months left in a 4 year program  Ask me how it’s going!

GLOBEPower Analysis for Geographic Representativeness

Matthew D Schmill and Tim Oates

University of Maryland, Baltimore County

Page 2: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE: Enhancing Scientific Workflows The goal: accelerate and improve scientific workflows for land change

science

Joint work with Wayne Lutters, Erle Ellis, Tim Oates, Penny Rheingans at University of Maryland, Baltimore County IS, CSEE, GES

Supported by NSF’s Cyber-Enabled Discovery & Innovation program

Centerpiece is the GLOBE application Enabling better science by

Linking local studies to global data

Real-time analytics, interactive geovisualization tools

Fostering scientific collaboration

Page 3: 4 months left in a 4 year program  Ask me how it’s going!

The GLOBE Program 4 months left in a 4 year program

Ask me how it’s going!

Page 4: 4 months left in a 4 year program  Ask me how it’s going!

Land Change Science Study of interaction between human systems, ecosystems, the atmosphere,

and other Earth Systems as mediated through human use of land. Cross cuts many disciplines of social and natural science

Typified by this challenge: how to integrate and synthesize local studies to “globalized” results

Though GLOBE is targeted at Land Change Scientists The concept of representativeness is a very general concern

The GLOBE system is appropriate to any discipline engaged in the synthesizing local studies into global results

Page 5: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE: Technical Overview

Postgres/PostGIS

GlassfishGLOBE API

Java/ServletApache Commons

GeoToolsGson

GeoServer(Foreign Geom Layer)

WDPAGADM

UI: JS/HTML5/CSS3AngularJS

D3jsLeaflet

Google Maps

jdbc ajax

Page 6: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE: How We Can Help

Cases

Social

Analytics

Page 7: 4 months left in a 4 year program  Ask me how it’s going!

Connecting with Data: GLOBE’s DGG Discrete Global Grid (Kevin

Sahr)

Globe Land Unit (GLU) ISEA3H, resolution 12

~ 96km2 hexagonal units

~1.5M terrestrial GLUs

Multi-resolution grid Faster rendering @ global

scales

Page 8: 4 months left in a 4 year program  Ask me how it’s going!

Connecting with Data: Global Variables Static global variable

layers 100 variables and

growing

Remote sensing, biological, environmental, human …

Reprocessed to our grid

Data stored in-memory Quad tree / lookup table

Fast tile rendering and histogramming

Page 9: 4 months left in a 4 year program  Ask me how it’s going!

Connecting with Data: GLOBE Cases An instance of a case study

A geography describing the location studied

Attached to primary source data (i.e. a journal article)

Annotated with metadata

GLOBE’s case database 3,166 visible case studies

~1,000 case studies georeferenced by the GLOBE Cases Team Taken from iconic LCS meta studies

Global data is aggregated over the case’s geographic entity

Page 10: 4 months left in a 4 year program  Ask me how it’s going!
Page 11: 4 months left in a 4 year program  Ask me how it’s going!

Collections and Metastudies Global inferences can be

made by synthesizing the results from collections of cases

44 GLOBE and community submitted collections

Page 12: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE Analytic: Representativeness The degree to which a sample represents a global pattern

Typical criticism anywhere that samples are used to make inferences Biased case selection undermines your result

Schmill et al. (2014) COM.Geo GLOBE: Analytics for Assessing Global RepresentativenessAvailable from IEEE Xplore

Page 13: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE’s Bias Analytics

Representativeness A converse to bias

A well-represented sample is not biased, a biased sample is not representative

Goodness-of-fit Collection versus population

Representedness The degree to which a location or

member of the population is represented by the collection

Useful for visualization and analysis Heat maps that show geographically

where gaps lie

Can be used as a basis for case study search to fill study gaps

Page 14: 4 months left in a 4 year program  Ask me how it’s going!

GLOBE’s Bias Analytics

Representativeness A converse to bias

A well-represented sample is not biased, a biased sample is not representative

Goodness-of-fit Collection versus population

Multinomial

Representedness The degree to which a location or

member of the population is represented by the collection

Useful for visualization and analysis Heat maps that show geographically

where gaps lie

Can be used as a basis for case study search to fill study gaps

Series of binomials

Page 15: 4 months left in a 4 year program  Ask me how it’s going!

Representativeness Analysis – x2

Page 16: 4 months left in a 4 year program  Ask me how it’s going!

Bias vs. Representativeness A low p-value lets the researcher conclude bias

The next step is to address the bias

But the goal is to conclude representativeness And one cannot accept H0 based on a high p-value

Cohen, P.R. Getting what you deserve from data. 1996. You have not ruled out possible explanations for failing to reject H0

Small sample size

High sample variance

Page 17: 4 months left in a 4 year program  Ask me how it’s going!

Statistical Power The probability that a test will reject H0 if it is false

α - the probability of falsely rejecting H0 (type I error)

β - the probability of failing to reject H0 when it is false (type II error)

(1-B) is power

A collection is representative if H0 is not rejected and the underlying test is powerful

Page 18: 4 months left in a 4 year program  Ask me how it’s going!

Obligatory Math Slide

Binomial X2 (representedness) - Power

Can solve for n to compute sample size for a fixed β Multinomial case – much less pleasant

Monte Carlo methods – how to generate pi?

Treat the problem as a collection of binomials, use min power

The binomial decomposition also lets us draw maps

Page 19: 4 months left in a 4 year program  Ask me how it’s going!

Components of Binomial Power Functions

standard normal quantile function

standard normal cumulative probability function

Parameters the biased proportion (p- is the “amount of effect”)

α is the significance level

Values n is sample size

p is the “true proportion” (the proportion of the category being assessed in the global range)

Page 20: 4 months left in a 4 year program  Ask me how it’s going!

Mapping Statistical Power1. Tile request (x,y,z,A)

1. Range query, returning all intersecting GLUs (QT Lookup)

Page 21: 4 months left in a 4 year program  Ask me how it’s going!

Mapping Statistical Power1. Tile request (x,y,z,A)

1. Range query, returning all intersecting GLUs (QT Lookup)

2. Iterate over intersecting GLUs

1. Dimensionality reduction (optional)

2. Map to condition

3. Map to binomial

Page 22: 4 months left in a 4 year program  Ask me how it’s going!

Mapping Statistical Power1. Tile request (x,y,z,A)

1. Range query, returning all intersecting GLUs (QT Lookup)

2. Iterate over intersecting GLUs

1. Dimensionality reduction

2. Map to condition

3. Map to binomial

Page 23: 4 months left in a 4 year program  Ask me how it’s going!

Mapping Statistical Power1. Tile request (x,y,z,A)

1. Range query, returning all intersecting GLUs (QT Lookup)

2. Iterate over intersecting GLUs

1. Dimensionality reduction

2. Map to condition

3. Map to binomial

4. Compute power

1. Map to color key

2. Render GLU (scanline renderer)

Page 24: 4 months left in a 4 year program  Ask me how it’s going!

Representativeness, the Next Generation

Multivariate analysis

Modular analytical workflows & geoviz Save & Share workflows

Includes two Power Analysis tools Multinomial/Monte Carlo power

Binomial Power/Sample Size

Page 25: 4 months left in a 4 year program  Ask me how it’s going!
Page 26: 4 months left in a 4 year program  Ask me how it’s going!
Page 27: 4 months left in a 4 year program  Ask me how it’s going!

In Summary GLOBE Application: new scientific workflows for Land Change Science

Broadly applicable where local studies are used to make global inferences

Collaboration, Contextualizing, Analytics

Representativeness is a goal of synthesis studies that is facilitated by GLOBE

But representativeness is not a directly testable hypothesis Chi Square, other statistics are useful for identifying and diagnosing bias

Power analysis helps to complete the case for representativeness

Page 28: 4 months left in a 4 year program  Ask me how it’s going!

In the Pipeline Cases & Community Outreach

LTER network

World Bank / AidData integration (15,000+ cases)

GLOBE Temporal Consider spatiotemporal patterns

Aimed at supporting the Archaeology community

GLOBE BigData GLOBE Water

Working at higher resolutions ISEA3H resolution 17 = 400M .4km2 land-covered cells

Distributed / MapReduce architecture

Page 29: 4 months left in a 4 year program  Ask me how it’s going!

Thanks! Visit us at http://globe.umbc.edu

Get an account at http://globe.umbc.edu/app