alberta innovates pem_presentation_feb13_2012_ram_version1
DESCRIPTION
This is a first draft of slides for a presentation to a session on Predictive Ecosystem Mapping at Alberta Innovates.TRANSCRIPT
Automated Predictive Mapping:
Lessons Learned about the Process
R. A. MacMillanLandMapper Environmental Solutions Inc.
I n d i v i d u a l s a l i n i t y h a z a r d r a t i n g sfo r ea c h l a y e r
1 0 0 x 1 0 0 m g r id
L a n d s c a p ec u r v a t u r e
V e g e t a t io n
R a in f a l l
G e o lo g y
S o i ls
L a n d s u r f a c e
S a l in i t y h a z a r dm a p
L a y e r w e ig h t in g s
2 x
1 x
2 x
1 x
3 x
T o t a l s a l in i t yh a z a r d r a t in g
What is a PEM?• Definition of Predictive Ecosystem
Mapping– Jones et al., 1999Predictive Ecosystem Mapping (PEM) – a computer, GIS and knowledge based method of stratifying landscapes into ecologically-oriented map units based on the overlaying of existing mapped themes and the processing of the resulting attributes by automated inferencing software using a formalized knowledge base containing ecological-landscape relationships.
Principals and Concepts
Fundamental PrincipalsDifferent Approaches to DSM
Fundamental Principals of DSM
Pedotransfer functions (PTF)
Bouma (1989): “translating data we have into what we need ”
Credit: Minasny & McBratney
Fundamental Principals of DSM
Credit: Minasny & McBratney
Principle 1:
Do not predict something that is easier to measure or map than the predictor
Effort
Fundamental Principals of DSM
Uncertainty
-Do not use PTFs unless you can evaluate the uncertainty, and for a given problem.
-If a set of alternative PTFs is available, use the one with minimum variance (= optimized).
Principle 2:
Credit: Minasny & McBratney
Predictive Mapping Concepts
From: Dobos et al., 2006 JRC – EUR 22123
A Spatial Soil Inference System ( Lagacherie & McBratney, 2005)
DTMRS image
X
Existing Soil map
Scorpan layers
Soil observations
Spatial Soil Information System DSM Function library
Scorpan F.
Pedotransfer F
Class Content F.
Allocation F.
User interface
User data
Predictor
OUTPUT
Function organiser
DSM Methods
Translating Concepts into Results
Approaches to Producing Predictive Area-Class Maps
Unsupervised Classification
• If You Do Not Know (or Are Not Confident That You Know)– What spatial entities are optimal to map– What their defining attributes are– Under what conditions (of input variables) they occur
• Then You Are Best Served by Adopting– An unsupervised classification approach
• ISODATA (Irwin et al., 1997)• Fuzzy k-means (Burrough et al., 2002, 2003; Irwin et al.,
1997)
Concept of Fuzzy K-means Clustering
Source: J. Balkovič & G. Čemanová
Credit: Sobocká et al., 2003
Example of Application of Fuzzy K-means Unsupervised
Classification
From: Burrough et al., 2001, Landscsape Ecology
Note similarity of unsupervised classes to
conceptual classes
Supervised Classification
• If You ARE Confident That You Know– What spatial entities are optimal to map– That you can consistently identify “A” when you
“see it”• But You ARE NOT Confident That You Know
– What the defining attributes of the entities are– That you can formally express under what
conditions (of input variables) the entities to be predicted occur
• Then You Are Best Served by Adopting– A supervised (data mining) classification approach
• Classification and Regression Trees (CART)• Bayesian Analysis of Evidence• Supervised Fuzzy-logic
Supervised Classification Using Regression Trees
From: Zhou et al., 2004 JZUS
Note similarity of supervised rules and classes to typical soil-landform conceptual classes
Note numeric estimate of likelihood of occurrence of classes
Supervised Classification Using Bayesian Analysis of
EvidenceFrom: Zhou et al., 2004 JZUSNote: ultimately this is just a way of establishing numerical measures of the likelihood of occurrence of each class to be predicted given the presence of a predictor class
Note: the final, overall probability value is computed as a weighted average of the individual probabilities of each potential output class given each input class on n input maps
Supervised Classification Using Bayesian Analysis of
Evidence/Classification Trees
From: Zhou et al., 2004 JZUS
Supervised Classification Using Fuzzy Logic
• Shi et al., 2004– Used multiple cases of
reference sites– Each site was used to
establish fuzzy similarity of unclassified locations to reference sites
– Used Fuzzy-minimum function to compute fuzzy similarity
– Harden class using largest (Fuzzy-maximum) value
– Considered distance to each reference site in computing Fuzzy-similarity
Fuzzy likelihood of being a broad ridge
Knowledge-Based Classification
• If You ARE Confident That You Know– What spatial entities are optimal and desirable to
map
• AND You ARE Confident That You Know– What the defining attributes of the entities are– That you have a pretty good idea of the conditions
(of input variables) that the entities develop under
• Then You May Be Well Served by Adopting– A Knowledge-based (heuristic) classification
approach• Pragmatic, subjective, semantically expressed knowledge
– SoLIM (Zhu), LandMapR (MacMillan), SIE (Xun Shi)• Formal, theoretical, quantitatively defined knowledge
– Shary et al., 2005 (GeoFis)
From: Zhu,SoLIM Handbook
Knowledge-Based Classification In SoLIM
From: MacMillan, 2005
Knowledge-Based Classification In LandMapR
Source: Steen and Coupé, 1997
PEM DSS Classification Using LandMapR
Normal Mesic
Moist Foot Slope
Warm SW Slope
Shallow Crest
Organic Wetland
Wet Toe Slope
Cold Frosty Wet
Permanent Lake
From: MacMillan, 2005
PEM from a knowledge-based approach can look like a normal PEM
Predictive Mapping10 Lessons Learned from my
BC PEM Mapping Experience
Lesson 1: Define What Constitutes Success!
• Key to Everything Else– Can’t achieve success if
you don’t know what it is (and how to measure it)
• Measure Success– Need a way to measure
success objectively
• Establish Standards– Need to set targets that
can be realistically met– Figure out what you
need and not what you feel you want
Lesson 2: Organize for Success – Partition work
• Key is to split work up– Don’t try to do
everything yourself– Do what you do best
and let others do what they do best
– Don’t give implementer control over time and budget
– Check and verify
Forest Industry Clients
Government Funding
Programs
Dedicated Project
Manager
Project Technical Monitor
External Compliance
Auditors
GIS Input Data Preparation Specialists
Local Knowledge
Expert
Knowledge Engineer
& MapperIndependent Field Accuracy
Assessors
Government Published
Knowledge
Government Digital Data Repository
Government Published Standards
Research and Development Environment
Theory, Methods, Data, Tools, Software
Lesson 3: Test and Verify All of Your Assumptions –
Objectively!• PEM Pilot – 2002/03
– Automated methods will be less costly than traditional manual ones– Intensive manual interpretation and field sampling will produce more
accurate maps than those produced by automated modeling• Canim Lake PEM Operational Scale-up – 2003/04
– Automated predictive methods aren’t scalable for operational mapping– Finer resolution DEM data (5 & 10 vs. 25m) will yield more accurate
maps• Quesnel Operational PEM – 2004/05
– Unit costs can go down with efficiencies of scale as larger areas are mapped
– Single sets of KB rules can apply to entire BEC subzones• East Williams Lake Operational PEM – 2005/06
– Local experts can agree on correct classification in the field at 100% of visited locations
– Areas of elevated frost hazard can be predicted to occur in structural hollows
• East Quesnel and West Williams Lake Operational PEMs – 2006/07– Land Cover information from LandSat imagery is not useful for PEMs
Lesson 4: There’s More Than One Way to Skin this Cat!
• Use Expert Knowledge to Predict PEM Entities
• Use Data Mining – To Develop Statistical Classification Rules
Lesson 5: Select Appropriate Predictor Inputs!
• Predictors are More Important than Methods– Appropriate predictors
co-vary with entities to be predicted (at that scale)
– Multi-scale inputs are being used increasingly
• Expanding windows
– Measures of context and pattern are important
• Replacing local measures of slope and shape
Lesson 6: DEMs Don’t Tell You Everything!
• You Need to Make Use of Ancillary Predictor Data Sets– DEMs can tell you:
• SHAPE & SIZE (at a specific scale)• CONTEXT & PATTERN• ORIENTATION
– DEMs can’t tell you:• SUB-SURFACE ATTRIBUTES
– Texture or Mineralogy– Water table depth or seepage
• SURFACE COVER ATTRIBUTES– Land use, land cover, vegetation
5 m DEM
800 m900 m
25 m DEM
800 m900 m
Ancillary data sets are important and needed!
• Radiometrics 4 Subsurface• Imagery 4 Surface Cover
Lesson 7: Hierarchies Establish Context!
• One Set of Rules Can’t Fit Everywhere– Sub-divide map areas
into successively smaller and more homogeneous “classification domains”
– Develop and apply different KB rules in different map domains
– Knowing “where” you are tells you what to expect = CONTEXT!
Source: Steen and Coupé, 1997
Lesson 8: Don’t Model What You Can Directly Map More
Efficiently!Principle 1:
Do not predict something that is easier to measure or map than to predict!
So – if you can map it manually faster or better, do not hesitate to do so!
Lesson 9: Don’t Expect Perfection!
• These are concepts and NOT reality– Any number of maps
may be equally “good”– N experts will never
agree at all locations– Input data and models
are not perfect– Need to stop when
map is “good enough”– More time and effort
often give poorer result
T28B T28K T28O T28R T28M B-K B-O B-R K-O K-R O-R00 12 10 5 7 0 10 5 7 5 7 501 28 31 35 50 40 28 28 28 31 31 3402 19 7 23 4 6 7 19 19 7 4 403 20 12 5 2 9 12 5 2 5 2 204 10 19 1 15 26 10 10 10 1 15 105 1 9 0 1 7 1 1 1 0 1 006 10 3 4 0 2 3 3 0 3 0 007 0 6 5 21 10 0 0 0 5 6 508 0 3 3 0 0 0 0 0 3 0 009 0 0 21 0 0 0 0 0 0 0 0
71 71 67 60 66 51
Agreement between Ecologist 64Agreement between Map and Ecologists 65Overall Ecologist Agreement 65Overall Map Agreement 67
T28B T28K T28O T28R T28M B-K B-O B-R K-O K-R O-R00 12 10 5 7 0 10 5 7 5 7 501 28 31 35 50 40 28 28 28 31 31 3402 19 7 23 4 6 7 19 19 7 4 403 20 12 5 2 9 12 5 2 5 2 204 10 19 1 15 26 10 10 10 1 15 105 1 9 0 1 7 1 1 1 0 1 006 10 3 4 0 2 3 3 0 3 0 007 0 6 5 21 10 0 0 0 5 6 508 0 3 3 0 0 0 0 0 3 0 009 0 0 21 0 0 0 0 0 0 0 0
71 71 67 60 66 51
Agreement between Ecologist 64Agreement between Map and Ecologists 65Overall Ecologist Agreement 65Overall Map Agreement 67
Lesson 10: Discontinuities are Important!
From: Minar and Evans, 2008
Predictive MappingSome lessons learned from
my recent global soil mapping experiences
Some Lessons Learned• The process is more important than
the product– The final predictive map output product
is not the most important product.– The most important product is the
process used to create the predictive maps.
– The map is diminished in value if it cannot be easily updated, improved and replicated.
– The process has to capture and retain all inputs, all procedures and all outputs.
A proposal for a centralized ecological
information facility– Rationale:• Link individual components together to ensure
that ecological information products are:– Complete: information of similar content and
appearance is provided everywhere and not just for a scattered patchwork.
– Consistent: products avoid glaring inconsistencies, abrupt discontinuities and clearly recognizable differences between different project areas or mapping entities.
– Correct: all products are as correct, or accurate, as possible, assessed relative to real data using objectively defined criteria.
– Current: all data are as up to date and current as possible and new versions of outputs can be regularly and easily produced.
• Key objective• To provide an overarching methodological
framework and operational platform for the production of consistent province-wide ecological information products.
A proposal for a centralized ecological
information facility
• Basic concept• Provide an overarching methodological framework
that links individual components (bits) into an integrated whole whose functions interact intelligently to produce consistent outputs.
– An Open Database of field observations and classifications
– A repository of consistent gridded covariate maps– A linked library of complimentary functions and utilities
(mostly but not exclusively produced using R) for manipulating and processing the preceding data sets to automatically produce models and maps of ecological entity spatial patterns (and uncertainties) according to agreed specifications.
– A platform and utilities for discovering, displaying and retrieving grid maps of soil properties for any area of interest.
A proposal for a centralized ecological
information facility
An Example of a Central Information Facility
(GSIF)
An Example of a Central Information Facility
(GSIF)
• Build and maintain the cyber-infrastructure– Think of Facebook, Twitter or Google
• You provide the platform and the functionality• You establish the templates, standards and tools• Users contribute content and effort to create content
• Make it easy to use and rapid to update– Maps become best models of current reality
• Based on automated analysis of latest inputs and data• Let the maps constantly grow and improve – not static• Move towards real-time dynamic maps not one-time
static maps
What do I recommend?
• Expert rule based models take time and effort– Slow and costly to redo or update– Subject to bias and differences in experience
• Statistical models from data mining – Will often produce results inferior to ones
created using expert knowledge– But offer the ability to continuously and
regularly improve the rules and update the maps automatically
– Are optimized, in the sense that the output maps fit the available observations as closely as possible
Why do I recommend this?
Thank You
And Good Luck!