paradigm shifts in wildlife and biodiversity management through machine learning

45
e Learning in the Environmental Disciplin atistical Paradigm Shifts, Combining Data Mining ographic Information Systems (GIS), and Educating Next Generation of Environmental Researchers Falk Huettmann -EWHALE lab- 419 Irving 1 Inst. of Arctic Biology Biology & Wildlife Dept Fairbanks, Alaska 99775 USA [email protected]

Upload: salford-systems

Post on 20-Aug-2015

1.106 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Paradigm shifts in wildlife and biodiversity management through machine learning

Machine Learning in the Environmental Discipline:Statistical Paradigm Shifts, Combining Data Mining

with Geographic Information Systems (GIS), and Educating theNext Generation of Environmental Researchers

Falk Huettmann-EWHALE lab-419 Irving 1Inst. of Arctic BiologyBiology & Wildlife DeptFairbanks, Alaska 99775 USA [email protected]

Page 2: Paradigm shifts in wildlife and biodiversity management through machine learning

Applying and Teaching Machine Learning at Public Institutions

What’s at stake ?

-a better pre-screening of data (data mining)-a better inference (better model fits)-a better hypothesis (more relevant)-a faster and more accurate analysis-a better prediction (pro-active decisions) => progress in science & society => a new culture

Page 3: Paradigm shifts in wildlife and biodiversity management through machine learning

This presentation

-Environmental Sciences (Sustainability)

-Paradigm Shift in the Sciences

-Geographic Information Systems (GIS)

-Teaching Machine Learning at Public Institutions

Page 4: Paradigm shifts in wildlife and biodiversity management through machine learning

The earth and its atmosphere ...we only have one ...

Page 5: Paradigm shifts in wildlife and biodiversity management through machine learning

The atmosphere

Page 6: Paradigm shifts in wildlife and biodiversity management through machine learning

Rio Convention 1992 and its goals

Page 7: Paradigm shifts in wildlife and biodiversity management through machine learning

…people are everywhere these days, mostly at the coast, affecting Wildlife,Habitats and Ecological Services

The Earth at night…

Page 8: Paradigm shifts in wildlife and biodiversity management through machine learning

An Earth Sampler: The Three Poles

1.

2.

3.

Page 9: Paradigm shifts in wildlife and biodiversity management through machine learning

Three poles: Ice, glaciers and snow

=> A cool (or a hot?) place to be...

1. 2. 3.

Page 10: Paradigm shifts in wildlife and biodiversity management through machine learning

No data …and “nothing is known” …?Climbing and Understanding Data Mountains with Data Mining:

A free copy of GBIF data… www.gbif.org, seamap.env.duke.edu/ , mountainbiodiversity.org

Megascience

Page 11: Paradigm shifts in wildlife and biodiversity management through machine learning

Examples of people using and publishing onMachine Learning in the Biodiversity, Wildlifeand Sustainability Sciences…

Page 12: Paradigm shifts in wildlife and biodiversity management through machine learning

Classify, pre-screen, explain and predict data

A heap of data, e.g. online and merged,free of research design

Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +…. (e.g. clustering & multiple regression)

Page 13: Paradigm shifts in wildlife and biodiversity management through machine learning

Multiple Regressions: Are they any good ?!

Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +….

The issues are for instance: -Many predictors, e.g. 30

-Correlations

-Interactions (2-way, 3-way and higher)

-Coefficients + Offset

-(Multi-) hypothesis testing

-Model Selection

-Parsimony

= > A single formula vs a software code as a result ?!

Page 14: Paradigm shifts in wildlife and biodiversity management through machine learning

What statistical philosophies and schools exist ?

-Frequency Statistics

-Bayesian Statistics

-Machine Learning

Occam(Parsimony)

R. A. Fisher

Bayes

L. Breiman

Page 15: Paradigm shifts in wildlife and biodiversity management through machine learning

What statistical philosophies and schools exist ?

e.g. Frequency Statistics-Normal distribution-Independence-Linear functions-Variance explained-Parsimony

Page 16: Paradigm shifts in wildlife and biodiversity management through machine learning

What statistical philosophies and schools exist ?

e.g. Bayesian Statistics-Priors-Fit a function-Variance explained-(alternative) confidence intervals-Linear functions (usually)

Page 17: Paradigm shifts in wildlife and biodiversity management through machine learning

What statistical philosophies and schools exist ?

e.g. Machine Learning-Inherently non-linear-No prior assumptions (non-parametric)-Fast-Computationally intense

Page 18: Paradigm shifts in wildlife and biodiversity management through machine learning

What linear models and frequency statistics cannot achieve

‘Mean’SD => potentially low r2

‘Mean’ ?SD ?Machine Learning, e.g.CART, TreeNet & RandomForest(there are many other algorithms !)

Linear(~unrealistic)

Non-Linear(driven by data)

Virtually not Pro-Active Pro-active Potential ~Pre-cautionary

Page 19: Paradigm shifts in wildlife and biodiversity management through machine learning

A numerical pattern The mimick’ed pattern

Machine Learning algorithm

Virtually NOT parsimonious (no AIC, no R2 etc etc)

What machine learning is

Page 20: Paradigm shifts in wildlife and biodiversity management through machine learning

What is a non-linear algorithm …?

Through the origin vs. Not through the origin

Single formula vs. Many formulas

Page 21: Paradigm shifts in wildlife and biodiversity management through machine learning

What is a non-linear algorithm …?

Through the origin vs. Not through the origin

Page 22: Paradigm shifts in wildlife and biodiversity management through machine learning

What is a non-linear algorithm …?

Single formula vs. Many formulas

Page 23: Paradigm shifts in wildlife and biodiversity management through machine learning

What is a non-linear algorithm …?

Through the origin vs. Not through the originSingle formula vs. Many formulas

A binary software algorithm that captures the mathematical relationship vs. A single formula with (a) coefficient

Page 24: Paradigm shifts in wildlife and biodiversity management through machine learning

Linear regression: a+ bx vs. Machine Learning?!

Linear regression as an artefact, aka statistical & institutional terror ?!

Page 25: Paradigm shifts in wildlife and biodiversity management through machine learning

Predictions as the prime goal, and linked with Adaptive Management(done in Random Forest)

Pelagic King Penguin prediction using RF (french telemetry data from SCAR-MARBIN)

Best dataused and available ?

Predictions

Page 26: Paradigm shifts in wildlife and biodiversity management through machine learning

Outliers and variance …

Root Mean Square Error (RMSE)

real

ity

predicted

Page 27: Paradigm shifts in wildlife and biodiversity management through machine learning

Outliers and variance …

Confusion Matrix (=> ROC curve)

PredictedYes No80% 20%

20% 80%

Rea

lity

No

Yes

Page 28: Paradigm shifts in wildlife and biodiversity management through machine learning

Your classic Frequentist Statisticianmight get very scared and horrifiedwith such data…

A heap of data, e.g. online and merged,free of research design

Moving into Data Mining (non-parametric, rel. high data error)

Page 29: Paradigm shifts in wildlife and biodiversity management through machine learning

Describe Patterns,Outliers

InformedTraditional

Analysis

Data Quantification (~prediction)Sc

reen

ing

A heap of data, e.g. online and merged

Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and RandomForests for data-mining and for finding meaningful patterns, relationships and outliers in complexecological data: an overview, an example using golden eagle satellite data and an outlook for apromising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies throughPattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.

Moving into Data Mining

Page 30: Paradigm shifts in wildlife and biodiversity management through machine learning

Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and RandomForests for data-mining and for finding meaningful patterns, relationships and outliers in complexecological data: an overview, an example using golden eagle satellite data and an outlook for apromising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies throughPattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.

Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predictedspecies distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of ComputationalIntelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence,Vol. 122, Springer-Verlag Berlin Heidelberg.

DataMining

Describe Patterns,Outliers

Informed(Traditional)

Analysis

Scre

enin

g

A heap of data, e.g. online and merged

Data Quantification (~prediction)

Page 31: Paradigm shifts in wildlife and biodiversity management through machine learning

Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and RandomForests for data-mining and for finding meaningful patterns, relationships and outliers in complexecological data: an overview, an example using golden eagle satellite data and an outlook for apromising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies throughPattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.

Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predictedspecies distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of ComputationalIntelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence,Vol. 122, Springer-Verlag Berlin Heidelberg.

DataMining

Describe Patterns,Outliers

Informed(Traditional)

Analysis

SOFTWARE This becomes a standard, e.g. BEST PROFESSIONAL PRACTICE

Scre

enin

g

A heap of data, e.g. online and merged

Data Quantification (~prediction)

Page 32: Paradigm shifts in wildlife and biodiversity management through machine learning

(Adaptive/Science-based) Wildlife Management:

How it works…-Identify a problem

-Do the science

-Provide a management suggestion

-Implement science-based management

-Assess and fine-tune

=>Pro-active (before trouble occurs)=> Predictive in Space and Time

Page 33: Paradigm shifts in wildlife and biodiversity management through machine learning

Paradigm Shifts-Data Mining as a means to inform an analysis

-Predictions as a goal in itself

-Linear regressions and algorithms as a poor performer for modern problems of relevance to mankind

-Metrics like r2, p-values and AIC as being suboptimal

-Statistics based on (very advanced) software

-Decision-support systems, Pro-Active Management and Operations Research

Page 34: Paradigm shifts in wildlife and biodiversity management through machine learning

RandomForest as a scientific discipline ?!

-a big and new research field per se (e.g. machine learning, ensemble models)

-comes naturally with computing power and the internet

-changes science (and fieldwork, e.g. design-based)

-changes education (e.g. universities & stats depts)

-changes pro-active/adaptive management, multi-species

-changes society

-affects land- and seascapes

Page 35: Paradigm shifts in wildlife and biodiversity management through machine learning

Adding Spatial Complexity:What is a GIS

(Geographic Information System) ?“Computerized Mapping”

Consists of Hardware & Software

(Monitor, PC and Database)

ArcGIS by ESRI as a major software suite

=>Can be linked with Machine Learning, e.g. for predictions, classifications and data mining

Page 36: Paradigm shifts in wildlife and biodiversity management through machine learning

What is a GIS(Geographic Information System) ?

Page 37: Paradigm shifts in wildlife and biodiversity management through machine learning

RandomForest & GIS: Mapping Supervised & Unsupervised Classification

Supervised Classification: -Multiple Regression (classification or continuous)

-Multiple Response e.g. YAIMPUTE

Unsupervised Classification: 1. Proximity Matrix via Bagging/Voting (RF)

2. Similarity Matrix

3. e.g. Regular Clustering (mclust, PAM)

3. Visualize Result

RandomForest

Page 38: Paradigm shifts in wildlife and biodiversity management through machine learning

11 Cliome Clusters (RF)Climate Cluster Data, Canada & Alaska

Credit: M. Lindgren et al.

Page 39: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching Machine Learning What’s the issue ?

-the current teaching concept re. math & stats

-wider introduction of machine learning

-parsimony vs. holistic analysis and approach

-better predictions

Page 40: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching math, statistics and machine learning…

Usually, machine learning starts wheremaximum-likelihood theory ends…

=>maximum-likelihood starts at college…

But it does not have to be that way!

Page 41: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching math, statistics and machine learning…

Student’s age Topic learned

16 Basic Math

20 Frequency Statistics (Maximum Likelihood)

23 - 30 Machine Learning

Page 42: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching math, statistics and machine learning…the better way

Student’s age Topic learned 15 Basic Math 16 Frequency Statistics 18 (Maximum Likelihood) 19 Machine Learning Why & How done ?

via Greybox Blackbox

Page 43: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching math, statistics and machine learning…

Parsimony (complexity vs. variance explained)

Model name AIC etcModel 1 1000Model 2 900Model 3 600Model 4 500Model 5 1200Model 6 1100

Breaking down a complexreal-world problem into a singlelinear term…and for a mechanistic understanding…??

Page 44: Paradigm shifts in wildlife and biodiversity management through machine learning

Teaching math, statistics and machine learning…

Predictions (knowing things ahead of time)

For spatial predictionsFor temporal predictionsFor a pro-active management

Page 45: Paradigm shifts in wildlife and biodiversity management through machine learning

To…Salford Systems, Dan Steinberg and his team, our students, L. Strecker,S. Linke, and many many project and research collaboratorsworld-wide over the years!!

Questions ?