modeling presence/absence data

18
Modeling Presence/Absence Data Acknowledgements to WyomingFishing.net (electro-fishing pics and Michael Houts (Wolf data and article)

Upload: jenny

Post on 25-Feb-2016

55 views

Category:

Documents


1 download

DESCRIPTION

Modeling Presence/Absence Data. Acknowledgements to WyomingFishing.net (electro-fishing pics ) and Michael Houts (Wolf data and article). Counting Fish. How are fish numbers calculated? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling Presence/Absence Data

Modeling Presence/Absence Data

Acknowledgements to WyomingFishing.net (electro-fishing pics)and Michael Houts (Wolf data and article)

Page 2: Modeling Presence/Absence Data

• How are fish numbers calculated?

• “There are approximately 3200 trout per mile that are greater than 6” on the Miracle Mile…66% are Browns, 29% are Rainbows and 5% are Snake River Cutts

• Where does this information come from?

• Lots of ways to count fish and do fish surveys…will discuss a bit about electrofishing

Counting Fish

Page 3: Modeling Presence/Absence Data

The Miracle Mile

Page 4: Modeling Presence/Absence Data

4

Background – Electrofishing• Electrofishing

– Portable generator– DC current from generator is at ??? volts to immobilize fish– Probes are electrodes which provide positive end of current– Nets are called “dip nets”– Back-up samplers catch missed fish– Fish placed in a flooded net

Page 5: Modeling Presence/Absence Data
Page 6: Modeling Presence/Absence Data

• Determining correct voltage is important…too little voltage will not allow sufficient capture…too much…well, we know what that means!

• Voltage studies involve setting up tanks with similar water chemistry to stream of interest…place fish in tank, provide a voltage amount, observe if fish immobilized

• electricfish.csv contains such a dataset

How Much Voltage to Use??

Page 7: Modeling Presence/Absence Data

7

Since our interest is in determining what sort of relationship may exist between y and x, we can take a regression approach. We believe that the probability of immobilization, , is a function of X= voltage

Page 8: Modeling Presence/Absence Data

8

If we use the following linear relationship between Y and X 0 1Voltage ,

Problems???

Page 9: Modeling Presence/Absence Data

Need “Legal” Estimates

Need a function to relate mean to predictor (voltage) which binds it to an appropriate scale. i.e. if you need a mean that can’t assume negative values, use a log

0 1 1ln x x 0 1 1expx x

Page 10: Modeling Presence/Absence Data

“Legal” continued…For Bernouilli data, we want the mean to be between 0 and 1…a common mechanism for achieving this is the use of the logit link

0 1 11x

ln xx

0 1 1

0 1 11exp x

xexp x

Page 11: Modeling Presence/Absence Data

#Logit fit using the glm function model1 <- glm(Response ~ Voltage, family=binomial(logit), data=electric) summary(model1) #linear regression fit using the lm function model2 <- lm(Response ~ Voltage, data=electric) logitfits <- fitted(model1) linearfits <- fitted(model2) electric$logits <- logitfits electric$linear <- linearfits

Using R

Page 12: Modeling Presence/Absence Data

Plotting Using ‘Lattice’

library(lattice) xyplot(electric$Response + electric$logits + electric$linear ~ electric$Voltage, aspect=1,panel=function(x,y){ panel.xyplot(electric$Voltage,electric$Response,col="black") panel.xyplot(electric$Voltage,electric$logits,type="smooth") panel.xyplot(electric$Voltage,electric$linear,type="smooth")},

xlab="Voltage",ylab="Immobilized",data=electric)

R Code:

Page 13: Modeling Presence/Absence Data

The ‘best fit’ logit to our data compared to the ‘best fit’ line is given by

Page 14: Modeling Presence/Absence Data

Predicted ProbabilitiesThe model is not in terms of i though, it is instead in terms of the ‘log of the odds ratio’ or the ‘logit’

0 1= 1

i

i

ln Voltage

Thus, in ‘logistic regression’ we model the log odds of an event Even though we are likely more interested in than we are in the log odds, the log odds have ‘nicer mathematical properties’…so we do inference for the log odds and then backtransform to get the estimated value of

0 1

0 1

ˆ ˆexpˆ =

ˆ ˆ1 expi

Voltage

Voltage

Page 15: Modeling Presence/Absence Data

So our estimated model is given by:

-13.84 + 0.1008

ˆ = 1 -13.84 + 0.1008exp Voltageexp Voltage

Small

change in probBig change

in prob

Page 16: Modeling Presence/Absence Data

• Understanding habitat selection by animals, plants, and aquatic species is an important problem faced by ecologists and wildlife biologists worldwide

• If we know what sort of habitat critters select for, we can better manage these species

• Will consider a data set, wolves_geo.csv, which reports wolf occurrence in two years following wolf re-introduction in the Greater Yellowstone Area

Resource Selection

Page 17: Modeling Presence/Absence Data

• RD_DENSITY is a measure of, well, road density

• WOLVES_99 = 2 means the data came after 1999

• MAJOR_LC = codes for major land cover types (descriptions in landcover.txt)

• Paper: Houts03.pdf

Data Description

Page 18: Modeling Presence/Absence Data

Project

Fit the logistic regression model to the 1999 data set and createa column of predicted “probabilities” of wolf occurrence.

Are the “predicted probabilities” really “probabilities”?

Can we use this model to predict likely wolf occurrence acrossthe five state region? Why/why not?

Can you think of a better ‘design’ for building the initial model?