generalized linear model (gzlm): overview
DESCRIPTION
Generalized Linear Model (GZLM): Overview. Dependent Variables. Continuous Discrete Dichotomous Polychotomous Ordinal Count. Continuous Variables. Quantitative variables that can take on any value within the limits of the variable. Continuous Variables (cont’d). - PowerPoint PPT PresentationTRANSCRIPT
Generalized Linear Model (GZLM):
Overview
Dependent Variables
Continuous Discrete
DichotomousPolychotomousOrdinalCount
Continuous Variables
Quantitative variables that can take on any value within the limits of the variable
Continuous Variables (cont’d) Distance, time, or length
Infinite number of possible divisions between any two values, at least theoretically
“Only love can be divided endlessly and still not diminish” (Anne Morrow Lindbergh)
More than 11 ordered valuesScores on standardized scales such as those
that measure parenting attitudes, depression, family functioning, and children’s behavioral problems
Discrete Variables
Finite number of indivisible values; cannot take on all possible values within the limits of the variableDichotomousPolytomous OrdinalCount
Dichotomous Variables
Two categories used to indicate whether an event has occurred or some characteristic is present
Sometimes called binary or binomial variables
“To be or not to be, that is the question..” (William Shakespeare, “Hamlet”)
Dichotomous DVs
Placed in foster care or not Diagnosed with a disease or not Abused or not Pregnant or not Service provided or not
Polytomous Variables
Three or more unordered categories Categories mutually exclusive and
exhaustive Sometimes called multicategorical or
sometimes multinomial variables “Inanimate objects can be classified
scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker)
Polytomous DVs
Reason for leaving welfare:marriage, stable employment, move to
another state, incarceration, or death Status of foster home application:
licensed to foster, discontinued application process prior to licensure, or rejected for licensure
Changes in living arrangements of the elderly:newly co-residing with their children, no
longer co-residing, or residing in institutions
Ordinal Variables
Three or more ordered categories Sometimes called ordered categorical
variables or ordered polytomous variables
“Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Ordinal DVs
Job satisfaction:very dissatisfied, somewhat dissatisfied,
neutral, somewhat satisfied, or very satisfied Severity of child abuse injury:
none, mild, moderate, or severe Willingness to foster children with
emotional or behavioral problems: least acceptable, willing to discuss, or most
acceptable
Count Variables
Number of times a particular event occurs to each case, usually within a given:Time period (e.g., number of hospital visits
per year)Population size (e.g., number of registered
sex offenders per 100,000 population), orGeographical area (e.g., number of divorces
per county or state) Whole numbers that can range from 0
through +
Count Variables (cont’d)
“Now I've got heartaches by the number,Troubles by the score,Every day you love me less,Each day I love you more” (Ray Price)
Count DVs
Number of hospital visits, outpatient visits, services used, divorces, arrests, criminal offenses, symptoms, placements, children fostered, children adopted
General Linear Model (GLM) (selected models)
Continuous DV
Linear Regression
ANOVA
t-test
Generalized Linear Model (GZLM) (selected regression models)
GZLM
ContinuousDV
DichotomousDV
Polytomous DV
OrdinalDV
CountDV
LinearRegression
BinaryLogistic
Regression
MultinomialLogistic
Regression
OrdinalLogistic
Regression
Poisson orNegativeBinomial
Regression
Generalized How?
DV continuous or discrete Normal or non-normal error distributions Constant or non-constant variance Provides a unifying framework for
analyzing an entire class of regression models
GLM & GZLM Similarities
IVs are combined in a linear fashion (α + 1X1 + 2X2 + … kXk ;
a slope is estimated for each IV; each slope has an accompanying test
of statistical significance and confidence interval;
each slope indicates the IV’s independent contribution to the explanation or prediction of the DV;
GLM & GZLM Similarities (cont’d) the sign of each slope indicates the
direction of the relationship IVs can be any level of measurement; the same methods are used for coding
categorical IVs (e.g., dummy coding); IVs can be entered simultaneously,
sequentially or using other methods; product terms can be used to test
interactions;
GLM & GZLM Similarities (cont’d) powered terms (e.g., the square of an
IV) can be used to test curvilinearity; overall model fit can be tested, as can
incremental improvement in a model brought about by the addition or deletion of IVs (nested models); and
residuals, leverage values, Cook’s D, and other indices are used to diagnose model problems.
Common Assumptions
Correct model specification Variables measured without error Independent errors No perfect multicollinearity
Correct Model Specification
Have you included relevant IVs? Have you excluded irrelevant IVs? Do the IVs that you have included have
linear or non-linear relationships with your DV (or some function of your DV, as discussed below)?
Are one or more of your IVs moderated by other IVs (i.e., are there interaction effects)?
Variables Measured without Error Limitation of regression models, given
that most often our variables contain some measurement error
Independent Errors
Can be result of study design, e.g.:– Clustered data, which occurs when data are
collected from groups– Temporally linked data, which occurs when
data are collected repeatedly over time from the same people or groups
Can lead to incorrect significance tests and confidence intervals
Independent Errors (cont’d)
Examples of when this might not be trueEffect of parenting practices on behavioral
problems of children and reports of parenting practices and behavioral problems collected from both parents in two-parent families
Effect of parenting practices on behavioral problems of children and information collected about behavioral problems for two or more children per family
Effects of leader behaviors on group cohesion in small groups, and information collected about leader behaviors and group cohesion from all members of each group
No Perfect Multicollinearity
Perfect multicollinearity exists when an IV is predicted perfectly by a linear combination of the remaining IVs
Typically quantified by “tolerance” or “variance inflation factor” (VIF) (1/tolerance)
Even high levels of multicollinearity may pose problems (e.g., tolerance < .20 or especially < .10)
Estimating Parameters (e.g.,
) GLM
Ordinary Least Squares (OLS) estimation• Estimates minimize sum of the squared
differences between observed and estimated values of the DV
http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
GZLMMaximum Likelihood (ML) estimation
• Estimates have greatest likelihood (i.e., the maximum likelihood) of generating observed sample data if model assumptions are true
Testing Hypotheses
Overall and nested models (1 = 2 = k = 0)GLM
• F GZLM
• Likelihood ratio 2
Individual slopes ( = 0)GLM
• tGZLM
• Wald 2 or likelihood ratio 2
Estimating DV with GLM
Three ways of expressing the same thing… = α + 1X1 + 2X2 + … kXk
= • Assumed linear relationship
= Greek letter muEstimated mean value of DV
= Greek letter etaLinear predictor
Estimating DV with Poisson Regresion
ln() = α + 1X1 + 2X2 + … kXk
ln() = Assumed linear relationship
Single (Quantitative) IV Example
DV = number of foster children adopted IV = Perceived responsibility for
parenting (scale scores transformed to z-scores)
N = 285 foster mothers
Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children?
Poisson Model
ln() = α + X
log of estimated mean count .018 + (.185)(X)Log of mean number of children adoptedDoes not have intuitive or substantive
meaning
Mathematical Functions
Function√4 = 2
Inverse (reverse) function22 = 4
Mathematical Functions (cont’d)
Function ln(), natural logarithm of “Link function”
Inverse (reverse) functionexp(), exponential of
• ex on calculator• exp(x) in SPSS and Excel
“Inverse link function”
Link Function
ln(), log of estimated mean countConnects (i.e., links) mean value of DV to
linear combination of IVsTransforms relationship between and so
relationship is linearDifferent GZLM models use different linksDoes not have intuitive or substantive
meaning
Inverse (Reverse) Link Function
Three ways of expressing the same thing… = exp(α + 1X1 + 2X2 + … kXk) = exp() = e
represent values of the DV with intuitive and substantive meaninge.g., mean number of children adopted
Estimated Mean DV
.018 + (.185)(X)
X = 0 .018 + (.185)(0) = .018e.018 = 1.018M = 1.02 children adopted
X = 1 .018 + (.185)(1) = .203e.203 = 1.225M = 1.23 children adopted
Examples of Exponentiation
e0 = 1.00
e.50 = 1.65
e1.00 = 2.72
Problem
For discrete DVs the relationship between the DV () and the linear predictor () is non-linear
= α + 1X1 + 2X2 + … kXk =
• Non-linear
One-unit increase in an IV may be associated with a different amount of change in the mean DV, depending on the initial value of the IV
Example Non-linear Relationship
0.00
0.50
1.00
1.50
2.00
Standardized Parenting Responsibility
Mea
n N
umbe
r of
Chi
ldre
n
Mean Number ofChildren
0.58 0.70 0.85 1.02 1.23 1.47 1.77
-3 -2 -1 0 1 2 3
Solution
Linear relationship between a linear combination of one or more IVs and some function of the DV
Example Linear Relationship
-0.60
-0.40
-0.20
0.00
0.20
0.40
0.60
0.80
Standardized Parenting Responsibility
ln(M
ean
Num
ber
of C
hild
ren)
ln(Mean Number ofChildren)
-0.54 -0.35 -0.17 0.02 0.20 0.39 0.57
-3 -2 -1 0 1 2 3