annotated stata output_ multinomial logistic regression.pdf

4
giving a gift Help the Stat Consulting Group by Stata Annotated Output Multinomial Logistic Regression This page shows an example of an multinomial logistic regression analysis with footnotes explaining the output. The data were collected on 200 high school students and are scores on various tests, including science, math, reading and social studies. The outcome measure in this analysis is socioeconomic status (ses) low, medium and high from which we are going to see what relationships exists with science test scores (science), social science test scores (socst) and gender (female). Our response variable, ses, is going to be treated as categorical under the assumption that the levels of ses status have no natural ordering and we are going to allow Stata to choose the referent group, middle ses. The first half of this page interprets the coefficients in terms of multinomial logodds (logits) and the second half interprets the coefficients in terms of relative risk ratios. use http://www.ats.ucla.edu/stat/data/hsb2, clear mlogit ses science socst female Iteration 0: log likelihood = ‐210.58254 Iteration 1: log likelihood = ‐194.75041 Iteration 2: log likelihood = ‐194.03782 Iteration 3: log likelihood = ‐194.03485 Iteration 4: log likelihood = ‐194.03485 Multinomial logistic regression Number of obs = 200 LR chi2(6) = 33.10 Prob > chi2 = 0.0000 Log likelihood = ‐194.03485 Pseudo R2 = 0.0786 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ses | Coef. Std. Err. z P>|z| [95% Conf. Interval] ‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ low | science | ‐.0235647 .0209747 ‐1.12 0.261 ‐.0646744 .017545 socst | ‐.0389243 .0195165 ‐1.99 0.046 ‐.0771759 ‐.0006726 female | .8166202 .3909813 2.09 0.037 .050311 1.582929 _cons | 1.912256 1.127256 1.70 0.090 ‐.2971258 4.121638 ‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ high | science | .022922 .0208718 1.10 0.272 ‐.0179861 .0638301 socst | .0430036 .0198894 2.16 0.031 .0040211 .081986 female | ‐.032862 .3500153 ‐0.09 0.925 ‐.7188793 .6531553 _cons | ‐4.057323 1.222939 ‐3.32 0.001 ‐6.45424 ‐1.660407 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ (ses==middle is the base outcome) Iteration Log a Iteration 0: log likelihood = ‐210.58254 Iteration 1: log likelihood = ‐194.75041 Iteration 2: log likelihood = ‐194.03782 Iteration 3: log likelihood = ‐194.03485 Iteration 4: log likelihood = ‐194.03485 a. This is a listing of the log likelihoods at each iteration. Remember that multinomial logistic regression, like binary and ordered logistic regression, uses maximum likelihood estimation, which is an iterative procedure. The first iteration (called iteration 0) is the log likelihood of the "null" or "empty" model; that is, a model with no predictors. At the next iteration, the predictor(s) are included in the model. At each iteration, the log likelihood decreases because the goal is to minimize the log likelihood. When the difference between successive iterations is very small, the model is said to have "converged", the iterating stops, and the results are displayed. For more information on this process for binary outcomes, see Regression Models for Categorical and Limited Dependent Variables by J. Scott Long (page 5261). Model Summary Multinomial logistic regression Number of obs c = 200 LR chi2(6) d = 33.10 Prob > chi2 e = 0.0000 Log likelihood = ‐194.03485 b Pseudo R2 f = 0.0786 b. Log Likelihood This is the log likelihood of the fitted model. It is used in the Likelihood Ratio ChiSquare test of whether all predictors' regression coefficients in the model are simultaneously zero and in tests of nested models. c. Number of obs This is the number of observations used in the multinomial logistic regression. It may be less than the number of cases in the dataset if there are missing values for some variables in the equation. By default, Stata does a listwise deletion of incomplete cases. > stata_mlogit_output.htm > stat > stata output

Upload: mfong-thong

Post on 17-Dec-2015

31 views

Category:

Documents


1 download

TRANSCRIPT

  • 5/17/2015 AnnotatedStataOutput:MultinomialLogisticRegression

    http://www.ats.ucla.edu/stat/stata/output/stata_mlogit_output.htm 1/4

    givingagiftHelptheStatConsultingGroupby

    StataAnnotatedOutputMultinomialLogisticRegressionThispageshowsanexampleofanmultinomiallogisticregressionanalysiswithfootnotesexplainingtheoutput.Thedatawerecollectedon200highschoolstudentsandarescoresonvarioustests,includingscience,math,readingandsocialstudies.Theoutcomemeasureinthisanalysisissocioeconomicstatus(ses)low,mediumandhighfromwhichwearegoingtoseewhatrelationshipsexistswithsciencetestscores(science),socialsciencetestscores(socst)andgender(female).Ourresponsevariable,ses,isgoingtobetreatedascategoricalundertheassumptionthatthelevelsofsesstatushavenonaturalorderingandwearegoingtoallowStatatochoosethereferentgroup,middleses.Thefirsthalfofthispageinterpretsthecoefficientsintermsofmultinomiallogodds(logits)andthesecondhalfinterpretsthecoefficientsintermsofrelativeriskratios.

    usehttp://www.ats.ucla.edu/stat/data/hsb2,clear

    mlogitsessciencesocstfemale

    Iteration0:loglikelihood=210.58254Iteration1:loglikelihood=194.75041Iteration2:loglikelihood=194.03782Iteration3:loglikelihood=194.03485Iteration4:loglikelihood=194.03485

    MultinomiallogisticregressionNumberofobs=200LRchi2(6)=33.10Prob>chi2=0.0000Loglikelihood=194.03485PseudoR2=0.0786

    ses|Coef.Std.Err.zP>|z|[95%Conf.Interval]+low|science|.0235647.02097471.120.261.0646744.017545socst|.0389243.01951651.990.046.0771759.0006726female|.8166202.39098132.090.037.0503111.582929_cons|1.9122561.1272561.700.090.29712584.121638+high|science|.022922.02087181.100.272.0179861.0638301socst|.0430036.01988942.160.031.0040211.081986female|.032862.35001530.090.925.7188793.6531553_cons|4.0573231.2229393.320.0016.454241.660407(ses==middleisthebaseoutcome)

    IterationLoga

    Iteration0:loglikelihood=210.58254Iteration1:loglikelihood=194.75041Iteration2:loglikelihood=194.03782Iteration3:loglikelihood=194.03485Iteration4:loglikelihood=194.03485

    a.Thisisalistingoftheloglikelihoodsateachiteration.Rememberthatmultinomiallogisticregression,likebinaryandorderedlogisticregression,usesmaximumlikelihoodestimation,whichisaniterativeprocedure.Thefirstiteration(callediteration0)istheloglikelihoodofthe"null"or"empty"modelthatis,amodelwithnopredictors.Atthenextiteration,thepredictor(s)areincludedinthemodel.Ateachiteration,theloglikelihooddecreasesbecausethegoalistominimizetheloglikelihood.Whenthedifferencebetweensuccessiveiterationsisverysmall,themodelissaidtohave"converged",theiteratingstops,andtheresultsaredisplayed.Formoreinformationonthisprocessforbinaryoutcomes,seeRegressionModelsforCategoricalandLimitedDependentVariablesbyJ.ScottLong(page5261).

    ModelSummary

    MultinomiallogisticregressionNumberofobsc=200LRchi2(6)d=33.10Prob>chi2e=0.0000Loglikelihood=194.03485bPseudoR2f=0.0786

    b.LogLikelihoodThisistheloglikelihoodofthefittedmodel.ItisusedintheLikelihoodRatioChiSquaretestofwhetherallpredictors'regressioncoefficientsinthemodelaresimultaneouslyzeroandintestsofnestedmodels.

    c.NumberofobsThisisthenumberofobservationsusedinthemultinomiallogisticregression.Itmaybelessthanthenumberofcasesinthedatasetiftherearemissingvaluesforsomevariablesintheequation.Bydefault,Statadoesalistwisedeletionofincompletecases.

    >stata_mlogit_output.htm>stat >stata output

  • 5/17/2015 AnnotatedStataOutput:MultinomialLogisticRegression

    http://www.ats.ucla.edu/stat/stata/output/stata_mlogit_output.htm 2/4

    d.LRchi2(6)ThisistheLikelihoodRatio(LR)ChiSquaretestthatforbothequations(lowsesrelativetomiddlesesandhighsesrelativetomiddleses)atleastoneofthepredictors'regressioncoefficientisnotequaltozero.ThenumberintheparenthesesindicatesthedegreesoffreedomoftheChiSquaredistributionusedtotesttheLRChiSquarestatisticandisdefinedbythenumberofmodelsestimated(2)timesthenumberofpredictorsinthemodel(3).TheLRChiSquarestatisticcanbecalculatedby2*(L(nullmodel)L(fittedmodel))=2*((210.583)(194.035))=33.096,whereL(nullmodel)isfromtheloglikelihoodwithjusttheresponsevariableinthemodel(Iteration0)andL(fittedmodel)istheloglikelihoodfromthefinaliteration(assumingthemodelconverged)withalltheparameters.

    e.Prob>chi2ThisistheprobabilityofgettingaLRteststatisticasextremeas,ormoreso,thantheobservedunderthenullhypothesisthenullhypothesisisthatalloftheregressioncoefficientsacrossbothmodelsaresimultaneouslyequaltozero.Inotherwords,thisistheprobabilityofobtainingthischisquarestatistic(33.10)ifthereisinfactnoeffectofthepredictorvariables.Thispvalueiscomparedtoaspecifiedalphalevel,ourwillingnesstoacceptatypeIerror,whichistypicallysetat0.05or0.01.ThesmallpvaluefromtheLRtest,|z|.Theinterpretationoftheparameterestimates'significanceislimitedonlytothefirstequation,lowsesrelativetomiddleses.Theinterpretationforthesecondmodel,highsesrelativetomiddleses,naturallyfallsoutofthefirstequationsinterpretation.

  • 5/17/2015 AnnotatedStataOutput:MultinomialLogisticRegression

    http://www.ats.ucla.edu/stat/stata/output/stata_mlogit_output.htm 3/4

    Forlowsesrelativetomiddleses,thezteststatisticforthepredictorscience(0.024/0.021)is1.12withanassociatedpvalueof0.261.Ifwesetouralphalevelto0.05,wewouldfailtorejectthenullhypothesisandconcludethatforlowsesrelativetomiddleses,theregressioncoefficientforsciencehasnotbeenfoundtobestatisticallydifferentfromzerogivensocstandfemaleareinthemodel.Forlowsesrelativetomiddleses,thezteststatisticforthepredictorsocst(0.039/0.020)is1.99withanassociatedpvalueof0.046.Ifweagainsetouralphalevelto0.05,wewouldrejectthenullhypothesisandconcludethattheregressioncoefficientforsocsthasbeenfoundtobestatisticallydifferentfromzeroforlowsesrelativetomiddlesesgiventhatscienceandfemaleareinthemodel.Forlowsesrelativetomiddleses,thezteststatisticforthepredictorfemale(0.817/0.391)is2.09withanassociatedpvalueof0.037.Ifweagainsetouralphalevelto0.05,wewouldrejectthenullhypothesisandconcludethatthedifferencebetweenmalesandfemaleshasbeenfoundtobestatisticallydifferentforlowsesrelativetomiddlesesgiventhatscienceandfemaleareinthemodel.Forlowsesrelativetomiddleses,thezteststatisticfortheintercept,_cons(1.912/1.129)is1.70withanassociatedpvalueof0.090.Withanalphalevelof0.05,wewouldfailtorejectthenullhypothesisandconclude,a)thatthemultinomiallogitformales(thevariablefemaleevaluatedatzero)andwithzeroscienceandsocsttestscoresinlowsesrelativetomiddlesesarefoundnottobestatisticallydifferentfromzeroorb)formaleswithzeroscienceandsocsttestscores,youarestatisticallyuncertainwhethertheyaremorelikelytobeclassifiedaslowsesormiddleses.Wecanmakethesecondinterpretationwhenweviewthe_consasaspecificcovariateprofile(maleswithzeroscienceandsocsttestscores).Basedonthedirectionandsignificanceofthecoefficient,the_constellswhethertheprofilewouldhaveagreaterpropensitytofallinoneofthelevelsofthedependentvariable.

    l.[95%Conf.Interval]ThisistheConfidenceInterval(CI)foranindividualmultinomiallogitregressioncoefficientgiventheotherpredictorsareinthemodelforoutcomemrelativetothereferentgroup.Foragivenpredictorwithalevelof95%confidence,we'dsaythatweare95%confidentthatthe"true"populationmultinomiallogitregressioncoefficientliesbetweenthelowerandupperlimitoftheintervalforoutcomemrelativetothereferentgroup.ItiscalculatedastheCoef.(z/2)*(Std.Err.),wherez/2isacriticalvalueonthestandardnormaldistribution.TheCIisequivalenttothezteststatistic:iftheCIincludeszero,we'dfailtorejectthenullhypothesisthataparticularregressioncoefficientiszerogiventheotherpredictorsareinthemodel.AnadvantageofaCIisthatitisillustrativeitprovidesarangewherethe"true"parametermaylie.

    RelativeRiskRatioInterpretationThefollowingistheinterpretationofthemultinomiallogisticregressionintermsofrelativeriskratiosandcanbeobtainedbymlogit,rrrafterrunningthemultinomiallogitmodelorbyspecifyingtherrroptionwhenthefullmodelisspecified.Thispartoftheinterpretationappliestotheoutputbelow.

    mlogitsessciencesocstfemale,rrr

    Iteration0:loglikelihood=210.58254Iteration1:loglikelihood=194.75041Iteration2:loglikelihood=194.03782Iteration3:loglikelihood=194.03485Iteration4:loglikelihood=194.03485

    MultinomiallogisticregressionNumberofobs=200LRchi2(6)=33.10Prob>chi2=0.0000Loglikelihood=194.03485PseudoR2=0.0786

    ses|RRRaStd.Err.zP>|z|[95%Conf.Interval]b+low|science|.9767108.02048621.120.261.93737261.0177socst|.9618236.01877141.990.046.925727.9993276female|2.262839.88472762.090.0371.0515984.869199+high|science|1.023187.02135581.100.272.98217471.065911socst|1.043942.02076332.160.0311.0040291.085441female|.9676721.33870.090.925.48729811.921595(ses==middleisthebaseoutcome)

    a.RelativeRiskRatioThesearetherelativeriskratiosforthemultinomiallogitmodelshownearlier.Theycanbeobtainedbyexponentiatingthemultinomiallogitcoefficients,ecoef.,orbyspecifyingtherrroption.Recallthatthemultinomiallogitmodelestimatesk1models,wherethekthequationisrelativetothereferentgroup.Ifthemodelwastobewrittenoutinanexponentiatedformwherethepredictorofinterestisevaluatedatx+andatxforoutcomemrelativetoreferentgroup,whereisthechangeinthepredictorweareinterestedin(istraditionallyissettoone)whiletheothervariablesinthemodelareheldconstant.Ifwethentaketheirratio,theratiowouldreducetotheratiooftwoprobabilities,therelativerisk.Inthissense,theexponentiatedmultinomiallogitcoefficientprovidesanestimateofrelativerisk.However,theexponentiatedcoefficientarecommonlyinterpretedasoddsratios.Standardinterpretationoftherelativeriskratiosisforaunitchangeinthepredictorvariable,therelativeriskratioofoutcomemrelativetothereferentgroupisexpectedtochangebyafactoroftherespectiveparameterestimategiventhevariablesinthemodelareheldconstant.

    lowsesrelativetomiddleses

    scienceThisistherelativeriskratioforaoneunitincreaseinsciencescoreforlowsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Ifasubjectweretoincreasehersciencetestscorebyoneunit,therelativeriskforlowsesrelativetomiddleseswouldbeexpectedtodecreasebyafactorof0.977giventheothervariablesinthemodelareheldconstant.So,givenaoneunitincreaseinscience,therelativeriskofbeinginthelowsesgroupwouldbe0.977timesmorelikelywhentheothervariablesinthemodelareheldconstant.Moregenerally,wecansaythatifasubjectweretoincreasetheirsciencetestscore,they'dbeexpectedtofallintomiddlesesascomparedtolowses.

    socstThisistherelativeriskratioforaoneunitincreaseinsocstscoreforlowsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Ifasubjectweretoincreasehersocsttestscorebyoneunit,therelativeriskforlowsesrelativetomiddleseswouldbeexpectedtodecreasebyafactorof0.962giventheothervariablesinthemodelareheldconstant.

    femaleThisistherelativeriskratiocomparingfemalestomalesforlowsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Forfemalesrelativetomales,therelativeriskforlowsesrelativetomiddleseswouldbeexpectedtoincreasebyafactorof2.263giventheothervariablesinthemodelareheldconstant.

    highsesrelativetomiddleses

    scienceThisistherelativeriskratioforaoneunitincreaseinsciencescoreforhighsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Ifasubjectweretoincreasehersciencetestscorebyoneunit,therelativeriskforhighsesrelativetomiddleseswouldbeexpectedtoincreasebyafactorof1.023giventheothervariablesinthemodelareheldconstant.

    socstThisistherelativeriskratioforaoneunitincreaseinsocstscoreforhighsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Ifasubjectweretoincreasetheirsocsttestscorebyoneunit,therelativeriskforhighsesrelativetomiddleseswouldbe

  • 5/17/2015 AnnotatedStataOutput:MultinomialLogisticRegression

    http://www.ats.ucla.edu/stat/stata/output/stata_mlogit_output.htm 4/4

    Howtocitethispage Reportanerroronthispageorleaveacomment

    expectedtoincreasebyafactorof1.043giventheothervariablesinthemodelareheldconstant.

    femaleThisistherelativeriskratiocomparingfemalestomalesforhighsesrelativetomiddleseslevelgiventhattheothervariablesinthemodelareheldconstant.Forfemalesrelativetomales,therelativeriskforhighsesrelativetomiddleseswouldbeexpectedtodecreasebyafactorof0.968giventheothervariablesinthemodelareheldconstant.

    b.[95%Conf.Interval]ThisistheCIfortherelativeriskratiogiventheotherpredictorsareinthemodel.Foragivenpredictorwithalevelof95%confidence,we'dsaythatweare95%confidentthatthe"true"populationrelativeriskratiocomparingoutcomemtothereferentgroupliesbetweenthelowerandupperlimitoftheinterval.AnadvantageofaCIisthatitisillustrativeitprovidesarangewherethe"true"relativeriskratiomaylie.

    Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

    I D R E R E S E A R C H T E C H N O L O G YG R O U P

    High PerformanceComputing

    Statistical Computing

    GIS and Visualization

    HighPerformanceComputing GIS StatisticalComputing

    Hoffman2Cluster Mapshare Classes

    Hoffman2AccountApplication Visualization Conferences

    Hoffman2UsageStatistics 3DModeling ReadingMaterials

    UCGridPortal TechnologySandbox IDREListserv

    UCLAGridPortal TechSandboxAccess IDREResources

    SharedCluster&Storage DataCenters SocialSciencesDataArchive

    AboutIDRE

    ABOUT CONTACT NEWS EVENTS OUR EXPERTS

    2015 UC Regents Terms of Use & Privacy Policy